Research Evaluation

Research may be evaluated with respect to quality, quantity, and impact. However, measuring these criteria is far from trivial. The very few outstanding scientific achievements are easy to observe and agree on, but for the large-scale evaluation of research, one needs to resort to secondary measures, that is, indicators.


Quality is typically judged by peer review, and the standards for the quality that need to be met in order to get an article accepted vary according to the particular journal, conference, workshop, etc. Nevertheless, such aspects of quality as scientific soundness, originality, relevance, etc. are difficult to measure objectively, and research on peer review shows that the reliability of peer review is relatively low. However, the research communities have so far not come up with a better alternative.


The quantity of the research of a person, group or institution is typically measured in terms of number of publications. In practice, there is often a combination of quality and extent. For example, the Norwegian "research counting" system distinguishes between Level 2 and Level 1 publications, the former with weight three times higher than the latter.

In software and systems engineering, the Journal of Systems and Software annually rank scholars and institutions on the basis of the number of publications in the last 5-year period in the seven journals that a group of senior research scientists consider the most important in the field. For the period 2003-2007, I was ranked among the top 15 scholars on JSS' list (out of a total of about 4000 scholars who had at least one article in the sample). The Software Engineering Department at Simula Research Lab. was ranked number 1 in the world for the period 2003-2008, when I was its leader.


What really matters in the long run is the impact of the research, which in academia is typically indicated by measures of secondary citations. In software engineering, the transfer of research outcomes to industrial practice is an important criterion for their impact, but it is difficult to measure. In the most recognized citation database, ISI Web of Science (WoS), I have 592 citations (August 2010), whereas Google Scholar gives me about 1500. Scopus is somewhere in between.

To combine quantity, quality and impact, the h-index is now being used more and more. For a given set of publication sources (which indicates quality), the value of the h-index is h if there are at least h publications that are cited at least h times each. For example, in ISI WoS, my h-index is 14, that is, at least 14 of my publications are cited at least 14 times. In Google Scholar, my h-index is 24.

The h-index is a reasonable indicator, but it is not well-suited for comparison across disciplines. For example, the number of authors of a publication is not measured (as opposed to the Norwegian research counting measure, which divides the score on the number of authors). Computer science, with a tradition of having relatively few co-authors, would therefore generally have lower h-indexes than other sciences with more co-authors. Another problem for computer science with respect to the h-index is that much of the research is published at conferences, which has only very recently been included in ISI WoS. Yet another problem with the h-index is that it simply accumulates all research published over the years, which favours people who have been in the game the longest. To indicate the present research level, it may be better to calculate the h-index for (say) the last 10 years. My ISI WoS h-index for 2001-2010 is 11.

Concluding Remarks

There is obviously extremely good research that is not indicated by the measures described above. For example, my former MSc supervisor and later colleague, Kristen Nygaard, received the ACM Turing Award and the IEEE John von Neumann Medal for his inception of the fundamental concepts of object-oriented programming. Still, he had only a couple of high-profile publications (e.g., O.-J. Dahl and K. Nygaard: SIMULA - an ALGOL-based simulation language, Communications of the ACM 9(9): 671-678, 1966) and an h-index of 2. However, he was one of a very few exceptional cases. (See also the obituary I wrote on him.) For the large majority of us, the measures described above are reasonable indicators of the level of our research that we must accept will be used by funding bodies, employers, etc. who do not possess research competencies in our specific areas.

© 2010   Dag Sjøberg, Department of Informatics, University of Oslo