Woman And computer
Human And Computer

Fun with the h-index

The h-index is a widely used measure for a scientist's scientific productivity and impact somewhat more sophisticated than just the number of publications. The h-index is the greatest positive integer number h, such that the scientist has h papers each of which has been cited at least h times. If you're wondering how relevant the h-index is in practice, I have no way of telling in general. I know however that I've been in committees where the h-index evidently was an interesting point of reference for some of its members, and I have also been asked a few times what my h-index is. (Before you ask, according to SPIRES my h-index is either 14 or 16, depending on whether you count all or only published papers.) The absolute number isn't of much importance in most cases, it matters instead how you compare to others in your particular field - as Einstein taught us, everything is relative ;-)

Next time somebody asks for my h-index, I'll refer them to this hilarious paper by Cyril Labb� from the Laboratoire d'Informatique de Grenoble at the Universit� Joseph Fourier:

    "Ike Antkare, One of the Great Stars in the Scientific Firmament"
    22th newsletter of the International Society for Scientometrics and Informatrics (June 2010)
    PDF here

Labb� has created a fictional author, Ike Antkare, and pimped Ike's h-index to 94. For this, Labb� created 102 "publications" using a software resembling a dada-generator for computer science called Scigen, and a net of self-citations. Labb�'s paper contains an exact description of the procedure. His spoof works for tools that compute the h-index based on Google scholar's data; the best known is maybe Publish or Perish.

What lesson do we learn from that?

First, the Labb�'s method works mainly because he uses the h-index computed with a quite unreliable database, Google scholar, to which it is comparably easy to add "fake" papers. While for example the arXiv database also contains unpublished papers, it does have some amount of moderation which I doubt 102 dada-generated papers by the same author would get past. In addition, SPIRES offers the h-index for published papers only. (Considering however that I know more and more people - all tenured of course - who don't bother with journals, restricting to published papers only might in some cases give a very misleading result.)

Second, and maybe more importantly, I doubt that any committee that were faced with Ike's amazing h-index would be fooled, since it only takes a brief look at his publications to set the record straight.

Nevertheless, Labb�'s paper is a warning to not use automatically generated measures for scientific success without giving the so obtained results a look. Since the use of metrics in science for evaluation of departments and universities is becoming more and more common, it's an important message indeed, and an excellent example for how secondary criteria (high h-index) deviate from primary goals (good research).

For more on science metrics, see my most Science Metrics and Against Measure. For more on the dynamics of optimization in the academic system, and the mismatch between primary goals and secondary criteria, see The Marketplace of Ideas and We have only ourselves to judge each other.

Thanks to Christine for drawing my attention to this study.

 
Internet