The *h*-index is a way to assess the impact of the published work of a scientist, in terms of citations. It is an attempt to get around more simple-minded citation measures and works rather well in the sense that the scientists whom you’d expect to have high *h*-indices, usually do; while scientists who happen to have published one or two high-impact papers, but have had otherwise unremarkable careers, won’t score very high.

Basically, the definition of the *h*-index is: it is the maximum number *h* such that the scientist has published *h* papers, each with at least *h* citations. That is, if I have published 10 papers that have each been cited 10 or more times, but have not published 11 papers which have each been cited 11 or more times, my *h*-index is 10.

Based on a facebook wisecrack by a friend (on the occasion of Sachin falling short of his 100th international 100), the thought occurred to me: how about ranking batsmen in cricket by an analogous score? That is, a batsman has an index *h* if, on *h* occasions, he has scored *h* or more runs. (As before, we take the maximum possible *h*.)

It turns out that the top five in this list (for Test cricket only) are basically the top five rungetters, in the same order: Sachin Tendulkar, Rahul Dravid, Ricky Ponting, Jacques Kallis, Brian Lara. The top 20 or so almost all appear in the top 20 list for rungetters. So it’s not very interesting — yet. Tendulkar’s *h*-score is 76 — that is, on 76 occasions he has scored 76 or more. There is a big gap between him and Dravid (69) but the others follow closely behind.

Suppose we modify it as follows: the *nh* index is that value of *h*, for a given *n*, such that on *h* occasions the batsman has scored *nh* or more runs. For examples, the 10*h* index would be: if on 5 occasions I have scored 50 runs or more (and I have not scored 60 runs or more on 6 occasions) I have a 10*h* index of 5. For *n* > 1, basically, I am giving more importance to higher-scoring innings, and also benefiting those who played fewer matches (most older players played far fewer games than Tendulkar and can’t remotely approach either his career aggregate, or his *h*-score).

What is the 10*h* ranking of batsmen, then? It turns out to be substantially different. The top 6 batsmen are now DG Bradman, Lara, Tendulkar, V Sehwag, KS Sangakkara and DPMD Jayawardene. Bradman scored 180 or more on 18 occasions; Lara’s 10*h* score is 17, Tendulkar’s is 16, and the other three get 15 each. Also in the top 10 (well, tied for 10) is Gary Sobers, who ranks quite far below both in career aggregate and in *h*-index. Immediately after him is Wally Hammond, who drops off today’s lists in the aggregrate as well as the above *h*-index.

Specifically, the top-20 list goes like this:

h |
Batsman | Score (at least) |

18 | DG Bradman | 185 |

17 | BC Lara | 178 |

16 | SR Tendulkar | 160 |

15 | V Sehwag | 164 |

15 | KC Sangakkara | 152 |

15 | DPMD Jayawardene | 150 |

14 | SR Waugh | 150 |

14 | RT Ponting | 150 |

14 | R Dravid | 148 |

14 | JH Kallis | 148 |

14 | GS Sobers | 145 |

13 | WR Hammond | 140 |

13 | SM Gavaskar | 147 |

13 | ML Hayden | 131 |

13 | Javed Miandad | 145 |

13 | IVA Richards | 135 |

13 | GC Smith | 133 |

13 | G Kirsten | 133 |

12 | Zaheer Abbas | 126 |

12 | Younis Khan | 126 |

As you may expect, a 5*h* ranking sort of interpolates these: Tendulkar now tops again, with Bradman and Lara tied next (and close behind). Sobers and Hammond continue to rank high.

While it is always difficult to rank batsmen from different eras, it seems to me that any list of all-time-great batsmen must put Bradman at or near the top, and must include Sobers, Hammond, Sunil Gavaskar, Vivian Richards, Zaheer Abbas, Javed Miandad and other past greats in the top 20. The *nh*-index seems to do this, for suitable choices of *n*. But what is the optimal choice?

(This is based on raw batting data downloaded from cricinfo.)

UPDATE 07 Dec 2011: Gangan Prathap points, in the comments, to this 2010 paper, by him, where he proposes a “mock h index” (different from my nh-index above); his scheme ranks Bradman above four top Indian batsmen, but does not consider other international greats.

## Bala

/ November 28, 2011An excellent post, as usual, and a great idea (the h-index for batsmen). As for the optimal value of n, resisting the temptation to say “42” (which, as we know, is the ultimate answer to all deep questions), I’d still say “1”. So we’re back to the original h index, and Tendulkar is numero uno once again, and all is right with the world:-)

## Rahul Siddharthan

/ December 1, 2011Bala – so, just for fun, I looked at the numbers with n=42. Now Bradman, Sehwag and Lara are on top (6 times they have scored 6*42=252 or more), followed by Abbas, Hammond, Dravid, Miandad and Jayawardene on 5. Sachin, with a bunch of others, is on 4.

I disagree that 1 is optimal (in fact, I think it is not optimal even in publishing) but it would be an interesting exercise to work out, and justify theoretically, what is optimal in a given field.

## Bala

/ December 1, 2011I figured n = 1 might be optimal only in a vague hand-waving way,

not after any real thought. It was more that I couldn’t see any

obvious natural ‘scale’ to fix n, and so fell back on 1. Maybe one

should go nonlinear and try n = sqrt{h} or something. As for

publishing, I think there IS no optimal measure that is anywhere

near universally applicable. After all, there are as many motivations

and styles of doing science as there are scientists themselves.

Of course one speaks of impact, but I think that is a corollary of

our present-day myopic impatience with things like perspective

and the big picture. A milieu that increasingly regards yesterday as

ancient history can’t imagine waiting for a decade or more to assess

the real impact of anything.

## Rahul Siddharthan

/ December 1, 2011Yes, the question of a natural “scale” is important. Also important is the quality of the opposition (in cricket), the citing papers (in research), etc — which is ignored by the h-index and this generalisation. People have come up with more sophisticated metrics in literature, but none have the simplicity of the h-index. I agree this is rather unreliable for recent research, and it may also be unfair to some older research — but that is why people talk about “impact” metrics, not “quality” metrics. The problem comes when impact is confused with quality. All things being fair, I believe that high-impact papers will, 99% of the time, be high-quality; but of course people are gaming the system now and not all highly-cited papers are actually so worthwhile.

## Alpan

/ December 1, 2011Interesting. Have you tried doing this for bowlers (i.e., largest h such that the bowler has taken nh wickets in h matches)?

## Rahul Siddharthan

/ December 1, 2011No, I haven’t tried. Given that the maximum any bowler has ever taken in a match is 18, while many bowlers have played over 100 matches, I’d guess the choice of n needs to be much less than 1. I was thinking of trying other sports, including some that I’m not familiar with, just to see how it does in an unbiased situation (ie, in cricket I believe Bradman should be on top, but in baseball, say, I’d have no clue — but then I’ll need a baseball expert to comment on the resutls).

## CIngram

/ December 2, 2011Wonderful idea. The ideal number from my point of view (I’m an Essex boy) would be n: Graham Gooch = nº 1, but I think we’d have to go non-linear (as Bala suggests) and then cheat a bit to get it to work ;-) It would be easier I think to make Jack Hobbes top for all first class cricket.

But I wouldn’t argue with Tendulkar as nº 1. And even less would I argue with the Don. I remember reading, as a very young boy, the story of his last test series in England (told by I can’t remember who). The drama of the narrative was compelling. Reading the lines I could see before me this greatest of all batsmen doing impossible things before proving that he was just, only just, on our side of immortality. I wanted to watch him play, and couldn’t understand why my father said it wasn’t possible because he’d retired many years before. Time means nothing to a six-year-old.

It would be good to come up with something similar for bowlers, and then perhaps to get the ICC, or at least Cricinfo, interested in the idea.

## Rahul Siddharthan

/ December 5, 2011Heh. It will take a lot more work before I’d consider getting the ICC or even Cricinfo interested!

## sunny

/ December 4, 2011Can you please try 16.

Here is how I calculated it. Average cricket career is 8 years. 2 Big test scores a year. I think that can be taken as good output from a batsman.

## Rahul Siddharthan

/ December 5, 2011Ok, here you go: Bradman (13), Lara (12), Sehwag/Tendulkar/Sangakkara (11), Hammond/Gavaskar/Dravid/Miandad/Jayawardene (10), and about 20 batsmen on 9. Not a bad argument, by the way. I’d have tried to fix the ideal “high score” first (centuries are too common, double-centuries perhaps too rare) and the number of times that score should be reached by a really good batsman. So I’d have thought a really good batsman should get, say, 150 about 10 times in his career. That leads to n=15 — almost the same as what you ask — with Bradman still on top, Lara still 2nd, Sangakkara now alone 3rd, and a bunch of others tied below.

## Alpan

/ December 7, 2011Ok, here’s a really geeky approach to optimize n: a maximum entropy algorithm. Clearly if n is too large, everyone will have a h-index of 0, i.e., the same value. Similarly, if n is too small, everyone’s h-index will be approximately equal to the number of matches they have played. If we normalize by number of matches, again everyone will have almost the same h-index. Thus, an optimal value of n is one which most differentiates between batsmen. So my recommended (greedy) algorithm would run as follows:

For each n, compute the normalized h-index for all (or a large subset of) batsmen; from the normalized h-index distribution compute the entropy; end. Pick the n value that corresponds to the largest entropy.

I don’t have the data, so not sure how well this works!

## Rahul Siddharthan

/ December 7, 2011alpan — what do you mean by “normalised h-index”? The sum of h over all batsmen should be 1? That doesn’t really make sense to me.

One could calculate the probability of a random batsman having h, for each value of h (not normalised): then the sum over h of p(h) would be 1 and one can calculate and maximise the entropy of this distribution. That is not hard to do, but it turns out that the maximum entropy value of n is 1 (or 0, if we allow 0).

## Gangan Prathap

/ December 7, 2011Is there a place for a mock h-index?

Gangan Prathap

Scientometrics (2010) 84:153–165

Although the h-index was introduced in a publication-citation context, it can easily be

applied to other source-item relations, as was done for library classification (Liu and

Rousseau 2009). An obvious application is its application to sports. It is usual to use

averages, e.g., batting average = C/P, where C is the total number of runs scored in a and

career of P innings played, say, borrowing an example from cricket. This is meaningful,

where the distribution is Gaussian. However, if it is Lotkaian or Paretian, as is often the

case, a composite measure of correction for quality like (C2/P)1/3 may be profitably used to

rank such performers. Figure 3 shows an example compiled quite tediously a few years ago

showing the performances of four leading Indian batsmen and how they compared with alegendary great, Sir Donald Bradman of Australia. As seen, the distribution is identical to

what will obtain in bibliometrics, with tall cores and long tails. If a conventional h-index

evaluation is done, then Tendulkar ranks as the best in this list, and Bradman ranks last,

only because in those days, cricketers got to play fewer games in an active career

(Table 6). No self-respecting cricket enthusiast will accept such a conclusion. The h-index

procedure fails to recognize the very tall core and the very short tail implied by Bradman’s

career. However, the mock h procedure captures this very faithfully, restoring Bradman

rightfully to the top of the list.

## Rahul Siddharthan

/ December 7, 2011Thanks – your mock h looks very interesting. But it would have been nice if you had looked at a few more batsman — at least the rest of the current top 5, i.e. Lara, Ponting, Kallis. Also, your h-score for Bradman (44) looks different from what I get (48): how did you calculate it? For Gavaskar we agree, for Ganguly I get 51 where you have 47. Tendulkar and Dravid are, of course, still playing (and Dravid has had a good run since you published your paper) so some difference is expected.