Batting averages: 99.94 and all that


Cricket's ultimate failure is a duck; the most common score in cricket is also a duck. For batters in men's Test cricket, a tenth of all innings finish without a run, so failure is the most frequent outcome for a batter, yet this is not truly reflected in batting averages. The high rate of failure and low rate of success makes the median, not the average, a better measure of a batter's effectiveness. Batting innings in men's Test cricket, players with 2000+ runs
Collective innings for all batters in men's Tests who have scored 2000+ runs.

All men's Tests innings of batters with 2000+ runs

Successes are over-represented in batting averages. The collective average of all men's Test batters who've made 2000+ runs is an impressive 41.89, with a range of individual averages bookended by two greats of the game, ranging from 17.32 (Shane Warne) to Donald Bradman (99.94)*. Batting innings are more likely to fail than succeed, as shown by the familiar exponential fall-off in scores in Test innings from zero up. 50% of innings end before 25 runs, and a quarter make eight or fewer runs, yet this is not reflected in the collective mean of 41.89.

*Data retrieved 17th June 2021

Rare success are over-represented in batting averages

With distributions like these, it is expected that the average value will be biased by outliers and this is exactly the case with Test innings. Sporadic, rare successful innings have a larger weight than the more frequent batting failures. For example, in this sample of 33,487 innings, just under a tenth (3118 innings, 9.3%) reach of score of a hundred or more; only 2.8% (938 innings) of innings pass 150 and 0.97% (324 innings) go to 200 or greater. 

A simple experiment highlights the big influence rare big scores have on averages compared to the more numerous failures. With all batters' innings, if the 2.8% of scores that made 150+ is removed, the collective average (unsurprisingly) decreases. However, what is surprising is the amount the average decreases, as it drops by nearly six runs from a respectable 41.69 to a more pedestrian 35.82 runs per innings. To complete this simple experiment, when the same number (2.8%) of randomly-selected failures (innings of fewer than 10 runs) are removed, the average is basically unchanged as it goes from 41.69 to 41.70. So a small number of successful innings can change an average by 6 runs whereas the removal of the same number of failures has little discernible effect.

Can the high weight of rare, successfully innings be removed by including not outs to penalise averages? In this measure the mean = total runs / number of innings, whereas average = total runs / (number of innings - number of not out innings).

Not out, not fair: using mean scores in place of averages

A not out innings in cricket has a high price in cricket, but effectively little value for a team. Runs from a batter's not out innings are counted on their overall tally of Test runs, but this innings is erased in the count. Effectively, the innings no longer exists but the runs do. Ostensibly this allows for runs that would have been scored had the batter continued batting, but in no other aspect of cricket are phantom measures included in calculations of performance: a bowler is not awarded potential wickets, a 'keeper extra dismissals, or a captain more wins.

If an opener carries their bat throughout the innings, it's positive for the team in that they will score a good number of runs, stabilise the innings to help others also score runs, as well as generally tiring and demoralising the bowling team. Exceptionally remaining not out can salvage a draw for your team in first-class and Test cricket. Being not out is often good for the team, but so is bowling uphill into the wind all day, or smashing a few runs while getting out cheaply. Unfortunately, bowling and batting averages do not capture this, but they do capture the fundamentals. 

The mean score is one way to ignore not outs. For the mean, the total sum of runs is divided by the number of innings, and is 38.15 for the top-scoring batters in men's Tests (compared to 41.89 batting average). For batting averages, the presence of a few high scores produces an artificially large measure of batting performance - a similar pattern is seen when the mean is used so it is not panacea. Including not outs in innings counts makes sense, but the mean still has the same problems associated with the average (e.g., values are higher than they should be). 

Median measures better reflect biased distributions

The median score of all innings for batters with 2000+ runs is 24. The median is the middle value of all innings scores in order, and can overcome the problem of outliers artificially raising averages, and like the mean, the median does not treat not out innings as a special case. Also, the median only decreases by 1 when the top scores are removed, and increases by 1 to 25 when the lowest scores are removed, so is much less influenced by successfully innings outliers compared to the average or mean.

Many others have (better) discussed the problems with batting averages (e.g., https://www.espncricinfo.com/story/kartikeya-date-the-calculus-of-the-batting-average-745791), and indeed, mathematically the use of including not outs can be defended as making sense (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2117199). For me, as can be seen from the chart above, median values (which ignore outliers which are over-represented in the mean) is a better, more intuitive measure than the mean.

Winners and losers when alternatives to the mean are used

Does using a different measure really make any difference? It may be statistically fairer to use a median rather than a batting average but that will matter little to anyone. Arguably, the biggest differences in using the median (or mean) in place of the average will be seen in batters with a high number of not out innings, so it could have practical effects. To me it makes more sense to use the median rather than the average, but I am not completely sold on the idea; below is list of the top 100 batters (rank by average) compared to their mean and median rankings. 

The use of the median to measure batters may be statistically fairer, but the table below may offend cricket traditionalists. Bradman is still top, but his median batting "average" collapses to a less pleasing 56.5 from 99.94, although this is still over 10 runs per innings better than Ken Barrington in second place. Steve Smith falls to fifth, as Bradman, Barrington, and Hobbs make up the top three. 

Whatever measure is used, Bradman remains the greatest men's Test batter of all time. The sheer ruthless volume run accumulation is immune to any statistically pressure. Counter-intuitively, Bradman is also the biggest loser in this system with mean (not outs penalise) rather than the average, as his measure of runs per innings decreases from 99.94 to 87.45, but this reflects more the fact he is almost operating at a different scale to other batters by having a average/mean nearly twice as large as the next best.

Difference in average and mean in batter's performance, men's Test cricket
Collective innings for all batters in men's Tests who have scored 2000+ runs.

Below the full break-down of measures is shown for the top 100 of Test batters ranked using different statistics for perusal at leisure.

 

Comments

Popular posts from this blog

Loyalty & cricket

Who gets to decide what is acceptable in cricket?

Should we ban bouncers? (I don't know)