Australia's success with the DRS: What the numbers say

The recently concluded India-Australia Test series was one of the most intriguing ones, not only for the cricketing battle on the pitch, but also the controversy that swirled around the decision review system (DRS). In the second Test match, Steve Smith looked towards the dressing room for guidance on whether to review the on-field decision after he was deemed out, a moment he later attributed as a “brain fade.”

Cricket rules clearly state that players cannot consult other than those on the field while deciding whether to review an on-field decision. Indian captain Virat Kohli, however, insisted that this was not a one-off incident but something that the visitors had been systematically doing during the Test match and he all but insinuated Steven Smith of cheating.

Extra Cover: What’s right or wrong with the Decision Review System (DRS)?

To get some clarity on this issue, I performed some statistical analysis on the total number of reviews that Australia have taken since Smith’s captaincy in 2015, and the ratio of reviews that were upheld vs struck down. Since Smith took charge, Australia played a total of 21 Test matches, of which 8 were played in his first season (2015-16) and the remaining 13 in the current season - from July 2016 to the conclusion of the recent test series in India (March-2017).

I analysed the ratio of a total number of review decisions that Australia got right (upheld) to the total number of reviews requested (including both that were upheld and struck down) across the two seasons. If there is any truth in Australia systematically consulting the dressing room to review an on-field verdict, one should expect them to display significant improvement in this metric over time. Specifically, Australia should be getting significantly[1] more of their reviews right than wrong.

Extra Cover: India's dismal affair with the Decision Review System

I removed the last two Test matches that Australia played (Ranchi and Dharamsala) from the statistical analysis as this issue had become a huge talking point by then and Australian players were scrutinised more closely when considering taking a review. The key metric was the ratio of how many decisions Australia got right to the total number of reviews called for, across the two seasons of Smith’s captaincy. I excluded innings where Australia did not take any reviews from analysis.

Things turned sour between the two teams by the end of the series

I find that in the 15 innings of his first 8 Test matches as captain, Australia’s average ratio was .13 compared to .37 in the 21 innings of his last 11 matches, implying that Australia got more of their reviews right in Smith’s second season compared to his first season. In terms of percentage improvement, the jump is about 185%. Further, this difference is significant with a probability value of .036, meaning that the two metrics are statistically different with more than 96% likelihood.

Now, if we look at Australia’s performance in their last two Test matches (3 innings), the ratio drops down to .07, even lower than the average ratio of .13 in their first season. This is a substantial drop and is in stark contrast to their average performance of .37 in their previous 11 test matches. However, an adequate statistical analysis could not be performed for the last two matches given the small sample size of three innings.

Extra Cover: Virat Kohli believes Steve Smith crossed a line with DRS incident

As a point of comparison, I also examined how other teams have performed in the same time period. An alternate account for the above results could be that teams on average have learned and generally improved over time when appealing the on-field verdict.

I looked at England, New Zealand, Pakistan, South Africa and Sri Lanka’s performance on the same metric across the two seasons. India was not included as their use of DRS originated this year after it was imposed by ICC.

Team	Number of Innings in 2015-2016 season	Number of Innings in 2016-2017 season	Ratio of reviews upheld to total number of reviews in 2015-2016 season	Ratio of reviews upheld to total number of reviews in 2016-2017 season	Percentage Improvement
Australia	15	21	0.13	0.37	184.62
England	13	25	0.22	0.31	40.91
New Zealand	14	16	0.18	0.18	0.00
South Africa	8	20	0.32	0.29	-9.38
Sri Lanka	8	20	0.21	0.28	33.33
Pakistan	6	23	0.21	0.28	33.33

See table above for each team’s ratio of successful reviews over a total number of reviews across the two seasons. Results indicate that most of the teams have improved over the two seasons but none of these teams have significantly improved with a probability difference of less than .05 i.e. greater than 95% chance. In fact, the team with the highest improvement within the comparison group was England (from .22 to .31) but this improvement of 41% is not statistically significant as probability value is only .38

Within this comparison set of other national teams, Australia’s improvement stands out as a statistical anomaly. Additionally, a drop in their review performance after the “brain fade” incident stands in sharp contrast to their performance in the previous 11 matches.

If Australia is to maintain the level of review performance before the “brain-fade” incident, they will require a significant improvement in their review hit rate in the following Test matches. However, if their review performance stays similar to the last two Test matches, and is significantly below their average performance in the second season, it suggests a systematic shift in their decision-making process post-Kohli’s accusations.

Although the data cannot pinpoint or provide a reason, the statistical evidence on the outcome is clear – Smith’s (Australia’s) ability to correctly call for reviews in his second season until the Bangalore Test match was significantly better in comparison to his first season and also among his contemporaries.

[1] In statistical language, significant difference implies that there is a 95% chance that the two metrics are different; anything less than 95% in statistical language is not considered dissimilar even though the two metrics may differ in their mean value or overall magnitude.