Data Science for Cricket
networking.
Data Science, ML, AI are the flavors of the season
all over the world and India is no exception. Interest in these
techniques is at a fever pitch; academicians and industrial practitioners alike
are exploring the fundamental underpinnings and application potential of these
techniques. As a result, data science is being evaluated for
implementation in all imaginable fields and sub-fields. Sports is
one such area with tremendous potential for data science
applications. While sports analytics has been around for a long time,
sabermetrics being a prime example, the scale of what can be achieved now with
analytics is quite stupendous. Disparate data from various sources
can be collected and analyzed towards making decisions regarding almost all
aspects of any sport. Take cricket, which is the focus of this blog,
as an example. Ball-by-ball commentary, video feed, bats that are
actual IoT devices producing fascinating information, can all be merged
together for analysis. This blog describes a data science tool for
cricket. While one would usually associate sport analytics with tools that can
be used to improve player performance or game outcomes, this effort targeted
enriching spectator experience. The aim was to go beyond traditional statistics
by bringing in multiple layers of nuanced analysis that heightens the
sport-lovers’ engagement with the game.
This particular effort was a result of
ESPNcricinfo’s interest in rationally addressing two talking-points that seize
spectators the most. The first one is related to the impact of luck
on the result of a game. The second is on deciding how important a particular
performance is to the outcome of a game or equivalently, quantification of an
inherent value of a performance that takes into account as much game context as
possible. Sport lovers would surely have participated in
arguments centered on these two themes. At the end of these
discussions - usually highly energetic and contentious - one is more often than
not left with a sense of dissatisfaction or non-closure. ESPNcricinfo
wanted to arm the debaters with a little more data-based reasoning to bolster
their arguments. In other words, the effort was to mathematically frame these
questions, albeit not from a viewpoint of definitive answers (which obviously
is not possible), but from a spectator engagement perspective. At
this juncture, many of you reading the blog may probably chuckle at the
foolhardiness of such an exercise. You might wonder how one would be able to
quantify a fundamentally abstract concept such as luck, especially in a debate
where the participants have pre-conceived notions and favorite
teams. Let us pause here for a moment to enunciate the fundamental
guiding principle that underlies this work. Clearly, there is no
unique best approach to address this problem. Rather, it is
desirable that any approach that one develops satisfy the following two
requirements. One, consistent application of the same approach to
all scenarios leading to “apple-to-apple” comparisons should be possible and
two, it should make “cricketing sense”. At this point, there might
be dismay that we are trading one abstraction, “quantify luck” for another,
“cricketing sense”. If one were to think a little more carefully, it
will become apparent that this is not as arbitrary a concept as may seem at
first encounter. Assume that you make a statement about some
cricketing situation to a room full of spectators. If a large
majority of the room agree to the statement, then the statement is deemed to
make cricketing sense. Of course, how one practically realizes this
(we cannot assemble a room full of spectators and do polling) is a tricky
question and in our case, a panel of ESPNcricinfo experts painstakingly checked
if the algorithm results make cricketing sense for the large number of games
that we tested on. Essentially, the underlying idea is some form of
“majority is right”, however small the sample may be. We apply this
principle to considerably more important things in life than
cricket. Any result of a presidential style election, however important
the country be on the world stage, cannot be viewed as the right answer (in any
mathematical or practical sense) but as an answer that is delivered by a
majority (in some cases, not even that).
Let us return to the first problem at
hand. Given a score-card of a game that is completed, how can we
quantify the impact of luck on the game? In rather short order one
would realize that just a scorecard is not enough information. In
this project we decided to work with information at a higher level of granularity,
which is the easily available, “ball-by-ball”
commentary. Ball-by-Ball commentary is, in general, unstructured
data; however, some structuring of this data and a resultant database was
already available with ESPNcricinfo. When the problem of quantifying luck is
broken down further, there were several more questions that we had to contend
with. A list of these are:
1. What
are luck events?
2. Do
these events affect the batsman and bowler in the same manner? In essence, is
it zero-sum?
3. Is
the impact of a luck event on the batting or bowling team the same as the
impact on batsman or bowler?
4. How
would one quantify the impact of disparate luck events in an apple-to-apple
fashion anyways?
5. What is the
cumulative impact of all the luck events on the two teams? How does one account
for the luck event in both the innings together?
Of course, at this point, answering all these questions looks like a formidable endeavor, and a comprehensive solution might be elusive. In the search for a data science approach, the first decision that was made was to enumerate a reasonably comprehensive list of luck events. A list of such luck events is shown in the figure above.
It can be seen that
we have a hierarchical arrangement of the luck events. At the highest level of
hierarchy, we have dismissal and non-dismissal events. Dismissal events are
further categorized into replacement and reinstatement events. There
are multiple luck events under each of these final nodes of this
classification. We report an illustrative list here and in the actual
application many more events have been considered. The non-dismissal
events have a reasonably simple logic for their run impact computations.
Replacement events are ones where the alternate situation is where a batsman
has to be replaced by another one. In contrast, reinstatement events
are ones where a batsman has been given out unluckily and one has to imagine
what would happen to the scorecard if the batsman was reinstated. This
hierarchical arrangement and the identified luck events answer our first
question in the enumerated list of five questions.
The table above is
developed for all the luck events (the table shows a subset of events). This
table allows us to answer questions 2 and 3. For each of the luck events, how
(with the correct interpretation of positive or negative luck) and if it
impacts the batsman, bowler and the respective teams is described in these
tables (only one of the multiple tables shown here). Y stands for impact and N
stands for no impact in luck computations. Using these tables, we also address
differences between luck and skill to some extent. If one looks at event
description Catch dropped, for a regulation catch that is dropped, the bowling
team is not unlucky but rather they have not executed a basic skill properly (N
entry). Now armed with this formalism, one can then proceed to answer question
4, which is the identification of a quantifying metric for these luck events
that will make commensurate comparisons possible. The most obvious
quantifying metric is the run impact of the luck event. This would
allow luck events to be compared on an equal footing. This
necessitated the development of a core data science module that can predict
future runs that will be scored from any given situation in a game. This was
named the forecaster.
The basic
mathematical problem is, given a score at the end of nth over
(runs scored and wickets fallen), how does one predict the score at (n+k)th over? Initially,
since it looks like a nice time series problem, we used a recurrent neural
network architecture for this prediction. However, there were
difficulties with this approach, largely related to data requirements and
explainability. We could also not explore this solution fully given
the incredibly short time that we had (3 months), starting from a blank page
all the way to a deployed application in the ESPNcricinfo
website. It would be interesting to revisit this with more data and
deeper (figuratively and literally) architectures. Nevertheless, we abandoned
this approach and moved onto a more operations research approach, with machine
learning models as required. Here, from a given situation, there are
a certain number of balls remaining to be bowled (resource) and these need to
be allocated to the remaining batsman (allocation). We solve this
resource allocation problem based on multiple statistical parameters derived
from the data. Once this problem is solved, for predicting the score
after (n+k)th over, we need to predict the strike rates of the
batsmen who will play-out the allocated balls. Here, we use
different machine learning models with self-correction abilities trained on
data for all the batsmen in the database. These models take several factors
into account, and are also conceptually extendable to include other factors in
the future. From our experience, the most accurate machine learning model to be
used depends on the format of the game (T20, ODI). This module for prediction can
then be integrated to predict the impact of luck event. The score prediction
algorithm is run on the actual situation and luck removed alternate
situation. The difference in the predicted scores quantify the luck
impact. Though not used in luck computations, probability of a
result (win or loss) for the teams was also developed based on the forecaster
and historical data. There are also other nuances such as post game and live
game luck computations and so on that are not discussed here, for reasons of
brevity. Further, the computations were carefully designed so that these impact
numbers could be cumulated to address question 5 in the list of questions.
Now that the luck
events are enumerated, each delivery bowled can be annotated with a luck
code. This necessitated that the database be altered to include as
many columns in the table as there are luck events. As a commentator
is providing text commentary, he or she will also score the presence or absence
of luck events for each of the deliveries. The default value is
zero, which signifies absence of the luck event; this ensures that online
scoring of luck events is simple and efficient. Traditionally,
ESPNcricinfo was not scoring these luck events and hence considerable effort at
retrospectively scoring a selected set of matches for luck events through
manual curation of the commentary had to be undertaken. In some cases, the
original match footage had to be revisited for this annotation
exercise. A set of 50 odd games were annotated for luck events and
then used to benchmark and evaluate the appropriateness of the algorithms that
were developed.
We also developed
algorithms for identifying the inherent value of different performances – a
suite of algorithms collectively called smartstats. Here, the key idea is to
value performances based on a notional pressure felt by a batsman or bowler
when they are performing. Performances in high pressure situations are valued
more than the ones where the pressure is minimal. The pressure that we feel
(and presumably the players also feel similar pressure) while watching the game
is directly related to the scoreboard pressure. To capture this, the difference
between the predicted score and the target is mathematically transformed into a
value for pressure. The first innings pressure is calculated based
on a notional target, akin to the par score that teams bating first usually
target. This instantaneous pressure is used to appropriately increase or
decrease runs scored of every ball. Based on this the algorithm identifies
an alternate score card from which smart strike rates and other smart
statistics can be derived.
We
will now look at some of the results from our suite of algorithms for the IPL
2019 season and the recently concluded ODI world cup. We sample some
interesting results and describe them briefly. One of the first successes of
the forecaster tool in the world cup ODI came in a game between South Africa
and Bangladesh. The forecaster predicted a final score of 335 for Bangladesh
after 25 overs and they went onto make 330 at the end of the innings. This was
one of the early scores greater than 300 predicted by the forecaster.
In another
Bangladesh match featuring West Indies, the forecaster gave a thumbs-up for
Bangladesh by the half-way mark with a win percentage of about 63%. At this
point in the game, Bangladesh had still about 160 runs to score with three top
order batsmen gone. It turned out that the forecaster was right and the game
ended in Bangladesh’s favor. Of course, there are also cases where the
forecaster’s predictions didn’t turn out to be as accurate.
In terms of luck index, there were several interesting results throughout the IPL season. Here, we point out a consolidated result in terms of the overall impact of luck as judged by the algorithms. Below, you will see two tables, one with actual standings at the end of the league games, where MI, CSK, DC and SRH were the top four teams and these teams moved onto play-offs. If we were to remove all luck events from all the games, our algorithms predict that RR would have replaced CSK and gone onto the play-offs. Whatever you make of this result, one thing is for sure; this table will not make us popular with the CSK fans (no lucky guesses needed here!!).
Let us look at some
results from the smartstats algorithms that were developed. We describe two
prototypical results here, one for batsmen and one for bowlers. Let us look at
what smartstats says about the performances of KL Rahul and M Agarwal in a KXIP
v MI match. The pressure was high (required run rate was over 10) when Mayank
came out to bat. Mayank scored 43 off 21 balls and turned the match in Punjab's
favor. During his partnership with Rahul, Mayank scored the bulk of the runs at
a high strike rate and reduced the pressure of the required rate on Rahul and
the other batsmen to follow. Though Rahul scored 28 runs more than Mayank,
Mayank scored more 1 smart run more than Rahul in the innings as judged by the
smartstats algorithms (shown below).
From a bowling
viewpoint, let us look at the performances of Axar Patel and Sandeep Sharma in
a KXIP vs RCB game (IPL 2017). Both Sandeep Sharma and Axar Patel took three
wickets each for Punjab However, while Axar took the wickets of Shane Watson,
Pawan Negi and Samuel Badree, Sandeep was the bowler to derail RCB's chase with
wickets of Chris Gayle, Virat Kohli and de Villiers inside the Powerplay.
Sandeep's three wickets were worth 4.86 on smart wickets. Axar's three were
worth 2.85.
One of the fun
aspects of this work has been the feedback of fans who followed IPL and ODI
world cup in the ESPNcricinfo website. Here is a representative collection of
comments from the website. The first commenter has words of
encouragement for the forecaster and one another is impressed by forecaster’s
early precise prediction.
In the comment “has
Forecaster seen this Sri Lanka team even bat” the commenter seems to be
skeptical of the forecaster’s prediction of a big score for Sri Lanka. It
turned out that the forecaster’s prediction in this case was quite accurate in
the end. Of course, the comment following that shows the interest of fans in
wanting direct access to the forecaster tool.
In summary, it was an incredible experience working at the intersection of
data science and cricket, both of which are exciting domains. Let me end this
blog with an answer to an interesting question that we pondered over when we
started to build these algorithms. At what point in the game will we get the
best predictions from our algorithms? Based on the performance of our
algorithms in IPL 2019 and the ODI world cup matches, we see that for T20
games, the 11th over predictions seem to be best and for ODI,
the 25th over predictions seem to be the best in terms of
accuracy of the final score predicted. As we can see, this is right about in
the middle of the game. This might be so because predictions towards the end
are generally plagued by random errors (with not enough overs to average them)
and predictions at the beginning might not have enough information about the
current game to work with.









Excellent
ReplyDeleteGreat blog , keep up the good work
ReplyDeleteNicely penned
ReplyDelete