Predicting IPL Chases with KNN


"It's not over until it's over."
We've all heard this saying and it is more than true in the IPL. Over the years, we've witnessed countless last-ball thrillers, impossible finishes, and chases that defied every logical prediction. But which teams truly defied the odds? Who managed to chase down targets that, to the average viewer, seemed completely out of reach?
In this blog, I try to answer that question. I’ve built a simple win probability predictor for IPL chases using the K-Nearest Neighbors (KNN) algorithm, a beginner-friendly and simple but a surprisingly effective approach.
So let’s dive in and explore how we can use machine learning to predict the probability of chasing a target in an IPL match.
Match Scenarios
Before we start predicting win probabilities, it's important to first understand what actually influences a run chase in a cricket match. In my case, due to the limitations of the data I had, I chose four core features that I believe capture the essence of a chase. These features are calculated for every ball in the second innings:
Runs Required
Balls Left
Wickets in Hand
Required Run Rate
I believe these four variables cover the most important aspects of a run chase. Of course, I haven’t accounted for factors like the quality of the batter or bowler, or pitch conditions because those are hard to quantify objectively, especially with only ball-by-ball data. Similarly, external factors like match pressure or tournament stage have also been left out.
The idea here is to assume that these four features are enough to generalize most chases, at least well enough for a basic model like KNN.
For example, the model will look at match situations like:
"80 runs required off 40 balls with 6 wickets in hand", and predict the probability of the chasing team going on to win from that point.
Applying KNN to IPL Chases
As I also mentioned in my earlier blog on natural language querying for IPL stats, I’ve built an extensive ball-by-ball PostgreSQL data warehouse using data from Cricsheet. This setup allows me to extract detailed features from matches and analyze them in meaningful ways.
For this project, I queried the data warehouse to extract all the relevant features discussed earlier: runs required, balls left, wickets in hand, and required run rate, for every delivery in the second innings of all IPL matches.
Alongside these features, the dataset also includes a label:
is_win
→ A binary value indicating whether the chasing team went on to win the match.
This is the class we’re trying to predict the probability of, given a match state, how likely is it that the chasing team will win?
Choosing the Right Value of K
One important decision in using KNN is picking the right value of K.
Ideally, K should be an odd number to avoid ties during voting.
A small K can lead to overfitting, as predictions may be overly sensitive to noise.
A large K might lead to underfitting.
To find a good balance, I needed a way to evaluate how good a given K value is. That’s where I discovered the Brier Score: a metric used to evaluate the accuracy of probabilistic predictions.
The Brier Score measures the mean squared difference between predicted probabilities and actual outcomes.
(You can read more about it here on Wikipedia.)
Using this metric, I evaluated multiple values of K and chose the one that gave me the best Brier Score, i.e., the most reliable probability estimates.
Real Match Scenarios
To evaluate how the model performs on real-world data, I tested it against historical IPL chases. Each scenario includes match context and the model’s predicted win probability at key points in the innings.
1. RCB vs KKR (IPL 2019)
At the end of the 17th over, Kolkata Knight Riders required 53 runs off the final 18 balls, with 6 wickets in hand. This put the required run rate close to 18 per over, a scenario where most teams historically struggle to finish successfully.
The KNN model predicted a win probability of 6.98% at this point, reflecting the low success rate of similar past scenarios. Despite the prediction, KKR completed the chase with 5 balls to spare, an outlier result, driven by an unusually high-scoring end phase.
2. PBKS vs RR (IPL 2020)
Rajasthan Royals were chasing a large target and needed 51 runs off the last 3 overs. The situation, with the required run rate exceeding 17 and only 4 wickets remaining, typically results in a loss in most historical matches.
The model predicted a win probability of 18.6% at the 17-over mark. However, following an over in which 30 runs were scored (including five sixes), the model updated its estimate to 83.72% with 21 runs required from 12 balls. The final result was a successful chase, consistent with the updated probability.
Some more examples worth checking out:
PBKS vs KKR (IPL 2025): KKR failed to chase a modest target of 112. They looked in control for a large part of the innings, but a sudden batting collapse shifted the win probability sharply in PBKS's favor, and KKR eventually lost the match.
LSG vs RCB (IPL 2025): RCB successfully chased the third-highest target in IPL history. Based on historical patterns, the model initially assigned low win probabilities. However, as RCB gained momentum and wickets were preserved, the probability steadily rose, eventually reflecting a favorable outlook before they won.
There are many more examples I probably don’t remember right now. You can explore them yourself through the live demo. It allows you to track how win probability evolves ball by ball for any IPL match.
These scenarios demonstrate that the model captures the expected patterns of past data, while dynamically adjusting to real-time match changes. Whether it’s a collapse or a comeback, the model reflects shifts in momentum as they happen.
Reflections, Limitations, and What's Next
While the KNN-based win probability predictor does a decent job of capturing the dynamics of IPL chases, it’s important to recognize its limitations.
Context Blindness: The model doesn't account for the quality of batters, bowlers, or the pitch. A scenario like “80 off 40 balls with 6 wickets” is treated the same irrespective of the strength of the batters.
No Real-Time Momentum: Although win probability is updated ball by ball, the model lacks awareness of recent momentum shifts like a batter hitting 3 sixes in an over or a sudden batting collapse, unless it’s reflected in the current state variables (e.g. fewer balls remaining, fewer runs required or fewer wickets in hand).
Non-quantifiable Factors: Things like pressure situations, crowd support, dew, and match importance (league game vs final) aren’t captured. These factors, although not majorly but can have some effect on the final outcome.
Despite this, the model still provides a reasonable approximation of win probabilities based purely on historical patterns. It gives fans and analysts a way to put numbers to feelings like “this match is slipping away” or “they still have a chance.”
Future Directions
This was a first step using a simple model on curated data. There’s a lot of room to grow:
Incorporating player-level stats to add context to who's batting and bowling.
Using more advanced models like gradient boosting or neural networks.
Expanding beyond chases: What if we could predict par scores or bowling win probabilities too? We could also use win probability contributions from each player to measure a player's "Impact."
Thanks for reading! You can check out the demo here and explore any IPL match yourself to see how the model reacts. Feel free to reach out or leave suggestions, always open to feedback and cricket discussions!
Subscribe to my newsletter
Read articles from Sanchit Jain directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Sanchit Jain
Sanchit Jain
Hi! I'm Sanchit. A curious mind always exploring data, coding, and AI/ML. I blog about my learning journey, diving into everything from analytics to artificial intelligence and machine learning. Join me as I write, learn, and share insights along the way!