OpenFloor #1: How well does each factor predict winning?
Understanding importance of four factors.
I was listening to Michael Beuoy’s presentation at SSAC25. I know Beuoy from his beautiful website which I mainly use to check stuff related to basketball. However, it turns out he got into volleyball as well, and attempted to come up with volleyball’s own four factors. If you don’t know what four factors are, the longer name is “Four Factors of Basketball Success” which might be more explanatory for you. Those factors are shooting (measured by eFG%), turnovers (measured by TOV%), rebounding (measured by REB%), and free throws (measured by FT%). These can be calculated both for offense and defense, which is what I did in the previous post.
Anyway, as I was saying, Mike Beuoy tried to come up with volleyball’s own four factors. When creating such metrics, one of the aspects that one should take into account is their importance in terms of winning. This gets mentioned a lot in NBA convos on different social media platforms and recently Gabriel Guzman added a function to his website that reflects it as well (you can select a metric and see how often a team wins when it performs better than its opponent in respect to selected metric).
These are nice but they are all related to NBA. This raises an issue called space generalizability: Does the conclusions we ended up with NBA data generalizable to different league (such as EuroLeague)? So, let’s try to see how it looks like for the most recent regular season of EuroLeague.
Simple Way
One of the simplest ways to check for how four factors are related to winning is via framing the question as conditional probability question:
If someone tells me that a team did better than its opponent on eFG%, how well does that tell me about who won the game? In other words, what’s the probability of winning given that a Team A did better than Team B in eFG%: P(Win | Team_A’s eFG% > Team_B’s eFG%)?
Teams that outperform their opponents in eFG% end up winning 77.3% of the time.
Teams with lower TOV% prevail in 58.6% of games.
Teams with higher OREB% end up winning 55.2% of the time.
Teams with better FT rate (FTr) end up winning 55.8% of the time.
I also checked how those change when I exclude low importance possessions, possessions that don’t move the needle much in terms of win probability. Those can be found on the ugly1 plot above.
Less Simple Way
Well, the approach above does not control for other variables. Thus, it’s hard to attribute a win to a single factor since a team might be doing better than its opposition in respect to more than one factors.
I framed the problem in a different way: Instead of predicting the outcome as win/loss, I decided to fit a statistical model to predict point margin in the end. Although the model gives me a point prediction (i.e., a single value), there is some uncertainty in that prediction. In other words, there is some error to that prediction. After checking some assumption, I ended up with distributions of predicted results (shown below) when only one factor differs by 1%, while the others remain equal and home-court advantage is taken into account.
Estimated probabilities of winning can be read for each, under the title of each plot. However, because of the approach that I’ve taken, these should be slightly off. I assume that the estimated coefficients for each factor are known, but in reality there is uncertainty associated with those estimates as well. In order to incorporate uncertainty associated with coefficient estimates alongside uncertainty in predictions, I opted for combining simulations with Bayesian methods. Here are the results (again adjusted for home-court advantage):
Shaded areas reflect the estimated probability. I’d prefer the latter approach since it includes the uncertainty in parameters alongside the predictions.
From a ranking perspective, it seems to be in line with how Dean Oliver assigns weights to these factors: Shooting > Turnovers > Rebounding > Free Throws.
However, I wasn’t expecting turnovers to be that significant. I wonder if it’s related to percentage of live turnovers in the EuroLeague and whether there’s a meaningful difference compared to the NBA.
In addition, I didn’t check how likely it is to observe a 1% difference in each. It looks like I’ll have to pick this up on another post!
I don’t like making plots in Python (I prefer R for that). I used Python for this post, so plots feel ugly. When you see better looking plots/tables you can be sure that I used R during the process. Anyway, I’m hoping to improve my plotting skills in Python as well but until then…