Wager or Investment?
Is this anything more than a wager?
At the end of day, not really. This is simply a monetary guess based on some level of historical data presented in a fashion to drive a decision one way or another. Wait… that kind of sounds like an investment strategy…
Well, in my summation it is an investment strategy. Not a fixed return 5% CD, but also not a penny-stock or shot in the dark on the next digital coin or blockchain company. It is a strategy based on machine learning models understanding the statistical cross-section for matches in a given league; run through thousands of simulations siphoning features to a point that allows for statistically accurate prediction on matches ending in a draw (and soon expanding to other outcomes). Have to add the disclaimer though, current results may not reflect historical outcomes or activity, lol. Seriously though, outcomes are dependent on sporting activity which by it’s very nature and definition is chaotic and difficult to gauge. The good thing here is that the strategy is based more on the longer term viewpoint than a micro-single-match one.
Why “Draws”
So why pick the worst outcome if you are cheering for one team or another? The cursed “kiss-the sister” outcome? A couple of reasons really. The first is the average odds for a draw are always fairly high and certainly not ever the favorite outcome. Secondly, it allowed me to move the model from a multi-classification question (Home Winner / Draw / Away Winner) to a binary one (Draw or not) that seemed to push the results forward and allowed for direct focusing on that single classification, whether the match ends in a draw. Given the average odds for a draw and the eventual payout, if a person could maintain a 33% precision rate (the success rate for when the model actually makes a prediction), they would be profitable… Although at that rate (1/3) it is a long hard slog to profitability. If one gets upward of 50% per league precision then the profit grows substantially.
A quick aside on “accuracy” and “precision”
This was one of the first concepts I wrapped my head around to better understand how the models are working and ways to drive them to higher levels of profitability. Simply put “accuracy” is the percentage of times the model predicated a draw would occur when a draw actually did occur. “Precision” is the average success rate of the model when it makes a prediction. Think of “accuracy” as the total surface area of matches that did end in a draw and precision as how many of the predictions ended up winning.
In order to really understand how the models were performing I needed to put in a place a client that had visualization capability. Literally, like the worst thing for my skill-set. Regardless, had some experience in Angular and much less in React so it was a pretty easy decision to make as I didn’t have a ton of time to spend learning another tech stack. Fast-forward and currently we have, broken up into 2 images as I suffer from “BAM: Big-Ass Monitor” syndrome):
So let’s break this out into 3 sections:
League Selection
The top icon bar has icon-buttons for each league supported. Selecting the league brings up the detail represented in the above lists and graph.
Initial Simulations / Refinements
These two sections list out the simulations (containers for a certain number of model executions) that have been run.. The Refinement section contains two types of simulations, a simple refinement (half-star) and then a final refinement (full-star) representing the basis for any future predictions being made. Breaking the entries down:
For refinements, the star again represents the level of refinement that was performed.
$xxxx value. This is the profit made over the last 100 matches. This is the test set for the final refinement model feature set and is the exact dollar value of profit one would walk away with by wagering $100 on each prediction. The value is pure profit, over and above any wager value made. So with the model shown above, one would have walked away with a profit of $4190 dollars on the basis of a $100 wager made on each prediction over that time period. 100 matches basically equates to a 2-3 months. Also, keep in mind this is one league out of 15…
Accuracy:Precision: The next column represents the percentage values for accuracy and precision described above. The key thing to note is getting both of these values north of 50%. With this model we are at 78% accuracy and 67% precision. Greatly above the profit objective.
Date that the simulation started
Action icon buttons are for displaying the model execution feature details, generating a refinement simulation and deleting the simulation and all model execution results.
Result Graph
Just a visualization of the results described above.
Final Disclaimer
I certainly realize that the numbers in the projection are a bit startling so let me be clear of what they exactly represent, specifically the $value for a simulation run. This is not the actual value I made but the theoretical value that could have been made by wagering on that strategy for the league. That value is the TEST SET and represents what the trained model DID predict for the last 100 matches. Read into what you will, I am always a bit skeptical but returns at the very least seem to be trending quite profitable.
This second image contains the importance values for the selected simulation (initial, refined or final refinement). Each row represents one of the features that has been engineered to offer into the model and the subsequent model importance and SHAP value. The model importance values are extracted directly from the scikit-learn metrics collection and indicate the importance of that feature to the outcome. The SHAP ( SHapley Additive exPlanations) value has origins in game theory and represent the amount a specific feature has driving it towards one conclusion or another.
The feature encoding is: “stat”_”operation”_”scope”_”domain”_”window” where
stat is the underlying stat providing basis to the feature
operation is the mathematical operation to take on the stat. Values include average/mean, variance, slope/trend and ratio.
scope equates to which matches to consider including all matches for a team, home-only, away-only or closest described in the first blog post
domain is which team stat to consider or whether this is a combined operation
window is the number of games temporally to include in the stat calculation.
So there it is, a quick summation of where the platform is currently as I get my arms around the returns that the models are showing and the wagering direction it will take. Thanks for reading! as this was a quick follow-up post on the original and certainly plan on more detailed ones about some of the interesting aspects of getting to this destination and the “art of bridge-engineering”. As a last entry here are the picks from the last weekend, and while I realize it is so easy to post after the fact, these are truly the matches I wagered on this weekend and can pass the receipts on if needed. Regardless, will start posting picks prior to the weekend but here it is / was:
The result column is the fiscal result of the wager. Either a minus -100 for a loss or the actual value with odds factored in. The ‘Win-Loss’ column is a running tally of well, wins and losses. The key column is the running balance, really profit, through the weekend. The balance does not include any of the entry fees and is simply pure profit. If the model lost all the matches the balance/profit would be -2000 dollars.