Methodology & How It Works — PredictPal

1. Wilson Confidence Interval

In sports statistics, a very common error is to overestimate a player's success rate based on too few matches. For example, if a player loses Set 1 and comes back to win in the only recorded match, their simple "comeback rate" would be 100%. Obviously, this is unrealistic.

To correct this small-sample bias, PredictPal calculates the 95% Wilson Confidence Intervals (Score Interval). The formula for the lower ($L$) and upper ($U$) bounds of the proportion is:

$$ p_{lo, hi} = \frac{\hat{p} + \frac{z^2}{2n} \pm z \sqrt{\frac{\hat{p}(1-\hat{p})}{n} + \frac{z^2}{4n^2}}}{1 + \frac{z^2}{n}} $$

Where $\hat p$ is the observed rate (e.g., $k/n$), $n$ is the sample size (e.g., matches where Set 1 was lost), and $z$ is the quantile of the standard normal distribution ($z = 1.96$ for a 95% confidence level).

If a player has few samples ($n$), the Wilson interval widens significantly, alerting us visually to the metric's high uncertainty.

2. Resolution Chain (Lookup Chain)

To guarantee robust estimations, we implement a cascading fallback hierarchy that prioritizes information with the highest statistical relevance:

1
Matchup History (H2H): If both players have at least 10 recorded matches under the current conditional state, the direct conditional probability of that matchup is used along with its respective Wilson Interval.
2
Player Average: If the head-to-head matchup has insufficient samples, the model falls back to evaluate the conditional performance of each player separately against the entire league (provided they have played at least 30 matches overall).
3
League Average: If the player is new or has very few samples in the system, we apply the average transition rates of the entire league (Setka Cup or TT Elite Series).
4
Constant Value (50%): As an absolute last resort if no league data is available, pure chance is assumed (0.5).

3. Calibration and Brier Score

To audit the quality of our conditional probabilities, we automatically calculate the Brier Score (BS) metric for every set state prediction (Sets 1+2). The Brier Score is defined as:

$$ BS = \frac{1}{N} \sum_{i=1}^{N} (f_i - o_i)^2 $$

Where $f_i$ is the conditional probability assigned by our model to the home player's victory, and $o_i$ is the observed binary outcome ($o_i = 1$ if the home player wins the match, $o_i = 0$ if they lose).

A Brier Score of 0.0 represents a perfect prediction (absolute certainty matching the outcome). A model that consistently assigns a 50% probability (pure chance) will get a Brier Score of 0.250. The PredictPal v0 model operates within an optimal calibration range of 0.150 – 0.165, demonstrating a robust predictive edge over chance.

📌 Why Conditional?

In fast-format professional table tennis (like Setka Cup), emotional momentum and rapid dynamics after winning or losing the first set cause drastic performance shifts. Static pre-match analysis often overlooks these patterns. By evaluating dynamic probabilities conditioned on partial scores, PredictPal captures behavioral anomalies that traditional odds take time to adjust for.

📊 Clean Data Format

The clean data used is sourced from a robust ETL pipeline that discards incomplete matches, premature retirements, walkovers, or scoreboard capture anomalies. Only 98.4% of raw table tennis matches pass our quality filter to be incorporated into the conditional prediction model.