Dan Cervone
November 2, 2015
Based on “box score” data:
Limitations:
What's in the box score:
What's missing:
Installed in 2013, tracks:
About 1 billion space-time points per season
Let \( \Omega \) be the space of all possible basketball possessions. For \( \omega \in \Omega \)
Regression-type prediction (points versus possession features):
Markov chains (discretizing \( Z_t \)):
Brute force, “God model'' for basketball:
Finite collection of states, \( \mathcal{C} \)
\( \mathcal{C}_{\text{poss}} \): Ball possession states {player} \( \times \) {region} \( \times \) {defender within 5 feet}
\( \mathcal{C}_{\text{trans}} \): Transition states
{{pass linking \( c, c' \in \mathcal{C}_{\text{poss}} \) },
{shot attempt from \( c \in \mathcal{C}_{\text{poss}} \)},
turnover in progress,
rebound in progress }
\( \mathcal{C}_{\text{end}} \): End states {made 2, made 3, turnover}
\[ \mathcal{C} = \mathcal{C}_{\text{poss}} \cup \mathcal{C}_{\text{trans}} \cup \mathcal{C}_{\text{end}} \]
Some useful stopping times:
\[
\begin{align*}
\color{blue}{\tau_t} &= \begin{cases}
\text{min} \{ s : s > t, C_s \in \mathcal{C}_{\text{trans}}\} & \text{if } C_t \in \mathcal{C}_{\text{poss}} \\
t & \text{if } C_t \not \in \mathcal{C}_{\text{poss}}
\end{cases} \\
\color{red} {\delta_t} &= \text{min}\{s : s \geq \tau_t, C_s \not \in \mathcal{C}_{\text{trans}} \}
\end{align*} \]
\[
\begin{align*}
\color{blue}{\tau_t} &= \begin{cases}
\text{min} \{ s : s > t, C_s \in \mathcal{C}_{\text{trans}}\} & \text{if } C_t \in \mathcal{C}_{\text{poss}} \\
t & \text{if } C_t \not \in \mathcal{C}_{\text{poss}}
\end{cases} \\
\color{red} {\delta_t} &= \text{min}\{s : s \geq \tau_t, C_s \not \in \mathcal{C}_{\text{trans}} \}
\end{align*} \]
Key assumptions:
Assume (A1)–(A2), then for all \( 0 \leq t < T \),
\[ \nu_t = \sum_{c \in \{ \mathcal{C}_{\text{trans}} \cup \mathcal{C}_{\text{end}} \}} \mathbb{E}[X | C_{\delta_t} = c]\mathbb{P}(C_{\delta_t} = c | \mathcal{F}^{(Z)}_t). \]
\[ \nu_t = \sum_{c \in \{ \mathcal{C}_{\text{trans}} \cup \mathcal{C}_{\text{end}} \}} \mathbb{E}[X | C_{\delta_t} = c]\mathbb{P}(C_{\delta_t} = c | \mathcal{F}^{(Z)}_t). \] Let \( M(t) \) be the event \( \{\tau_t \leq t + \epsilon\} \).
Monte Carlo computation of \( \nu_t \):
Player \( \ell \)'s position at time \( t \) is \( \mathbf{z}^{\ell}(t) = (x^{\ell}(t), y^{\ell}(t)) \).
\[ x^{\ell}(t + \epsilon) = x^{\ell}(t) + \alpha^{\ell}_x[x^{\ell}(t) - x^{\ell}(t - \epsilon)] + \eta^{\ell}_x(t) \]
Recall \( M(t) = \{ \tau_t \leq t + \epsilon \} \):
\[ \log(\lambda_j(t)) = [\mathbf{W}_j^{\ell}(t)]'\boldsymbol{\beta}_j^{\ell} + \xi_j^{\ell}\left(\mathbf{z}^{\ell}(t)\right) \]
LeBron James' shot-taking \( \xi \)
LeBron James' pass (to Center) \( \xi \)
Made/missed shots, 2013-14 season
Dwight Howard
Shrinkage needed:
Spatial effects \( \xi^{\ell}_j \)
Functional basis representation: \( \xi^{\ell}_j (\mathbf{z}) = [\mathbf{w}^{\ell}_j]'\boldsymbol{\phi}_j(\mathbf{z}) \).
Information sharing:
“Partially Bayes'' estimation of all model parameters:
Distributed computing implementation:
|
|
|
|
Our EPV framework can be extended to better incorporate unique basketball strategies:
Thanks to: