Predicting Basketball Possession Outcomes Using Optical Tracking Data

Dan Cervone
November 2, 2015

Traditional Basketball Analytics

Based on “box score” data:

Made/missed shots.
Assists.
Rebounds, blocks, steals, turnovers, fouls.

Limitations:

Data misses meaningful actions and events.
Players depend on teammates.
Actions analyzed out of context.

Box Score Data

Example possession

What's in the box score:

Made shot by LeBron James.
Assist by Rashard Lewis.

What's missing:

Norris Cole's dribble penetration.
Characterization of defense.
LeBron James' dribble penetration.

NBA Optical Tracking Data

sportvu

NBA Optical Tracking Data

Installed in 2013, tracks:

\( (x,y) \) locations of all 10 players
\( (x,y,z) \) locations of ball
25 observations per second

About 1 billion space-time points per season

Expected Possession Value (EPV)

epv_ticker

EPV Definition

Let \( \Omega \) be the space of all possible basketball possessions. For \( \omega \in \Omega \)

\( X(\omega) \in \{0, 2, 3\} \): point value of possession \( \omega \).

\( T(\omega) \): possession length

\( Z_t(\omega), 0 \leq t \leq T(\omega) \): time series of optical tracking data.

\( \mathcal{F}^{(Z)}_t = \sigma(\{Z_s^{-1}: 0 \leq s \leq t\}) \): natural filtration.

Definition: The expected possession value (EPV) at time \( t \geq 0 \) during a possession is \( \nu_t = \mathbb{E}[X|\mathcal{F}^{(Z)}_t] \).

EPV provides an instantaneous snapshot of the possession's value, given its full spatiotemporal history.

\( \nu_t \) is a Martingale: \( \mathbb{E}[\nu_{t + h} | \mathcal{F}^{(Z)}_t] = \nu_t \) for all \( h > 0 \).

Calculating EPV

Regression-type prediction (points versus possession features):

Data are not traditional input/output pairs.

No guarantee of stochastic consistency.

Markov chains (discretizing \( Z_t \)):

Information is lost through discretization.

Many rare transitions.

Brute force, “God model'' for basketball:

\( \mathbb{P}(Z_{t + \epsilon} | \mathcal{F}^{(Z)}_t) \) full-resolution transition Kernel.
Allows Monte Carlo calculation of \( \nu_t \) by simulating future possession paths.
\( Z_t \) is high dimensional and includes discrete events (passes, shots, turnovers).

A Coarsened Process

Finite collection of states, \( \mathcal{C} \)

\( C_t \in \mathcal{C} \): state of the possession at time \( t \).
Observable states: \( C_t = h(Z_t) \).
\( C^{(0)}, C^{(1)}, \ldots, C^{(K)} \): discrete sequence of distinct states.

State Space

\( \mathcal{C}_{\text{poss}} \): Ball possession states {player} \( \times \) {region} \( \times \) {defender within 5 feet}

\( \mathcal{C}_{\text{trans}} \): Transition states {{pass linking \( c, c' \in \mathcal{C}_{\text{poss}} \) },
{shot attempt from \( c \in \mathcal{C}_{\text{poss}} \)},
turnover in progress,
rebound in progress }

\( \mathcal{C}_{\text{end}} \): End states {made 2, made 3, turnover}

\[ \mathcal{C} = \mathcal{C}_{\text{poss}} \cup \mathcal{C}_{\text{trans}} \cup \mathcal{C}_{\text{end}} \]

Possible Paths for C

Some useful stopping times:

\[ \begin{align*} \color{blue}{\tau_t} &= \begin{cases} \text{min} \{ s : s > t, C_s \in \mathcal{C}_{\text{trans}}\} & \text{if } C_t \in \mathcal{C}_{\text{poss}} \\ t & \text{if } C_t \not \in \mathcal{C}_{\text{poss}} \end{cases} \\ \color{red} {\delta_t} &= \text{min}\{s : s \geq \tau_t, C_s \not \in \mathcal{C}_{\text{trans}} \} \end{align*} \]

Stopping Times for Switching Resolutions

\[ \begin{align*} \color{blue}{\tau_t} &= \begin{cases} \text{min} \{ s : s > t, C_s \in \mathcal{C}_{\text{trans}}\} & \text{if } C_t \in \mathcal{C}_{\text{poss}} \\ t & \text{if } C_t \not \in \mathcal{C}_{\text{poss}} \end{cases} \\ \color{red} {\delta_t} &= \text{min}\{s : s \geq \tau_t, C_s \not \in \mathcal{C}_{\text{trans}} \} \end{align*} \]
Key assumptions:

(A1) \( C_t \) is marginally semi-Markov.
(A2) For all \( s > \delta_t \), \( \mathbb{P}(C_s | C_{\delta_t}, \mathcal{F}^{(Z)}_t) = \mathbb{P}(C_s | C_{\delta_t}) \).

Assume (A1)–(A2), then for all \( 0 \leq t < T \),

\[ \nu_t = \sum_{c \in \{ \mathcal{C}_{\text{trans}} \cup \mathcal{C}_{\text{end}} \}} \mathbb{E}[X | C_{\delta_t} = c]\mathbb{P}(C_{\delta_t} = c | \mathcal{F}^{(Z)}_t). \]

Multiresolution Models

\[ \nu_t = \sum_{c \in \{ \mathcal{C}_{\text{trans}} \cup \mathcal{C}_{\text{end}} \}} \mathbb{E}[X | C_{\delta_t} = c]\mathbb{P}(C_{\delta_t} = c | \mathcal{F}^{(Z)}_t). \] Let \( M(t) \) be the event \( \{\tau_t \leq t + \epsilon\} \).

(M1) \( \mathbb{P}(Z_{t + \epsilon} | M(t)^c, \mathcal{F}^{(Z)}_t) \): the microtransition model.

(M2) \( \mathbb{P}(M(t) | \mathcal{F}^{(Z)}_t) \): the macrotransition entry model.

(M3) \( \mathbb{P}(C_{\delta_t} | M(t), \mathcal{F}^{(Z)}_t) \): the macrotransition exit model.

(M4) \( \mathbf{P} \), with \( P_{qr} = \mathbb{P}(C^{(n+1)} = c_r | C^{(n)} = c_q) \): the Markov transition probability matrix.

Monte Carlo computation of \( \nu_t \):

Draw \( C_{\delta_t} | \mathcal{F}^{(Z)}_t \) using (M1)–(M3).

Calculate \( \mathbb{E}[X | C_{\delta_t}] \) using (M4).

Microtransition Model

Player \( \ell \)'s position at time \( t \) is \( \mathbf{z}^{\ell}(t) = (x^{\ell}(t), y^{\ell}(t)) \).

\[ x^{\ell}(t + \epsilon) = x^{\ell}(t) + \alpha^{\ell}_x[x^{\ell}(t) - x^{\ell}(t - \epsilon)] + \eta^{\ell}_x(t) \]

\( \eta^{\ell}_x(t) \sim \mathcal{N} (\mu^{\ell}_x(\mathbf{z}^{\ell}(t)), (\sigma^{\ell}_x)^2) \).
\( \mu_x \) has Gaussian Process prior.
\( y^{\ell}(t) \) modeled analogously (and independently).
Different parameters for all players, offense (ball carrier or not) and defense.

Microtransition Acceleration Effects

parker_with howard_with

Microtransition Acceleration Effects

parker_without howard_without

Macrotransition Entry Model

Recall \( M(t) = \{ \tau_t \leq t + \epsilon \} \):

Six different “types”“, based on entry state \( C_{\tau_t}, \cup_{j=1}^6 M_j(t) = M(t) \).
Hazards: \( \lambda_j(t) = \lim_{\epsilon \rightarrow 0} \frac{\mathbb{P}(M_j(t) | \mathcal{F}^{(Z)}_t)}{ \epsilon} \).

\[ \log(\lambda_j(t)) = [\mathbf{W}_j^{\ell}(t)]'\boldsymbol{\beta}_j^{\ell} + \xi_j^{\ell}\left(\mathbf{z}^{\ell}(t)\right) \]

\( \mathbf{W}_j^{\ell}(t), \boldsymbol{\beta}_j^{\ell} \): time-varying covariates and coefficients.
\( \xi_j^{\ell} \): spatial effect on log-hazard.

Spatial Effects

lebron_take LeBron James' shot-taking \( \xi \)

lebron_pass LeBron James' pass (to Center) \( \xi \)

Hierarchical Modeling

howard_chart Made/missed shots, 2013-14 season

howard_face Dwight Howard

Hierarchical modeling

Shrinkage needed:

Across space.

shot_corr

Across different players.

player_graph

Basis Representation of Spatial Effects

Spatial effects \( \xi^{\ell}_j \)

\( \ell \): ballcarrier identity.

\( j \): macrotransition type (pass, shot, turnover).

Functional basis representation: \( \xi^{\ell}_j (\mathbf{z}) = [\mathbf{w}^{\ell}_j]'\boldsymbol{\phi}_j(\mathbf{z}) \).

\( \boldsymbol{\phi}_j = (\phi_{ji} \: \: \ldots \phi_{jd})' \): \( d \) spatial basis functions.

\( \mathbf{w}_j^{\ell} \): weights/loadings.

Information sharing:

\( \boldsymbol{\phi}_j \) allows for non-stationarity, correlations between disjoint regions

\( \mathbf{w}_j^{\ell} \): weights across players follow a CAR model based on player similarity graph \( \mathbf{H} \).

Bases for Hierarchical Models

Basis functions \( \boldsymbol{\phi}_j \) learned in pre-processing step:

Graph \( \mathbf{H} \) learned from players' court occupancy distribution:

bplots h_basis

Inference

“Partially Bayes'' estimation of all model parameters:

Multiresolution transition models provide partial likelihood factorization
All model parameters estimated using R-INLA software

Distributed computing implementation:

Preprocessing involves low-resource, highly parallelizable tasks.
Parameter estimation involves several CPU- and memory-intensive tasks.
Calculating EPV from parameter estimates involves low-resource, highly parallelizable tasks.

New Insights from Basketball Possessions

ticker_again

New Metrics for Player Performance

EPV-Added: Top 10 and bottom 10 players by EPV-added (EPVA) per game in 2013-14, minimum 500 touches during season.

Rank	Player	EPVA
1	Kevin Durant	3.26
2	LeBron James	2.96
3	Jose Calderon	2.79
4	Dirk Nowitzki	2.69
5	Stephen Curry	2.50
6	Kyle Korver	2.01
7	Serge Ibaka	1.70
8	Channing Frye	1.65
9	Al Horford	1.55
10	Goran Dragic	1.54

Rank	Player	EPVA
277	Zaza Pachulia	-1.55
278	DeMarcus Cousins	-1.59
279	Gordon Hayward	-1.61
280	Jimmy Butler	-1.61
281	Rodney Stuckey	-1.63
282	Ersan Ilyasova	-1.89
283	DeMar DeRozan	-2.03
284	Rajon Rondo	-2.27
285	Ricky Rubio	-2.36
286	Rudy Gay	-2.59

New Metrics for Player Performance

Shot satisfaction: Top 10 and bottom 10 players by shot satisfaction in 2013-14, minimum 500 touches during season.

Rank	Player	Satis.
1	Mason Plumlee	0.35
2	Pablo Prigioni	0.31
3	Mike Miller	0.27
4	Andre Drummond	0.26
5	Brandan Wright	0.24
6	DeAndre Jordan	0.24
7	Kyle Korver	0.24
8	Jose Calderon	0.22
9	Jodie Meeks	0.22
10	Anthony Tolliver	0.22

Rank	Player	Satis.
277	Garrett Temple	-0.02
278	Kevin Garnett	-0.02
279	Shane Larkin	-0.02
280	Tayshaun Prince	-0.03
281	Dennis Schroder	-0.04
282	LaMarcus Aldridge	-0.04
283	Ricky Rubio	-0.04
284	Roy Hibbert	-0.05
285	Will Bynum	-0.05
286	Darrell Arthur	-0.05

Acknowledgements and Future Work

Our EPV framework can be extended to better incorporate unique basketball strategies:

Additional macrotransitions can be defined, such as pick and rolls, screens, and other set plays.
Use more information in defensive matchups (only defender locations, not identities, are currently used).
Summarize and aggregate EPV estimates into useful player- or team-specific metrics.

Thanks to:

Luke Bornn, Alex D'Amour, Alex Franks, Kirk Goldsberry, Andrew Miller.

Moore/Sloan Foundations.