Thisis an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
American football is an appealing field of research for the use of information technology. While much effort is made to analyze the offensive team in recent years, reasoning about defensive behavior is an emergent topic. As defensive performance and positioning largely contribute to the overall success of the whole team, this study introduces a method to simulate defensive trajectories. The simulation is evaluated by comparing the movements in individual plays to a simulated league average behavior. A data-driven ghosting approach is proposed. Deep neural networks are trained with a multi-agent imitation learning approach, using the tracking data of players of a whole National Football League (NFL) regular season. To evaluate the quality of the predicted movements, a formation-based pass completion probability model is introduced. With the implementation of a learnable order invariant model, based on insights of molecular dynamical machine learning, the accuracy of the model is increased to 81%. The trained pass completion probability model is used to evaluate the ghosted trajectories and serves as a metric to compare the true trajectory to the ghosted ones. Additionally, the study evaluates the ghosting approach with respect to different optimization methods and dataset augmentation. It is shown that a multi-agent imitation learning approach trained with a dataset aggregation method outperforms baseline approaches on the dataset. This network and evaluation scheme presents a new method for teams, sports analysts, and sports scientists to evaluate defensive plays in American football and lays the foundation for more sophisticated data-driven simulation methods.
American football is a widely used sport for the statistical evaluation of the performances of teams. Performance indicators for play-by-play data such as expected points added (EPA1), the defense-adjusted value over average (DVOA2), and defensive passing and rushing yards help to evaluate defensive plays (Cohea and Payton, 2011). Tracking data is also incorporated to evaluate single plays or specific game situations (Yurko et al., 2019). As demonstrated by recent Superbowl winners, defensive effectiveness has a major impact on winning. An emergent example for the acknowledgment of this fact is coach Paul Bryants' famous mantra:
Improving defensive behavior is, therefore, a major predictor for winning championships. Of the last eight Superbowl winners, four were ranked first or second in overall defensive rating in the league. In contrast to the offensive ratings, where just one team was ranked first or second. Hence, coaches' decisions on providing strategies for offense are important. However, defense is a key element for winning and statistics prove that3. It is cumbersome to imagine all possible defensive formations applicable for a specific offense. Furthermore, it is hard to determine which defender contributed to a specific defensive play, as defensive outcomes are commonly evaluated as a team achievement. With the emergence of tracking data, it is possible to cluster and classify specific contributions of defensive players, which helps to choose the right player in the corresponding play.
Modeling defensive behavior by simulating possible running trajectories of defensive players, knowing the behavior of the offense can be a notable tool. This could be used for setting up tactics beforehand or for the usage in retro-perspective analysis. Furthermore, this method can provide offensive and defensive coaches a tool to adapt their decision for the play strategy.
The presented ghosting model is capable of generating movement trajectories of the defensive teams via imitation learning, from the time the ball is snapped until the quarterback throws a pass forward. The model is evaluated using the expected pass completion probability at the moment the quarterback throws the pass forward.
The proposed model provides support for the decision-making of defensive coaches and helps with the evaluation of defensive strategies. It can be incorporated with media or fan applications or be used for extensive match analysis.
The availability of tracking data in American football led to an increased amount of projects about evaluation and application engineering. The most commonly used tools in the area of team performance analytics are advanced statistical methods as well as machine learning and artificial intelligence. This chapter is divided into three parts, statistical methods, neural networks, and imitation learning.
Statistical methods have flourished in the past several years, and expanded the highly competitive landscape of sports analysis. Fernandez and Bornn (2018) modeled pitch control with a parametric approach to model influence areas of specific players with Gaussian functions and add the influence of each player to a team influence model based on the spatial coordinates on the field. Dutta et al. (2019) investigated defensive player behavior by classifying the behavior of defensive backs on two different coverage schemes, man coverage and zone coverage, using Gaussian mixture models to capture the state in an unsupervised manner.
Offensive player routes were analyzed with neural networks by recognizing and classifying running routes into different categories from wide out routes and backfield routes to compare the number of routes ran by the offense and the probability of targeting a receiver in that route (Team, 2019). Mehrasa et al. (2017) reduced player trajectories with one-dimensional convolutional neural networks for play recognition and team classification in basketball and ice hockey. The authors conclude, that franchise player or starting lineups contribute heavily to the team classification and identification using tracking data. Burke (2019) used deep neural networks to analyze the decision-making of quarterbacks and compute the pass completion probabilities of the quarterback with respect to the position of receivers and the closest defenders. Deshpande and Evans (2020) picked up this idea and extend the model in a more sophisticated way, by incorporating hypothetical pass probabilities in a Bayesian non parametric catch probability model. Most of the features of models regarding hypothetical passes are unobserved and, therefore, impute observable inputs.
Imitation learning yields multiple areas of operation in sport. Seidl et al. (2017) proposed a sketching tool for basketball play-by-play analysis, where they also use imitation learning to synthesize NBA defense. Coordinated multi-agent imitation learning was first proposed by Le et al. (2017) and was validated to be superior to an unstructured solution of a predator-prey problem, called the pursuit domain and on a soccer domain, where the results also showed a smaller loss in the coordinated case with respect to unstructured behavior. The training of the ghosted soccer players was done with Long Short Term Memory (LSTM) layers, while the Pursuit Domain was modeled with a random forest. A main finding of the study is the benefit of the alternating training of the model Hochreiter and Schmidhuber (1997) and the cascading training process of the LSTM layers for the problem.
The recently proposed methods and the extensive work by the NFL to make advanced statistics publicly available is the motivation to build a ghosting model for the defensive player trajectories of American football. Imitation learning is used to predict the trajectories of defensive players. Subsequently, these predicted positions are evaluated by comparison with the actual positions using a pass completion probability model.
In this study, a method to simulate individual and collective defensive behavior of American football players from the time of ball snap until the quarterback throws the pass forward is developed. A big aim of cornerbacks and safeties is trying to intercept passes or prevent offensive receivers from running the ball after the catch. Other defenders (e.g., linebacker, defensive end) try to rush and tackle the quarterback, so the pass cannot even be thrown. To date, to the best of our knowledge, no model accounts for the different team strategies or the contribution of individual defensive players to the outcome of the play.
The proposed ghosting model takes advantage of a comprehensive representation of tracking data. Individual defensive players are modeled with the positional information of the offensive team. As ghosted trajectories do not behave like the true running trajectory, a learned pass completion probability model, similar to previous study (Burke, 2019; Deshpande and Evans, 2020) is proposed to evaluate the true running trajectories with the synthesized trajectories.
Motivational Play: Eli Manning (10) throws a pass short right to Cody Latimer (12) for 8 yards before stopped by a tackle. The the trajectories of offensive teams are displayed in orange, the trajectories of defensive teams are displayed in blue, and the ball trajectory in red. The line of scrimmage is labeled and displayed in yellow.
Pass completion can be modeled in various ways. The NFL introduced a model to evaluate pass completion probabilities of specific players based on 10 features corresponding to every receiver (Team, 2018). With this method, it is difficult to simultaneously evaluate the positions all players', as every single route is computed and player-to-player comparison is conducted. Consequently, a single evaluation metric cannot be generated without engineered adjustments. To circumvent this issue, the pass is captured as a binary problem for the entire team in this study. This simplification helps to capture the completion probability and combines the probabilities of player-to-player single routes analysis in a model where the different routes are automatically combined in an end-to-end approach. In the model, y captures whether the pass was caught, given the specific formation and speed of the players, neglecting the targeted player. The following formulas illustrate that this issue can be considered a binary classification problem with a completion probability:
3a8082e126