Editor’s note: This article appears in the September/October 2011 issue of Analytics Magazine.
By: Benjamin Alamar and Vijay Mehrotra
Over the past few years, the world of sports has experienced an explosion in the use of analytics. In this three-part series, we reflect on the current state of sports analytics and consider what the future of sports analytics may look like.
We define sports analytics as “the management of structured historical data, the application of predictive analytic models that utilize that data, and the use of information systems to inform decision makers and enable them to help their organizations in gaining a competitive advantage on the field of play.” Our definition is both expansive (in the sense that it includes not only statistical models but also the broader information value chain that surrounds these models) and restrictive (because it excludes traditional analytics applications such as demand forecasting, revenue management and financial modeling, all of which are certainly relevant in the business of professional sports). Our framework for sports analytics is presented in Figure 1.
Data management includes any and all processes associated with acquiring, verifying and storing data in an efficient manner. In a sports organization, data can come from a variety of sources and may be presented in many different forms.
As shown in Figure 1, the data management function will feed both the predictive analytics function and the information systems that support decision-makers. Given this crucial role, good data management is essential, and therefore missing, incomplete and/or inaccessible data inherently reduces the value of any other investments in analytics.
In many organizations, data is often stored in isolated silos, so that getting data is often not a smooth process. Different groups within an organization such as scouting or training may have extensive data on players that other groups either do not have access to or do not even know exist.
For example, the personnel group at one NFL team had been collecting extensive performance data on various groups of both opposing players and their own players. The coaching staff had no idea that the data existed, but when they did discover it, they had difficulty accessing it. The data resided in spreadsheets on the computers of the personnel group instead of being integrated into a common data archive. This is a common situation within professional sports organizations.
Predictive analysis, the next piece of the framework, is the process of applying statistical tools to data to gain insight into what is likely to happen in the future. In sports, this can involve the projection of the pro careers of amateur players, identifying how the strengths and weaknesses of an opponent will play out against your own team’s strengths and weaknesses, or assessing whether a free agent would fill a need on a team at an appropriate cost. Depending on the importance of the problem, the time until an answer is needed and the data available, these analyses can range from simple comparisons to extremely complicated and cutting-edge statistical analysis. The results of these analyses may feed directly into an intelligent information system that provides decision-makers with standardized results. Alternately, such results may be reported directly to a decision-maker for special projects that may be outside of any standard systems.
Information systems, the next component in the framework, are increasingly common in the world of sports. When designed and implemented correctly, such information systems typically allow for visualization and interactive analysis of relevant information from multiple sources in one place, organized in a meaningful way to provide insights for decision makers. For example, a cutting-edge sports information system might combine unstructured information from scouting reports, summary reports from multiple data sources and results from predictive models. Such a system not only provides a data-driven decision support platform and integrates data from multiple sources, but (as we will discuss in Part 2 of this series) also has the potential to fundamentally alter and enhance the way a decision-maker does his or her job.
Decision-makers are the ultimate customers for all components in the sports analytics framework. However, the modern professional sports organization typically has many different decision-makers, including the general manager, coaches, scouts, trainers, salary cap managers and other personnel executives. Decision-makers in different functional areas may utilize different data and models to tackle different types of questions. Conversely, as mentioned above, one key problem today is that decision-makers in one functional area (such as scouts) rarely have easy access to information generated by personnel in other areas (such as assistant coaches or salary cap managers).
To summarize, our definition and framework for sports analytics encompasses several different and related aspects associated with turning raw data into information that is valued by – and has an impact on – decision-makers in the world of sports.
An Explosion of Interest in Sports Analytics
Though still a nascent, unstructured field (as we will discuss in more detail in Part 3), interest and activity in sports analytics has been exploding in recent years.
While studies applying mathematical models to professional sports data can be traced back more than 50 years , it is important to remember what the world looked look like as recently as 2005 when the first issue of the Journal of Quantitative Analysis in Sports (www.bepress.com/jqas/) was published. At the time this journal was launched, only two or three NBA teams thought about using advanced statistics in connection with players and strategy. Michael Lewis’ seminal book , “Moneyball: The Art of Winning an Unfair Game,” about the Oakland A’s use of data and models had recently been published, and no one had yet thought seriously about the application of motion capture technology in the context of professional sports. Just six short years later, more than half of NBA teams now utilize the tools of analytics on the team side of the their operation, most MLB teams now consider analytics a normal part of baseball operations, and companies such as STATS LLC are installing cameras in NBA arenas and NFL stadiums to capture more and more data.
On a broader scale, the annual Sloan Sports Analytics Conference serves as a vivid symbol of the growth of sports analytics. The first Sloan conference took place in 2006 in a few classrooms on the MIT campus with less than 300 attendees. The 2011 conference was held at the Boston Convention Center and attracted more than 2,000 attendees.
An Explosion of Data
Data within a sports organization used to consist of individual box scores, player and team summary statistics, text-based scouting reports and raw game films. However, the data available to decision-makers has grown exponentially over the last 15 years.
Several factors have contributed to this explosion in data. Innovations in sports science, ranging from training routines to nutritional regimens, coupled with improved reporting from medical staffs and trainers have all come with their own data sets that are gathered and tracked somewhere within an organization. With improved communications via the Internet, the frequency and amount of information captured, stored and distributed by scouts and coaches at all levels has grown significantly. Thanks to increased computing power and reduced storage costs, historical data about the games themselves is now packaged into many different formats, with companies such as Stats LLC, StatDNA and Sports Data Hub emerging to provide organizations with high-quality historical data presented with unique summaries and indexes.
Finally, the advent of motion capture technology has expanded the data collected from each game. This technology tracks everything that moves on a field every 100th of a second. The impact of this is staggering for it transforms the amount of information captured for a single game from a few hundred rows of data to well over one million. Major League Baseball, the NBA and pro soccer teams have implemented this type of technology.
The result of all of this is clear: The world of sports generates far, far more data today than could have been imagined just a few short years ago. Dean Oliver, director of Publication Analytics at ESPN, has spoken of finding “data that can win championships.”
However, as the computer scientist Clifford Stoll has said, “Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom.” Too much time is still spent by analysts using their skills to try and answer questions that are not meaningful to decision-makers in pro sports. For example, very little is interesting about the next new statistic that ranks all NBA players. General managers do not lose sleep trying to figure out who the best player in the game is, as that information is neither accurate nor actionable. Conversely, Mark Cuban, owner and general manager of the Dallas Mavericks, has often cited studies that either predict or examine the effects of injuries as delivering useful and actionable information to his coaching staff and team.
In other words, despite the remarkable growth in the amount and variety of data available of examination and analysis, the world of sports analytics still faces the same ubiquitous challenge: How to get meaningful information into the hands – and minds – of the people who are in a position to make effective use of it.
In Part 2, we examine some of the predictive models that are being used to create actionable information in the world of sports today and the information systems that effectively deliver valuable information to decision-makers.
Benjamin Alamar (email@example.com) is the founding editor of the Journal of Quantitative Analysis in Sports, a professor of sports management at Menlo College and the director of Basketball Analytics and Research for the Oklahoma City Thunder of the NBA. He is co-author of the annual “Football Outsiders Almanac” and a regular contributor to the Wall Street Journal.
Vijay Mehrotra (firstname.lastname@example.org) is an associate professor, Department of Finance and Quantitative Analytics, School of Business and Professional Studies, University of San Francisco. He is also an experienced analytics consultant and entrepreneur, an angel investor in several successful analytics companies and a San Francisco Giants season-ticket holder.
- Lindsey, G. R. “Statistical Data Useful for the Operation of a Baseball Team,” Operations Research, Vol. 7, No. 2, March-April 1959, pp. 197-207.
- Moneyball: The Art of Winning an Unfair Game.