Machine learning meets sports
Repository | https://github.com/kaushal1014/research-paper-template-Athlete-Injuries |
Copyright | Default |
Machine learning meets sports: Predicting the injuries of NFL players due to variation in playing surface
Hello, my name is Kaushal,
I am a student at Delhi Public School Bangalore East studying in the 12th grade who is passionate about computer science and swimming. This is the moment I have lived many times in my sports career. Being a competitive swimmer for years, I have experienced what inevitable injuries mean to a sports person. Injuries are part of the sports that we all experience, but what one does not know is the agony and the pain that a sports person goes through of not being able to practice every day or missing those competitions, or even losing those nail-biting finishes as pains stop letting you give your 100%. Now that in the covid times, my competitive swimming career had temporarily been suspended, and I have also devoted my time to my second passion developing skillsets in machine learning and AI.
A 2-year consecutive state medalist in swimming at the 2018 and 2019 junior championships in Karnataka. In my swimming journey, I have endured injuries that have not only hindered my performance but also have kept me out of practice. To help the athlete community, I have decided to combine my two passions, computer science and love of sport, to help better understand the usage of Machine learning in sports. Guided and mentored by professor Gregory M. Kapfhammer
Abstract
Data about sports have long been the subject of research and analysis by sports scientists. Machine learning has been applied to many areas of science, health care, and finance industries, such as image detection, cancer detection, stock market prediction, and customer churn prediction. In some areas, such as sports, the effective use of machine learning has still large scope for improvement. The areas of improvement in mainly data collection in improving the accuracy of prediction & sports science/medicine.
The article takes deep dive into the positive impact of ML integration on sports analytics. It is evident that ML can unearth great potential insights with a data-driven approach & decision-making. There are many aspects of ML integration into sports. This paper focuses on deploying the ML into the field of predicting sports injury. ML not only increases the knowledge of sports injury but also assist in proactively taking steps to avoid sports injury by predicting ahead of time. To this end, technological advancements have enabled the collection of multiple points of data for use in analysis and injury prediction. The full breadth of available data has, however, only recently begun to be explored using suitable statistical methods & processing of these large data through ML algorithms.
Paper utilizes the advances in automatic and interactive data analysis with the help of machine learning & establishes the intricacies of the playing surface & injury relationship. Public data shared by NFL for sports analytic competition is analyzed for the relationship between playing surface, NFL player's movements, and their damage, leading to potentially improved performance and minimizing the risk of injury. The article also briefly underlines the importance of critical sports parameters accurate data collection & direct impact on ML accuracy in prediction too.
Introduction
Artificial turf continues to alternate options that many players at the high school, intercollegiate, and professional levels practice and compete. There have been several studies and literature reviews to investigate the properties of artificial turf and their impact on injuries along with injury patterns too. The overall rate of football injuries has been reported to be significantly higher on artificial surfaces compared with natural grass [5–6]. It is increasingly clear that the playing surface is an important factor in injury incidences and mechanisms.
John Powell from the University of Iowa [1] was among the first to quantify the higher incidence of these injuries. He published a paper in 1992 which showed that professional football teams had more major knee injuries on artificial turf than compared on natural grass.
Since that time, artificial turf companies have made significant strides to simulate more natural surfaces. Despite these modern advancements, the effect of artificial turf on injury rates is still controversial. Natural grass fields are not free from problems either. There are studies that demonstrate that playing on a grass surface that is not well maintained may also increase injury rates.
This issue has become particularly important in cold-weather climate areas such as Green Bay, Minnesota, New England, and New York. In these areas the weather can take a heavy toll on the fields, making them dangerous, despite the best efforts of ground crews. The question now becomes: Is today’s generation of synthetic surfaces responsible for an increased risk of sports injuries? The answer remains unclear, as not all research studies have arrived at the same conclusion. Few types of research conducted do indicate a higher incidence rate of orthopedic injuries on artificial turf. For example, one study by JasonL Dragoo, [2] published in the National Library of Medicine concluded that college football players suffer about 1.39 times as many ACL tears when not playing on a natural surface.
Yet, according to other research, synthetic and natural surfaces lead to an equal number of orthopedic injuries. Gould H & team published a recent study in May 2022, which provided a comprehensive systematic review of sports injuries on artificial turf versus natural grass. A total of 53 articles published between 1972 and 2020 were identified for study inclusion. The study suggests that the rates of knee injuries and hip injuries are similar between playing surfaces, although elite-level football athletes may be more predisposed to knee injuries on artificial turf compared with natural grass. Only a few articles in the literature reported a higher overall injury rate on natural grass compared with artificial turf, and all of these studies received financial support from the artificial turf industry.
Rossi A, Pappalardo L, and Cintia describe very well the importance of the application of ML in sports. P [4]. With the technological advent of the last few decades, it is possible to record a huge quantity of data from athletes. Wearable devices, video analysis systems, tracking systems, and questionnaires are only a few examples of the devices used currently to record data in sports. These data can be used for scouting, performance analysis, and tactical analysis, but an increased interest is in assessing the risk of injuries. With this huge amount of data, the use of complex models for data analysis is mandatory and, for this reason, machine learning models are increasingly used in sports science. In order to demonstrate the application of ML & showcase the outcome of the ML approach, where we have taken publicly shared data by the NFL for sports analytic competition & analyzed for relationship between playing surface, NFL players' movements, and their injury, leading to potentially improve the performance and minimizing the risk of injury. The article also briefly underlines the importance of critical sports parameters accurate data collection & direct impact on ML accuracy in prediction too.
Methods
Here is the method and analysis link to the code repository
Note: Please use only laptop or computer to review the code
State the research questions:
-
RQ1: is there a correlation between playing surface Vs Player injury ?
-
RQ2: Is the injury on Synthetic more severe than the injury on the Natural surface ?
-
RQ3: What kind of Injury is seen in Synthetic Vs Natural surfaces ?
-
RQ4: Where do the injuries occur in Synthetic Vs Natural surfaces ?
-
RQ5: Any variation in speed & distance between artificial Turf Vs Natural Turf ?
Results
Here Data shared by NFL is analyzed by applying the Machine Learning codes & also some of the relevant plots/graphs being developed by participants are re-generated to demonstrate the benefits of the application of ML to injury prediction. Let us start Analyzing the data & make our observations:
Answer to RQ1:Is there a correlation between playing surface Vs Player injury?
After verifying the data, it is clear that synthetic surfaces have a greater probability of injury. As seen in the Fig.1, synthetic surfaces have approximately 1.8 times greater injury rate compared to natural surfaces. This establishes the correlation between Synthetic Surface & injury probability.
Answer to RQ2: Is the injury on Synthetic more severe than the injury on the Natural surface.
We analyze the data based on number of days that a player has been away from play after the injury. Injured payers are categorized into four buckets based on the number of days those players are away from the play. These buckets are (a) players being away from play for 1 Day (b) Players being away from play for 7 Days (c ) Players being away from play for 28 days (d) Players being ways from play for 42 days.
It clearly visible from Fig.2, that the sheer number of injured players in each category are more on the synthetic surfaces in comparison to natural turf. It is clearly evident from above Fig.3 that as the number of days away from the play is more, higher is the percentage of players on the artificial surface over the Natural surface. For example, we have seen a 40% higher number of injured players on synthetic surfaces with more than 42 days away from the play.
This clearly shows that more players are getting injured on Synthetic surfaces and also, more importantly, the injuries appear to be more severe on a synthetic surfaces. Which is leading to players being away from play for longer numbers of days.
Answer to RQ3: What kind of Injury is seen in Synthetic Vs Natural surfaces?
We can see from Fig.4 , on the Artificial surface, foot or lower limb injuries are more often seen. This is evident in the box plot shown above. On artificial turf, the foot or lower limb is the highest category of injures, then followed by Knee injuries. This outcome from ML application can be used by sports science to understand the injuries of particular kinds on artificial surfaces.
On the contrary, it is evident from Fig.5, that on the Natural surfaces, foot or lower limb injuries are NOT as high as the Artificial surfaces, but there are Knee injuries on the Natural surface. Of course, these are not as high as foot injuries seen on synthetic surfaces.
So we can conclude from the above ML application that we often find lower limb or foot injuries being seen more often on Artificial surfaces compare to Natural surfaces.
Answer to RQ4: Where do the injuries occur in Synthetic Vs Natural surfaces ?
It is evident from Fig.6, that the injuries on Synthetic turf, are more widely spread & across the location of the Turf too. In comparison to the Natural surface, the injuries are more concentrated in the center of the Turf. This could also be due to the speed that payers reach in the middle of turf. Usually, it is expected players reach a higher speed in the middle of the pitch. It is important to mention one fact. Due to the limitation of the computer to analyze the data, limited numbers of samples were considered here. If the entire sample shared by NFL is considered, then the outcome can be more accurate & also may slightly vary too.
Answer to RQ5: Any variation in speed & distance between artificial Turf Vs Natural Turf?
From Fig.8 & Fig.9, we can observe the maximum speed is reached at the beginning of the turf length. This appears to be a common observation between the Artificial & Natural turf. In other words, we do not find any difference in players reaching the maximum speed, and appears to be the same between both kinds of turf. The same applies to maximum distances too.
Discussions
Recommendations from the study:
It is clearly evident from the ML analysis, that Synthetic surface leads to (I) a Higher injury rate (II) Higher Severity of injury based on the number of days absent (III) injuries across the artificial turf (IV) more injuries to lower limbs and foot.
There is no difference in maximum speed & distance that players achieve between Artificial & Natural turf.
These findings perfectly match to study published by the American Journal of Sports Medicine, 2019,[8] “ Higher Rates of Lower Extremity Injury on Synthetic Turf Compared with Natural Turf Among National Football League Athletes Published in the American Journal of Sports Medicine, 2019”
Since ML can handle a large amounts of Data, one of the possible recommendations to NFL is to further improve & increase parameters sensitivity. NFL can also add variables like footwear, padding and even putting accelerometer/wearable device to measure the physical parameters of athletes. The application of ML opens up the door to infinite opportunities for sports science.
In a second phase study, we can now collect more data around lower-limb injuries, as seen more on artificial turf. This can further improve assessment and accordingly one can work on the prevention of injuries by means of various proactive measurements. This opens the further study of injury prevention and sports training modules in conjunction with ML technique & applications.
Few Limitations:
As shown above, we have used the python algorithm of ML to analyze the data. Here accuracy of data is critical and also the more the volume of data, the better is prediction. We have shown in the above case, results can slightly vary based on sample size & accuracy too. This is an important factor to consider in deploying the ML in sports or any other domain too.
Background and Related Work
Here is the background desrciption of this research
In recent times, NFL did conduct a competition to investigate the relationship between the playing surface and the injury and performance of National Football League (NFL) athletes and to examine factors that may contribute to lower extremity injuries. This was conducted on the Kaggle platform. NFL has shared all the data required for applying the Machine learning Technique in this case study.
According to NFL, there are 12 stadiums with synthetic turf. Recent investigations by NFL showed a significantly higher lower limb injury rate among football athletes on synthetic turf compared with natural turf (Mack et al., 2018; Loughran et al., 2019). Their epidemiologic investigations and biomechanical studies of football cleat-surface interactions have shown that synthetic turf surfaces do not release cleats as readily as natural turf and may contribute to the incidence of non-contact lower limb injuries (Kent et al., 2015). Given these differences in cleat-turf interactions, NFL has an interest in determining, if player movement patterns and other measures of player performance differ across playing surfaces and how these may contribute to the incidence of lower limb injury.
Now, the NFL has shared the data to examine the effects that playing on synthetic turf versus natural turf can have on player movements and the factors that may contribute to lower extremity injuries. NFL player tracking, also known as Next Gen Stats, is the capture of real-time location data, speed and acceleration for every player, every play on every inch of the field. As part of this challenge, the NFL has provided full player tracking of on-field positions for 250 players over two regular season schedules. One hundred of the athletes in the study data set sustained one or more injuries during the study period that were identified as a non-contact injury of a type that may have turf interaction as a contributing factor to injury. The remaining 150 athletes serve as a representative sample of the larger NFL population that did not sustain a non-contact lower-limb injury during the study period. Details of the surface type and environmental parameters that may influence performance and outcome are also provided.
Our task to characterize any differences in player movement between the playing surfaces and identify specific scenarios (e.g., field surface, weather, position, play type, etc.) that interact with player movement to present an elevated risk of injury. More details on the entry criteria are available in Evaluation Tab.
About the NFL
The National Football League is America's most popular sports league, comprised of 32 franchises that compete each year to win the Super Bowl, the world's biggest annual sporting event. Founded in 1920, the NFL developed the model for the successful modern sports league, including national and international distribution, extensive revenue sharing, competitive excellence, and strong franchises across the country. The NFL is committed to advancing progress in the diagnosis, prevention and treatment of sports-related injuries. The NFL's ongoing health and safety efforts include support for independent medical research and engineering advancements and a commitment to work to better protect players and make the game safer, including enhancements to medical protocols and improvements to how our game is taught and played. As more is learned, the league evaluates and changes rules to evolve the game and try to improve protections for players. Since 2002 alone, the NFL has made 50 rules changes intended to eliminate potentially dangerous tactics and reduce the risk of injuries.
Citation References
[14] A Machine Learning Approach to Assess Injury Risk in Elite Youth Football Players.
[15] The Effect of Playing Surface on Injury Rate A Review of the Current Literature.
[19] The Effect of Weather in Soccer Results: An Approach Using Machine Learning Techniques"
[20] Dragoo, J.L., Braun, H.J. The Effect of Playing Surface on Injury Rate.
[23] Orchard, J. Is There a Relationship Between Ground and Climatic Conditions and Injuries in Football?