In conclusion, the framework shows promise for addressing data scarcity in sports injury prediction within its evaluated scope. Performance improvements, while statistically significant, remain incremental. Translation to practice requires resolving identified technical, practical, and ethical limitations through systematic validation and governance framework development. Until these prerequisites are met, implementation should proceed only under expert supervision with transparency about current limitations. With continued refinement addressing these challenges, such approaches may eventually support evidence-based injury prevention while respecting athlete welfare and privacy.
This study uses multiple public sports injury datasets to validate the proposed cross-sport transfer learning framework. Data sources include: the Dutch competitive runners dataset (https://www.kaggle.com/datasets/shashwatwork/injury-prediction-for-competitive-runners) with 74 athletes' 7-year training records covering sprinting and middle/long-distance running; a FIFA Sports Injury Monitoring System data subset (https://www.kaggle.com/datasets/kolambekalpesh/football-player-injury-data) with football players' training load and injury records; and the UCI Machine Learning Repository motion sensor dataset (https://doi.org/10.24432/C5C59F) with kinematic data from team sports including basketball and volleyball. The datasets comprise 312 athletes from 5 sports, spanning 2018-2023. Using the sliding window method, 10,847 time-series samples were generated for model training. All datasets have been de-identified and contain no personal privacy information.
Each sample contains multi-dimensional features: training load indicators (daily training volume, training intensity zones, acute-to-chronic workload ratio), physiological monitoring data (heart rate variability, subjective fatigue score, sleep quality score), kinematic parameters (acceleration, deceleration, number of direction changes), and individual characteristics (age, gender, previous injury history, years of specialized training). The dataset is limited to commonly available training metrics and lacks advanced biomechanical telemetry (e.g., ground reaction forces, joint angles, EMG data) due to the constraints of public data sources. While such data could potentially improve model performance, the current features represent standard practice in sports monitoring and have shown predictive value in prior research. The injury label is defined as whether an injury event requiring training cessation for more than 3 days occurs within the next 7 days. This operational definition follows standard sports epidemiology protocols but has recognized limitations. It may not capture: (1) subclinical injuries where athletes continue training despite pain, (2) gradual-onset overuse injuries, and (3) reporting inconsistencies across different teams and coaches.
A sliding window method was used for data preprocessing to construct time-series samples, with a window size of 14 days and a prediction window of 7 days. The 14-day window captures two-week training cycles common in sports periodization, while the 7-day prediction horizon aligns with weekly training planning practices. Continuous variables were standardized within athletes to eliminate individual baseline differences:
Where and are the mean and standard deviation of the j-th feature of the i-th athlete, respectively. Missing value processing uses a temporal interpolation method, and samples with missing rates exceeding 30% are excluded. To address the class imbalance problem (injury samples account for approximately 8.7%), the training set uses SMOTE-Tomek hybrid sampling technology, adjusting the positive-negative sample ratio to 1:5, following established practices for handling ~ 10% minority class imbalance in medical prediction tasks.
Based on the similarity of sports characteristics, the dataset is divided into the source domain and the target domain. Track and field events (with a large sample size) serve as the source domain, while ball sports (with a small sample size) serve as the target domain, used to evaluate the transfer learning effectiveness. Data is divided chronologically: 60% training set, 20% validation set, and 20% test set, ensuring temporal integrity. The dataset comprises 312 athletes across 5 sports from publicly available sources. While no independent external validation datasets were available due to data accessibility constraints in sports medicine, we implemented rigorous cross-validation with temporal splits to minimize overfitting and assess generalization within the available data.
Time series graph encoding is a technique that converts one-dimensional time series data into a two-dimensional image representation, which can preserve temporal dependencies and reveal potential pattern features. In recent years, this method has achieved remarkable results in fields such as financial forecasting and seismic damage assessment. Inspired by the successful application of time series image encoding in sports injury prediction by Ye et al., this study adopts multiple encoding methods for feature transformation of sports training data.
As shown in Fig. 5, this study implements four main time series graph encoding methods. Gramian Angular Field constructs an angle information matrix by mapping time series data to a polar coordinate system.GASF captures temporal correlation by calculating the cosine value of the sum of angles:
Where . GADF reflects temporal changes through the sine values of angular differences:
These two encoding methods preserve the temporal dependencies of time series while enhancing feature distinguishability through trigonometric transformation.
The Markov Transition Field (MTF) constructs a state transition probability matrix after discretizing time series data, effectively capturing dynamic change patterns in training load. The Recurrence Plot (RP) constructs a binary matrix by calculating similarities between time points:
Where is the Heaviside function and is the threshold. This method is particularly suitable for identifying periodicity and mutation points in training patterns.
To fully leverage the complementary advantages of different encoding methods, this study proposes a multi-modal fusion strategy that performs weighted combination of standardized encoding matrices:
Where the weight coefficients are optimized and determined through the validation set. This fusion encoding integrates angular information, state transitions, and recurrence patterns, providing rich feature representation for subsequent graph neural networks.
The spatio-temporal graph neural network architecture proposed in this study integrates graph convolution, temporal modeling, and cross-sport transfer learning mechanisms to fully explore the spatio-temporal characteristics of sports training data. As shown in Fig. 6, the overall architecture adopts a dual-path parallel processing strategy to extract spatial and temporal features respectively, and realizes feature fusion through an attention mechanism.
The graph construction module converts the encoded temporal data into a graph structure representation. Nodes represent the athlete's state at different time points, and edge weights are determined by calculating feature similarity:
Where is the feature vector of node i, and is the bandwidth parameter. This dynamic graph construction method can adaptively capture the temporal relationships of training patterns.
Spatial feature extraction employs a three-layer Graph Convolutional Network (GCN). Each GCN layer updates node representations by aggregating neighbor node information:
Where represents the neighbor set of node i, is the node degree, and is the learnable parameter. This design can effectively capture dependencies between different training states.
Temporal feature extraction employs Temporal Convolutional Network (TCN), which captures long-term dependencies through causal convolution and residual connections. The receptive field of TCN grows exponentially with the number of layers, enabling efficient modeling of temporal dynamics. Dou validated the effectiveness of this parallel architecture in athlete action recognition research, demonstrating that the spatial-temporal dual-path design can significantly enhance feature representation capability.
Multi-head attention mechanism is used for adaptive fusion of spatial and temporal features:
By learning the importance weights of different features, the model can dynamically adjust feature combination strategies according to specific tasks.
The domain adaptation module is the key to achieving cross-sport transfer. Through adversarial training, it minimizes the feature distribution differences between source and target domains, enabling the model to learn representations with sport invariance, thereby improving generalization capability on data-scarce sports.
Cross-sport transfer learning is a key technology for solving model training in small-sample sports. This study adopts a domain adaptation strategy, achieving knowledge transfer between different sports through architectural design of shared feature extractors and sport-specific classifiers.
The transfer learning framework contains three core components: a shared feature encoder, a domain discriminator, and a sport-specific predictor. The shared encoder is responsible for learning sport-agnostic universal injury patterns, with its parameters fixed after pre-training on the source domain. The domain discriminator minimizes the distribution differences between source and target domains through adversarial training:
Where G is the feature extractor, D is the domain discriminator, and and represent source domain and target domain samples respectively.
To adapt to the feature distributions of different sports, a feature adversarial mechanism is introduced. Through the Maximum Mean Discrepancy (MMD) constraint, the feature distributions of the source and target domains are made to converge:
Where represents the kernel function mapping.
As shown in Table 2, the experimental setup covers different scales of source domain and target domain configurations. The source domain selects track and field events with larger sample sizes (sprint, middle, and long-distance running), while the target domain includes small-sample sports such as ball sports (basketball, volleyball). In the model configuration, the shared encoder adopts a 3-layer GCN with hidden dimensions of 128, 256, and 512, respectively. The domain discriminator uses a 2-layer fully connected network and employs gradient reversal layers to implement adversarial training.
The fine-tuning strategy adopts progressive unfreezing, first training only the sport -specific layers, then gradually unfreezing the high-level parameters of the shared layers. This strategy both preserves the general knowledge learned from the source domain and allows the model to adapt to specific patterns in the target domain. Through this design, even when the target sport has only a small number of labeled samples, the model can still achieve good predictive performance.
The model adopts an end-to-end joint optimization strategy, and the overall loss function contains three parts:
Where is the focal loss for injury prediction, is the domain adversarial loss, and is the L2 regularization term.
The focal loss is defined as:
Where is dynamically adjusted according to class frequency, , making the model pay more attention to hard-to-classify injury samples.
Training uses the Adam optimizer with an initial learning rate of 0.001, decayed through a cosine annealing strategy. Regularization strategies include: applying 0.3 Dropout to all hidden layers, randomly dropping 15% of edges in graph convolutional layers, and an L2 regularization coefficient of 0.0001. Batch normalization is applied after each GCN layer to accelerate convergence. The training process adopts an early stopping mechanism, terminating when validation set AUC does not improve for 20 consecutive epochs, with gradient clipping threshold set to 5.0 to prevent gradient explosion.
Model evaluation adopts multi-dimensional metrics: AUC measures overall classification performance, sensitivity and specificity evaluate injury detection rate and non-injury identification rate respectively, F1 score balances precision and recall, and Matthews Correlation Coefficient (MCC) is suitable for class imbalance scenarios. The experiment adopts 5-fold cross-validation, maintaining temporal integrity to avoid data leakage.
In cross-sport transfer experiments, the source domain uses full data for pre-training, while the target domain uses 10%, 30%, 50%, and 100% labeled data for fine-tuning, respectively, evaluating transfer effectiveness under different data scales. Baseline methods cover three categories of models: traditional machine learning (logistic regression, random forest, XGBoost), deep learning (LSTM, CNN-LSTM, Transformer), and graph neural networks (GCN, GAT, GraphSAGE). All methods use the same data preprocessing pipeline, with statistical significance verified through a paired t-test (α = 0.05). Ablation experiments systematically evaluate the contributions of temporal graph encoding, attention mechanism, and transfer learning modules. Experiments are conducted in a PyTorch 1.12 environment, with all results based on the mean and standard deviation of 5 repeated experiments. Detailed experimental configurations are provided in Supplementary Table S2, including all hyperparameters, random seeds (set to 42 for reproducibility), and model architecture specifications to facilitate reproduction of our results.
The computational complexity of the proposed framework consists of:
The total time complexity is where C is the number of channels and L is the number of GNN layers. Memory requirements scale as for storing graph structure and intermediate representations.
In practical implementation, with typical parameter settings (, , , , ), the computational cost is dominated by the GNN forward pass. On a single NVIDIA RTX 3090 GPU, the average training time per epoch is approximately 2.5 min for a batch size of 64, and inference requires 0.3 s per sample. These computational requirements, while substantial, are within acceptable ranges for offline training and batch prediction scenarios. However, real-time deployment on edge devices would require model compression techniques such as knowledge distillation or pruning to reduce the computational burden.