In terms of practical implications, our findings identify deficiencies in current waste management practices and their negative impacts on safety perception, providing evidence-based guidance for policymakers to develop targeted management strategies. Furthermore, this research highlights a broader insight into urban governance: the creation of safe and sustainable communities depends not only on initial planning and construction but also on the effectiveness of long-term management practices. This emphasizes the crucial role of sustainable urban management in shaping urban experiences.
This study explores the relationships between different categories of urban street waste and safety perception, while examining waste presence as an indicator of dynamic urban management effectiveness in relation to broader factors shaping urban safety perceptions.
Our analysis unfolds in four interconnected stages. First, we demonstrate the performance of our computer vision model for safety perception calculation (Table 1), followed by mapping the predicted perceptions across NYC (Fig. 1) and presenting relevant statistical analyses. Second, we examine various categories of street waste, encompassing both controlled and uncontrolled waste (Fig. 3) while validating our computer vision model's capability in waste identification (Table 2). The spatial distribution and statistical characteristics of waste presence across the city are then presented (Fig. 4). Third, we investigate the statistical relationships between waste presence and safety perception, examining the correlation patterns and magnitude across different waste types (Fig. 5). Finally, we explore the relative importance of waste presence as a contributing factor to safety perception, identifying the dominant waste types that influence perceived safety. To investigate these dominant factors, we employ multiple analytical methods in the final section, utilizing explainable machine learning techniques alongside Class Activation Mapping (CAM) visualization (Figs. 6 and 7).
Based on comprehensive experimental evaluations of four mainstream CNN architectures (Table 1), we selected ResNet-50 as our primary model architecture. ResNet-50 demonstrated superior overall performance with the highest accuracy (0.748) and consistently balanced metrics across safe and unsafe classifications (F1 scores of 0.746 and 0.750, respectively). While MobileNet-V2 achieved comparable accuracy (0.745), it showed notable disparities between safe and unsafe categories (precision: 0.719 vs 0.775; recall: 0.789 vs 0.702), indicating potential classification bias. EfficientNet-B0 and ShuffleNet-V2, despite their computational efficiency, exhibited lower overall accuracy (0.738 and 0.678) and less consistent performance across evaluation metrics. The balanced precision-recall trade-off and robust F1 scores of ResNet-50, combined with its well-established architecture in computer vision tasks, made it the optimal choice for our safety perception model.
The trained model was applied to infer safety perceptions from street-view images across the study area. We adopted a confidence-based scoring methodology, following established approaches in urban perception studies, to transform binary classifications into continuous safety scores. Specifically, locations classified as safe were assigned positive values while those classified as unsafe received negative values, with the magnitude proportional to the model's prediction confidence. This quantitative transformation enables more nuanced differentiation of safety perceptions, capturing subtle variations between locations that share the same binary classification but exhibit different degrees of perceived safety.
The spatial analysis of perceived safety across New York City reveals distinct geographical patterns. Our model-inferred safety perception scores, as visualized in Fig. 1, show a core-periphery pattern in each borough, particularly evident in Manhattan and Brooklyn, where central areas consistently exhibit higher safety perception compared to their peripheral counterparts. Notable concentrations of high safety perception are observed in Midtown Manhattan, central Brooklyn, and eastern Queens, as indicated by the pronounced blue regions in Fig. 1.
The relationship between safety perception and socioeconomic indicators (population density, education level, and income level, shown in Supplementary Figs. 2-4) exhibits complex spatial variations across boroughs. In Manhattan and Brooklyn, areas of high population density strongly correlate with elevated safety perception scores. However, this relationship does not persist uniformly across all boroughs. Eastern Queens presents a notable exception, displaying high safety perception scores despite relatively lower population density, suggesting that population density alone cannot fully explain the variations in perceived safety. Income level distributions show strong spatial correspondence with safety perception patterns, most notably in Queens and the Bronx, where communities with higher income levels consistently report higher levels of perceived safety. Additionally, areas with higher educational attainment exhibit elevated safety perception scores across all boroughs, indicating a robust relationship between education level and perceived environmental safety.
Statistical analysis of the safety perception scores yields a mean value of -0.047, indicating a neutral perception baseline, with a standard deviation of 0.668. Based on these parameters, we established four distinct safety perception categories based on the statistical distribution of scores, using standard deviation (σ) thresholds from the mean (μ = -0.047, σ = 0.668). Areas were classified as follows: low safety perception (x < μ - σ, or x < -0.715), moderately low safety perception (μ - σ ≤ x < μ, or -0.715 ≤ x < -0.047), moderately high safety perception (μ ≤ x < μ + σ, or -0.047 ≤ x < 0.621), and high safety perception (x ≥ μ + σ, or x ≥ 0.621), where x represents the safety perception score. Representative street view images (SVIs) from each category are presented in Fig. 2.
Further analysis of SVI across these safety categories reveals distinct environmental characteristics. Areas classified as safe consistently exhibit well-maintained streetscapes featuring abundant greenery, organized parking infrastructure, and well-preserved building facades, as evidenced in Fig. 2. Conversely, areas categorized as unsafe frequently display vacant lots, active construction sites, and deteriorating infrastructure. The moderately safe category comprises areas with mixed urban features, characterized by intermediate levels of maintenance and organization that contribute to moderate safety perception scores.
In this study, we focus on street waste, defined as waste accumulation occurring on urban streets outside of designated waste containers. To minimize the impact of random noise on our research findings, we specifically concentrate on identifiable waste clusters while excluding randomly scattered single pieces of litter. Based on extensive observations of urban street conditions, we identified distinct patterns in waste manifestation across the urban landscape. These patterns vary significantly in their spatial distribution and formation mechanisms. Consequently, we categorized street waste into two primary classifications: controlled and uncontrolled waste.
Controlled waste refers to temporarily placed, properly contained waste (such as securely bagged garbage or systematically stacked recyclables) positioned at designated collection points along streets in accordance with municipal collection schedules and regulations (Fig. 3A). Uncontrolled waste refers to improperly disposed materials that deviate from municipal waste management guidelines, encompassing three distinct subtypes:
Based on our waste categorization, we developed specialized deep learning models for each waste type. The waste identification models, implemented using the Swin Transformer architecture, demonstrated robust performance across all waste categories, validating our approach's effectiveness in real-world urban waste detection scenarios. The detailed performance metrics are presented in Table 2.
The model achieved notable accuracy across different waste categories, with particularly strong performance in controlled waste detection (92.01% accuracy for bagged waste) and widespread litter identification (93.17% accuracy). The latter demonstrated high precision in distinguishing between areas with and without widespread litter, effectively capturing varying degrees of litter presence. Overall model performance remained consistently strong across all waste categories, with accuracies ranging from 90.43 to 96.14%.
We observed relatively lower performance in detecting uncontrolled dumpsites and construction waste categories. This pattern largely reflects their natural occurrence patterns in urban environments, as these categories constitute relatively smaller proportions of our dataset (3.8% and 7.7% respectively). To address this class distribution characteristic, we implemented targeted data augmentation techniques, which helped maintain model performance while preserving the authentic representation of waste distribution patterns in urban settings. While acknowledging these performance limitations, we implemented rigorous quality control measures. During the inference phase, we conducted thorough manual verification of all positive detections to ensure classification accuracy. Although our approach may have resulted in conservative waste counts due to their inherent sparsity, all detected cases underwent careful manual verification to ensure high reliability. This verification process effectively minimizes potential classification bias, strengthening the robustness of our subsequent relationship analysis between waste presence and perceived safety, and providing a solid foundation for our analytical conclusions.
Utilizing our developed waste classification models, we systematically identified and mapped the spatial distribution of distinct waste categories across the study area, as visualized in Fig. 4. Our detection system revealed 2351 instances of bagged waste (controlled waste), 1771 cases of widespread litter, 614 uncontrolled litter dumpsites, and 358 locations with construction waste.
The spatial analysis revealed distinct distribution patterns between controlled and uncontrolled waste categories. Controlled waste, primarily represented by bagged waste, showed the highest concentration in Manhattan compared to other boroughs. In contrast, uncontrolled waste categories exhibited markedly different spatial patterns, with a notably lower presence in Manhattan and a significant concentration in the Rockaway Peninsula area of southern Queens. Among the uncontrolled waste categories, widespread litter emerged as the most prevalent issue, while construction waste showed a relatively modest presence throughout the city.
We analyzed the cumulative distribution of safety perception scores across different waste categories, as illustrated in Fig. 5. In our analysis, safety perception scores range from -1 to 1, where higher scores indicate greater perceived safety. The median score (at cumulative proportion = 0.5) indicates the safety perception value where 50% of locations within each category are concentrated, providing a representative measure of the overall safety perception level for that specific waste category. The analysis reveals distinct patterns in how safety perception varies between areas with and without waste presence, and among different waste types.
Out of the total 295,189 sampling points in NYC, only 4697 points (approximately 1.6%) contained waste in any form. The baseline distribution across all points in NYC (yellow line) shows that safety perception scores are approximately normally distributed, with a median score of -0.04, indicating a relatively neutral overall safety perception in the city. However, when examining areas where any type of waste is present (purple line), the distribution shifts notably toward lower safety scores, with the median dropping to -0.53, suggesting a substantial negative association between waste presence and perceived safety.
Further analysis of specific waste categories reveals a marked distinction between controlled and uncontrolled waste types. Controlled waste, represented by bagged waste (green line), shows a relatively modest negative association with safety perception, with a median score of -0.128. The gradual slope of its cumulative distribution curve indicates considerable variation in safety perceptions in areas with bagged waste, suggesting that its presence does not consistently correspond to negative safety perceptions.
In contrast, uncontrolled waste categories demonstrate a remarkably stronger negative relationship with perceived safety. Areas with construction waste, widespread litter, and uncontrolled litter dumpsites exhibit substantially lower median safety scores of -0.923, -0.921, and -0.896, respectively. The cumulative distribution curves for these uncontrolled waste categories display steeper slopes and closely aligned patterns, indicating a more consistent and pronounced negative relationship with safety perception. The similarity in both the median values and distribution patterns among uncontrolled waste types suggests that the presence of any form of uncontrolled waste corresponds strongly with reduced safety perception, regardless of the specific type.
To systematically investigate the underlying mechanisms driving these correlations, we employed two analytical methods to examine the relationships. For statistical analysis, we utilized explainable machine learning techniques to assess the relative importance and directional effects (positive or negative) of various environmental factors on safety perception. These factors encompass both static environmental characteristics, such as road and wall surface areas, and dynamic management indicators, such as waste presence, extracted from SVI. Additionally, we implemented CAM as a visual interpretation technique to identify and highlight specific regions within SVI that significantly influence the model's safety judgments. This visualization approach provides insights into the spatial attention patterns of the model, helping us understand how environmental features are weighted in the algorithmic assessment of safety perception.
To analyze the determinants of visual safety perception, we employed four regression models: Ordinary Least Squares (OLS), Random Forest, XGBoost, and Gradient Boosting Decision Tree (GBDT). These models incorporated sociodemographic factors, visual environmental characteristics, and waste-related variables. To assess the specific impact of waste-related variables on model performance, we conducted parallel analyses with and without these variables for each model architecture. The comparative results are presented in Table 3.
The GBDT regression model demonstrated superior predictive performance, achieving the highest R² (69.47%) and lowest Mean Squared Error (0.101) among all tested models. Notably, the inclusion of waste-related variables consistently enhanced model performance across all architectures. Specifically, the GBDT model with waste-related variables showed a 4.0% point improvement in R² compared to its counterpart without these variables (65.48%). This pattern of improvement was consistent across all models, with performance gains ranging from 3.5 to 6.0% points in R², underscoring the significant contribution of waste-related factors to safety perception prediction.
Based on these results, we selected the GBDT model for subsequent analysis. To further elucidate the complex relationships between environmental factors and safety perception, we employed SHapley Additive exPlanations (SHAP) value analysis. This approach enabled us to quantify both the relative importance and directional effects of individual environmental factors on safety perception, providing interpretable insights into the model's decision-making process.
The SHAP value analysis (Fig. 6) reveals the complex interplay between visual environmental features and sociodemographic characteristics in shaping visual safety perception. The results demonstrate diverse patterns of influence, varying in both magnitude and direction. Notably, while both physical environmental characteristics and socioeconomic factors significantly influence safety perception, their relative importance differs substantially.
Among the analyzed features, environmental elements demonstrate the strongest influence on safety perception. Sky visibility and tree coverage emerge as the most influential factors, exhibiting notable non-linear relationships with safety perception. Higher tree coverage is consistently associated with enhanced safety perception, suggesting that abundant urban vegetation contributes to a visual safety perception. Conversely, increased sky visibility correlates with decreased perceived safety, potentially indicating that more enclosed urban spaces, with limited sky exposure, are perceived as safer environments. This finding aligns with urban design theories about human-scale spaces and the role of natural surveillance in safety perception.
Notably, waste-related factors, as binary variables (0,1), demonstrate substantial negative impacts on safety perception. Both widespread litter and uncontrolled litter dumpsites show distinct binary distributions in their effects, confirming their categorical nature in the environment. Both widespread litter and uncontrolled litter dumpsites demonstrate strong negative relationships with safety perception. The magnitude of these effects positions waste-related factors among the top influential features, suggesting that waste management issues serve as powerful environmental cues for visual safety assessment. This finding emphasizes the critical role of municipal waste management in urban safety perception.
Built environment features and human activity indicators display moderate but consistent effects. Houses demonstrate positive associations with safety perception, while walls show a negative correlation. Notably, higher volumes of vehicles and pedestrians are associated with increased safety perception, suggesting that more developed urban environments with greater human activity generally foster stronger safety perception.
Regarding demographic and socioeconomic indicators, population density shows a positive correlation with safety perception, supporting the notion that more densely developed urban areas tend to evoke stronger perceptions of safety. Other socioeconomic indicators, including household income and educational attainment, show relatively modest influences on visual safety perception, suggesting their impact is less pronounced compared to physical environmental features.
These findings highlight the multifaceted nature of environmental safety perception, with particular emphasis on the significant impact of waste-related issues and urban vitality. The results suggest that urban visual safety enhancement strategies should prioritize effective waste management while maintaining active, well-populated spaces, alongside traditional urban design elements such as green space and built environment features.
To provide an interpretable visualization of our findings, we employed CAM to reveal the model's attention patterns in safety assessment. As illustrated in Fig. 7, the heat maps reveal regions of model attention through color intensity, with red areas indicating zones of highest attention. In images classified as unsafe, the model's attention predominantly concentrated on areas containing scattered litter, construction debris, and illegal dumping sites. This focused attention pattern suggests that the model identified these uncontrolled waste elements as key visual indicators of unsafe environments, aligning with human perceptual patterns.
In the analysis of scenes classified as safe, an intriguing pattern emerged: despite the presence of controlled waste (i.e., bagged waste) in some scenes, the model's attention primarily focused on surrounding architectural features rather than the waste itself. This selective attention suggests that controlled waste plays a less prominent role in safety perception, particularly when situated within well-maintained urban environments with established waste management systems.
These CAM-based visual interpretations provide strong supporting evidence for our findings: while uncontrolled waste serves as a primary visual indicator for unsafe environment perception, controlled waste demonstrates substantially less influence in shaping safety perceptions. This complementary relationship between the two analyses offers a more comprehensive understanding of how different waste management practices influence perceived safety in urban landscapes.