In a paper published in the journal Applied Geography, researchers integrated transaction records and explainable artificial intelligence (XAI) methods to investigate the nonlinear effects of private service amenities, public service amenities, and street view on housing prices in Shanghai. The research uncovered the relationships between neighborhoods and housing prices at an individual level using XAI models. The current study contributes to real estate market research by combining multi-source data and XAI methods.
Background
The housing market poses challenges for many cities, and understanding the factors influencing housing prices is crucial for effective real estate policies and city planning. These features can be categorized into structure, neighborhood, and location characteristics. Extensive research has been conducted using the hedonic pricing model (HPM) to understand these relationships. However, fully comprehending the mechanisms behind housing prices remains a challenge. The current study aims to explore the nonlinear relationships between neighborhoods and housing prices at an individual level using XAI models.
Evolution from HPM to the XAI model
The HPM has been widely used to study housing prices, considering location, neighborhood, and structural attributes as significant determinants. However, the HPM lacks sensitivity for detecting spatial dependencies and heterogeneity. Geographically Weighted Regression (GWR) has emerged as a favored localized multiple-regression model, addressing spatial variations. Nevertheless, non-linearity and multicollinearity can affect GWR performance. In recent years, advanced machine learning (ML) algorithms, like XGBoost, have been applied to housing price studies, offering greater flexibility and predictive power. However, interpreting ML results can be challenging. XAI methods, such as SHAP, have addressed interpretability and revealed the spatial variations of housing determinant contributions.
Methodology
The researchers obtained a dataset comprising transaction records, street view imagery, and points of interest (POIs) for the study area. Factors influencing housing unit values, including locational, neighborhood, and structural attributes, were determined. They implemented local regression models using XGBoost on the housing prices dataset and introduced the SHAP method to explain the nonlinear effects on housing prices. The contributions of neighborhood attributes derived from street view data and POI were computed and mapped.
The study was conducted in Shanghai, known for its economic advancement. Various datasets were used, including transaction data, POI data, population density data, and street view images. Transaction data contained information on housing unit attributes, while POI data provided details on public and private service amenities. Street view images were processed using semantic image segmentation.
Explanatory variables were categorized into structural attributes, location, and neighborhoods, with a focus on a 1 km distance threshold. The XGBoost model, a powerful ML technique, was employed to explore the nonlinear association between neighborhoods and housing prices. Hyperparameters were fine-tuned to optimize model performance. Additionally, evaluation metrics and cross-validation were used to assess the model.
The SHAP model, based on game theory, was utilized for interpreting the features of the XGBoost model by explaining global and local interactions. SHAP values estimated their importance, while main effects revealed nonlinear effects on housing prices.
Results
The XGBoost model was used to analyze the relative importance of variables. The highest relative value (45.48%) was found for neighborhood characteristics, followed by location (34.39%) and structural attributes (20.13%). Within the neighborhood category, public services (23.18%) made a greater contribution than private services (7.06%). Population density accounted for 11.24% of housing prices. Financial services (11.73%), scenic spots (3.75%), and public educational services (1.94%) were significant factors within the public services variables. Street View had a subtle contribution, particularly the green view.
The XGBoost model outperformed other regression models with an R2 value of 0.912 and the lowest mean absolute error (MAE), root mean squared error (RMSE), and mean absolute percent error (MAPE). The model captured 91.2% of the influence of variables on housing prices. The SHAP model provided insights into the associations between variables and housing prices. Population density showed a nonlinear relationship, with a rapid increase in housing prices at lower densities. Public service amenities exhibited threshold associations. Public educational services and financial services had positive effects up to a certain threshold, while medical services showed two categories of effects. Housing prices were adversely affected by shopping and catering services, and Street view analysis revealed positive associations with the green view and negative associations with the sky view and building view.
The spatial dispersion of SHAP main effect values revealed considerable variability in the impacts of variables across the study area. Locational characteristics, including proximity to the station and distance to the CBD, exhibited a decay effect with increasing distance. The influences of financial services, population density, and commercial amenities demonstrated substantial variations between the central city and its suburban regions.
Conclusion
In summary, the current study utilized the XGBoost and SHAP models for housing transaction data to analyze the nonlinear effects of factors on Shanghai housing prices. XAI helps to understand the importance of features globally and locally, providing insights into housing characteristics and spatial patterns.
Threshold effects were observed for various amenities, and positive associations were found for most public services while negative associations were found for shopping and catering services. Street views, particularly the green view, also significantly impacted house values. However, further research is needed to explore the interactive effects of neighborhood characteristics and other locational determinants.