E-commerce Data and ML for Poverty Estimation in Indonesia

In an article published in the journal Nature, researchers explored a novel approach to expedite poverty assessment in Indonesia using e-commerce data and machine learning (ML) algorithms. The authors employed statistical-based feature selection and compared three ML algorithms to predict poverty rates.

Study: E-commerce Data and ML for Poverty Estimation in Indonesia. Image credit: VRVIRUS/Shutterstock
Study: E-commerce Data and ML for Poverty Estimation in Indonesia. Image credit: VRVIRUS/Shutterstock

Background

In recent decades, poverty has persisted as a significant challenge in developing countries, exemplified by Indonesia's struggle where 9.82 percent of the population, or 25.95 million people, were identified as poor in March 2018. Traditional poverty assessment methods, such as the National Socio-economic Survey (SUSENAS), are time-consuming, costly, and conducted at infrequent intervals, hindering timely and cost-effective policymaking. Acknowledging the transformative possibilities of the digital revolution, this investigation delved into leveraging e-commerce data as an immediate and nuanced gauge of socio-economic conditions. This focus was particularly relevant in Indonesia, boasting one of Southeast Asia's largest e-commerce markets.

While previous studies have utilized various data sources, such as satellite imagery and call detail records, for poverty estimation, assumptions and limitations persisted. E-commerce data, however, presented a promising alternative, offering direct insights into household expenditure without inherent assumptions. This paper addressed the scarcity of research in utilizing e-commerce data for poverty prediction, highlighting its novelty and potential significance.

Previous efforts have primarily employed limited feature selection algorithms, whereas this research employed three statistical-based feature selection methods along with three ML algorithms to enhance the accuracy of poverty estimation models. By doing so, the paper sought to bridge gaps in existing research, providing a comprehensive and original approach to poverty prediction using e-commerce data, which could have broader implications beyond Indonesia.

Method

The research utilized sample advertising data from a prominent Indonesian e-commerce company to address the challenge of timely and cost-effective poverty assessment. Focused on Java Island, the dataset comprised eight items such as motorbikes, cars, apartments, houses, and land for sale or rent in 2016. Poverty levels were measured against a predefined poverty line, representing the minimum expenditure needed for basic life needs. With 96 features initially, including aspects like the number of items sold and their prices, the dataset covered 118 cities.

For improved computational efficiency, the study utilized statistical-based feature selection algorithms, including f-score, chi-square, and correlation-based feature selection, to pinpoint pertinent features. The researchers utilized a thorough, multi-stage methodology that covered data preprocessing, normalization, feature selection, model training, and evaluation.

The authors employed ML algorithms, specifically support vector regression (SVR), k-nearest neighbor regression (k-NN), and linear regression (LR). SVR, chosen for its successful application in various domains, underwent a grid search for optimal parameters. The researchers aimed to predict poverty rates, employing leave-one-out cross-validation for evaluation.

Performance metrics included root mean squared error (RMSE) to measure the difference between actual and predicted values and R-squared (R2) to assess the model's ability to predict actual data trends. The comprehensive methodology addressed the challenge of high-dimensional data in the e-commerce dataset, contributing to the novel application of e-commerce data and ML for poverty estimation in Indonesia.

Results and Discussion

The research employed f-score and chi-square feature selection algorithms to identify relevant features from e-commerce data, aiming to enhance the performance of ML models in predicting poverty levels. Correlation-based feature selection exhibited inconsistent results and was excluded from further analysis. The study conducted prediction experiments with SVR, k-NN, and LR, comparing results with and without feature selection. The best-performing model was SVR, achieving an R2 score of 0.42765 with f-score feature selection and 90 features.

Visualizations of SVR, k-NN, and LR models showcased SVR's superior performance, particularly in minimizing prediction errors. The choropleth maps displayed actual and predicted poverty rates in Java Island, revealing an overall underestimation in predicted rates compared to actual data. Detailed city-level comparisons provided a comprehensive analysis of actual versus predicted poverty percentages. The findings highlighted the effectiveness of feature selection in handling high-dimensional e-commerce data, with SVR emerging as the most reliable model for poverty prediction. However, the underestimation observed in predictions suggested potential areas for model refinement and improvement.

Conclusion

In conclusion, the researchers demonstrated the potential of utilizing e-commerce data for poverty prediction through ML. The f-score feature selection algorithm outperformed others, enhancing the performance of SVR in predicting poverty rates. However, challenges existed in predicting regions with higher poverty rates. Despite limitations, such as the use of only one year of data, the study suggested the viability of e-commerce datasets as proxies for socio-economic conditions.

Future research could explore larger datasets for improved model accuracy. The main drawback was in data accessibility and confidentiality constraints associated with e-commerce data. Overall, the findings underscored the promise of integrating e-commerce data, feature selection, and ML for effective poverty estimation.

Journal reference:
Soham Nandi

Written by

Soham Nandi

Soham Nandi is a technical writer based in Memari, India. His academic background is in Computer Science Engineering, specializing in Artificial Intelligence and Machine learning. He has extensive experience in Data Analytics, Machine Learning, and Python. He has worked on group projects that required the implementation of Computer Vision, Image Classification, and App Development.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2024, February 16). E-commerce Data and ML for Poverty Estimation in Indonesia. AZoAi. Retrieved on September 19, 2024 from https://www.azoai.com/news/20240216/E-commerce-Data-and-ML-for-Poverty-Estimation-in-Indonesia.aspx.

  • MLA

    Nandi, Soham. "E-commerce Data and ML for Poverty Estimation in Indonesia". AZoAi. 19 September 2024. <https://www.azoai.com/news/20240216/E-commerce-Data-and-ML-for-Poverty-Estimation-in-Indonesia.aspx>.

  • Chicago

    Nandi, Soham. "E-commerce Data and ML for Poverty Estimation in Indonesia". AZoAi. https://www.azoai.com/news/20240216/E-commerce-Data-and-ML-for-Poverty-Estimation-in-Indonesia.aspx. (accessed September 19, 2024).

  • Harvard

    Nandi, Soham. 2024. E-commerce Data and ML for Poverty Estimation in Indonesia. AZoAi, viewed 19 September 2024, https://www.azoai.com/news/20240216/E-commerce-Data-and-ML-for-Poverty-Estimation-in-Indonesia.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Boost Machine Learning Trust With HEX's Human-in-the-Loop Explainability