In a paper published in the journal PLOS ONE, researchers aimed to develop a lightweight, interpretable machine-learning (ML) classifier to detect opioid overdoses in emergency medical services (EMS) records. They used annotations from harm reduction paramedics to compare three feature engineering methods, including term frequency-inverse document frequency (TF-IDF) and a custom keyword-based approach.
Models trained with custom features outperformed others, achieving an area under the receiver operating characteristic curve (AUROC) score of up to 0.93. The study suggests that applying this approach to county EMS data could significantly enhance local harm reduction initiatives.
Related Work
Past works have highlighted the significant challenges in accurately tracking opioid overdoses, mainly due to limitations in current surveillance methods relying on delayed and underreported data. Recent efforts have turned to ML applications to address these gaps, aiming for individual risk assessment and population-level surveillance. While EMS encounter data offer promise for more comprehensive monitoring, challenges persist, including reliance on unreliable diagnostic proxies and difficulty capturing cases where patients refuse transport.
Data Sourcing, Ethics Approval, Modeling
Researchers sourced the data for this study from three Texas county EMS providers: Travis, Williamson, and El Paso counties. They choose specific counties deliberately to ensure the representation of diverse county profiles in terms of demographics and geographical characteristics. Travis County, predominantly urban and white, had 33.6% identifying as Hispanic or Latino. Williamson County, a mix of affluent suburbs and rural areas, had fewer than 25% identifying as Hispanic or Latino.
El Paso County, the western-most county, was majority-minority, with 87.2% identifying as Hispanic or Latino. Researchers obtained ethics approval and formulated data requests with subject matter experts and community advisory boards. They tailored these requests to target specific parameters indicative of opioid-related overdose events occurring between 2019 and 2021. A total of 2958 records were annotated for opioid overdose events in a two-phase process, addressing class imbalances and ensuring reliability through expert and paramedic annotations.
Feature engineering involved three approaches: TF-IDF, Cui2Vec concept embeddings, and a custom "flags" method. TF-IDF calculated salient terms for text classification, while Cui2Vec provided concept vectors for chief narratives to mitigate overfitting risks associated with large feature sets. The custom flag approach, derived from qualitative analyses and integrative reviews, identified keywords related to opioids and overdoses, offering interpretability and computational efficiency.
Researchers trained multiple models for each feature set, including GLM, neural networks, Naive Bayes, and extreme gradient boosting (XGBoost), with ensembling using cretensemble. Random search hyperparameter optimization and k-fold cross-validation were employed to reduce overfitting risks. Researchers evaluated performance using AUROC, with final benchmarking conducted on an 80/20 test/train split. They also assessed variable importance to enhance model interpretability.
Model Performance Analysis
Researchers trained 12 models and assembled 3 ensembles across two phases to detect opioid overdoses in EMS encounter data. Phase one models utilizing TF-IDF features exhibited AUROC scores ranging from 0.39 to 0.74, while those trained on Cui2Vec features ranged from 0.364 to 0.81. Custom feature models generally outperformed others, with AUROC scores ranging from 0.78 to 0.85. Phase two models showed similar trends, with TF-IDF models ranging from 0.59 to 0.76, Cui2Vec models from 0.83 to 0.89, and custom feature models from 0.92 to 0.93 AUROC. Notably, the custom flags approach consistently yielded more performant models compared to TF-IDF or Cui2Vec embeddings.
The phase two flags ensemble emerged as the top-performing opioid overdose classifier with an AUROC of 0.93, assessed on a reserved testing set of 592 EMS encounters. A confusion matrix revealed 486 true negatives, 45 true positives, 44 false negatives, and eight false positives. Demographics-based error analyses uncovered disparities in false negative rates by gender but not by ethnicity. Manual error analysis of prediction errors highlighted familiar sources of false negatives, including positive Narcan responses and misleading chief complaints. Conversely, false positives often stemmed from cases of non-opioid overdoses or patient denial of opioid use.
Variable importance analysis indicated that opioid flags in chief narratives and overdose flags in primary impressions were the most critical predictors. It underscores the significance of these variables in reliably predicting opioid overdoses. Despite the promising results, the study's approach has limitations, such as its reliance on keyword flagging, which may overlook cases identified solely by positive Narcan response. Future research avenues may explore additional feature engineering techniques to address such limitations and develop classifiers capable of distinguishing between different overdose events.
Additionally, researchers should endeavor to develop non-binary classifiers capable of distinguishing between mono-opioid and polysubstance overdoses, as well as accidental and intentional overdoses. These advancements could enhance the system's utility for first responders and public health agencies. Additionally, future studies should aim to train models using larger datasets from diverse EMS contexts to ensure robustness and generalizability.
Localized approaches may also be beneficial, allowing individual services to tailor models to their specific data management and demographic contexts. Overall, while the proposed ML framework shows promise for opioid overdose surveillance, ongoing research, and refinement are necessary to address its limitations and maximize its effectiveness.
Conclusion
In summary, this article outlines the development of a lightweight and interpretable binary classifier for identifying opioid overdose events in county EMS records. We found that the custom features method offers computational efficiency and interpretability by comparing feature engineering approaches and employing various model architectures.
Moving forward, researchers should focus their efforts on creating non-binary classifiers to differentiate among various types of overdose events. Nonetheless, the current system represents a valuable tool for enhancing opioid overdose surveillance and informing public health and harm reduction initiatives, mainly through location-aware EMS datasets.
Article Revisions
- Jul 11 2024 - Fixed broken journal link.