A recent study published in the journal PLOS ONE introduced a highly accurate multi-pill detection framework, a Priori Graph-assisted Pill Detection Network (PGPNet). It addresses the issue of pill misuse, which is quite common because of the visual similarities between many commonly used pills.
Background
According to reports, drug misuse and prescription errors cause thousands of deaths annually. With increasing chronic diseases among the elderly requiring ongoing medication, this drug misuse problem needs urgent solutions. Despite many attempts at exploiting deep learning for pill identification, most works focus only on single-pill identification and do not distinguish multiple drug samples with visual similarities. The present study is the first to address the multi-pill detection challenge in real-world contexts.
About the study
In this study, the researchers intended to localize and identify pills collected by users during pill ingestion. They also present a dataset of many pills consumed under unrestricted settings.
The authors provide a unique approach for building heterogeneity a priori graphs, which take into account three types of inter-pill interactions, namely co-occurrence probability, relative dimension, and visual semantic association, to handle challenging samples. Then, to improve detection precision, they provide an approach for implementing a priori with the visual characteristics of pills. Their experimental findings have demonstrated the suggested framework's endurance, dependability, and explicability.
The following are some of the study’s key contributions:
- The study provides the first actual collection of multi-pill photos, which contains 9,426 images that correspond to 96 different pill classes. The pictures were captured in a variety of scenarios using standard smartphones.
- In order to handle challenging pill samples, the researchers suggest a unique pill detection framework called PGPNet that uses three graph-based a priori, along with co-occurrence probability, relative size of the pill, and ocular semantic connection.
- They also offer a technique for creating these diverse a priori graphs using supplied prescriptions along with the training dataset of pill images.
- They carried out extensive trials to assess the effectiveness of the suggested remedy via comparing it to the current state-of-the-art.
The proposed framework - PGPNet
The authors concentrated on a useful application that can identify pills in images of patient ingestion. Their model takes an image of several pills in the form of input and produces the bounding box along with the names of every pill. Here, they face a crucial challenge in recognizing tablets with the same form, color, and size.
Using the connection between pills instead of counting every single one separately may increase identification accuracy. With regard to this, they suggest introducing two different forms of a priori, the initial modeling, the different sizes of pills, and reflecting the chance of co-occurrence. A specific prescription, along with the pill picture training dataset, is used to extract the a priori, which is subsequently expressed as heterogeneous graphs.
The overall workflow is discussed in the following steps:
- Modeling the priori graph: To show the relationship between all of the medicines in terms of co-occurrence and relative size, the researchers have created two arbitrary graphs: Co-occurrence of Prescription Medicine Graph and Relative Size Graph. In relation to the first, they employed a predetermined collection of prescriptions to predict the association between medications (i.e., medications that are mostly used for managing the same conditions).
- Extraction of visual features: A Convolutional Network (ConvNet) is used to extract visual characteristics from the initial image of the various pills, and a Region Proposal Network (RPN) is used to find probable Regions of Interest (RoI). In order to remove any visual representations of pills, the results of these two components are passed into an RoI pooling layer.
- Extracting the features of the relational inter-pill: Consolidated variants of the Co-graph as well as Size-graph that emphasize the connection between just those pills that are likely to be seen in the image are produced by combining the two a priori graphs with the visual characteristics of the pills.
- Data fusion using the multimodal: The intra-pill visual characteristics together with inter-pill relational attributes are now combined to create improved feature vectors, each of which captures the traits of a pill on its own and in connection to other pills. The final findings are provided using these improved feature vectors.
Results and conclusion
Overall, five algorithms are implemented successfully in this novel work. Three of them use a two-step detector (object detector), and the remaining two use a one-step detector (object detector). The PGPNet uses two evaluation methods viz, mean average precision (MAP) and average precision (AP) for assessing the results. The MAP for all the methods used here has crossed the vanilla object detector. By 9.2%, the suggested technique outperforms the standard Faster R-CNN in terms of performance. PGPNet also performs better under rigorous measures, such as AP75, than Faster R-CNN by 8–9%. The researchers also noted the same behavior while applying the ResNet-50-FPN algorithm. The MAP evaluations increased by 9.4% with the planned PGPNet.
The study proposed PGPNet, a practical framework for pill detection that is both accurate and explicable. PGPNet used external knowledge, such as co-occurrence probability, relative pill dimension, and visual semantic association throughout the training procedure to deal with difficult data. The authors integrated PGPNet into two well-known item detectors and tested the suggested approach using a real-world dataset for multiple pill identification. The experimental findings showed that it could significantly outperform these models. The extensive ablation investigations further demonstrated the suggested framework's robustness, dependability, and explainability.