In a paper published in the journal Transportation Research Part F: Traffic Psychology and Behaviour, researchers presented a framework using pre-trained deep neural networks and the random forest (RF) algorithm to automate distraction detection in Australian Naturalistic Driving Study (ANDS) video data.
Background
The naturalistic driving study (NDS) is a research method that focuses on driver performance and behavior in various driving situations, including normal, disabled, and safety-critical scenarios. It involves continuously recording data using a Data Acquisition System (DAS) to monitor the behavior of participating vehicles and other road users. NDS provides valuable insights into enhancing road safety by studying how drivers handle risky situations and avoid accidents.
The current study specifically investigates driver behavior in Australian conditions and focuses on analyzing driver distraction, inattention, and drowsiness using real-time driver monitoring technology.
Previous works
Driver distraction significantly impacts driver performance and safety. Previous studies have utilized NDS to investigate the causes and effects of driver distraction and its impact on driver behavior. Drivers can become distracted by not having their hands on the steering wheel, diverting their attention inside the vehicle rather than focusing on the road, and their vehicles drifting or crossing lane lines. The US 100-Car NDS demonstrated that driver distraction played a major role in vehicle crashes and collisions. Video analysis revealed that front-seat passengers and child occupants were familiar sources of driver distraction. Truck drivers were also frequently engaged in distracting activities. Researchers have employed machine learning techniques, such as convolutional neural networks (CNNs) and gradient boosting, to differentiate between drivers and passengers engaged in reading messages while driving.
Distracted driving statistics
Statistics from the National Highway Traffic Safety Administration in the USA indicate a significant number of distracted driving incidents, with millions of reported injuries and property damage crashes. Distraction factors include activities like talking on the phone, reading, or typing text messages or emails. An analysis of crashes in South Australia between 2014 and 2018 revealed that distraction, inattention, and misprioritized attention contributed to a significant percentage of accidents.
Data used
The current study utilizes transfer learning and RF methods to analyze naturalistic driving data from the ANDS. The dataset consists of 346 privately owned vehicles equipped with DAS over a four-month period. A total of 337 drivers and other household members from metropolitan Sydney, Melbourne, and regional areas of New South Wales and Victoria were responsible for operating these vehicles.
The DAS, provided by the Virginia Tech Transportation Institute, captured variables such as acceleration, indicator status, gyroscopic motion, speed, GPS position, and multi-angle video recordings of the driver's face and interactions. A subset of 194,961 randomly selected trips from the dataset was used for analysis.
Proposed framework
The proposed framework analyzes videos to detect distractions, considering two types of correlation: spatial (between neighboring pixels in the same frame) and temporal (between successive frames). Two approaches can be employed: an end-to-end model or separate models for each correlation. The algorithm utilizes a pre-trained CNN to learn and fine-tune the spatial correlation to generate distraction probabilities. A 1-D time series of distraction probabilities represents the temporal correlation. The RF algorithm approximates the function relating the probabilities within a window centered at each time, outputting a filtered distraction probability.
Results
The dashboard detector and face detector were trained and tested using images from Five and Seven trips, respectively. The models’ hyperparameters were fine-tuned using grid search. The RF algorithm classifiers were also trained using 45 and 140 trips for the dashboard and face, respectively, with five-fold cross-validation. The proposed framework achieved promising results, with a true positive rate, a false positive rate, and precision values of 0.609, 0.218, and 0.325 for the dashboard camera and 0.748, 0.344, and 0.651 for the face camera, respectively.
The researchers used the Local Interpretable Model-Agnostic Explanations (LIME) approach to interpret the fine-tuned neural networks. Superpixels were generated to identify essential areas in the images for distraction prediction. Using LIME, visual inspection revealed that the fine-tuned CNN for the face sometimes emphasized unimportant background areas, possibly due to overfitting.
Conclusion and future work
In summary, the researchers proposed a framework for video data reduction to automate the process and save effort and resources. The use of machine learning helps tackle questions involving large video datasets without requiring extensive manpower.
The framework was applied to detect distractions from the face and dashboard cameras, yielding promising but not highly accurate results. According to the authors, further improvement can be achieved by expanding the training dataset and fine-tuning the CNN.
Future work includes exploring two approaches: consolidating decisions from multiple specialized networks and combining inputs from various sources for robust distraction detection.