Autonomous vehicles may soon become a reality in many industries, including agriculture, transportation, and the military. Many of the features of cars are dependent on data from sensors and certain artificial intelligence (AI) systems. Vehicles must gather data, plot their course, and then follow that course. These tasks, particularly the final two, call for sophisticated use of machine learning methods, which are a component of AI. The recent increase in focus on AI is sure to spark new developments in this domain.
Numerous duties of autonomous cars still pose serious difficulties and call for advanced solutions, as it is difficult to replace a human's cognitive and physical skills, and it will take years to perfect. AI must be used in various functions to develop a trustworthy and safe autonomous vehicle.
Uses of AI in Autonomous Vehicles
The applications of AI in AVs can be broadly categorized into the following branches:
- Perception
- Localization and Mapping
- Planning and Decision Making
Perception
Perception gathers data from sensors and unearths pertinent information from the surrounding environment. It fosters the development of contextual awareness of the driving environment, including the detection, tracking, and segmentation of barriers, road signs/markings, and open regions that may be driven in. Depending on the sensors used, Light Detection and Ranging (LIDARs), cameras, radars, or a combination of these three types of equipment can be used to solve the environment perception problem.
Direct perception, behavior reflex perception, and mediated perception are the three types of perception techniques. The environment is created in great detail via mediated perception in the form of nearby objects such as cars, people, trees, and road signs. The mapping of sensor data [image, point cloud, global positioning system (GPS) position] to driving maneuvers is known as behavior reflex perception. The metric collection of the mediated perception technique is combined with behavior reflex perception in direct perception.
Traffic sign identification is one of the most traditional perceptual problems that AI techniques have mastered. In certain studies, the accuracy ratio of AI techniques like deep neural networks (DNN) was 99.46% and exceeded human recognition. The convolutional neural network (CNN) model structure shows comparable accuracy levels in similar tasks, such as identifying lanes and traffic signals. CNNs are a class of deep learning algorithms that learn the spatial relationship of a scene in an image. They process the image block by block in an iterative manner.
Some of the important issues solved by AI that are classified as perception are 2D and 3D object detection, semantic segmentation, object tracking, depth estimation, and human pose estimation. The output of these tasks form the input to make driving decisions and drive the vehicle in a safe and human-like manner.
Localization and Mapping
From fixed, interior mapping for mobile robot applications to outdoor, dynamic, high-speed localization and mapping for AVs, localization and mapping have advanced over the years. The simultaneous location and mapping (SLAM) acronym is another moniker for this procedure. Visual SLAM and LIDAR SLAM are the two main types of SLAM, and the former works on data from a camera while the latter works on data from a LIDAR sensor.
Models can fuse data from GPS, inertial odometry, and cameras as the input of the SLAM to estimate vehicle trajectory and a sparse 3D scene reconstruction. Image pairs are aligned based on similarity and further used to detect potential street-view changes.
A very basic example of the use of SLAM is that of a robot given the task of cleaning a room. A robot that proceeds randomly without understanding the structure of the room will do a suboptimal job in this case and take longer with more energy consumption, while a robot utilizing SLAM can learn about the structure and layout of the room to work in the most optimal path.
Planning and Decision Making
Planning and decision making is the heart of the algorithm that drives an autonomous vehicle. It can be divided into two subtasks: human behavior modeling/prediction and vehicle driving behavior modeling and decision making.
Human Behavior Modeling/Prediction
Pedestrian behavior modeling can be typically classified as physics-based, pattern-based, and planning-based. Most of the deep learning-based methods and generative adversarial networks (GAN)-based methods are pattern-based, while reinforcement learning-based methods are planning-based.
GANs learn to generate data as shown by a competition between two models. The generating model tries to fool the discriminating model that the data generated is the actual supplied data while the discriminator tries to segregate, and it is a constant competition during training between the two. Reinforcement learning refers to a type of learning where the model learns from making mistakes and gets penalized.
In order to create the temporal pattern, pedestrian behavior modeling techniques mostly use RNN/LSTM, but GAN-based techniques enhance prediction model training using adversarial data. Some techniques that improve the capacity to portray social interaction in busy pedestrian activities mimic the attention mechanism in the prediction framework. Hence, the attention mechanism finds an interesting use case here, which is generally employed for natural language problems.
Vehicle Driving Behavior Modeling and Decision Making
Vehicle behavior prediction models are categorized into physics-based, maneuver-based, and interaction-aware models. Deep learning-based methods are roughly classified based on the model types as CNN, recurrent neural network (RNN), long short-term memory network (LSTM), gated recurrent unit (GRU), GAN, GNN, and deep reinforcement learning (RL).
Deep RL with Deep Q-Network (DQN) can be used to learn maneuver decisions based on a compact semantic state representation of all the scene objects, such as vehicles, pedestrians, lane segments, signs, and traffic lights, where the state and reward are extended by a behavior adaptation function and a parameterization, respectively. This is an excellent modeling of the problem through RL.
A technique converts the context of each traffic actor in its immediate vicinity into a raster picture, which is then fed into deep convolutional networks to provide relevant features for forecasting various potential courses.
TrafficPredict, an LSTM-based real-time traffic prediction algorithm on the frontal view, is proposed. It uses an instance layer to learn the movements and interactions of instances (vehicles, bicycles, and pedestrians) and a category layer to learn the similarities of instances belonging to the same type to improve the prediction. LSTMs are a type of neural networks for working on time series data.
By operating on vectorized high-definition (HD) maps and agent trajectories, the recent work from Google, dubbed VectorNet, is a hierarchical GNN that takes advantage of the spatial locality of individual road components represented by vectors and models the high-order interactions among all components. GNNs are neural networks that operate on data modeled in a graphical manner, representing the connections between data.
Future Challenges and Work Ahead
The major challenges faced in the use of AI in autonomous vehicles are either due to the autonomous diving task itself or the deep learning shortcomings. A large amount of data is still needed to train a deep learning model, and model overfitting and the sensitivity to picture alterations are still challenging.
Sensor fusion is a requisite for perception and localization, especially in bad weather conditions, but the modeling of each sensor’s characteristics (capacity and limitation) for these corner cases is not well defined.
Additionally, more data is required to train the vehicle or pedestrian trajectory prediction models, and there is currently a shortage of behavior and intention modeling for both short-term and long-term forecasts. Current models also need more cues from perception, such as a person's gaze and posture, a driver's attitude and hand gesture, and a car's turn signal.
In the future, deep learning applications, especially for real-time deployment in congested and highly dynamic traffic settings, need more maturity in behavior planning and motion planning, as characterizing driving scenarios and replicating driving behaviors from examples are still challenging.
References
Y. Ma, Z. Wang, H. Yang and L. Yang, "Artificial intelligence applications in the development of autonomous vehicles: a survey," in IEEE/CAA Journal of Automatica Sinica, 7, 2, 315-329, 2020, doi: 10.1109/JAS.2020.1003021.
Yu Huang, Yue Chen “Autonomous Driving with Deep Learning: A Survey of State-of-Art Technologies” arXiv:2006.06091v3
Li-Hua Wen, Kang-Hyun Jo, ”Deep learning-based perception systems for autonomous driving: A comprehensive survey,” Neurocomputing, 489, 2022, 255-270, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2021.08.155.