In a study published in Electronics Letters, researchers from Iran University of Science and Technology propose a fast neural architecture search (NAS) approach to design optimal convolutional neural networks (CNNs) for license plate recognition (LPR). Their method utilizes a flexible super-network with efficient custom convolutional filters and structural pruning to effectively explore the architecture space. Multi-objective optimization generates Pareto-optimal networks, balancing accuracy and computational cost. Experiments on an Iranian license plate dataset demonstrate the approach can produce accurate yet efficient LPR models tailored for vector processors.
Background
LPR is critical for applications like traffic monitoring and law enforcement. CNNs achieve high performance, but further architecture optimization can improve efficiency. NAS automates finding ideal network configurations by searching over possible architectures. However, NAS is expensive computationally. This study presents a fast single-shot NAS using filter pruning to efficiently search architectures for end-to-end LPR.
The authors construct an LPR super-network with depth-wise separable convolutions and recurrent stages. Structural pruning based on filter importance ranking repeatedly shrinks this network to extract efficient architectures. By considering error and computational cost, the method generates Pareto-optimal trade-offs between objectives. Notably, custom rectangular filters are designed to utilize vector processor capabilities fully.
Experiments on a 10,000-sample Iranian LPR dataset demonstrate the NAS approach produces accurate models with over 5x less computation than regular CNNs. Analysis shows most resulting networks lie on the Pareto frontier between error and cost. The findings highlight the potential of tailored CNN architecture search to automate the development of efficient neural models for applications like LPR.
Neural Architecture Search
Neural architecture search aims to automate finding optimal network architectures for a task by searching over possible configurations. This improves on manual hit-or-trial design. However, NAS can be prohibitively expensive due to numerous training iterations. Recent methods address this through efficient search spaces, weight sharing, and single-shot approaches.
Single-shot NAS enables architecture search in a single training run. One method is to gradually prune redundant filters and operations from an over-parameterized super-network. The smallest final architecture that maintains accuracy becomes the NAS result. Structural pruning based on filter ranking has shown promise for fast NAS.
Constructing the Super-Network Architecture
The authors designed a large LPR super-network comprising convolutional stages to extract features from license plate images, recurrent stages for sequence modeling, and a CTC loss layer for end-to-end character recognition. Notably, depth-wise separable convolutions are used as building blocks for efficiency. Pointwise convolutions fuse the separate feature channels. Rectangular filters of sizes 1x1 to 8x8 are crafted to optimally utilize vector processor SIMD instructions, and 2x2 max pooling, Gaussian noise, dropout, and batch normalization aid generalization.
The recurrent stages apply fully connected layers for sequence processing. This compact design achieved high performance without over-parameterization. The network is trained end-to-end with CTC loss suitable for variable input and output sequence lengths in LPR.
Architecture Space via Pruning
The core technique is structured pruning of filters from the trained super-network to extract Pareto-optimal architectures. An L1 regularizer helps identify unimportant filters for removal based on low weight.
Pruning percentage thresholds of 15%, 30%, etc., generate networks of varying complexities in a single training run. Each pruned architecture is then trained independently and evaluated on accuracy and computational cost. The non-dominated networks form the final Pareto-optimal NAS solutions balancing the two objectives.
The approach allows customizing each stage separately instead of using a shared building block everywhere. This tailored optimization significantly reduces operations compared to standard NAS techniques.
Results on License Plates
Experiments used 10,000 real and synthetic Iranian license plate images under diverse conditions for training and evaluation. The NAS method achieved high-accuracy LPR models with up to 7x less computation than regular CNNs designed manually.
Analysis of the Pareto-frontier networks shows larger filters pruned first to maximize reduction. Later stages heavily use small, efficient filters to retain spatial details. The first-stage filters also follow an expanded bottleneck shape in the most compressed network.
Notably, 5 out of 7 candidate architectures lie on the Pareto frontier, indicating that pruning effectively explores the accuracy-efficiency trade-off. The approach can be applied to similar sequence recognition tasks in optical character recognition and speech processing.
Future Outlook
This work demonstrates a fast and effective NAS technique for optimizing LPR model architecture. Tailored convolutional filters for vector processors, efficient super-network design, and multi-objective filter pruning provide customized accuracy-computation trade-offs. The method can help automate the development of application-specific CNN architectures like LPR that are accurate yet efficient. Ongoing research can further enhance the approach and extend it to related domains.