In a paper published in the journal PLOS ONE, researchers harnessed the advancements in computer vision, particularly in the evolution of convolutional neural networks (CNN), for tourism image classification to gain deeper insights into tourists' real perceptions of tourism resources, a task beyond the scope of human vision alone. They applied and refined SqueezeNet, a lightweight CNN, by utilizing a dataset.
Related work
Previous studies have underscored the crucial role of images as carriers of human perception and cultural discourse. The evolution of visual representation, from ancient paintings to modern photography, has played a pivotal role in various social and cultural contexts. The integration of art and science, exemplified by works like Leonardo da Vinci's Mona Lisa and Zhang Zeduan's 'along the river at Qingming Festival,' reflects the deep historical roots of image documentation.
The advancement of photography technology, from daguerreotypes to mobile phone cameras, has heightened visual expectations. In the contemporary era, tourism heavily relies on photography, capturing moments to enhance the connection to places. While tourism images provide an objective representation of activities, challenges in quantification persist, prompting the need for scientific technologies to complement human vision in studying tourism images.
Evolution of Lightweight CNNs
Researchers tracked the evolution of CNNs in image classification, which went back to LeNet-5 in 1998, a pioneering model defining the basic structure of CNNs with features like automation and minimal parameters. Subsequent models, such as AlexNet in 2012, introduced innovations like using multiple graphics processing units (GPUs) for training, accelerating convergence rates, and employing rectified linear unit (ReLU) activation functions.
However, advancements came with challenges; in 2014, the visual geometry group (VGG) improved image classification but consumed substantial computing resources, and Google faced issues like overfitting with large parameter sets. In response, SqueezeNet emerged in 2016 as a lightweight CNN, innovating on AlexNet, introducing the fire module, and achieving similar accuracy with significantly fewer parameters, making it suitable for mobile devices.
Researchers characterized SqueezeNet's architecture by its compact size and efficient feature extraction. The model comprises fire modules containing squeeze and expand layers with ReLU activation. Notably, the use of 1×1 and 3×3 convolution kernels optimize parameter efficiency. The overall structure demonstrates the sequential arrangement of convolutional, Fire, and pooling layers, culminating in Softmax functions for classification. This architecture substantially reduces parameters, making it conducive for rapid calculations on mobile terminals.
SqueezeNet has demonstrated versatility and success in various practical applications, such as grading peanut pod quality, achieving 97.83% accuracy, conducting microscopic feature map analysis of Chinese medicinal materials powder with 90.33% accuracy, and classifying rock images with a verification set accuracy of 90.88%. SqueezeNet consistently outperforms other CNNs regarding classification accuracy and model size, reinforcing its significance in image classification research.
SqueezeNet Efficiency in Tourism
Researchers meticulously analyzed the Slender West Lake tourism images in the experimental setup on a computer with an Intel(R)Core (TM)i5-3230M processor, NVidia graphics card, and 4GB of memory. The operating system was Windows 10 version 20H2, and the model ran on matrix laboratory (MATLAB) 2021b. The study addressed the inherent challenges of high-resolution images, including optical interference and system-induced noise, leading to the inclusion of image preprocessing techniques. MATLAB 2021b was employed to mitigate issues such as fuzzy images by adding Gaussian noise and linear spatial filtering, enhancing digital image quality.
The experimental process consisted of multiple steps. First, the researchers constructed an improved SqueezeNet model by adding a two-dimensional convolution layer named coven after the fire9 module SqueezeNet. The model underwent further enhancements, including modifications to the convolution layer cov10, resulting in improved learning speeds.
Researchers adjusted the classification layer to accommodate the new model structure. Subsequently, image data from slender West Lake, totaling 3740 images, was imported, with 70% allocated for the training dataset and 30% for validation. The images were standardized to 227×227 sizes, incorporating random rotations and rescaling for optimal training.
The algorithm employed for training utilized stochastic gradient descent optimization, ReLU activation functions, and dropout technology to prevent overfitting. The model underwent 1040 iterations, showcasing a classification accuracy of 90% for the training set and 85.75% for the validation set. The improved SqueezeNet model demonstrated its efficiency with a compact model size of 2.64 MB and completed training in 74 minutes and 27 seconds.
Comparative analyses with other models, including AlexNet, GoogLeNet, and VGG19, highlighted SqueezeNet's superior performance in terms of accuracy and model size, making it a high-quality and high-efficiency network model. The researchers emphasized the potential applications of SqueezeNet, particularly on mobile devices, in enhancing tourism image analysis and promoting tourism destination resources. The study concluded by affirming the valuable contributions of the SqueezeNet CNN model in image classification for tourism-related applications.
Conclusion
To sum up, this study employed an enhanced SqueezeNet model to classify slender West Lake tourism images, achieving a high validation accuracy of 85.75% with a compact model size of 2.64 MB. Leveraging computer vision provided an objective evaluation tool for tourism image classification, recognizing the limitations in replicating tourists' complex aesthetic perceptions. The research advocates for future studies to combine both human and computer vision for a comprehensive understanding of tourism images, anticipating the integration of artificial intelligence to advance the field further.