T-Max-Avg Pooling for CNNs Unleashes Adaptive Feature Extraction

In an article recently published in the journal Scientific Reports, researchers proposed the novel T-Max-Avg pooling layer for convolutional neural networks (CNNs) to address the limitations of traditional pooling methods.

Study: T-Max-Avg Pooling for CNNs Unleashes Adaptive Feature Extraction. Image credit: Pixels Hunter/Shutterstock
Study: T-Max-Avg Pooling for CNNs Unleashes Adaptive Feature Extraction. Image credit: Pixels Hunter/Shutterstock

Background

CNNs are composed of batch normalization layers, pooling layers, activation layers, and convolutional layers, which are designed for feature extraction from raw data, specifically in the field of image analysis. The pooling layer is one of the crucial components in CNNs and is primarily utilized after the convolutional layer to reduce the parameter count and data volume and decrease the feature maps’ spatial dimensions.

This layer functions by partitioning every feature map into fixed-size regions and calculating the average value/average pooling or selecting the maximum value/max pooling within every region. Average pooling and max pooling are the two primary types of pooling operations.

However, these standard pooling operations are not effective for all data types and applications, which necessitated the development of custom pooling layers capable of learning and extracting relevant features adaptively from specific datasets.

The proposed approach

In this study, researchers proposed a novel approach for designing and implementing customizable pooling layers to improve the feature extraction capabilities in CNNs. The proposed T-Max-Avg pooling layer incorporated a threshold parameter T, which selects the K highest interacting pixels from the input data to control whether the input data output features are based on weighted averages or the maximum values.

Thus, the custom pooling layer can capture and represent discriminative information effectively in the input data by learning the optimal pooling strategy during training, which improves classification performance. The objective of the study was to develop a new pooling method to overcome the limitations of conventional pooling functions, specifically preventing the loss of highly representative values that can be disregarded by conventional pooling methods and ensuring appropriate representation of these values.

To realize this goal, researchers introduced K pixels with the highest representational capacity and incorporated a learning parameter T to compute the average and maximum values of these pixels depending on crucial feature information. This strategy can reduce the drawbacks of average pooling and max pooling methods while eliminating noise.

Evaluation of the approach

Three benchmark datasets, including MNIST, CIFAR-100, and CIFAR-10, and the LeNet-5 CNN model were used to perform experiments to compare the proposed T-Max-Avg pooling method with conventional pooling methods, including Avg-TopK, maximum, and average pooling methods.

Researchers selected the LeNet-5 model due to its robust classification capabilities and simple structure. The LeNet-5 network structure was composed of seven layers, including two fully connected layers, two pooling layers, and three convolutional layers. The CIFAR-10 dataset containing 60,000 color images categorized into 10 distinct classes that include trucks, frogs, ships, horses, dogs, cats, deer, birds, airplanes, and cars is used extensively for computer vision (CV) tasks.

Additionally, the CIFAR-100 dataset is the CIFAR-10 dataset’s extended version used for CV tasks. This dataset consists of 100 fine-grained categories that cover various everyday items, animals, and objects, with every category having 600 images, resulting in 60,000 images.

The MNIST dataset is a handwritten digit recognition dataset utilized extensively in machine learning (ML) and CV research and education. This dataset contains samples representing digits from 0 to 9, with every sample being a 28x28 pixel grayscale image.

Moreover, the effectiveness of the T-Max-Avg pooling technique in transfer learning models, including ChestX, ResNet50, and VGG19, was investigated by performing a series of experiments using the CIFAR-10 dataset. Researchers used the Google Colab for experiments using the LeNet-5 model and a device with NVIDIA GeForce GTX 1050 for the extended experiments.

Significance of the study

Experiments performed using the LeNet-5 CNN model on three datasets demonstrated that the max pooling method was more effective compared to the average pooling method. However, the proposed T-Max-Avg method outperformed both average pooling and max pooling and improved the accuracy of the Avg-TopK pooling method.

The T-Max-Avg pooling method achieved the highest accuracy among all pooling methods on CIFAR-10, CIFAR-100, and MNIST datasets. In the proposed method, selecting a T value of 0.7, K value of six, and pool size of four yielded the highest score for color images, and selecting a T value of 0.8, K value of three, and pool size of three resulted in the highest score for grayscale images.

These results indicated that the T-Max-Avg method can capture feature information more accurately and provide better results during the model training process. Moreover, the T-Max-Avg method attained better results on ChestX and ResNet50 transfer learning models compared to conventional pooling methods when it was applied to these models.

To summarize, the findings of this study demonstrated that the proposed method could effectively address the limitations of conventional pooling methods and offer an alternative option for the expansion and development of existing methods in the field.

Journal reference:
Samudrapom Dam

Written by

Samudrapom Dam

Samudrapom Dam is a freelance scientific and business writer based in Kolkata, India. He has been writing articles related to business and scientific topics for more than one and a half years. He has extensive experience in writing about advanced technologies, information technology, machinery, metals and metal products, clean technologies, finance and banking, automotive, household products, and the aerospace industry. He is passionate about the latest developments in advanced technologies, the ways these developments can be implemented in a real-world situation, and how these developments can positively impact common people.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Dam, Samudrapom. (2024, January 26). T-Max-Avg Pooling for CNNs Unleashes Adaptive Feature Extraction. AZoAi. Retrieved on November 21, 2024 from https://www.azoai.com/news/20240126/T-Max-Avg-Pooling-for-CNNs-Unleashes-Adaptive-Feature-Extraction.aspx.

  • MLA

    Dam, Samudrapom. "T-Max-Avg Pooling for CNNs Unleashes Adaptive Feature Extraction". AZoAi. 21 November 2024. <https://www.azoai.com/news/20240126/T-Max-Avg-Pooling-for-CNNs-Unleashes-Adaptive-Feature-Extraction.aspx>.

  • Chicago

    Dam, Samudrapom. "T-Max-Avg Pooling for CNNs Unleashes Adaptive Feature Extraction". AZoAi. https://www.azoai.com/news/20240126/T-Max-Avg-Pooling-for-CNNs-Unleashes-Adaptive-Feature-Extraction.aspx. (accessed November 21, 2024).

  • Harvard

    Dam, Samudrapom. 2024. T-Max-Avg Pooling for CNNs Unleashes Adaptive Feature Extraction. AZoAi, viewed 21 November 2024, https://www.azoai.com/news/20240126/T-Max-Avg-Pooling-for-CNNs-Unleashes-Adaptive-Feature-Extraction.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Enhanced YOLO Model Detects Overlapping Shoeprints