Analog In-Memory Computing: A Breakthrough for Efficient AI Processing

Artificial intelligence (AI) models with high parameters have demonstrated remarkable accuracy but pose challenges to energy efficiency in conventional processors. Analog in-memory computing (analog-AI) emerges as a solution by enabling energy-efficient parallel matrix computations. In a recent publication in the journal Nature, researchers presented a chip comprising 35 million memory devices distributed across 34 tiles, achieving an impressive performance of up to 12.4 tera-operations per second per watt (TOPS/W).

Study: Analog In-Memory Computing: A Breakthrough for Efficient AI Processing. Image credit: NicoElNino/Shutterstock
Study: Analog In-Memory Computing: A Breakthrough for Efficient AI Processing. Image credit: NicoElNino/Shutterstock

Background

Over the last decade, AI techniques have found applications in diverse domains, encompassing tasks such as image recognition, speech transcription, and text generation. These advancements hinge on ever-expanding deep neural networks (DNNs) with an increasing number of parameters. Models such as transformers and recurrent neural-network transducers (RNNTs) with billions of parameters have notably improved word error rates (WERs) for speech transcription for the Librispeech and SwitchBoard datasets.

However, hardware progress has lagged, resulting in prolonged training, inference times, and higher energy consumption. Analog in-memory computing (analog-AI) emerges as a solution, leveraging non-volatile memory arrays to perform computation directly in memory. This promises efficiency for large DNNs with fully connected layers. An experimental chip with phase-change memory arrays and analog components demonstrates accurate and energy-efficient natural language processing (NLP) inference, even for substantial models such as RNNTs. This innovation addresses energy inefficiencies associated with data movement, offering a potential leap in performance.

Chip architecture

The chip's architecture features a grid of 34 analog tiles, each housing a 512 × 2,048 phase-change memory (PCM) crossbar array. These tiles are organized into six power domains labeled as north, center, or south and further categorized as east or west. Within each power domain, there is an input landing pad (ILP) and an output landing pad (OLP) connected to sizable static random-access memory (SRAM). The ILP receives digital input vectors (each with 8-bit unsigned integer (UINT8) entries) from external sources, converting them into pulse-width-modulated (PWM) durations transmitted via parallel wires on the 2D mesh. Conversely, the OLP acquires PWM durations and reverses the process to transform them into UINT8 for chip transportation.

Communication between analog tiles occurs using durations, avoiding analog-to-digital conversion at the tile periphery. PCM devices encode analog conductance states by adjusting crystalline or amorphous material ratios. Variable PCM configurations enable flexible weight encoding. Local controllers on each tile define weight setups, MAC operations, and routing schemes within the 512x512 wire mesh. Complex routing patterns are managed by 'Borderguard' circuits and tri-state buffers.

From keyword spotting to speech-to-text transcription

The chip's performance was demonstrated through a multi-class keyword spotting (KWS) task. While the machine learning model, namely MLPerf, typically employs a convolutional neural network structure for KWS, we chose a fully connected (FC) network architecture. Both network variants require upstream digital preprocessing to prepare incoming audio waveforms for input. The convolutional neural network MLPerf outperforms the FC model in classification tasks. It offers a simpler architecture and faster performance.

To execute an end-to-end implementation on the chip, the researchers adjusted the audio-spectrum preprocessing to generate 1,960 inputs and expanded the size of hidden layers to 512 per tile. In response to analog noise sensitivity, they integrated weight and activation noise, weight clipping, L2 regularization, and bias removal into the network. A pruned version of this network, accommodating the chip's capacity, was adopted for implementation. For KWS, researchers employed four tiles in total, two for the first weight layer and two for the subsequent two layers.

To enhance the accuracy and account for peripheral circuit asymmetries, researchers introduced the multiply and accumulate computation (MAC) asymmetry balance (AB) method, ensuring accurate computation by canceling out circuitry asymmetries. Each audio frame took 2.4 microseconds, significantly faster than the best-case latency reported by MLPerf. The accuracy of the KWS implementation was 86.14%, well within the MLPerf's software-equivalent accuracy limit of 85.88%.

For a more complex task, the chip demonstrated speech-to-text transcription using the RNNT model. The chip mapped the network's components, and although digital preprocessing remained vital, the chip adeptly handled vector-vector products and activation functions. This capability extended to multiple chips, with one chip's output feeding into another. Remarkably, the chip maintained resilience even after more than a week of PCM drift, resulting in a mere 0.4% increase in the RNNT's word error rate (WER).

Analyzing power consumption and efficiency

An analysis of power consumption highlighted the dominant impact of the 1.5 volts (V) and 0.8 V power supplies on consumption. Sustained TOPS/W values were recorded, with chip four demonstrating the highest performance. The overall energy efficiency of the system was evaluated, revealing that incorporating digital computation would yield similar energy efficiencies. Furthermore, the combined analog-digital processing proved significantly more efficient than pure digital processing.

Conclusion

In summary, researchers showcased successful industry-specific applications on analog-AI chips, focusing on speech recognition and transcription in the field of NLP. Using an all-analog setup with an innovative AB technique, the 14-nm analog inference chip demonstrates software-equivalent accuracy in end-to-end KWS. The study extends to MLPerf RNNT on Librispeech, achieving 9.258% WER with a weight-expansion approach. This pioneering work establishes the first instance of commercially significant accuracy with over 140 analog-AI tiles and efficient neural network activation communication. The findings suggest that, in conjunction with efficient on-chip auxiliary computation, analog-AI systems can offer sustained energy efficiency and throughput at impressive levels.

Journal reference:
Dr. Sampath Lonka

Written by

Dr. Sampath Lonka

Dr. Sampath Lonka is a scientific writer based in Bangalore, India, with a strong academic background in Mathematics and extensive experience in content writing. He has a Ph.D. in Mathematics from the University of Hyderabad and is deeply passionate about teaching, writing, and research. Sampath enjoys teaching Mathematics, Statistics, and AI to both undergraduate and postgraduate students. What sets him apart is his unique approach to teaching Mathematics through programming, making the subject more engaging and practical for students.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Lonka, Sampath. (2023, August 25). Analog In-Memory Computing: A Breakthrough for Efficient AI Processing. AZoAi. Retrieved on November 21, 2024 from https://www.azoai.com/news/20230825/Analog-In-Memory-Computing-A-Breakthrough-for-Efficient-AI-Processing.aspx.

  • MLA

    Lonka, Sampath. "Analog In-Memory Computing: A Breakthrough for Efficient AI Processing". AZoAi. 21 November 2024. <https://www.azoai.com/news/20230825/Analog-In-Memory-Computing-A-Breakthrough-for-Efficient-AI-Processing.aspx>.

  • Chicago

    Lonka, Sampath. "Analog In-Memory Computing: A Breakthrough for Efficient AI Processing". AZoAi. https://www.azoai.com/news/20230825/Analog-In-Memory-Computing-A-Breakthrough-for-Efficient-AI-Processing.aspx. (accessed November 21, 2024).

  • Harvard

    Lonka, Sampath. 2023. Analog In-Memory Computing: A Breakthrough for Efficient AI Processing. AZoAi, viewed 21 November 2024, https://www.azoai.com/news/20230825/Analog-In-Memory-Computing-A-Breakthrough-for-Efficient-AI-Processing.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
AI Researchers Reveal New Method for Measuring How Much is 'Too Much' in Image Generation Models