MIT’s New AI Compiler Speeds Up Machine Learning by 30x Using Data Redundancies

Download PDF Copy

Reviewed by Joel ScanlonFeb 3 2025

A groundbreaking AI compiler from MIT, SySTeC, slashes computational costs by 30x by automatically optimizing deep-learning algorithms. By harnessing both sparsity and symmetry, this innovation paves the way for faster, more energy-efficient AI applications.

Research: SySTeC: A Symmetric Sparse Tensor Compiler. Image Credit: Chaosamran_Studio / Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Deep-learning models, which use neural networks in artificial intelligence applications like medical image processing and speech recognition, perform operations on hugely complex data structures requiring enormous amounts of computation. This is one reason they consume so much energy.

To improve the efficiency of AI models, MIT researchers created an automated system that enables developers of deep learning algorithms to simultaneously take advantage of two types of data redundancy. This reduces the amount of computation, bandwidth, and memory storage needed for machine learning operations.

Existing techniques for optimizing algorithms can be cumbersome and typically only allow developers to capitalize on either sparsity or symmetry, two different types of redundancy in deep learning data structures.

The MIT researchers' approach enabled developers to build an algorithm from scratch that takes advantage of both redundancies simultaneously, boosting the speed of computations by nearly 30 times in some experiments.

Because the system utilizes a user-friendly programming language, it could optimize machine-learning algorithms for a wide range of applications. The system could also help scientists who are not experts in deep learning but want to improve the efficiency of AI algorithms they use to process data. In addition, the system could have applications in scientific computing.

"For a long time, capturing these data redundancies has required a lot of implementation effort. Instead, a scientist can tell our system what they would like to compute in a more abstract way, without telling the system exactly how to compute it," says Willow Ahrens, an MIT postdoc and co-author of a paper on the system, which will be presented at the International Symposium on Code Generation and Optimization.

She is joined on the paper by lead author Radha Patel '23, SM '24, and senior author Saman Amarasinghe, a professor in the Department of Electrical Engineering and Computer Science (EECS) and a principal researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Cutting out computation

In machine learning, data are often represented and manipulated as multidimensional arrays known as tensors. A tensor is like a matrix, which is a rectangular array of values arranged on two axes, rows, and columns. However, unlike a two-dimensional matrix, a tensor can have many dimensions or axes, making it more difficult to manipulate.

Deep-learning models perform operations on tensors using repeated matrix multiplication and addition - this process is how neural networks learn complex patterns in data. The sheer volume of calculations that must be performed on these multidimensional data structures requires an enormous amount of computation and energy.

However, because of how data in tensors are arranged, engineers can often boost the speed of a neural network by cutting out redundant computations.

For instance, if a tensor represents user review data from an e-commerce site since not every user reviewed every product, most values in that tensor are likely zero. This type of data redundancy is called sparsity. A model can save time and computation by only storing and operating on non-zero values.

In addition, sometimes a tensor is symmetric, meaning the data structure's top and bottom halves are equal. In this case, the model only needs to operate on one-half, reducing the amount of computation. This type of data redundancy is called symmetry.

"But when you try to capture both of these optimizations, the situation becomes quite complex," Ahrens says.

To simplify the process, she and her collaborators built a new compiler, SySTeC. This computer program translates complex code into a simpler language that a machine can process. It can optimize computations by automatically utilizing both sparsity and symmetry in tensors.

They began the process of building SySTeC by identifying three key optimizations they can perform using symmetry.

First, if the algorithm's output tensor is symmetric, then it only needs to compute one-half of it. Second, if the input tensor is symmetric, the algorithm only needs to read half of it. Finally, if the intermediate results of tensor operations are symmetric, the algorithm can skip redundant computations.

Simultaneous optimizations

To use SySTeC, developers input their programs, and the system automatically optimizes their code for all three types of symmetry. The second phase of SySTeC performs additional transformations to store only non-zero data values, optimizing the program for sparsity.

In the end, SySTeC generates ready-to-use code.

"In this way, we get the benefits of both optimizations. And the interesting thing about symmetry is, as your tensor has more dimensions, you can get even more savings on computation," Ahrens says.

The researchers demonstrated speedups of nearly a factor of 30 with code generated automatically by SySTeC.

Because the system is automated, it could be especially useful in situations where scientists want to process data using an algorithm they are writing from scratch.

In the future, the researchers want to integrate SySTeC into existing sparse tensor compiler systems to create a seamless user interface. They would also like to use it to optimize code for more complicated programs.

Intel, the National Science Foundation, the Defense Advanced Research Projects Agency, and the Department of Energy partially fund this work.

Source:

Massachusetts Institute of Technology

Journal reference:

Preliminary scientific report. Patel, R., Ahrens, W., & Amarasinghe, S. (2024). SySTeC: A Symmetric Sparse Tensor Compiler. ArXiv. https://arxiv.org/abs/2406.09266

Posted in: AI Research News