Forget flattening. This new AI layer respects your data’s structure — and it's changing the rules of deep learning from the inside out.
Research: NdLinear Is All You Need for Representation Learning. Image Credit: NicoElNino / Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
A new paper by researchers at Ensemble AI published on the arXiv preprint* server, titled “NdLinear Is All You Need for Representation Learning,” introduces a powerful new approach to linear layers in neural networks. At the heart of every deep learning model — whether it’s recognising faces, translating languages, or forecasting weather — lies a simple operation called a linear layer. This layer multiplies inputs by weights, mixes them, and passes the result forward through the network.
The problem? To perform this mixing, traditional models flatten input data into a 1D vector, destroying the natural hierarchy of the information.
Take a color image as an example. It’s composed of height, width, and three color channels (RGB). Flattening it ignores this spatial layout, potentially scattering nearby pixels far apart in the input. The model then has to work harder to learn that these pixels are actually neighbors.
NdLinear flips this script by processing each dimension sequentially rather than all at once. It treats height, width, and color as unique axes and applies transformations to each one in sequence, respecting their structure throughout.
This results in models that are not only more accurate but also more efficient, easier to train, and fully compatible with existing architectures such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), transformers, and multilayer perceptrons (MLPs).
Big Gains, Fewer Parameters
One of the most surprising outcomes of using NdLinear is how much smaller the resulting models are — without sacrificing accuracy.
In a standard image classification task using the popular CIFAR-100 dataset, researchers replaced the final dense layer in a convolutional network with a NdLinear layer. The result was not only higher accuracy but also over 60% fewer parameters. Similar improvements were observed across a range of tasks, including text classification using datasets such as SST-2 and CoLA, time-series forecasting using benchmarks like ETTm1 and ETTh1, and model distillation in large-scale vision transformers.
The key lies in multi-axis factorization: instead of learning one giant weight matrix, NdLinear breaks the task into multiple smaller transformations, each targeting a single axis. This technique resembles Tucker decomposition, which is used in physics and statistics to simplify complex systems.
Plug and Play: Works with Transformers, CNNs, RNNs and More
NdLinear isn't just powerful — it's incredibly versatile.
It can replace any linear layer in a modern deep-learning model. In transformer architectures, including those used in language models like GPT, it reduced the number of parameters while improving language understanding tasks. In CNNs, it preserved spatial patterns more effectively than standard dense layers. In RNNs, it made it easier to process structured inputs at each timestep, such as video frames or multi-channel sensor data. In MLPs, it allowed deep networks to handle grid-like inputs directly, even without convolutional layers.
In one standout experiment involving a BERT model, replacing the linear layers in its classification head with NdLinear improved accuracy on text classification benchmarks while reducing the number of parameters by more than sevenfold.
Efficient Enough for Edge Devices
As AI expands beyond data centers to phones, cameras, and even IoT sensors, efficiency is becoming critical. The fewer parameters a model has, the less memory and energy it consumes.
Because NdLinear significantly reduces model size and computation without degrading performance, it opens the door to powerful deep-learning models that can run on the edge, where resources are limited.
Unlike some efficiency methods that rely on specialized hardware or complex training procedures, NdLinear is built using standard operations like transpose, reshape, and matrix multiplication, all of which are supported by major machine learning frameworks such as PyTorch and TensorFlow.
Better Generalisation Through Structural Bias
One subtle but powerful benefit of NdLinear is that it introduces a useful inductive bias — it assumes that different dimensions of the data should be treated differently.
This bias acts like a helpful rule-of-thumb, encouraging the network to focus on the kinds of patterns that are likely to matter—like vertical lines in images or time order in sequences. Although NdLinear is slightly less expressive than a fully connected layer, this constraint acts as a form of regularisation, boosting performance when the data’s structure aligns with its design.
In practice, this improves the generalization of new data, reduces overfitting, and enhances robustness in real-world settings.
Ready for Large Language Models, Too
In their final experiment, the team replaced linear layers in Open Pretrained Transformer (OPT) models — relatives of GPT — with NdLinear layers. Despite having fewer parameters, these modified models achieved lower perplexity, meaning they were better at predicting the next word in a sentence.
As the models scaled from 124 million to 350 million parameters, NdLinear's benefits became even more pronounced, with a widening performance gap.
The Bigger Picture
The NdLinear paper is part of a broader movement in machine learning: to make models more efficient, structurally aware, and capable of dealing with real-world data as it naturally appears.
Flattening inputs — a relic from early neural network design — may soon be obsolete.
“If we want models that understand space, time, and structure like humans do, we have to stop squashing everything into a line,” says co-author Jerry Yao-Chieh Hu.
By respecting data's inherent dimensionality, NdLinear offers a promising path forward—not just for AI researchers but for everyone who relies on machines to make sense of the world.
Learn more and access the code: https://github.com/ensemble-core/NdLinear

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Journal reference:
- Preliminary scientific report.
Reneau, A., Hu, J. Y., Zhuang, Z., & Liu, T. (2025). NdLinear Is All You Need for Representation Learning. ArXiv. https://arxiv.org/abs/2503.17353