Scientists reveal how a hidden simplicity bias helps AI systems excel at solving complex problems, mirroring principles found in nature and evolution.
Research: Deep neural networks have an inbuilt Occam’s razor. Image Credit: nobeastsofierce / Shutterstock
A new study from Oxford University has uncovered why the deep neural networks (DNNs) that power modern artificial intelligence are so effective at learning from data. The latest findings demonstrate that DNNs have an inbuilt 'Occam's razor,' meaning that when presented with multiple solutions that fit training data, they tend to favor simpler ones. This preference is rooted in a Bayesian framework, where the network architecture determines a prior over functions, favoring simplicity. What is special about this version of Occam's razor is that the bias exactly cancels the exponential growth of the number of possible solutions with complexity. The study has been published in the journal Nature Communications.
The researchers hypothesized that DNNs would need a kind of 'built-in guidance' to help them choose the right patterns to focus on in order to make good predictions on new, unseen data—even when there are millions or even billions more parameters than training data points.
"Whilst we knew that the effectiveness of DNNs relies on some form of inductive bias towards simplicity – a kind of Occam's razor – there are many versions of the razor. The precise nature of the razor used by DNNs remained elusive," said theoretical physicist Professor Ard Louis (Department of Physics, Oxford University), who led the study.
To uncover DNNs' guiding principle, the authors investigated how they learn Boolean functions—fundamental rules in computing where a result can only have one of two possible values: true or false. The Bayesian analysis used by the researchers enabled them to quantify the prior probability of a function, showing that simpler functions are exponentially more likely under this framework. They discovered that even though DNNs can technically fit any function to data, they have a built-in preference for simpler functions that are easier to describe. This means DNNs are naturally biased towards simple rules over complex ones.
Furthermore, the authors discovered that this inherent Occam's razor has a unique property: it exactly counteracts the exponential increase in the number of complex functions as the system size grows. This allows DNNs to identify the rare, simple functions that generalize well (making accurate predictions on both the training data and unseen data), while avoiding the vast majority of complex functions that fit the training data but perform poorly on unseen data.
This emergent principle helps DNNs do well when the data follows simple patterns. However, when the data is more complex and does not fit simple patterns, DNNs do not perform as well, sometimes no better than random guessing. Quantitative experiments on Boolean functions showed that this drop in performance was particularly pronounced when the inductive bias towards simplicity was weakened by changes to the network's activation functions or initialization parameters. Fortunately, real-world data is often relatively simple and structured, which aligns with the DNNs' preference for simplicity. This helps DNNs avoid overfitting (where the model gets too 'tuned' to the training data) when working with simple, real-world data.
To delve deeper into the nature of this razor, the team investigated how the network's performance changed when its learning process was altered by changing certain mathematical functions that decide whether a neuron should 'fire' or not. They found that weakening the simplicity bias reduced the DNNs' ability to generalize on structured data, while stronger biases supported better alignment with the data.
They found that even though these modified DNNs still favor simple solutions, even slight adjustments to this preference significantly reduced their ability to generalize (or make accurate predictions) on simple Boolean functions. Similar patterns were observed in larger systems like the MNIST and CIFAR-10 datasets, suggesting that this principle applies beyond simple Boolean models. This problem also occurred in other learning tasks, demonstrating that Occam's razor's correct form is crucial for the network to learn effectively.
The new findings help to 'open the black box' of how DNNs arrive at certain conclusions, which currently makes it difficult to explain or challenge AI systems' decisions. However, while these findings apply to DNNs in general, they do not fully explain why some specific DNN models work better than others on certain types of data.
Christopher Mingard (Department of Physics, Oxford University), co-lead author of the study, said: "This suggests that we need to look beyond simplicity to identify additional inductive biases driving these performance differences." Future research may explore how optimization methods like stochastic gradient descent (SGD) introduce new biases that complement the initial simplicity bias.
According to the researchers, the findings suggest a strong parallel between artificial intelligence and fundamental principles of nature. For example, the simplicity bias in DNNs mirrors evolutionary pressures toward symmetry in natural systems like protein complexes. Indeed, DNNs' remarkable success on a broad range of scientific problems indicates that this exponential inductive bias must mirror something deep about the structure of the natural world.
"Our findings open up exciting possibilities," said Professor Louis. "The bias we observe in DNNs has the same functional form as the simplicity bias in evolutionary systems that helps explain, for example, the prevalence of symmetry in protein complexes. This points to intriguing connections between learning and evolution, a connection ripe for further exploration."
Source:
Journal reference: