Researchers have developed a cutting-edge framework to predict how AI identification methods perform at scale, paving the way for safer, more ethical technology applications.
Research: A scaling law to model the effectiveness of identification techniques. Image Credit: vs148 / Shutterstock
AI tools are increasingly being used to track and monitor us both online and in person, yet their effectiveness comes with significant risks. Computer scientists at the Oxford Internet Institute, Imperial College London, and UCLouvain have developed a new mathematical model based on the Pitman-Yor process, a Bayesian statistical framework, that could help people better understand the dangers posed by AI and assist regulators in protecting people's privacy. The findings have been published in the journal Nature Communications.
A Breakthrough in Evaluating Identification Techniques
For the first time, the method provides a robust scientific framework for evaluating 'exact,' 'sparse,' and 'robust' identification techniques, especially when dealing with large-scale data. This could include, for instance, monitoring how accurate advertising code and invisible trackers are at identifying online users from small pieces of information such as time zones or browser settings (a technique called 'browser fingerprinting').
The method draws on entropy and tail complexity, two key parameters of the Pitman-Yor process, to learn how identifiable individuals are on a small scale and extrapolate the identification accuracy to larger populations up to 10 times better than previous heuristics and rules of thumb. This gives the method unique power in assessing how different data identification techniques will perform at scale in various applications and behavioral settings. It could help explain why some AI identification techniques perform highly accurately when tested in small case studies but then misidentify people in real-world conditions.
Real-World Applications and Ethical Considerations
Lead author Dr. Luc Rocher, Senior Research Fellow, Oxford Internet Institute, part of the University of Oxford, said: "We see our method as a new approach to help assess the risk of re-identification in data release, but also to evaluate modern identification techniques in critical, high-risk environments. In places like hospitals, humanitarian aid delivery, or border control, the stakes are incredibly high, and the need for accurate, reliable identification is paramount."
The findings are highly timely, given the challenges posed to anonymity and privacy caused by the rapid rise of AI-based identification techniques. For instance, AI tools are being trialed to automatically identify humans from their voices in online banking, their eyes in humanitarian aid delivery, or their faces in law enforcement. However, the research also highlights that a high correctness score in testing may not always guarantee reliable performance at scale, especially for underrepresented groups or minorities.
Supporting Data Protection and Privacy Legislation
According to the researchers, the new method could help organizations better balance the benefits of AI technologies with the need to protect people's personal information, making daily interactions with technology safer and more secure. By incorporating compliance checks for global data protection laws, including GDPR, their testing method allows for the identification of potential weaknesses and areas for improvement before full-scale implementation, which is essential for maintaining safety and accuracy.
Co-author Associate Professor Yves-Alexandre de Montjoye (Data Science Institute, Imperial College, London) said: "Our new scaling law provides, for the first time, a principled mathematical model to evaluate how identification techniques will perform at scale. Understanding the scalability of identification is essential to evaluate the risks posed by these re-identification techniques, especially in high-stakes environments where misidentifications could have critical consequences, including ensuring compliance with modern data protection legislations worldwide."
Dr. Luc Rocher concluded: "We believe that this work forms a crucial step towards the development of principled methods to evaluate the risks posed by ever more advanced AI techniques and the nature of identifiability in human traces online. We expect that this work will be of great help to researchers, data protection officers, ethics committees, and other practitioners aiming to find a balance between sharing data for research and protecting the privacy of patients, participants, and citizens."
Source:
Journal reference: