New Bayesian Framework Balances AI Accuracy and Privacy Risks

Researchers have developed a cutting-edge framework to predict how AI identification methods perform at scale, paving the way for safer, more ethical technology applications.

Research: A scaling law to model the effectiveness of identification techniques. Image Credit: vs148 / ShutterstockResearch: A scaling law to model the effectiveness of identification techniques. Image Credit: vs148 / Shutterstock

AI tools are increasingly being used to track and monitor us both online and in person, yet their effectiveness comes with significant risks. Computer scientists at the Oxford Internet Institute, Imperial College London, and UCLouvain have developed a new mathematical model based on the Pitman-Yor process, a Bayesian statistical framework, that could help people better understand the dangers posed by AI and assist regulators in protecting people's privacy. The findings have been published in the journal Nature Communications

A Breakthrough in Evaluating Identification Techniques

For the first time, the method provides a robust scientific framework for evaluating 'exact,' 'sparse,' and 'robust' identification techniques, especially when dealing with large-scale data. This could include, for instance, monitoring how accurate advertising code and invisible trackers are at identifying online users from small pieces of information such as time zones or browser settings (a technique called 'browser fingerprinting').

The method draws on entropy and tail complexity, two key parameters of the Pitman-Yor process, to learn how identifiable individuals are on a small scale and extrapolate the identification accuracy to larger populations up to 10 times better than previous heuristics and rules of thumb. This gives the method unique power in assessing how different data identification techniques will perform at scale in various applications and behavioral settings. It could help explain why some AI identification techniques perform highly accurately when tested in small case studies but then misidentify people in real-world conditions.

Real-World Applications and Ethical Considerations

Lead author Dr. Luc Rocher, Senior Research Fellow, Oxford Internet Institute, part of the University of Oxford, said: "We see our method as a new approach to help assess the risk of re-identification in data release, but also to evaluate modern identification techniques in critical, high-risk environments. In places like hospitals, humanitarian aid delivery, or border control, the stakes are incredibly high, and the need for accurate, reliable identification is paramount." 

The findings are highly timely, given the challenges posed to anonymity and privacy caused by the rapid rise of AI-based identification techniques. For instance, AI tools are being trialed to automatically identify humans from their voices in online banking, their eyes in humanitarian aid delivery, or their faces in law enforcement. However, the research also highlights that a high correctness score in testing may not always guarantee reliable performance at scale, especially for underrepresented groups or minorities.

Supporting Data Protection and Privacy Legislation

According to the researchers, the new method could help organizations better balance the benefits of AI technologies with the need to protect people's personal information, making daily interactions with technology safer and more secure. By incorporating compliance checks for global data protection laws, including GDPR, their testing method allows for the identification of potential weaknesses and areas for improvement before full-scale implementation, which is essential for maintaining safety and accuracy. 

Co-author Associate Professor Yves-Alexandre de Montjoye (Data Science Institute, Imperial College, London) said: "Our new scaling law provides, for the first time, a principled mathematical model to evaluate how identification techniques will perform at scale. Understanding the scalability of identification is essential to evaluate the risks posed by these re-identification techniques, especially in high-stakes environments where misidentifications could have critical consequences, including ensuring compliance with modern data protection legislations worldwide." 

Dr. Luc Rocher concluded: "We believe that this work forms a crucial step towards the development of principled methods to evaluate the risks posed by ever more advanced AI techniques and the nature of identifiability in human traces online. We expect that this work will be of great help to researchers, data protection officers, ethics committees, and other practitioners aiming to find a balance between sharing data for research and protecting the privacy of patients, participants, and citizens."

Source:
Journal reference:

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.