Quality Diversity through Human Feedback (QDHF)

In a recent submission to the arXiv* server, researchers introduced an innovative approach known as Quality Diversity through Human Feedback (QDHF). This approach leverages human feedback to derive diversity metrics, thus expanding the potential applications of quality diversity (QD) algorithms.

Study: Enhancing Quality Diversity through Human Feedback. Image credit: Peshkova /Shutterstock
Study: Enhancing Quality Diversity through Human Feedback. Image credit: Peshkova /Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Background

Foundation models, including large language models and text-to-image systems, have enabled various applications, empowering individuals to nurture creativity and address challenges. Reinforcement learning from human feedback (RLHF) streamlines the usage of these models, enhancing their competence by aligning them with human instructions and preferences. However, RLHF often focuses on optimizing reward models based on average human preferences. In contrast, the current study introduces the concept of learning diversity metrics to guide the optimization process within QD algorithms.

QD methods, such as novelty search and QD, aim to discover a range of diverse or top-performing solutions. Recent research has expanded on QD by improving diversity maintenance, the search process, and optimization mechanisms. Nonetheless, these methods often rely on manually defined diversity metrics, which can be challenging for complex real-world tasks.

Quality Diversity with Human Feedback

QD algorithms excel at exploring diverse, high-quality solutions within a solution space. Recent research has explored automatic diversity discovery through unsupervised dimension reduction methods, but these metrics may not be consistent with optimization.

Researchers, inspired by work in RLHF, introduced a novel paradigm known as Quality Diversity through Human Feedback (QDHF). In QDHF, diversity metrics are obtained from human feedback on solution similarity. This method works well in complex and abstract domains where defining numeric diversity measurements is difficult. It is also more adaptable than manually designing diversity metrics.

Characterization of Diversity Using Latent Projection: Recent research has demonstrated the use of unsupervised dimensionality reduction techniques to learn robot behavioral descriptors from raw sensory data. Within this framework, diversity characterization is viewed as a more generalized process. Given descriptive data containing diverse information, a latent projection transforms it into a meaningful semantic latent space. Initially, a feature extractor function is employed.

Following this, a dimensionality reduction function is used to project the feature vector into a compact latent representation. The latent space contains axes representing diversity metrics, with their magnitudes and directions capturing nuanced characteristics of the data. Linear projection is employed for dimensionality reduction, with parameters learned using a contrastive learning process.

Using this method, the diverse latent space is brought into line with human concepts of similarity and difference. A triplet loss mechanism is employed to optimize the spatial relations of latent embeddings based on human input, minimizing the distance between similar embeddings while maximizing the distance between dissimilar ones. Human judgment is gathered using the Two Alternative Forced Choice (2AFC) approach, which assesses the similarity of input triplets while accommodating human, heuristic, and AI-based feedback.

QDHF: Researchers propose an implementation of QDHF using contrastive learning and latent space projection, augmented with human judgments. With latent space acting as the measuring space and each dimension having a corresponding diversity metric, diversity metrics in QDHF are produced from human feedback on solution similarities. Two training strategies are devised, QDHF-online and QDHF-offline, for scenarios with or without prior human judgment data. QDHF-offline assumes the availability of human judgment data and trains the latent projection before running the QD algorithm. In contrast, QDHF-online adopts an active learning strategy, fine-tuning the latent projection iteratively during the QD process by gathering human judgment data on triplets of solutions. The frequency of fine-tuning decreases as learned metrics become more robust. The latent projection is updated at defined intervals, with each update utilizing a portion of the human feedback budget.

Experiments and results

Researchers performed experiments across three benchmark tasks: the robotic arm, the kheperax, and latent space illumination (LSI). The robotic arm task aims to find inverse kinematics solutions for a planar arm with revolute joints, minimizing joint angle variance by tracking endpoint positions in 2D. In the Kheperax task, the aim is to discover policy controllers in a neural network for a Khepera-like robot navigating a maze using limited-range lasers and contact sensors. The LSI task explores the latent space in a generative model.

In tasks involving the robotic arm and Kheperax, a predefined ground truth diversity metric is used to simulate human feedback. This metric is based on the position of the arm or robot in a 2D space. The evaluation measures include the QD score and coverage. Since there is no ground truth diversity metric available for the LSI task, human feedback on image similarity is collected. In this case, the effectiveness of QDHF is demonstrated qualitatively. For tasks with ground truth diversity metrics, QDHF performs significantly well, particularly QDHF-online in the robotic arm task. In the LSI task without ground truth diversity metrics, QDHF is shown to generate more diverse images compared to random sampling.

Sample efficiency and alignment between learned and ground truth diversity metrics are assessed. The alignment between learned diversity metrics and ground truth metrics is evaluated, demonstrating that QDHF can effectively align its learned diversity space with ground truth diversity, especially concerning the scales on each axis.

Conclusion

In summary, researchers introduced QDHF, which utilizes human feedback to enhance diversity in QD algorithms. Empirical results demonstrate the superiority of QDHF in automatic diversity discovery, comparing favorably to QD with human-designed metrics. In a latent space illumination task, QDHF significantly improves image diversity.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:

Article Revisions

  • Oct 26 2023 - Title Change - "Enhancing Quality Diversity through Human Feedback" to "Quality Diversity through Human Feedback (QDHF)"
Dr. Sampath Lonka

Written by

Dr. Sampath Lonka

Dr. Sampath Lonka is a scientific writer based in Bangalore, India, with a strong academic background in Mathematics and extensive experience in content writing. He has a Ph.D. in Mathematics from the University of Hyderabad and is deeply passionate about teaching, writing, and research. Sampath enjoys teaching Mathematics, Statistics, and AI to both undergraduate and postgraduate students. What sets him apart is his unique approach to teaching Mathematics through programming, making the subject more engaging and practical for students.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Lonka, Sampath. (2023, October 25). Quality Diversity through Human Feedback (QDHF). AZoAi. Retrieved on July 06, 2024 from https://www.azoai.com/news/20231025/Quality-Diversity-through-Human-Feedback-(QDHF).aspx.

  • MLA

    Lonka, Sampath. "Quality Diversity through Human Feedback (QDHF)". AZoAi. 06 July 2024. <https://www.azoai.com/news/20231025/Quality-Diversity-through-Human-Feedback-(QDHF).aspx>.

  • Chicago

    Lonka, Sampath. "Quality Diversity through Human Feedback (QDHF)". AZoAi. https://www.azoai.com/news/20231025/Quality-Diversity-through-Human-Feedback-(QDHF).aspx. (accessed July 06, 2024).

  • Harvard

    Lonka, Sampath. 2023. Quality Diversity through Human Feedback (QDHF). AZoAi, viewed 06 July 2024, https://www.azoai.com/news/20231025/Quality-Diversity-through-Human-Feedback-(QDHF).aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Advancing Large Language Models with Multi-Token Prediction