Unlocking Creativity: Human-AI Collaboration for Generating Visual Metaphors

Download PDF Copy

By Ashutosh RoyReviewed by Susha Cheriyedath, M.Sc.Jul 26 2023

Visual metaphors are powerful tools of communication that utilize imagery to convey complex ideas and emotions. However, generating high-quality visual metaphors is a challenging task, often requiring collaboration between human artists and AI systems. In a recent study posted to the arxiv* server, researchers explored the use of Chain-of-Thought (CoT) prompting to enhance the generation of visual metaphors by diffusion-based text-to-image models. The results demonstrated the potential of Human-AI collaboration in improving the quality and compositionality of visual metaphors.

*Study: Unlocking Creativity: Human-AI Collaboration for Generating Visual Metaphors. Image Credit: 3rdtimeluckystudio /Shutterstock*

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

The limitations of current models

Existing models, such as diffusion-based text-to-image models, excel in handling content-based language but struggle to capture the abstraction and implicit meaning of figurative language. Visual metaphors, being inherently abstract, present difficulties for these models in accurately depicting the intended meaning and symbolism. Therefore, novel approaches are required to bridge the gap between linguistic metaphors and their visual representations.

Advancements in language and text-to-image models

Recent advancements in large language models and text-to-image models have shown promise in facilitating creative processes across various domains. These models have demonstrated their ability to understand and generate human-like text, providing a valuable resource for creative endeavors. For instance, PopBlends, developed by Wang et al. (2023), leverages large language models to automatically generate conceptual blends for pop culture references, opening up new avenues for creative expression. Liu et al. (2023) introduced Generative Disco, an AI system that generates music visualizations using language and text-to-image models, providing a valuable tool for professionals in the creative field. These developments highlight the potential of AI systems in augmenting human creativity.

Creating the HAIVMet dataset

To address the challenges in generating visual metaphors, the present study proposed a collaborative approach combining the strengths of large language and diffusion-based models. The process involves three key steps.

First, visually grounded linguistic metaphors are selected from various sources to ensure their potential for visualization. Linguistic metaphors that possess strong visual imagery and implicit meaning are preferred for the dataset. Second, large language models, specifically Instruct GPT-3 with Chain-of-Thought (CoT) prompting, are used to generate visual elaborations of the linguistic metaphors. CoT prompting involves providing detailed instructions to the model step-by-step, allowing for a more precise and accurate depiction of the implicit meanings and visual elements. These visual elaborations capture the essential objects and implicit meanings involved in linguistic metaphors. Finally, diffusion-based models like DALL·E 2 and Stable Diffusion utilize these visual elaborations as input to generate high-quality visual metaphors. Human experts then validate and refine the generated metaphors to ensure their accuracy and artistic quality.

Evaluation and results

Professional artists and designers evaluated the collaborative approach to assess the effectiveness of the generated visual metaphors. The evaluation compared the output of diffusion-based models with and without the input of visual elaborations from large language models. The results showed that the collaborative approach significantly improved the quality of the generated visual metaphors. When using visual elaborations as input, the models were able to capture the implicit meanings better and depict the objects and relationships involved in the linguistic metaphors. LLM-DALL·E 2 emerged as the most successful model, demonstrating the effectiveness of Human-AI collaboration in enhancing visual metaphor generation.

The HAIVMet dataset

The collaborative approach resulted in the creation of the HAIVMet (Human-AI Visual Metaphor) dataset, which consists of a large collection of visually metaphoric images generated for a wide range of distinct linguistic metaphors. The dataset comprises 6,476 visually metaphoric images created through the collaborative framework. These images cover 1,540 unique linguistic metaphors, ensuring a diverse and comprehensive representation of visual metaphors. The HAIVMet dataset serves as a valuable resource for further research and development in the field of visual metaphor generation. It provides a benchmark for evaluating the performance of different models and techniques, allowing for the exploration of new approaches and improvements in the generation of visual metaphors.

Compositionality in visual metaphors

One of the significant findings of this research is the compositional nature of visual metaphors. Visual metaphors often require the combination of multiple elements to capture the metaphorical meaning effectively. The HAIVMet dataset showcases numerous examples of compositional visual metaphors, where the models successfully combine different objects, properties, and relationships to convey the intended metaphorical meaning. This highlights the importance of considering the compositionality of visual metaphors and the need for collaboration between human artists and AI systems to achieve these complex metaphorical representations.

Utilizing visual metaphors in downstream applications

Visual metaphors not only hold artistic and aesthetic value but also have practical implications in various downstream applications. The HAIVMet dataset, with its diverse collection of visual metaphors, was utilized in a Visual Entailment (VE) task to demonstrate its usefulness. The dataset was used to enhance a state-of-the-art VE model, resulting in a substantial improvement in accuracy compared to the model trained solely on real-world images. This showcases the practical utility and meaningful impact of visual metaphors in advancing vision-language models and their ability to capture metaphoric meanings.

Conclusion and future directions

In conclusion, this research demonstrates the potential of Human-AI collaboration in improving the generation of visual metaphors. By leveraging the strengths of large language models and diffusion-based models, researchers have paved the way for improved quality and compositionality in visual metaphors. The collaborative approach and the creation of the HAIVMet dataset provide valuable resources for further research and development.

Future investigations can build upon these findings to advance AI systems' understanding and generation of visual metaphors, opening up new possibilities for creative expression and communication. It is crucial to continue exploring the impact of prompt phrasing, model variations and expanding the research to other languages to ensure broader representation and inclusivity in visual metaphor generation. With continued research and collaboration, the future holds even greater possibilities for generating visually compelling metaphors.

Journal reference:

Preliminary scientific report. Tuhin Chakrabarty et al. (2023) I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors. DOI: https://doi.org/10.48550/arXiv.2305.14724, https://arxiv.org/abs/2305.14724

Posted in: AI Research News

Comments (0)

Written by

Ashutosh Roy

Ashutosh Roy has an MTech in Control Systems from IIEST Shibpur. He holds a keen interest in the field of smart instrumentation and has actively participated in the International Conferences on Smart Instrumentation. During his academic journey, Ashutosh undertook a significant research project focused on smart nonlinear controller design. His work involved utilizing advanced techniques such as backstepping and adaptive neural networks. By combining these methods, he aimed to develop intelligent control systems capable of efficiently adapting to non-linear dynamics.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Roy, Ashutosh. (2023, July 26). Unlocking Creativity: Human-AI Collaboration for Generating Visual Metaphors. AZoAi. Retrieved on July 18, 2025 from https://www.azoai.com/news/20230726/Unlocking-Creativity-Human-AI-Collaboration-for-Generating-Visual-Metaphors.aspx.
MLA
Roy, Ashutosh. "Unlocking Creativity: Human-AI Collaboration for Generating Visual Metaphors". AZoAi. 18 July 2025. <https://www.azoai.com/news/20230726/Unlocking-Creativity-Human-AI-Collaboration-for-Generating-Visual-Metaphors.aspx>.
Chicago
Roy, Ashutosh. "Unlocking Creativity: Human-AI Collaboration for Generating Visual Metaphors". AZoAi. https://www.azoai.com/news/20230726/Unlocking-Creativity-Human-AI-Collaboration-for-Generating-Visual-Metaphors.aspx. (accessed July 18, 2025).
Harvard
Roy, Ashutosh. 2023. Unlocking Creativity: Human-AI Collaboration for Generating Visual Metaphors. AZoAi, viewed 18 July 2025, https://www.azoai.com/news/20230726/Unlocking-Creativity-Human-AI-Collaboration-for-Generating-Visual-Metaphors.aspx.