Through innovative AI techniques, the study unveils how ChatGPT and DALL·E2 can both highlight and blur the distinct characteristics that make cities unique, offering a fresh perspective on urban identity and design.
Image was created with the assistance of DALL·E 3 - Prompt: Now, please create a 2000-pixel wide, 1333-pixel high photorealistic image that captures the previous question's answers: What is the place identity of streetscapes in Sydney? Research: Place identity: a generative AI’s perspective
In a research paper published in the journal Humanities and Social Sciences Communications, researchers explored the potential of generative artificial intelligence (AI) models, specifically Chat generative pre-trained transformer (GPT) and DALL·E2, to capture the place identity of cities through textual and visual representations. The study is one of the first to assess how these AI models can simulate the built environment concerning place-specific meanings.
By comparing AI-generated content with real-world data from Wikipedia and Google, the authors assessed the models' ability to reflect the distinctive characteristics of 64 global cities and discussed the implications for urban design, geography, and future research opportunities. The comparison involved advanced computational techniques, such as sentence transformers for text analysis and the Learned Perceptual Image Patch Similarity (LPIPS) metric for images, to rigorously evaluate the consistency of AI-generated outputs.
Background
Place identity focuses on how individuals develop a sense of identity through interactions with their physical environment. This concept has been expanded to include terms like place attachment and place uniqueness, highlighting people's relationships with locations.
Traditional studies have explored how physical settings and human perceptions shape place identity, often through qualitative approaches like interviews and photo-elicitation. However, these methods can be time-consuming and limited by small sample sizes, leading to potential biases. More recent studies have leveraged text and images from user-generated content to explore place identity, but challenges in measuring the concept remain due to its subjectivity.
This paper addressed those gaps by testing the potential of generative AI models, specifically ChatGPT and DALL·E2, in capturing the identity of 64 global cities. The study compared AI-generated results with factual data to assess the models' reliability and contribution to understanding place-specific identities.
Computational Methodology for Place Identity Analysis
Using generative AI, the researchers presented a two-step computational framework to explore place identity. First, they created text and image-based datasets using ChatGPT to generate city descriptions and DALL·E2 to generate city streetscapes. The AI-generated outputs were then validated through a cross-validation process that included measuring text similarity using cosine similarity scores and image similarity using the LPIPS metric. These AI-generated results were validated by comparing them with real-world data collected from Wikipedia and Google Images.
Cross-validation techniques, including text and image similarity measurements, were used to evaluate the consistency of AI-generated outputs. Specifically, the text similarity between ChatGPT responses and Wikipedia entries was calculated using a sentence transformer model, while DALL·E2 images were compared to Google Images using the LPIPS metric. The study also involved a human-in-the-loop approach by conducting surveys to assess human perceptions of the similarity between AI-generated images and real-world visuals.
Finally, city-by-city comparisons were conducted using Chamfer distance to determine whether generative AI could distinguish between geographically and culturally similar or distant cities. This approach offered valuable insights into the potential of AI in capturing and analyzing urban place identity.
Results and Analysis
Results from generative AI models, such as ChatGPT and DALL·E2, were examined to evaluate their ability to capture place identity—a concept referring to the unique characteristics of a location shaped by social, cultural, and historical factors. Despite their impressive capabilities in various tasks, generative AI models often produce outputs based on statistical patterns, which raises concerns about their trustworthiness, especially when dealing with nuanced concepts like place identity.
To assess this, the researchers compared ChatGPT's textual outputs with Wikipedia corpora and DALL· E2's visual outputs with Google images of cities like Beijing and New York. For textual analysis, cosine similarity was calculated, revealing moderate alignment between ChatGPT and Wikipedia descriptions, with scores ranging from 0.56 to 0.59.
Visual outputs from DALL·E2 also showed variability in similarity to real-world images, with perceptual scores averaging 0.575. While the models generated contextually relevant and visually recognizable outputs, limitations such as a lack of ground-truth datasets and differences in content length between generated and real-world data were noted. Additionally, the study highlighted the need for more robust human involvement in evaluating these outputs, emphasizing the importance of human judgment in interpreting AI-generated content.
Insights and Challenges
Results showed that generative AI could effectively capture unique characteristics of cities, such as architectural styles, but also generate generic urban scenes that lacked distinct identity markers. For instance, while DALL·E2 successfully depicted iconic features of cities like New York and Paris, it struggled with cities like Tokyo, where it produced more generic and less distinctive images. Ethical concerns, such as bias in representing different communities, were highlighted, as well as the limitations of using training data from sources like Wikipedia, which might introduce circularity.
The authors suggested improvements, including better prompt engineering and using alternative data sources like social media to capture more diverse opinions. Additionally, addressing language barriers in future research is crucial to ensuring more inclusive outputs.
Researchers were encouraged to explore advanced evaluation methods, such as object detection, to further verify the accuracy of generative AI-generated representations. Ultimately, generative AI holds potential for urban studies, particularly in understanding subjective urban experiences and enhancing urban planning by reflecting community preferences.
Conclusion
In conclusion, the authors underscored the promising potential of generative AI models, such as ChatGPT and DALL·E2, in capturing the place identity of cities through textual and visual representations. By comparing AI-generated outputs with real-world data, the research highlights both the strengths and limitations of these models. However, the study also emphasized the need for ongoing improvements in data sources, prompt engineering, and human-in-the-loop evaluations to address the complexities of place identity.
While generative AI could effectively depict certain city characteristics, challenges remained, including the need for improved data sources and evaluation methods. Future research should address these gaps and explore advanced techniques to enhance the accuracy and inclusivity of AI-generated representations.
Journal reference:
- Kee Moon Jang, Chen, J., Kang, Y., Kim, J., Lee, J., Duarte, F., & Ratti, C. (2024). Place identity: a generative AI's perspective. Humanities and Social Sciences Communications, 11(1). DOI: 10.1057/s41599-024-03645-7, https://www.nature.com/articles/s41599-024-03645-7