Behind the rhetoric: how "Open AI systems" shield big tech’s dominance from real scrutiny.
Research: Why ‘open’ AI systems are actually closed, and why this matters. Image Credit: MMD Creative / Shutterstock
In an article published in the journal Nature, researchers examined the concept of "open" artificial intelligence (AI), critiquing its claims of promoting innovation and democracy while highlighting its failure to disrupt industry power concentration. They analyzed AI systems' openness across models, data, and frameworks, identifying benefits like transparency and reusability. However, the authors argued that "open" rhetoric often reinforced power imbalances, emphasizing the need for clear definitions in policy to address these challenges effectively.
Background
The concept of "open" AI has gained significant attention in recent policy and research discourse. Traditionally, openness in software has been associated with transparency, accessibility, and innovation, as seen in the open-source software movement. However, applying these principles to AI is challenging due to its complex, resource-intensive nature and concentrated industry dominance. Previous studies have explored the risks and benefits of AI openness, but they often lack clarity on the nuanced trade-offs and systemic power dynamics at play.
This paper addressed these gaps by analyzing the material components of AI—models, data, frameworks, and computational power—to determine what "openness" could genuinely provide. It highlighted the affordances of transparency, reusability, and extensibility while exposing how claims of openness were frequently leveraged to maintain industry dominance. By critiquing "openwashing" practices and advocating for clearer definitions, the paper aimed to ground policy discussions in the realities of the current AI ecosystem.
Limits and Affordances of "Open" AI Systems
Open AI systems are shaped by resource-intensive development processes dominated by large tech companies with significant computing power, data access, and research teams. These companies control the conditions for AI development and market access, creating a landscape where openness does not inherently disrupt competitive imbalances.
Despite these limitations, open AI provides three key affordances.
-
Transparency: Open AI often offers access to model weights, training data, and documentation, enabling some validation and auditing. However, the researchers emphasized that transparency is often superficial, with opaque training data and labor processes underlying supposedly "open" systems, creating barriers to accountability. Due to the probabilistic nature of AI, transparency does not guarantee predictability or explainability of system behavior.
-
Reusability: Some models and data are licensed for third-party use, supporting claims of fostering market competition. Nevertheless, high computational costs and limited access to proprietary frameworks like Nvidia's CUDA software significantly hinder the potential for reusability. The researchers pointed out that even well-funded open AI start-ups often depend on partnerships with dominant players like Microsoft Azure or Google Cloud, perpetuating industry dependence.
-
Extensibility: Open AI models allow fine-tuning for specific tasks, enabling users to build on pre-trained models. This extensibility often serves corporate interests by shifting the costs of customization and development to users, while the foundational control remains with large companies. For instance, Meta's LLaMA models provide only limited access to data and impose restrictive licensing terms, illustrating "openwashing" practices that undermine genuine openness.
Ultimately, open AI’s benefits are constrained by systemic market dominance and bottlenecks created by large technology companies.
The Political Economy of "Open" AI Systems
The creation and deployment of large AI systems involve various elements, including models, data, labor, development frameworks, and computational power. Understanding these components reveals the challenges and limitations of “openness” in AI.
-
AI Models: AI models, such as chat generative pre-trained transformers (ChatGPT), are only part of operational systems that include software clients and supporting infrastructure. While some models are labeled “open,” their underlying data, labor processes, and reinforcement learning details often remain inaccessible. For example, Meta’s LLaMA models and MosaicML’s MPT advertise openness but often rely on closed datasets or omit critical documentation. This selective openness creates confusion and limits the practical utility of these models.
-
Data: Data is vital yet often remains opaque in supposedly open AI systems. Training datasets are frequently derived from copyrighted or culturally sensitive materials, raising ethical and legal concerns. The lack of transparency in data sourcing disproportionately impacts communities in the Global South, whose intellectual resources are often exploited without acknowledgment or compensation. Efforts to curate datasets like BigScience’s BLOOM model require intensive labor, yet data transparency remains scarce. This lack of openness hinders reproducibility and fosters exploitation, particularly of resources and intellectual property from marginalized regions.
-
Labor: Developing AI systems involves extensive human labor for data preparation, model calibration, and content moderation. Practices like reinforcement learning from human feedback (RLHF) depend on low-paid, often precarious workers. The authors highlighted the exploitation of workers in the Global South, where individuals earning less than $2 per hour contribute to the critical labor required for system calibration and content moderation. Despite their contributions, these workers are excluded from the benefits of AI development, perpetuating inequities and undermining claims of democratization.
-
Development Frameworks: Frameworks like PyTorch and TensorFlow streamline AI development while benefiting their corporate creators, Meta and Google, by fostering dependence on their ecosystems. These frameworks standardize AI development, aligning it with corporate platforms and entrenching industry dominance. For example, Meta’s PyTorch is tightly integrated with its internal systems, incentivizing developers to adopt its framework while indirectly reinforcing Meta’s influence in the AI landscape.
-
Computational Power: Large-scale AI development requires immense computational resources, primarily controlled by corporations like Nvidia, which dominates the AI chip market. Nvidia’s CUDA ecosystem, cited as the "de facto industry standard," imposes significant barriers to entry for smaller players due to its proprietary nature. Training state-of-the-art models often demands energy-intensive processes, making access to computational resources one of the most significant bottlenecks for independent AI development.
Conclusion
In conclusion, the "open" AI concept, often associated with transparency and innovation, faces critical challenges in disrupting industry dominance. While open AI offers benefits like transparency, reusability, and extensibility, its potential is limited by opaque data practices, labor inequities, and concentrated computational power. Corporate actors often exploit the rhetoric of openness to consolidate power while masking monopolistic tendencies.
The authors emphasized that true democratization of AI development requires not just openness but robust policy interventions, including antitrust measures and protections for labor and data privacy. They also called for a re-evaluation of the role of openness, advocating for alternative approaches to address systemic inequities in AI.