Building bridges between AI and modeling: a unified software ecosystem sets the stage for revolutionary advances in climate modeling, material discovery, and energy optimization.
Research: Cohesive AI and Simulation Software Ecosystem for Scientific Innovation. Image Credit: Shutterstock AI
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
In an article recently submitted to the arXiv preprint* server, researchers highlighted the need for a cohesive artificial intelligence (AI)/modeling and simulation (ModSim) software stack to support scientific applications.
With regular updates, this stack must ensure version compatibility across diverse software and computing systems. It should also support binary distributions for emerging scientific workflows. The article emphasized investing in a unified AI/ModSim community stack that complements computer system stacks while addressing unique scientific challenges in interdisciplinary research domains.
Unified AI/ModSim Software Stack
To address future scientific challenges, the next-generation scientific software stack must offer a cohesive portfolio of libraries and tools supporting AI and ModSim approaches.
As scientific research becomes more interdisciplinary, scientists require these integrated toolsets to tackle complex, data-rich problems in climate modeling, material discovery, and energy optimization. This integration must also support the scalability and reliability needed to manage scientific workflows on high-performance computing (HPC) platforms.
A unified software ecosystem that combines established AI frameworks and emerging scientific AI frameworks with established ModSim libraries is essential for effectively solving these next-generation scientific challenges.
The U.S. Department of Energy's (DOE) Office of Science has sponsored significant software stewardship and advancement initiatives, particularly in the post-Exascale Computing Project (ECP) era, to ensure long-term support for libraries and tools used by the DOE community. These efforts have included funding for various libraries and tools in mathematics, data visualization, performance analysis, and programming systems.
Despite the availability of many essential AI libraries and tools within the DOE-supported stack, installing the comprehensive AI stack remains challenging and labor-intensive, especially on high-performance computing (HPC) systems.
The post-ECP software optimization (PESO) Project, led by the authors of this document, is one of the post-ECP efforts focused on expanding the use of Spack in open-science codes and curating and delivering the Extreme-scale Scientific Software Stack (E4S), the software stack developed by ECP. These projects emphasize creating a portable and cohesive software stack that addresses the needs of both the AI and ModSim communities.
Version Management and Deployment Challenges
Critical challenges in developing a cohesive AI and ModSim stack are managing versions and ensuring compatibility across different tools and libraries. While the AI community has progressed with package managers like Pip and Conda, which simplify the installation of individual libraries, these tools often fail to address broader compatibility issues. This can lead to deployment bottlenecks when scientists attempt to integrate AI tools into ModSim-driven workflows.
This gap complicates integration with ModSim tools, making it difficult for scientists to ensure consistent behavior across diverse computing environments. To overcome this, a comprehensive software portfolio that integrates AI and ModSim components is needed.
In contrast, the ModSim community has traditionally focused on building software from source, offering greater configuration and performance optimization flexibility. However, with the increasing need for more productive scientific computing environments, there has been a growing demand for ease of deployment. Consequently, the ModSim community now requires prebuilt binaries for core libraries and tools, reflecting a shift toward standardization and accessibility.
Tools like cross-platform make (CMake) are used for managing builds, and Spack has emerged as a crucial tool for handling dependencies and configurations across libraries and platforms. Spack enables both building from source and creating reusable binaries, providing an efficient path for rapid software deployment, with reusable binary caches offered by Spack and E4S now widely adopted in the community.
AI-ModSim Integration Challenges
The interaction with computing resources in AI and ModSim communities varies significantly. The AI community relies heavily on persistent services and tools like Jupyter Notebooks, Kubernetes, and KubeFlow, which contrast with the ModSim community's traditional reliance on batch schedulers such as SLURM or Flux. AI models require versioning, regular updates, and bug fixes. They are typically deployed as persistent services called machine learning operations (MLOps). MLOps processes must be adapted to scientific contexts to support scalable, reproducible solutions for scientific problems. For scientific problems, ModSim must integrate both traditional batch environments and cloud-like persistent services.
Significant effort is required to develop a consistent software stack that integrates AI and ModSim. Core needs must be identified, converted into specifications, and produced into a cohesive, portable stack that can be regularly updated. Continuous Integration (CI) systems are essential in this effort, allowing automated testing and updates to ensure the stack remains compatible with evolving hardware and software technologies.
While AI tools currently target a broad range of domains, creating AI libraries specifically tailored for scientific applications is anticipated. These tools must be curated and made available to the scientific community to advance scientific research, particularly on Department of Energy (DOE) leadership computing platforms.
Conclusion
In summary, supporting a cohesive, integrated portfolio of AI/ModSim scientific tools and libraries would have enabled the DOE to accelerate development, foster collaboration, and scale AI/ModSim techniques for scientific discoveries.
Continued investments in community-driven ecosystems like the E4S and Spack were recommended to bridge the AI and ModSim communities. Such efforts would ensure a standardized, portable, and performance-optimized ecosystem capable of meeting the demands of next-generation scientific innovation.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Journal reference:
- Preliminary scientific report.
Heroux, M. A., et al. (2024). Toward a Cohesive AI and Simulation Software Ecosystem for Scientific Innovation. ArXiv. DOI: 10.48550/arXiv.2411.09507, https://arxiv.org/abs/2411.09507