By fusing large-scale AI models with real-time reasoning, Johns Hopkins APL equips robots to navigate, adapt, and make decisions in chaotic environments—redefining what’s possible in disaster zones and combat scenarios.

Researchers are using generative artificial intelligence and cutting-edge scene-mapping technology to elevate robots from simple tools to full teammates capable of providing aid in disaster and battlefield scenarios. Credit: Johns Hopkins APL/Ed Whitman
Researchers at the Johns Hopkins Applied Physics Laboratory (APL) in Laurel, Maryland, are advancing robotic perception capabilities by using artificial intelligence (AI) to equip autonomous agents with the capacity to make sense of unstructured environments and make plans like humans.
It's an effort with significant implications for the nation's warfighters and first responders, particularly in complex or challenging off-road environments.
"Robots with strong, human-like perception, coupled with the ability to reason about tasks they're given, is a broad capability we could apply wherever there's dangerous or dirty work for humans," said David Patrone, acting program manager for Robotics and Autonomy at APL. "Humans could take a managerial role instead of going into dangerous situations themselves."
The project, known as Full Scene Extraction, involves training robots to gather information about their surroundings to create contextual understanding of their environment. The desire is for autonomous agents to understand the space they're in independently, plan potential paths, and execute sequential tasks accordingly.
For instance, an agent equipped with Full Scene Extraction could be instructed to hide while navigating a tree-lined path. Intuitively, the robot would understand that moving behind a tree or under a bush accomplishes this task.
"The ability for us humans to understand and perceive our environment is something we take for granted, but it's something that robots have historically struggled with," said Corban Rivera, a senior AI and robotics researcher at APL. "When things move quickly or when they're in a dynamic scene, robots have trouble navigating. Our goal is to close that gap between human instinct and robotic reaction."
Applying AI
Just out of the box, today's robots require extensive training and human guidance - usually with a controller - to begin completing simple tasks. With Full Scene Extraction, however, APL researchers are working toward a paradigm where embodied agents can perceive and reason simultaneously, leveraging foundation models - large-scale AI models trained on vast datasets to perform a wide range of tasks - to process complex environments and execute commands in plain English, very much like communicating with a human.
By integrating large-scale perception models with advanced reasoning capabilities, researchers aim to enable robots to see and understand their surroundings, adapt dynamically, and make informed decisions in real time.
"We want to enable the agent to complete a command, make progress against the command or come back with follow-up questions to sort out any ambiguity," Rivera said. "Full Scene Extraction is leveraging agentic artificial intelligence to achieve these human-like perception and reasoning abilities for autonomous agents. It's a significant step forward in the robotics field."
The team is tapping into advances in large language models and visual language models to help a robot understand its environment.
"The Full Scene Extraction framework has a robot reason on its own through all the tiny, in-between steps of a task," said Rohita Mocharla, a computer vision engineer at APL. "Previously, we needed to program each step for a robot to be successful. Agentic AI allows the robot to plan out these steps."
Where robots once needed several weeks in a simulated environment to learn and make progress toward a task, Rivera and Mocharla said a robot can now accomplish that task on its first try. This is proof that the AI algorithms being applied with Full Scene Extraction are advancing the robot's skills.
"That's a bar that has never existed before," said Rivera. "I'm amazed and inspired by what's possible today that wasn't even possible two years ago."
Building Blocks for Innovation
Full Scene Extraction builds off other APL projects and innovative advancements in human-robot teaming, like that of Concept Agent and robotic perception. Concept Agent, which was also funded by the U.S. Army Combat Capabilities Development Command Army Research Laboratory, is an autonomous AI agent framework that uses a large language model to reason sequentially and enhance a robot's ability to create task execution plans, evaluate progress, and replan to complete a command.
According to Rivera, who helped develop Concept Agent, the Full Scene Extraction team is building off this previous work to develop a more advanced human-robot teaming capability. While Concept Agent focused on extending the open-world reasoning capabilities of robots, Full Scene Extraction advances perception by infusing specific military concepts of interest into open-world models using parameter-efficient training. Full Scene Extraction's perception accuracy and Concept Agent's open-world reasoning enable robots to be more successful at autonomous task execution.
Operational Impact
While the capabilities of Full Scene Extraction are still in development, the technology has a range of potential applications, particularly for warfighters or first responders. Search and rescue, casualty extraction, building clearing, tree line detection, or humanitarian relief and recovery are among the potential uses, but the APL team sees even more possibilities.
Researchers also plan to examine how information often gathered in austere environments impacts autonomous agents.
"For example, what happens when lidar or radar are introduced to an autonomous agent?" Rivera asked. "How does that data improve or complicate perception? This is a space where APL can certainly contribute."