WASHINGTON, D.C - A U.S. Naval Research Laboratory (NRL) research team successfully conducted the first reinforcement learning (RL) control of a free-flyer in space on May 27. In just three months, the crew of three young scientists overcame a swarm of challenges to achieve this groundbreaking advancement in robotic autonomy and space operations. "The APIARY team's achievement will allow us to rapidly adapt robotic systems to new tasks and environments," said NRL Senior Scientist for Robotics and Autonomous Systems Glen Henshaw, Ph.D. "What has previously taken years, took only three months, proving that further advanced development is on the horizon. This swift innovation means more affordable robotic solutions for the U.S. Navy and Department of Defense." Revolutionizing Space Robotics The team comprised NRL Space Roboticist Samantha Chapin, Ph.D., NRL Computer Research Scientist Kenneth Stewart, Ph.D., and NRL's Computer Research Scientist Roxana Leontie, Ph.D., conducted the experiment aboard the International Space Station using the Astrobee robotic platform. Their project, known as the Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY), demonstrated that RL algorithms can control a robot in a zero-gravity environment. "This research is significant because it marks, to our knowledge, the first autonomous robotic control in space using reinforcement learning algorithms," Stewart said. "We believe this breakthrough will build confidence in these algorithms for space applications and generate further interest in expanding this research." A Scientific Breakthrough The NRL APIARY experiment pioneers the use of RL for control of free-flying robots in the zero-gravity environment of space. RL can enable robots to perform very complex tasks, such as assembling large space telescopes and solar power beaming stations. This experiment demonstrates the transformative potential of RL for improving robotic autonomy for space exploration and logistics, as well as developing new RL-based behaviors in minutes to hours tailored to real-time mission needs. "With APIARY, we've definitively shown the viability of using reinforcement learning for space robot control," Chapin said. "This achievement is transformative, as it validates our ability to implement highly complex autonomous behaviors, paving the way for a new era of advanced robotic operations and services in orbit." This test represents a major scientific breakthrough: successfully demonstrating complex robotics algorithms in the space environment, particularly given the inherent difficulties. Within a mere five-minute window on the International Space Station, the algorithms performed flawlessly on their first attempt-a rare occurrence in robotics. Inside the APIARY Experiment Astrobee robots navigate using ducted fans, which are propellers enclosed in protective cages to prevent snagging. Equipped with multiple cameras, they provide diverse perspectives. "The Astrobees are free-flying robots designed for operation within the International Space Station," Henshaw said. "In addition to acting as a platform for space robotics experimentation, these volleyball-sized robots can help provide NASA Mission Control with flexible camera views in areas lacking fixed cameras. This allows ground teams to remotely inspect equipment or monitor operations without requiring astronaut intervention, freeing up valuable crew time." Beyond their imaging capability, Astrobee has enabled experimenters to develop and test various software, as well as attach new sensors or payloads, providing a cost-effective and accessible platform for in-space research that would be challenging to conduct on larger, more expensive spacecraft. The robots' names are Bumble, Honey, and Queen. "Our project's [APIARY] focus is on using free-flying robots for complex in-space assembly, manufacturing, and servicing," Chapin said. "Space robotics are currently in the early stages in terms of how complex autonomy is in space. It's a risk-averse environment where teleoperation by humans is still the norm for critical tasks." While effective, this human-in-the-loop approach limits scalability, she added. "To advance deep-space exploration and large-scale construction, we urgently need higher levels of robotic autonomy. The primary challenge is the lack of safe, accessible platforms for testing advanced algorithms without risking valuable space assets," Chapin said. RL is a type of machine learning where an agent, or in this case a robot, is given a task to complete and is rewarded based on how well it completes the task. The agent is not instructed on how to complete the task but learns through trial and error to maximize its reward. Overcoming the Sim-to-Real Challenge RL typically requires a robot to interact extensively with its environment to learn. It takes an action, receives feedback from the environment, and assesses that feedback against a "reward function." This iterative process is impractical in space; one can't realistically send a robot up to space solely for training. For that reason, the team used NVIDIA's Omniverse, a highly accurate physics simulator, that precisely models the physics of a zero-gravity environment. This allowed them to effectively "turn off" gravity, enabling the robot to maneuver as if in space. Because the simulation was so faithful to reality, the team successfully bridged the "sim-to-real" gap: training in a simulated zero-gravity environment directly translated to successful command and control of the robot in actual zero-gravity conditions. "We specifically used the Proximal Policy Optimization algorithm, a method of deep reinforcement learning," Stewart said. "This means we employed deep neural networks, specifically in a parallel configuration. An 'actor network' trains the robot to perform actions like maneuvering, while a 'critic network' evaluates its performance. These two networks work together to efficiently train the robot to move effectively in a 3D, zero-gravity environment." "I also implemented curriculum learning, an approach where the robot begins training in a simplified environment with an easier task," Stewart added. "We then gradually increase the complexity. For instance, the robot initially learned to move to a single, fixed position. Over time, we introduced increasing levels of randomization, preparing it to handle and adapt to greater variation. This progressive training substantially helped in bridging the 'sim-to-real' gap." The APIARY team had an advantage because they had already been developing ways to perform RL using the NVIDIA Omniverse. The team had previous independent focuses on manipulation with robotic arms and working on locomotion with quadrupeds. As they came together to work on APIARY, they investigated different types of AI algorithms, like RL, to enable greater robot autonomy. By combining their expertise, the team was prepared to tackle training a robot to perform tasks. "We specialize in reinforcement learning, a cutting-edge approach to robotic control," Leontie said. "This allows us to create highly adaptable control policies that are key to achieving greater robot autonomy and moving towards fully unsupervised operations. Applying our RL algorithms to free-flying robots is a direct outcome of our ongoing research, enabling us to advance their self-governance and precision." Accelerated Timelines and High-Stakes Success The team faced an exceptionally tight three-month deadline to conduct their experiment on the International Space Station with Astrobee, a stark contrast to the typical years-long timelines for space projects. This rapid turnaround was possible because Astrobee's hardware and software were already established. On their end, the NRL team had already developed RL techniques using NVIDIA Omniverse, with expertise in training various robots. "We achieved success through a rigorous process of development, extensive simulation, and ground testing," Stewart said. "Our team held numerous meetings and continuously refined our code. Close collaboration with the Astrobee team was vital. Limited Astrobee testing availability made the demonstration of space-based algorithms a challenge. However, the team secured valuable test time through the generous assistance of a fellow research group. This allowed them to successfully demonstrate their algorithms during Astrobee's final testing of this program phase aboard the International Space Station. On Tuesday, May 27, the APIARY team successfully controlled the Bumble Astrobee on board the space station using reinforcement learning. "The goal for free-flying robots in in-space assembly and servicing is to enable rapid, multi-client operations, like refueling or correcting deployment failures," Chapin said. "While current efforts, such as the Robotic Servicing of Geosynchronous Satellites [RSGS] project, largely rely on scripted maneuvers with limited autonomy for rendezvous and proximity operations due to their high-speed, contact-intensive nature, our research pushes for fuller autonomous capabilities. This unlocks rapid task adaptation and expands the operational scope of these essential space servicers." The team concentrated on developing fundamental robotic motions critical for In-Space Assembly, Manufacturing, and Servicing (ISAM). While the broader objective encompasses highly complex ISAM operations, the constrained project timeline necessitated an initial focus on foundational maneuvers: translation, rotation, undocking, and docking. Docking and undocking are central to ISAM operations. These robots need to be able to undock from a base to become free-flying, perform their tasks, and then reliably re-dock for recharging, maintenance, or data transfer. This ensures they can execute their missions and safely return. "My research at NRL quickly identified Astrobee as a unique pathway to obtain real data from the space environment," Chapin said. "Traditional ground-based methods, from simulations to quadcopter tests, simply can't replicate space's intricacies. Collaborating with the Astrobee team, Dr. Kenneth Stewart and I spearheaded the project, developing simulations, conducting tests at NASA Ames, and ultimately executing an experiment on the International space station. This hands-on access to a genuine microgravity environment was indispensable for our scientific objectives." Implications for the Future of Space Autonomy This immediate success provided critical validation for the team's entire simulation pipeline, which included training reinforcement learning algorithms in the NVIDIA Omniverse, testing them on Astrobee simulators, and conducting pre-flight ground tests at NASA Ames' Granite Lab. The fact that these algorithms translated directly from simulated and ground-based environments to the unique conditions of space proves the pipeline's accuracy and reliability. "Our experiment marked a momentous milestone: the first successful application of reinforcement learning to a free-flying robot in space," Leontie said. "This is particularly critical in the highly risk-averse space environment, where the immense cost of orbital assets often hinders the adoption of cutting-edge technologies. By demonstrating that RL can deliver safe and reliable control in a single, brief, on-orbit test, we've taken a crucial step toward building confidence in this advanced autonomy. This achievement is vital for accelerating the integration of RL into future space applications, ultimately enabling more complex and adaptable robotic missions." This achievement significantly boosts confidence in developing and deploying even more advanced robotic manipulation and maneuvering techniques for future space applications, establishing a crucial precedent that simulation-to-space deployment is indeed viable. "The APIARY team's demonstration that reinforcement learning enables autonomous systems to operate effectively in orbit proves the technology's viability and unlocks its potential across diverse domains," Henshaw said. "We're developing tools to rapidly model terrestrial, maritime, and undersea environments." Henshaw set a scene. He said to imagine with just a few scans of a new location, we can build a model and retrain a robot to operate in the new environment in under an hour. "This will allow warfighters in the field to define new tasks and environments, and then have the robot train itself to solve those problems," he said. "Reinforcement learning provides flexibility and potential to control robots across domains, from space to the ground, and from ships to underwater. Our vision is to equip warfighters with the power to adapt robots to any environment and any task, on demand." NRL is greatly appreciative of all the partnerships that arise from joint research activities and knowledge sharing with our colleagues working in civil space at NASA. By teaming the best of national security space research and civil space research we can help advance our national capabilities to the benefit of all. NASA's willingness to have guest researchers participate in research and development activities inside the International Space Station is invaluable to being able to advance what we can all accomplish together. About the U.S. Naval Research Laboratory NRL is a scientific and engineering command dedicated to research that drives innovative advances for the U.S. Navy and Marine Corps from the seafloor to space and in the information domain. NRL, located in Washington, D.C. with major field sites in Stennis Space Center, Mississippi; Key West, Florida; Monterey, California, and employs approximately 3,000 civilian scientists, engineers and support personnel. For more information, contact NRL Corporate Communications at (202) 480-3746 or
[email protected]. ###