SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks

ICLR 2025 Spotlight

1NVIDIA Corporation, 2University of Southern California, 3University of Washington

Abstract

Enabling robots to learn novel tasks in a data-efficient manner is a long-standing challenge. Common strategies involve carefully leveraging prior experiences, especially transition data collected on related tasks. Although much progress has been made for general pick-and-place manipulation, far fewer studies have investigated contact-rich assembly tasks, where precise control is essential.

We introduce SRSA (Skill Retrieval and Skill Adaptation), a novel framework designed to address this problem by utilizing a pre-existing skill library containing policies for diverse assembly tasks. The challenge lies in identifying which skill from the library is most relevant for fine-tuning on a new task. Our key hypothesis is that skills showing higher zero-shot success rates on a new task are better suited for rapid and effective fine-tuning on that task. To this end, we propose to predict the transfer success for all skills in the skill library on a novel task, and then use this prediction to guide the skill retrieval process. We establish a framework that jointly captures features of object geometry, physical dynamics, and expert actions to represent the tasks, allowing us to efficiently learn the transfer success predictor.

Extensive experiments demonstrate that SRSA significantly outperforms the leading baseline. When retrieving and fine-tuning skills on unseen tasks, SRSA achieves a 19% relative improvement in success rate, exhibits 2.6x lower standard deviation across random seeds, and requires 2.4x fewer transition samples to reach a satisfactory success rate, compared to the baseline. In a continual learning setup, SRSA efficiently learns policies for new tasks and incorporates them into the skill library, enhancing future policy learning. Furthermore, policies trained with SRSA in simulation achieve a 90% mean success rate when deployed in the real world.

5-min Overview Video with Narration.

Problem Setup

In this work, we consider the problem setting of solving a new target task leveraging pre-existing skills from a skill library. We focus on two-part assembly tasks as shown below.

Given a target task, we assume access to a prior task set. The skill library contains policies that solve each of the prior tasks, respectively. To solve a target task, the goal of reinforcement learning is to find a policy that produces an action for each state to maximize the expected return. We propose to first retrieve a skill (i.e., policy) for the most relevant prior task, and then rapidly and effectively adapt to the target task by fine-tuning the retrieved skill.

Skill Retrieval

To effectively retrieve the skills that are useful for a new target task T , we require a means to estimate the potential of applying a source policy to the target task. We are inspired by two intuitive points:

  • Applying the same policy on two tasks, similar success rate imply similarity in dynamics and initial state distributions.
  • Fine-tuning a source policy on a target task with similar dynamics to the source task could be efficient.
Therefore, we propose using zero-shot transfer success as a metric to gauge the potential to efficiently adapt a source policy to a target task. To identify a source policy with high zero-shot transfer success on a given target task, we propose to learn a function $F$ to predict zero-shot transfer success for any pair of source policy and target task.

1. Dataset Formulation to Learn Transfer Success Predictor

We treat any two tasks from the prior task set as a source-target task pair. For each pair, we evaluate the source policy on the target task to obtain the zero-shot transfer success rate. The transfer success predictor takes the information about source and target tasks as input, and outputs the zero-shot transfer success.

2. Learning Task Features for Transfer Success Predictor

We need a strong featurization of both the source and target tasks for efficient learning of the transfer success predictor. We propose a framework that jointly captures features of geometry, dynamics, and expert actions to represent the tasks.

  • (a) Geometry features are learned from point-cloud input using a PointNet autoencoder.
  • (b) Dynamics features are learned from transition segments using a state-prediction objective.
  • (c) Expert-action features are learned from transition segments using an action-reconstruction objective.


3. Training Transfer Success Predictor

The geometry, dynamics, and expert action features are concatenated together to form task features. We then pass the concatenated task features through an MLP to predict the transfer success.


4. Inferring Transfer Success for Retrieval

At test time, we use the well-trained transfer success predictor to predict the transfer success of applying any prior policy to a new task. We retrieve source policies with high predicted transfer successes.

We compare our approach (SRSA) with the baseline retrieval strategies (Signature, Behavior, Forward, Geometry), as shown in the figure below. Overall, SRSA retrieves source policies that obtain around 10% higher success rates on the test tasks.


Skill Adaptation

Our ultimate goal is to solve the new task as an RL problem. The retrieved skill is used to initialize the policy network. We subsequently use proximal policy optimization (PPO) and self-imitation learning to fine-tune the policy on the target task. We compare SRSA with the leading baseline AutoMate to learn specialist policies for two-part assembly tasks. We consider two settings:

  • Dense-reward setting: includes a reward term imitating reversed disassembly demonstrations, as well as a curriculum.
  • Sparse-reward setting: only provides a nonzero reward for task success to emulate real-world RL fine-tuning, where dense-reward information is much more challenging to acquire.
The learning curves in the set of test tasks show that SRSA achieves strong performance with fewer training epochs and greater stability. Quantitatively, in the dense-reward setting, SRSA reaches an average success rate of 82.6% on 10 test tasks, outperforming AutoMate (69.4%), corresponding to a relative improvement of 19% in performance. Moreover, SRSA shows greater stability, as AutoMate exhibits a 2.6x higher standard deviation across random seeds. In the sparse-reward setting, SRSA delivers a notable 135% relative improvement in the average success rate compared to the baseline.


SRSA Policy Deployment in Simulation

We deploy the trained SRSA policies in simulation on 5 distinct assemblies and show the policy performance in the videos below:

01029
01129
01136
01053
01079

SRSA vs. AutoMate in the Real World

We deploy the trained SRSA policies in the real world and directly compare them to AutoMate.

The experimental setup is the same as AutoMate, we place the robot in lead-through (a.k.a., manual guide mode), grasp a plug, guide it into the socket, and record the pose as a target pose. We then programmatically lift the plug until free from contact; apply an $xy$ perturbation of $ \pm 10$ mm, $z$ perturbation of $15 \pm 5$ mm, and yaw perturbation of $\pm 5$°; apply $x$, $y$, and $z$ observation noise of $\pm 2$ mm each; and deploy a policy.

We show the comparison between SRSA policies and AutoMate policies on 5 distinct assemblies in the videos below.

Frequently Asked Questions