Mobile ALOHA: Prepare to have your mind blown

Alan Timm

unread,

Jan 4, 2024, 10:24:31 PM1/4/24

to RSSC-List

You wanna know what the next step in deep learning robotics is going to look like? Here's a peek. Sometimes it's a little unclear in the videos what is teleoperated and what is autonomous but wow!

https://mobile-aloha.github.io/

https://www.youtube.com/watch?v=HaaZ8ss-HP4

https://arxiv.org/abs/2401.02117

Mobile ALOHA

Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Abstract
Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and dexterity necessary for generally useful tasks. In this work, we develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control. We first present Mobile ALOHA, a low-cost and whole-body teleoperation system for data collection. It augments the ALOHA system with a mobile base, and a whole-body teleoperation interface. Using data collected with Mobile ALOHA, we then perform supervised behavior cloning and find that co-training with existing static ALOHA datasets boosts performance on mobile manipulation tasks. With 50 demonstrations for each task, co-training can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously complete complex mobile manipulation tasks such as sauteing and serving a piece of shrimp, opening a two-door wall cabinet to store heavy cooking pots, calling and entering an elevator, and lightly rinsing a used pan using a kitchen faucet.

Alan Downing

unread,

Jan 22, 2024, 10:18:41 PM1/22/24

to Alan Timm, RSSC-List, hbrob...@googlegroups.com

Hi Alan T.,

FYI, you can buy your own Aloha hardware from Trossen:

https://www.trossenrobotics.com/aloha.aspx

Alternatively, you can use the pre-trained Aloha model (called Octo) and fine-tune it so that you can use it to control your own robot arm. The repository for the Octo model is:

https://github.com/octo-models/octo

I currently have my old WidowX robot arm being controlled by the DeepMind RT-X model:

https://github.com/downingbots/DeepMind_RT_X_with_WidowX_Arm

Unfortunately, without fine-tuning the dataset for my robot arm configuration, the RT-X models with my robot arm don't really succeed at completing the language instructions (like "push the white cup next to the black fork".) My next step is to gather a small dataset to fine-tune the Octo model (which also is trained using a subset of the RT-X dataset) and see if the arm has a much higher success rate at performing the language instructions.

Thanks,

Alan D.

--
You received this message because you are subscribed to the Google Groups "RSSC-List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rssc-list+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rssc-list/fbea83af-0b61-45d9-8d6e-50c4750719c4n%40googlegroups.com.