What is MuJoCo and why is it the default physics engine for robotics research?

MuJoCo (Multi-Joint dynamics with Contact) is a high-performance physics engine designed specifically for robotics and biomechanics. It handles contact dynamics, articulated bodies, and soft constraints with numerical stability that is hard to match. DeepMind acquired it in 2021 and released it as fully open source in 2022, after which it became the default simulator for most robot learning research. Its speed and accuracy on contact-rich manipulation is why almost every recent robot policy paper uses it.

LeRobot is Hugging Face's open source robotics framework. It ships with standardized datasets, a unified policy training interface, and a growing collection of pretrained policies. With over 10,000 GitHub stars it has become the community hub for open source robot learning. The framework emphasizes accessibility: the same code that trains a policy on a desktop also runs on an affordable SO-100 robot arm.

What are RT-2, pi-zero, and Octo?

These are vision-language-action foundation models for robotics. RT-2 from Google uses a PaLM-E backbone to translate natural language instructions and camera images directly into robot actions. Physical Intelligence's pi-zero (pi-0) is a vision-language-action model trained on a massive robot demonstration corpus. Octo from Berkeley is an open source transformer policy trained on the Open X-Embodiment dataset that spans many robot embodiments. All three aim to be general-purpose foundations for manipulation.

Why run robot simulation through an API instead of locally?

MuJoCo itself is lightweight and runs on a laptop. The problem is the rest of the stack: rendering, reinforcement learning training, foundation model inference, and dataset management. A serious robot learning setup wants a GPU, a curated dataset library, and several gigabytes of installed packages. An API lets you call MuJoCo and a pretrained policy without any of that, which is especially useful for education, rapid prototyping, and CI-style policy regression testing.

Can I run reinforcement learning training through the API?

Yes, within limits. The API supports short RL training runs (up to a few million simulation steps) through PPO and SAC wrappers on standard MuJoCo tasks. For long runs or custom environments, the recommended pattern is to develop locally and use the API for evaluation and regression. Production-scale training still benefits from dedicated infrastructure.

How does sim-to-real transfer work?

Sim-to-real is the process of training a policy in simulation and deploying it on real hardware. Modern approaches rely on domain randomization (training with varied physics and visuals), privileged learning (training in sim with extra information, then distilling to an observation-only student), and real-world fine-tuning. The API exposes randomization schedules and preset domains so you can experiment with sim-to-real recipes without wiring up your own infrastructure.

MuJoCo and LeRobot Simulation API for Robotics Research

Robotics research has crossed a threshold. The same transformer architectures that rewrote language and vision are now producing general-purpose robot policies that can follow natural language instructions, generalize across tasks, and improve with more data. RT-2, pi-zero, Octo, and a growing list of vision-language-action models are turning what used to be hand-crafted pipelines into foundation-model calls.

But robotics still has a tooling problem. Running a serious experiment requires a physics engine, a rendering pipeline, a policy training framework, a dataset manager, and usually a GPU. Each piece has its own install story and version conflicts. The result is that students and small teams spend weeks getting the infrastructure to work before they write any real code.

SciRouter's robotics API collapses the stack. You describe the task, pick a robot model and a policy, and the API runs the simulation and returns results. No MuJoCo install, no CUDA, no policy checkpoints to download.

Note

The simulation API is for research, prototyping, and education. Deploying learned policies on real hardware always requires local safety infrastructure (emergency stops, force limits, watchdog timers). The API does not drive physical robots directly.

MuJoCo: the default physics engine

MuJoCo (Multi-Joint dynamics with Contact) has been the research community's favorite physics engine for over a decade. It is fast, numerically stable, and handles the contact-rich situations that break most rigid-body simulators. DeepMind acquired it in 2021 and open sourced it in 2022, which removed the last barrier to adoption.

Almost every recent robot manipulation paper uses MuJoCo. Its MJX backend runs on TPUs and GPUs through JAX, which makes it possible to run thousands of parallel simulations for reinforcement learning. Teams that need to train a policy fast use MJX; teams that need cleaner contact modeling use the native CPU backend.

LeRobot: the open source robotics community

LeRobot is Hugging Face's robotics framework. Launched in 2024 and with over 10,000 GitHub stars as of early 2026, it has become the gravity well for open source robot learning. A few things make it distinct:

Standardized datasets. A unified schema for robot demonstrations so policies trained on one dataset can be evaluated on another without glue code.
Affordable hardware recipes. Tutorials and firmware for building a SO-100 robot arm for under a few hundred dollars, making research accessible to students.
Pretrained policies. Open weights for baselines like ACT, Diffusion Policy, and Octo that you can fine-tune or evaluate directly.
Tight MuJoCo integration. The library sits on top of MuJoCo and standard simulators so the same code runs in sim and on hardware.

Vision-language-action foundation models

The most exciting development in robotics over the past two years is the rise of vision-language-action (VLA) models. These are transformers that take a natural language instruction and one or more camera images as input and output robot actions directly. Three names to know:

RT-2 from Google DeepMind

RT-2 was one of the first large VLA models to show strong generalization. It uses a PaLM-E vision-language backbone fine-tuned on robot data, with actions encoded as tokens alongside language. The result was a policy that could follow novel instructions it had never seen during training, because the language backbone carried over real-world knowledge.

pi-zero from Physical Intelligence

Physical Intelligence (the robotics-focused startup) released pi-zero, a VLA model trained on a large multi-robot demonstration corpus. pi-zero is designed for long-horizon manipulation tasks and uses flow matching for action prediction, which gives smoother trajectories than discretized token outputs.

Octo from Berkeley

Octo is an open source VLA transformer trained on the Open X-Embodiment dataset, which spans dozens of robot embodiments and tens of thousands of trajectories. Because it is open, Octo has become the default starting point for teams building on VLA methods without access to proprietary data.

A simulation call

Here is what a MuJoCo simulation call looks like. You specify the robot model, the task, and the policy, and the API returns the trajectory plus success metrics.

mujoco-simulate.py

import httpx

API_KEY = "sk-sci-..."
BASE = "https://scirouter.ai/v1"

response = httpx.post(
    f"{BASE}/robotics/simulate",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "engine": "mujoco",
        "robot": "franka-panda",
        "task": "pick-and-place",
        "policy": "octo-base",
        "instruction": "pick up the red block and place it on the blue pad",
        "num_episodes": 10,
        "max_steps": 300,
        "render": True,
        "domain_randomization": True,
    },
    timeout=600,
)

result = response.json()
print(f"Success rate: {result['success_rate']:.1%}")
print(f"Mean episode length: {result['mean_steps']}")
print(f"Video URL: {result['video_url']}")

The call runs 10 episodes in a few minutes and returns success rates, trajectory data, and optionally a rendered video for each episode. Because the backend uses MJX on GPU, parallelization across episodes is free.

Task libraries and benchmarks

The API ships with hundreds of pretrained tasks covering the main robot learning benchmarks:

Meta-World and Robosuite for classical manipulation benchmarks.
LIBERO for long-horizon lifelong learning.
BEHAVIOR-1K for household-scale tasks with realistic physics.
DeepMind Control Suite for locomotion and continuous control.
Isaac Gym environments for high-throughput RL.

Researchers can evaluate a new policy across multiple benchmarks in parallel with a single API call, which is the kind of reproducible comparison the field needs.

Sim-to-real and domain randomization

Policies trained in simulation rarely work out of the box on real hardware. Visual appearance differs. Friction and mass are not perfectly matched. Sensor noise is different. The standard solution is domain randomization: train with varied physics (friction, mass, damping), varied visuals (lighting, textures), and varied sensor noise, so the policy learns to be robust to the gap. The API exposes a domain randomization scheduler that lets you specify distributions for each parameter.

More advanced approaches include privileged learning (train a teacher with full observability in sim, then distill to a student that matches the hardware observation space) and real-world fine-tuning with small data. The API supports the full menu so you can experiment with different recipes without building the infrastructure yourself.

Tip

A useful sanity check for any new VLA policy: before touching hardware, run it on ten randomly seeded domain randomization variations and make sure success rate is above 50 percent on all of them. Policies that are fragile in sim will be fragile on hardware.

Dataset collection and playback

Robot learning is data-hungry. Collecting demonstrations used to require physical teleoperation rigs; modern practice is to collect a small seed set on hardware, augment with sim-generated data, and fine-tune with behavior cloning. The API supports both sides:

Dataset playback. Play back any LeRobot-format dataset through the simulator to verify it imports cleanly.
Synthetic generation. Use a scripted expert or a pretrained policy to generate thousands of synthetic demonstrations for a task.
Format conversion. Convert between LeRobot, Robosuite, and Hugging Face datasets for cross-framework training.

Who is using the robotics API

Early users span a surprising range:

University research groups evaluating new VLA policies on standard benchmarks.
Student robotics clubs training policies before buying hardware.
Startups building manipulation skills for pick-and-place robots.
Course instructors providing reproducible environments for robot learning classes without requiring every student to have a GPU.

Getting started

The fastest path is Robotics Lab, the web interface that lets you pick a robot, a task, and a policy, and watch a simulation run in real time. Once you have a configuration you like, the lab exposes the corresponding API call for programmatic use.

For production research, the Python SDK handles batching (run hundreds of episodes in parallel), retries, and result aggregation. Using the API instead of a local MuJoCo install means your results are immediately reproducible: anyone with the call payload can reproduce your numbers exactly.

Open Robotics Lab →