Robotics research has crossed a threshold. The same transformer architectures that rewrote language and vision are now producing general-purpose robot policies that can follow natural language instructions, generalize across tasks, and improve with more data. RT-2, pi-zero, Octo, and a growing list of vision-language-action models are turning what used to be hand-crafted pipelines into foundation-model calls.
But robotics still has a tooling problem. Running a serious experiment requires a physics engine, a rendering pipeline, a policy training framework, a dataset manager, and usually a GPU. Each piece has its own install story and version conflicts. The result is that students and small teams spend weeks getting the infrastructure to work before they write any real code.
SciRouter's robotics API collapses the stack. You describe the task, pick a robot model and a policy, and the API runs the simulation and returns results. No MuJoCo install, no CUDA, no policy checkpoints to download.
MuJoCo: the default physics engine
MuJoCo (Multi-Joint dynamics with Contact) has been the research community's favorite physics engine for over a decade. It is fast, numerically stable, and handles the contact-rich situations that break most rigid-body simulators. DeepMind acquired it in 2021 and open sourced it in 2022, which removed the last barrier to adoption.
Almost every recent robot manipulation paper uses MuJoCo. Its MJX backend runs on TPUs and GPUs through JAX, which makes it possible to run thousands of parallel simulations for reinforcement learning. Teams that need to train a policy fast use MJX; teams that need cleaner contact modeling use the native CPU backend.
LeRobot: the open source robotics community
LeRobot is Hugging Face's robotics framework. Launched in 2024 and with over 10,000 GitHub stars as of early 2026, it has become the gravity well for open source robot learning. A few things make it distinct:
- Standardized datasets. A unified schema for robot demonstrations so policies trained on one dataset can be evaluated on another without glue code.
- Affordable hardware recipes. Tutorials and firmware for building a SO-100 robot arm for under a few hundred dollars, making research accessible to students.
- Pretrained policies. Open weights for baselines like ACT, Diffusion Policy, and Octo that you can fine-tune or evaluate directly.
- Tight MuJoCo integration. The library sits on top of MuJoCo and standard simulators so the same code runs in sim and on hardware.
Vision-language-action foundation models
The most exciting development in robotics over the past two years is the rise of vision-language-action (VLA) models. These are transformers that take a natural language instruction and one or more camera images as input and output robot actions directly. Three names to know:
RT-2 from Google DeepMind
RT-2 was one of the first large VLA models to show strong generalization. It uses a PaLM-E vision-language backbone fine-tuned on robot data, with actions encoded as tokens alongside language. The result was a policy that could follow novel instructions it had never seen during training, because the language backbone carried over real-world knowledge.
pi-zero from Physical Intelligence
Physical Intelligence (the robotics-focused startup) released pi-zero, a VLA model trained on a large multi-robot demonstration corpus. pi-zero is designed for long-horizon manipulation tasks and uses flow matching for action prediction, which gives smoother trajectories than discretized token outputs.
Octo from Berkeley
Octo is an open source VLA transformer trained on the Open X-Embodiment dataset, which spans dozens of robot embodiments and tens of thousands of trajectories. Because it is open, Octo has become the default starting point for teams building on VLA methods without access to proprietary data.
A simulation call
Here is what a MuJoCo simulation call looks like. You specify the robot model, the task, and the policy, and the API returns the trajectory plus success metrics.
import httpx
API_KEY = "sk-sci-..."
BASE = "https://scirouter.ai/v1"
response = httpx.post(
f"{BASE}/robotics/simulate",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"engine": "mujoco",
"robot": "franka-panda",
"task": "pick-and-place",
"policy": "octo-base",
"instruction": "pick up the red block and place it on the blue pad",
"num_episodes": 10,
"max_steps": 300,
"render": True,
"domain_randomization": True,
},
timeout=600,
)
result = response.json()
print(f"Success rate: {result['success_rate']:.1%}")
print(f"Mean episode length: {result['mean_steps']}")
print(f"Video URL: {result['video_url']}")The call runs 10 episodes in a few minutes and returns success rates, trajectory data, and optionally a rendered video for each episode. Because the backend uses MJX on GPU, parallelization across episodes is free.
Task libraries and benchmarks
The API ships with hundreds of pretrained tasks covering the main robot learning benchmarks:
- Meta-World and Robosuite for classical manipulation benchmarks.
- LIBERO for long-horizon lifelong learning.
- BEHAVIOR-1K for household-scale tasks with realistic physics.
- DeepMind Control Suite for locomotion and continuous control.
- Isaac Gym environments for high-throughput RL.
Researchers can evaluate a new policy across multiple benchmarks in parallel with a single API call, which is the kind of reproducible comparison the field needs.
Sim-to-real and domain randomization
Policies trained in simulation rarely work out of the box on real hardware. Visual appearance differs. Friction and mass are not perfectly matched. Sensor noise is different. The standard solution is domain randomization: train with varied physics (friction, mass, damping), varied visuals (lighting, textures), and varied sensor noise, so the policy learns to be robust to the gap. The API exposes a domain randomization scheduler that lets you specify distributions for each parameter.
More advanced approaches include privileged learning (train a teacher with full observability in sim, then distill to a student that matches the hardware observation space) and real-world fine-tuning with small data. The API supports the full menu so you can experiment with different recipes without building the infrastructure yourself.
Dataset collection and playback
Robot learning is data-hungry. Collecting demonstrations used to require physical teleoperation rigs; modern practice is to collect a small seed set on hardware, augment with sim-generated data, and fine-tune with behavior cloning. The API supports both sides:
- Dataset playback. Play back any LeRobot-format dataset through the simulator to verify it imports cleanly.
- Synthetic generation. Use a scripted expert or a pretrained policy to generate thousands of synthetic demonstrations for a task.
- Format conversion. Convert between LeRobot, Robosuite, and Hugging Face datasets for cross-framework training.
Who is using the robotics API
Early users span a surprising range:
- University research groups evaluating new VLA policies on standard benchmarks.
- Student robotics clubs training policies before buying hardware.
- Startups building manipulation skills for pick-and-place robots.
- Course instructors providing reproducible environments for robot learning classes without requiring every student to have a GPU.
Getting started
The fastest path is Robotics Lab, the web interface that lets you pick a robot, a task, and a policy, and watch a simulation run in real time. Once you have a configuration you like, the lab exposes the corresponding API call for programmatic use.
For production research, the Python SDK handles batching (run hundreds of episodes in parallel), retries, and result aggregation. Using the API instead of a local MuJoCo install means your results are immediately reproducible: anyone with the call payload can reproduce your numbers exactly.