Foundations
Everything in robotics rests on three observations: rigid bodies move, sensors observe imperfectly, and actuators are bandwidth- limited. This page is a tight refresher on the vocabulary every Midcore screen assumes you know.
Rigid bodies and coordinate frames
A rigid body is an idealised object whose internal distances never change. A robot is a tree (or graph) of rigid bodies called links connected by joints. The position and orientation of a link is its pose — three numbers for translation in metres, plus three more (Euler angles, rotation vector, or four for a unit quaternion) for orientation.
Every pose is expressed relative to some frame. Common frames you will see in Midcore:
| Frame | Origin | Used for |
|---|---|---|
| World | A fixed point in the workspace (usually the floor under the robot base). | Mission planning, multi-robot coordination. |
| Robot base | The first non-moving link of the robot. | Reachability, base-relative obstacles. |
| Arm base | The first link of an arm (right at the shoulder mount). | End-effector poses for τ₀-WM and most VLA models. |
| End-effector (EE) | The gripper tip or tool centre point (TCP). | Grasps, tool offsets, calibration. |
| Camera | The optical centre of an image sensor. | Vision-based perception, eye-in-hand control. |
Frame discipline saves debugging
Three ways to write a rotation
- Euler angles (XYZ, ZYX, ...) — three numbers. Human-readable, but suffers from gimbal lock when two axes align. Not safe for interpolation across configurations.
- Quaternions (xyzw) — four numbers on the unit 3-sphere. No gimbal lock, smooth interpolation via SLERP, but represent each rotation twice (
qand-q) which can confuse loss functions during training. - 6D continuous rotation (Zhou et al., CVPR 2019) — the first two columns of the 3 × 3 rotation matrix, flattened. Six numbers, continuous everywhere, the de-facto choice for neural network rotation outputs. τ₀-WM’s policy head emits 6D internally; the wire format converts to quaternion at the boundary because it’s easier to inspect.
Forward and inverse kinematics
Given joint angles q = [q₁, q₂, …], forward kinematics (FK) gives you the resulting end-effector pose. Always solvable, fast, unique.
Going the other way is harder. Inverse kinematics (IK) asks: given a target end-effector pose, what joint angles get me there? For 6-DOF arms there are usually up to 8 valid solutions; for 7-DOF arms there is a continuous self-motion manifold. IK solvers either pick a closed-form solution (fast, brittle) or run a numerical Jacobian iteration (slower, more flexible).
Important: a model like τ₀-WM does not solve IK explicitly. It learns the joint → pose → image relationship directly from data, then emits pose targets that you (or a downstream controller) realise via IK. This separation is why a single VLA generalises across embodiments: it commits to a pose, not a joint vector.
A note on dynamics
Kinematics describes where; dynamics describes how forces produce motion. The full robot dynamics equation:
M(q) q̈ + C(q, q̇) q̇ + g(q) = τ + Jᵀ Fₑₓₜ
...says the joint torques τ plus any external force on the end-effector Fₑₓₜ accelerate the robot through its inertia M(q), fighting Coriolis C(q, q̇) q̇ and gravity g(q).
Midcore’s Designer ships a Rapier-based physics preview that computes link world boxes, centre of mass, and the inertia ellipsoid live as you edit the robot. You don’t have to solve dynamics by hand — but you do have to understand that a 30 kg arm can’t reverse direction in one frame, and a 100 N grasp won’t hold a 200 N pull.
Sensors
| Sensor | What it returns | Where it shows up in Midcore |
|---|---|---|
| RGB camera | H × W × 3 pixel grid. τ₀-WM’s default is 192 × 256. | Fed straight into the policy’s vision encoder. |
| Depth / RGB-D | Per-pixel distance, often aligned to RGB. | Used for collision-aware planning + twin-state geometry. |
| IMU | Linear acceleration + angular velocity in the sensor frame. | Inertial state fusion; appears in Designer’s sensor list. |
| Force / torque | 3 forces + 3 torques at the sensor (usually wrist-mounted). | Compliance + contact-rich manipulation; surfaced via the safety tile. |
| Joint encoders | Per-joint position (always) + velocity (often). | Forms the state vector you send to the policy. |
| Tactile | High-density local force / shear across the gripper fingertip. | Future; τ₀-WM’s authors flag this as a known limitation. |
Actuators
- DC motors with planetary or harmonic-drive gearing — the default for industrial arms. Encoded position, position-velocity-current control loops.
- Series-elastic actuators (SEA) — a spring between motor and load lets you measure (and control) torque cheaply. Common in compliance-critical platforms.
- Parallel grippers — two fingers driven by a single linear stage. State reduces to a one-dimensional opening (0 = open, 120 = closed in the τ₀-WM observation space).
- Multi-fingered hands — 5+ DOF, much richer state. Out of scope for τ₀-WM today; targeted by humanoid-focused successor models.
Manipulator and mobile taxonomies
The robotics literature carves the field into platform classes because the engineering tradeoffs differ wildly. Midcore tracks the same split:
| Class | Typical DOF | Examples (Designer templates) |
|---|---|---|
| 6-DOF manipulator | 6 | UR5e, UR10e, FANUC LR Mate, ABB IRB 1200 |
| 7-DOF cobot | 7 | KUKA LBR iiwa 14 R820, generic cobot_7dof |
| Dual-arm bimanual | 14 (7+7) or 12 (6+6) | Dual-arm Franka FR3 (τ₀-WM ready) |
| Differential drive | 2 | Generic differential_drive_mobile |
| Quadruped | 12 | Boston Dynamics Spot, Unitree Go2 |
| Hexapod | 18 | Generic hexapod walker |
| Multirotor UAV | 4 | DJI Mavic 3 Enterprise, generic quadrotor |
| Fixed-wing UAV | 4 | Generic fixed_wing_uav |
| Subsea / surface marine | 3–6 | Subsea micro-ROV, USV workboat |
| Humanoid | 20–30 | humanoid_biped, humanoid_upper_body |
For bimanual manipulation — the class τ₀-WM targets — the canonical state is the pair of EE poses plus the pair of gripper openings. That’s the 14+2 = 16-channel input the Designer’s τ₀-WM state panel renders.
Where to go from here