具身智能导论
一站式复习网页
覆盖 Lect01-08:从具身智能总览、机器人学基础、运动规划与控制,到视觉抓取、模仿学习、强化学习。未标记内容按课件整理;带“非课件内容”标签的部分是为了帮助第一次接触本课的同学建立理解框架。
1. 本页中未额外标记的知识点,按课件原有定义、结论、流程和术语整理;为便于复习,页面结构做了重组,但不是逐页转写。
2. 带有 非课件内容 的内容,是为了让第一次学这门课的人更容易理解,不应替代课件原文。
3. 课件 Lecture 8 的 logistics 页写了 “Scope: from Lecture 1 - Lecture 9”,但你已明确说明这是 typo,实际按 Lect01-08 复习。按你的更正
期中考试信息(来自 Lect08 Logistics: Midterm)
- Midterm (40% of the total score)
- One-page double-sided A4 cheat sheet,handwrite or print both OK
- Questions are all in English;如果某个术语课堂里已经讲过,考试时不会再额外解释
- No dictionaries or calculators are allowed
- Multiple-select questions:选错任何一个错误选项,该题记 $0$ 分;每少选一个正确选项,扣 $1$ 分;最低为 $0$ 分
- Short answer questions:Explain why and how;some questions may require mathematical derivations
复习建议 非课件内容
Lect02-04 机器人学基础
刚体变换、FK/IK、SO(3)、Euler angle / quaternion、motion planning、PD/PID,属于后续视觉与策略部分的共同语言。
Lect07-08 策略学习
state / observation / action、MDP / POMDP、BC / DAgger、reward-to-go、baseline、actor-critic、discount 和 GAE,概念之间联系紧密。
Lect05-06 视觉抓取
open-loop grasping、6D pose、ICP、continuous rotation representation、force closure、hand-eye calibration。
Lect01 总览
要把 embodied AI 和 classical AI、generalist robots、VLA、synthetic data、Sim vs Real 的大图景记清楚。
先通读 Lect01 建立全局图景,再把 Lect02-04 当成“机器人学语言层”,接着看 Lect05-06 的视觉抓取管线,最后用 Lect07-08 把“policy learning”串起来。读完每章后立即做本章内嵌题和最后的互动自测。
考点路线图 整理导图
Lect01 Overview
What is a robot? classical special-purpose robots 的局限,Embodied AI,generalist robots,VLA,synthetic data,sim-to-real。
Lect02 Robotics I
Kinematics vs Dynamics,rigid transformation,homogeneous coordinates,joint/link/DoF,FK/IK,Pieper's criterion,$SO(3)$ / $SE(3)$ 入门。
Lect03 Robotics II
rotation representations:Euler angle,gimbal lock,angle-axis,Rodrigues,quaternion,slerp,distance on $SO(3)$。
Lect04 Robotics III
motion planning,configuration space,collision checking,PRM / RRT / RRT-Connect,path vs trajectory,P / PD / PID,PD tuning。
Lect05 Vision and Grasping I
open-loop grasping,4-DoF / 6-DoF grasp,6D object pose estimation,ICP,rotation regression,FoundationPose,NOCS。
Lect06 Vision and Grasping II
grasp detection,voxel / point cloud,GS-Net,DexGraspNet 2.0,force closure,GraspNet-1Billion,camera model,PnP,AX=XB。
Lect07 Policy I
policy / state / observation / dynamics model / world model,MDP,BC,distribution drift,teleoperation,HG-DAgger,teacher policy。
Lect08 Policy II
RL,reward,sparse vs dense,online / offline,REINFORCE,reward-to-go,baselines,actor-critic,discount,GAE。
第 1 章 · Lect01 Overview
1.1 课程目标
- A frontier course on Embodied AI
- Covering basics in robotics and deep learning based vision and robotic system, from a modern perspective of Embodied AI
- To lay a solid foundation for conducting research in Embodied AI
1.2 机器人与机器人学
- A Robot is a machine capable of carrying out a complex series of actions automatically.
- Robots can be guided by an external control or attached within.
- Robots may be constructed to take on human form.
- Robotics is a branch of engineering that involves the conception, design, manufacture and operation of robots.
- The objective of the robotics field is to create intelligent machines that can assist humans in a variety of ways.
1.3 从 classical special-purpose robots 到 Embodied AI
- Car factory 中的 classical special-purpose robots:Predesign and compute the trajectory,使用时 Only replay trajectory when using the arm
- Limitation 1: time-consuming deployment
- Limitation 2: can’t flexibly handle multi-tasks
- 课件结论:Very different from human intelligence!
这一页其实在说明:传统工业机器人更像“预先编程的轨迹执行器”,而不是能在开放环境里自主感知、决策、交互的智能体。后面整门课,就是在补齐“感知-决策-控制-交互”这条链。
1.4 Embodied AI 的核心直觉
- Perception-Action Loop
- How Humans Learn: Perceive, forms hypotheses, and then take action to examine.
- Our brain makes sense of the world around us by creating and testing hypotheses about how the world works.
1.5 人类智能演化与具身性
课件给出的 human intelligence 演化关键词:
| 关键词 | 课件位置 |
|---|---|
| Walk Upright | Evolution of Human Intelligence |
| Embodied Interaction | Evolution of Human Intelligence |
| Tool Usage | Evolution of Human Intelligence |
| Language | Evolution of Human Intelligence |
1.6 未来图景:Generalist Robots
- Past & Existing: Industrial robots
- Now & Happening: Autonomous cars
- Future & Our Dream: Generalist Robots - Humanoids
- 课件对 generalist robots 的概括:Task generalists,Environment generalists
1.7 Robot brain 的二层结构
| 部分 | 课件表述 |
|---|---|
| Cerebrum | Responsible for high-level policies; Deciding what to do |
| Cerebellum | Responsible for low-level motor policies; Determining how to execute actions in a fast, accurate, and reliable manner |
To Build Generalist Robots 时,课件把 robot brain 进一步写成:
| 模块 | 内容 |
|---|---|
| Cerebrum | Perception, planning, decision making, language, etc.; Vision-Language-Action model (VLA) |
| Cerebellum | Motion control, trajectory tracking, stability, error feedback; Whole-body and whole-hand control; Learned mainly via Reinforcement Learning (RL) |
1.8 Vision-Language-Action 与数据瓶颈
- VLA:Inputs: language, vision, other proprioceptive / sensor signals; Outputs: robot actions
- Advantages: end-to-end, benefit from VLM pretraining
- Limitations: zero-shot performance lagging behind LLMs & VLMs
- Real-world teleoperation is too costly to be scalable
- Embodied foundation model pretraining may require trillions of trajectories
- The largest VLA dataset is at the scale of 1M trajectories
1.9 Simulation 的角色
- Simulation and Photorealistic Rendering: Annotation-free, Time efficient, Transferable to real world
- For problems already done in real, learning in simulation requires generalization to real.
- For problems still far to be done in real, simulation environment is a good place for exploration.
本章表达式汇总
Perception-Action Loop:Perceive $\to$ form hypotheses $\to$ take action $\to$ perceive again
VLA:Inputs = language + vision + proprioceptive / sensor signals;Outputs = robot actions
Robot brain 分层:Cerebrum $\to$ decide what to do;Cerebellum $\to$ decide how to execute actions
第 2 章 · Lect02 Robotics I
2.1 Kinematics vs. Dynamics
- Kinematics: describing the motion of bodies (position, linear/angular velocity, linear/angular acceleration, etc.)
- Kinematics does not consider how to achieve motion via force
- Dynamics models all the way from force and torque to the motion
2.2 Link / Joint / DoF
- Link: the rigid-body connected in sequence
- Joint: the connectors between links
- DoF: the number of independent parameters that define configuration
- 常见关节:Revolute (R)、Prismatic (P)
- 其他关节:Helical (H)、Spherical (S)
2.3 刚体变换与坐标变换
- 把随刚体运动的坐标系记为 body frame $\mathcal{F}_b$,观察者坐标系记为 $\mathcal{F}_s$
- Pose 的问题是:How to transform $\mathcal{F}_s$ so that it overlaps with $\mathcal{F}_b$?
- 先旋转 $R_{s\to b}$ 对齐坐标轴,再平移 $t_{s\to b}$ 对齐原点
- 点坐标关系:$p^s = R_{s\to b} p^b + t_{s\to b}$
- 对任意点:${x'}^s = R_{s\to b} x^s + t_{s\to b}$
2.4 为什么引入 homogeneous coordinates
- $(R_{s\to b}, t_{s\to b})$ 不是线性变换,因为平移项破坏了线性性
- Homogeneous coordinate for 3D space: $\tilde{x} \in \mathbb{R}^4$
- Homogeneous transformation matrix: $T_{s\to b} = \begin{bmatrix} R_{s\to b} & t_{s\to b} \\ 0 & 1 \end{bmatrix}$
- Coordinate transformation under linear form: $\tilde{x}^s = T_{s\to b} \tilde{x}^b$
2.5 齐次变换的两条规则
| 规则 | 公式 |
|---|---|
| Composition rule | $T_{3\to 1} = T_{3\to 2} T_{2\to 1}$ |
| Change of observer's frame | $T_{2\to 1} = (T_{1\to 2})^{-1}$ |
2.6 Base link, end-effector, joint space, Cartesian space
- Base link / root link:the $0$-th link of the robot;the spatial frame $\mathcal{F}_s$ is attached to it
- End-effector link:the last link;例如 gripper;frame $\mathcal{F}_e$ attached to it
- Joint space:each coordinate is a vector of joint poses
- Cartesian space:the space of rigid transformations of the end-effector by $(R_{s\to e}, t_{s\to e})$
2.7 Forward Kinematics 与 Inverse Kinematics
- Forward Kinematics:map the joint space coordinate $\theta \in \mathbb{R}^n$ to a transformation matrix $T$,即 $T_{s\to e} = f(\theta)$
- FK 是通过沿 kinematic chain 复合变换得到
- Inverse Kinematics:给定 $T_{s\to e}(\theta)$ 和目标位姿 $T_{target} \in SE(3)$,求满足 $T_{s\to e}(\theta) = T_{target}$ 的 $\theta$
- Workspace:volume swept out by the end-effector as the robot does all possible motions
2.8 IK 的难点与解法分类
- IK 可能有多个解,也可能无解
- 即便有解,也可能 require complex and expensive computations
- 分类:analytical methods 与 numerical methods
- Analytical methods:直接数学求 closed-form exact solution,只适用于 relatively simple chains
- Numerical methods:use approximation and iteration;通常更 expensive,但 far more general purpose
2.9 Pieper's criterion
1. The axes of three consecutive rotational joints intersect at a single point.
2. The axes of three consecutive rotational joints are parallel.
注意:this is only a sufficient condition but not necessary。
2.10 $SO(3)$ 与 $SE(3)$
| 空间 | 定义 | 含义 |
|---|---|---|
| $SO(3)$ | $\{R \in \mathbb{R}^{3\times 3}: \det(R)=1, RR^T=I\}$ | 3D rotations, 3 DoF |
| $SE(3)$ | $T = \begin{bmatrix}R & t \\ 0 & 1\end{bmatrix}, R\in SO(3), t\in\mathbb{R}^3$ | rigid transformations, 6 DoF |
本章公式汇总
Point transform:$p^s = R_{s\to b} p^b + t_{s\to b}$
General transform:${x'}^s = R_{s\to b} x^s + t_{s\to b}$
Homogeneous transform:$T_{s\to b}=\begin{bmatrix}R_{s\to b} & t_{s\to b} \\ 0 & 1\end{bmatrix}$
Composition:$T_{3\to1}=T_{3\to2}T_{2\to1}$
Inverse:$T_{2\to1}=(T_{1\to2})^{-1}$
Forward kinematics:$T_{s\to e}=f(\theta)$
第 3 章 · Lect03 Robotics II
3.1 Rotation 的基本性质
- A rotation preserves lengths
- Cross products are preserved by a rotation
- 因此 rotation matrices 满足:$RR^T = R^TR = I$,$\det(R)=1$
3.2 为什么 rotation matrix 不理想
- Efficiency:rotation has $3$ DoF, however a rotation matrix needs $9$ values
- Numerical stability:需要 maintain orthogonality
3.3 Euler angle
- Euler angles are three angles introduced by Leonhard Euler to describe the orientation of a rigid body with respect to a fixed coordinate system.
- Lecture 3 给出的组合:$R = R_z(\gamma) R_y(\beta) R_x(\alpha)$
- 优点:good interpretability
3.4 Gimbal lock 与 Euler angle 的局限
结论:Euler angle can parameterize every rotation, but it is not a unique representation at some points,并且存在一些点,target space 的变化不能通过 source space 的变化实现。
3.5 Angle-axis 与 Euler's theorem
- Any rotation is equivalent to a rotation about a fixed axis $\hat{\omega}\in\mathbb{R}^3$ with $\|\hat{\omega}\|=1$ through a positive angle $\theta$
- $\hat{\omega}$ 是 unit vector of rotation axis
- $\theta$ 是 angle of rotation
- $R \in SO(3) := \mathrm{Rot}(\hat{\omega}, \theta)$
3.6 Rodrigues formula 与 exponential coordinate
- Skew-symmetric matrix:若 $A = -A^T$,则 $A$ 是 skew-symmetric
- Cross product 可写成线性形式:$a \times b = [a]b$
- Rodrigues formula:$e^{[\omega]\theta} = I + [\omega]\sin\theta + [\omega]^2(1-\cos\theta)$
- $\vec{\theta} = \hat{\omega}\theta$ 也叫 rotation vector,或 exponential coordinate
3.7 angle-axis 参数化的非唯一性
- $(\hat{\omega}, \theta)$ 和 $(-\hat{\omega}, -\theta)$ 给出同一个 rotation
- 当 $R=I$ 时,$\theta=0$,轴可以 arbitrary
- $(\hat{\omega}, \pi)$ 和 $(-\hat{\omega}, \pi)$ 给出同一个 rotation
- 若约束 $\theta\in(0,\pi)$,则 unique parameterization exists
3.8 Quaternion
- Quaternion 是 generalized complex number:$q = w + x\mathbf{i} + y\mathbf{j} + z\mathbf{k}$
- Vector form:$q = (w, \vec{v})$
- Unit quaternion 可以 represent a rotation
- Four numbers plus one constraint $\Rightarrow 3$ DoF
- Rotate a vector 的形式:先把 $\vec{x}$ augment 成 $x=(0,\vec{x})$,再做 $x' = qxq^*$
- Compose rotations by quaternion:just multiply quaternions
- Each rotation corresponds to two quaternions(double-covering)
3.9 Quaternion vs rotation matrix
| 方面 | Quaternion | Rotation Matrix |
|---|---|---|
| Storage | 4 floating-point values | 9 values |
| Multiplication | 16 multiplications + 12 additions | 27 multiplications + 18 additions |
| Numerical stability | normalize 后可 maintain unit magnitude | successive operations may violate orthogonality |
3.10 SLERP 与 rotation distance
- Why not linear interpolation? 因为需要 normalized,而且 does not have a constant rate of rotation
- Spherical linear interpolation (SLERP):shortest path between two points on sphere,traverses a great arc on the sphere of unit quaternions
- 具有 uniform angular rotation velocity about a fixed axis
- Rotation distance:$\mathrm{dist}(R_1, R_2) = \arccos\left(\frac{\mathrm{tr}(R_2 R_1^T)-1}{2}\right)$
3.11 课件最后的使用建议
| 表示 | 课件建议用途 |
|---|---|
| rotation matrices | define concepts |
| Euler angles | visualize rotations |
| angle-axis | visualize rotations and calculate derivatives |
| quaternion | write fast codes |
本章公式汇总
$SO(3)$:$\{R \in \mathbb{R}^{3\times 3}: RR^T=I, \det(R)=1\}$
Euler composition:$R = R_z(\gamma)R_y(\beta)R_x(\alpha)$
Rodrigues:$e^{[\omega]\theta}=I+[\omega]\sin\theta+[\omega]^2(1-\cos\theta)$
Quaternion rotation:$x' = qxq^*$,其中 $x=(0,\vec{x})$
Rotation distance:$\mathrm{dist}(R_1,R_2)=\arccos\left(\frac{\mathrm{tr}(R_2R_1^T)-1}{2}\right)$
第 4 章 · Lect04 Robotics III
4.1 Robotics stack 的大图
| 层 | 课件关键词 | 核心问题 |
|---|---|---|
| Goal | task / target | 要做什么 |
| Motion Planning | collision-free search | what motion is collision-free and geometrically feasible? |
| Trajectory | path / waypoints | 如何给路径加时间 |
| Control | track & stabilize | how do we track that motion reliably in the real system? |
| Robot | execution | 真实执行 |
4.2 Motion planning 的问题定义
- 问题:From one robot pose to another robot pose, how robot move safely in the world?
- Input: a start state $q_{start}$, a goal state $q_{goal}$, and the collision-free space $C_{free}$
- Output: a feasible path connecting the start to the goal (not actions!)
- Core: Motion planning is a search problem in a high-dimensional constrained space
4.3 Workspace vs Configuration space
| 空间 | 课件描述 |
|---|---|
| Workspace | Real 3D world; obstacle, robot, shelf, objects |
| Configuration space | One point correspond to one joint configuration |
- Planning is usually performed in configuration space
- The robot is not a point. Its shape, joints, and limits must be considered.
- Collision constraints depend on the full robot configuration
- A straight line connecting the start to the goal may get collision
4.4 Collision checking
- Whether a configuration $q$ is in $C_{free}$? Run collision check!
- Collision checking is called repeatedly inside planning algorithms
- Need to be very fast and maintain accuracy
- However, accurate collision check is very slow
- Visual mesh for rendering $\neq$ Collision mesh for collision check
- Collision mesh of each link is defined in URDF,通常是 geometry approximation
- Convex-convex collision checking is usually very fast on CPU
4.5 Grid-based vs sample-based planning
- Grid-based:discretize the whole configuration space,compute $C_{free}$,用 search algorithm,如 A*
- 缺点:too slow for high-dimensional space
- Sample-based algorithms:PRM, RRT, RRT-Connect 等
4.6 PRM / RRT / RRT-Connect / Shortcutting
| 方法 | 课件信息 |
|---|---|
| PRM | Probabilistic Roadmap;asymptotically optimal but requires massive sampling |
| RRT | Rapidly-exploring Random Tree |
| RRT-Connect | single-query path planning 的经典算法 |
| Shortcutting | very useful post-processing heuristic;fast local optimization;改善 jerky, unnatural paths |
这一段要记住的不是每个算法的伪代码,而是三件事:第一,planning 本质上是在高维受限空间里找路;第二,sample-based 方法更适合高维;第三,planner 输出的 path 通常并不“可直接执行”,还需要 trajectory 和 control。
4.7 OMPL
- OMPL = Open Motion Planning Library
- Default planning library in ROS-MoveIt
- Geometric planners:只考虑 geometric and kinematic constraints;假设 any feasible path can be turned into a dynamically feasible trajectory
- Control-based planners:若系统 subject to differential constraints,则用 state propagation 而不是 simple interpolation
4.8 From path to trajectory
- Path (Geometry): tells us where to go
- Trajectory (Time): tells us when to be there and how fast to move
- Path 表示为 $q(s)$,其中 $s\in[0,1]$
- Time parameterization: $s=s(t) \Rightarrow q(t)=q(s(t))$
- 真实机器人有关节速度和加速度上限,因此 geometric path 本身不够
- 课件例子:Minimum Jerk Trajectory 使用 $5$ 阶多项式,保证 smooth starts and stops,minimizes jerk
4.9 Control system
- A control system regulates a system's behavior to follow a reference trajectory $x_{ref}(t), \dot{x}_{ref}(t)$, despite disturbances and uncertainties
- Main components:Sensor, Controller, Environment/System, Actuator
- Open loop (feedforward):control signal based only on reference
- Closed-loop (feedback):controller uses the error to adjust the control signal
- Closed-loop 优点:robust to model uncertainties,rejects disturbances and noise,accurately tracks reference signals
4.10 Error metrics
- Tracking error: $x_e = x_{ref} - x$
- Steady-state error:$e_{ss}$
- Transient response metrics:rise time,settling time,overshoot
- 目标:small tracking error,small $e_{ss}$,small settling time,minimal oscillation
4.11 P / PD / PID
| 控制器 | 课件核心表述 |
|---|---|
| P | Serve as a virtual spring;$u = K_p x_e$ |
| PD | D term is velocity feedback term as virtual damping;accounts for future behavior / trend;attempts to reduce overshoot |
| PID | I term is error accumulation term to eliminate steady-state offset;reacts to past behavior / history |
- P only:small $K_p$ 时 slow response / large tracking error;large $K_p$ 时 fast but oscillatory
- PD:$K_p$ 决定 action 强度;$K_d$ 决定 damping
- PID:I term can eliminate steady-state error
4.12 为什么现代机器人里常偏好 PD 而不是纯 PID
- PD is often sufficient for tracking mechanical systems, simpler to tune, and usually more robust in real hardware
- In many robots, the main challenge is fast and stable motion, not eliminating tiny steady-state errors
- In contact-rich tasks, integral action may lead to unwanted force buildup
- To reduce or eliminate steady-state error, modern robots often combine PD control with model-based compensation, gravity compensation, and feedforward torque
4.13 PD tuning
- Second-order system 由 natural frequency $\omega_n$ 和 damping ratio $\xi$ 描述
- $\xi > 1$ overdamped;$\xi < 1$ underdamped;$\xi = 1$ critically damped
- 课件给出 matching terms 的结论:更大 $\omega_n$ 对应更大 P gain;更大 damping 对应更大 D gain
- 这通常用于 heuristically design a good initial set of PD parameters,之后 still needs finetuning
4.14 Multi-DoF system 的经验规则
- Rule of thumb:apply the single-joint tuning method to each joint
- Humanoid 中 ankle joints require more compliance
- Higher D gain 虽可 reduce oscillation,但也会 amplify noise from velocity sensors
- All PD tuning must respect hardware torque limits
- Robotic arms 因为建模更准确、环境更可控,还可用 feedback linearization 等 model-based methods
本章公式汇总
Path:$q(s), s\in[0,1]$
Time parameterization:$s=s(t) \Rightarrow q(t)=q(s(t))$
Tracking error:$x_e=x_{ref}-x$
P control:$u=K_p x_e$
PD control:$u=K_p x_e + K_d \dot{x}_e$
PID control:$u=K_p x_e + K_i \int x_e dt + K_d \dot{x}_e$
Standard second-order form:$\ddot{x}+2\xi\omega_n\dot{x}+\omega_n^2 x=0$
第 5 章 · Lect05 Vision and Grasping I
5.1 从机器人学到抓取流水线
- Kinematics:conversion between Cartesian space and joint space
- Motion Planning:get a geometrically valid path
- Control:ensures the motion follow the planned trajectory
- If we can provide where to move, a.k.a. end-effector pose in Cartesian space, this already forms a pipeline to grasp objects.
5.2 Grasping 与 grasp pose
- Grasping:restraining an object's motion in a desired way by applying forces and torques at a set of contacts
- Grasp Synthesis:high-dimensional search or optimization problem to find gripper poses or joint configurations
- Grasp Pose defines the position, orientation and articulation of a hand
- 4-DoF grasp:a 3D position and 1D hand orientation aligned with the direction of gravity,a.k.a. top-down grasping
- 6-DoF grasp:a 3D position and 3D orientation
5.3 Open-loop grasping 的两条路线
| 路线 | 课件描述 |
|---|---|
| Path I | For known objects with labeled grasps: 6D object pose estimation $\to$ further get grasping pose from object pose $\to$ motion planning |
| Path II | For unknown and general objects: directly predict grasping pose $\to$ motion planning |
5.4 6D object pose
- Definition:6D transformation from object to camera space
- 包含 3DoF translation 和 3DoF rotation
- Instance-level 6D pose estimation:small set of known instances;pose defined according to their CAD model;input 可为 RGB / RGBD;输出 object pose(s)
5.5 PoseCNN
- 课件用 PoseCNN 作为 instance-level 6D pose estimation 的代表方法
- 其中 rotation estimation 页明确写了:Regress quaternion
5.6 ICP
- ICP is a method for point cloud registration
- 输入两组 point clouds,求 $R$ 和 $T$ 使一组点云 align to the other as closely as possible
- 步骤:make data centered,find correspondences,solve constrained orthogonal Procrustes,obtain translation,iterate
- termination condition:变换小于阈值 / loss 变化小于阈值 / 达到最大迭代次数
| ICP 优点 | ICP 缺点 |
|---|---|
| Simple;no need for segmentation or feature extraction;good initialization 下 accuracy and convergence decent | finding correspondences cost high;只考虑 point-to-point distances;highly dependent on the accuracy of the initial estimate |
5.7 Rotation regression 的表示问题
- 3D rotation only has $3$ DoF,但 rotation matrix 有 $9$ 个元素,makes neural network harder to predict
- 可选表示:Euler angle、axis angle、quaternion
- 这些表示 often suffer from singularities and discontinuities
- Euler angles discontinuous;axis-angle 在 $\theta=0$ 与 $\theta=\pi$ 附近有问题;quaternion 有 double coverage
5.8 Continuous rotation representations
- 6D representation:simply eliminates the last column of rotation matrix
- 再用 Gram-Schmidt orthogonalization 把网络输出映射到 $SO(3)$
- 9D representation:对应 rotation matrix,再用 SVD orthogonalization 映射到 $SO(3)$
5.9 FoundationPose
- FoundationPose 是 a unified foundation model for 6D object pose estimation and tracking
- supports both model-based and model-free setups
- 初始化流程:先检测对象,再用 bbox center 对应的 $3D$ 点和 median depth 初始化 translation,再从 icosphere 采样 viewpoints 初始化 rotations
- Pose conditioned input crop:用 coarse pose 的渲染和输入 crop 输出 refined pose
5.10 Category-level 6D pose estimation 与 NOCS
- 目标:estimate 6D pose and 3D size of previously unseen objects from certain categories
- 关键是 NOCS = Normalized Object Coordinate Space
- 三步 normalization:rotation normalization,translation normalization,scale normalization
- Category-level pose 是 transformation from NOCS to camera space
- From image to NOCS map to pose:predict NOCS map,结合 depth backproject,再做 pose fitting with RANSAC
本章公式汇总
6D object pose:3DoF translation + 3DoF rotation
ICP objective:求 $(R,T)$ 使 source point cloud 尽量对齐 target point cloud
Quaternion regression:用 quaternion 预测 rotation
6D rotation representation:取 rotation matrix 前两列,再用 Gram-Schmidt orthogonalization 映射到 $SO(3)$
Category-level pose:transformation from NOCS to camera space
Lecture 5 可以概括成一句话:先决定“抓哪里”,再决定“手以什么位姿去抓”,其中 6D object pose estimation 是 known objects 路线里的核心中间变量。
第 6 章 · Lect06 Vision and Grasping II
6.1 MuJoCo
- MuJoCo = Multi-Joint dynamics with Contact
- general-purpose physics engine designed for fast and accurate simulation of articulated structures interacting with their environments
6.2 两种 grasp 模型形式
| 形式 | 课件描述 |
|---|---|
| Grasp detection | detect multiple grasp poses from observations |
| Conditional grasp generation | condition: observation; output: grasp poses |
课件特别说明:Due to multi-modal grasp distribution, formulate grasping as a detection problem.
6.3 Visual input representation
| 表示 | 课件关键词 | 特点 |
|---|---|---|
| Voxel Grids | VGN | Explicit geometry;limited by volume resolution |
| Point Cloud | PointNet / PointNet++ / GraspNet-baseline | Explicit geometry;less memory cost;higher resolution |
6.4 抓取评估指标
- Success Rate:the ratio of successful grasp executions
- Percent cleared:the percentage of objects removed during each round
- Planning time:the time between receiving input and returning grasps
6.5 GS-Net
- GS-Net 把 grasp pose detection in the wild 视作 two-stage problem
- Where stage:locations with high graspability
- How stage:decide grasp parameters,例如 in-plane rotation, approaching depth, gripper width, grasp score
- graspness:a novel geometrically based quality for distinguishing graspable area in cluttered scenes
- graspness 可分为 point-wise 与 view-wise graspness scores
- “success rate” 的计算依赖 force closure 和 collision checking
6.6 DexGraspNet 2.0
- DexGraspNet 2.0: Diffusion-based Dexterous Grasp Generation
- Stage 1: Where to grasp,predict graspness and objectness scores
- Stage 2: How to grasp,predict residual position、rotation、finger joint angles
- 为处理 multi-modal grasping pose distribution,课件写明使用 diffusion model
- DexGraspNet 2.0 contains 7600 training scenes with 426 million grasps
6.7 Force closure 与 form closure
- Force Closure:如果 grasp 在一组 frictional contacts 上施加的力足以 compensate any external wrench applied to the object
- 物理表述:the positive span of the wrench cones is the entire wrench space
- Form closure:若 rigid body 被 rigid stationary fixtures fully immobilized
- When planning a grasp by a robot hand, force closure is a good minimum requirement
- Form closure is usually too strict, requiring too many contacts
- 课件给出的关系:successful grasp $\le$ force closure $\le$ form closure
6.8 数据集
| 类型 | 数据集 | 课件说明 |
|---|---|---|
| Object dataset | ShapeNet / ModelNet / Objaverse-XL | Objaverse-XL: 10M+ 3D Objects |
| Synthetic grasp dataset | ACRONYM | with grasping annotation |
| Real grasp dataset | GraspNet-1Billion | with grasping annotation |
GraspNet-1Billion 的标注流程:sample grasp point from point cloud,再采样 grasp view / in-plane rotation / gripper depth 并评估,最后用 object 6D pose 投影到 scene,且做 collision detection。
6.9 Hand-eye calibration 的必要性
- Grasp poses 最初是在 camera space 里估计的
- 机器人要执行,就必须 transform poses to the robot space
- 因此需要知道 camera 和 robot 之间的 transformation,即 hand-eye calibration
- 定义:determining the precise geometric relationship (transformation matrix) between a robot's coordinate system and a camera's coordinate system
6.10 Camera model 与 calibration
- Agenda:camera model: intrinsics and extrinsics
- Goal of calibration:estimate camera intrinsics and extrinsic from one or multiple images
- 若 intrinsics $K$ 已知或已标定,则可进一步 recover pose
- Perspective-n-Point (PnP) 是课件给出的 camera calibration approach
6.11 Eye-in-hand vs Eye-to-hand
| 方式 | 安装位置 | 要求求解的关系 |
|---|---|---|
| Eye-in-hand | camera mounted on the robot | camera coordinate $\to$ end-effector coordinate |
| Eye-to-hand | camera mounted stationary next to the robot | camera coordinate $\to$ robot base coordinate |
6.12 AX = XB
在 eye-in-hand workflow 中:
$A$:end-effector pose change
$B$:camera pose change(由 marker pose 计算)
$X$:end-effector to camera
6.13 Eye-in-hand workflow
- Rigidly attach the camera to the robot's end-effector
- Fix a calibration target on a flat surface within workspace
- Move the robot to 10-30 different poses where camera clearly sees the board
- 每个 pose 记录 end-effector pose,并通过 solvePnP 计算 target-to-camera transformation
- Best practices:orientation 要显著变化;cover different heights and tilt angles;avoid singular configurations;board 必须 fully visible
6.14 Validation
- Reprojection Error:a low pixel error (usually $<1$ pixel) indicates a successful calibration
- TCP Touch Test:如果 robot accurately touches the point,则 calibration physically verified
- Eye-to-hand 对应的验证包括 hand-to-pixel test:virtual dot overlays perfectly with the physical gripper
6.15 Depth sensing problem
Lecture 6 最后用 transparent / specular objects 的例子展示了 depth sensing problem。
本章公式汇总
Force closure:positive span of the wrench cones = entire wrench space
Grasp quality relation:successful grasp $\le$ force closure $\le$ form closure
Hand-eye equation:$AX = XB$
Camera model:intrinsics + extrinsics
PnP setting:已知内参和对应关系时恢复 pose
第 7 章 · Lect07 Policy I
7.1 Policy 的基本定义
- A policy is an end-to-end mapping: state $\to$ action
- Stochastic:$a \sim \pi(a|s)$
- Deterministic:$a = \pi(s)$
7.2 State
- In Embodied AI, state $s_t$ at time step $t$ includes a complete description of the environment
- Contains all information needed to predict the future
- Usually not fully accessible
- Under true state, the system is Markovian
- Markov property:$P(s_{t+1}|s_t,a_t,s_{t-1},a_{t-1},\ldots)=P(s_{t+1}|s_t,a_t)$
7.3 Dynamics model 与 world model
- $P(s_{t+1}|s_t,a_t)$ 叫 dynamics model 或 transition model
- 描述 world evolves,predict next state given current state and action
- 有时 people also call it a world model
- world model 通常是 learned 的,并可能 additionally come with reward model $r(s,a)$ 和 observation model $P(o|s)$
7.4 MDP
- MDP:a framework for sequential decision making under uncertainty
- 每个 time step:observe current state $s_t$,take action $a_t\sim\pi(a_t|s_t)$,environment transitions to $s_{t+1}\sim p(s_{t+1}|s_t,a_t)$
- Markov property:next state depends only on current state and action
7.5 Observation
- Observation 是 agent 实际从 sensors 收到的量
- 包括 Exteroceptive information:vision, depth, tactile sensing, audio
- 也包括 Proprioceptive information:joint angles, velocities, torques, motor states
- Observations are typically partial, noisy and ambiguous
- The same observation may correspond to different underlying states
- Observations are often non-Markov, while states are defined to be Markov
7.6 State-based vs observation-based policy
- 实践里 policy 往往 operate on observations rather than true states
- 一般定义:a realistic robotic policy can be $a\sim\pi(a|o,l)$
- 其中 $o$ 是 observation,$l$ 是 language or task instruction
- State-based policy $a\sim\pi(a|s)$ 通常存在于 simulator,或 state 被其他算法估计出来
7.7 IL vs RL
| 方法 | 前提 | 关键思想 |
|---|---|---|
| Imitation Learning | access to an expert | learn to mimic expert behavior directly from data |
| Reinforcement Learning | no expert available | learn what to do by evaluating the consequences of actions |
7.8 Behavior Cloning
- BC 把 policy learning 当作 supervised learning
- input: observation (or state)
- output: action
- For deterministic policy,usually adopt MSE loss
- 最小化 MSE 等价于 Gaussian policy 下的 maximum likelihood(fixed covariance)
7.9 BC 的核心问题:distribution drift / mismatch
因而会发生:Small mistake $\to$ new unseen state $\to$ larger mistake $\to$ OOD state,最终 errors accumulate over time。
7.10 Embodied AI 中的数据采集
- ALOHA-style master-slave teleoperation:human operator controls a master arm, kinematically coupled to a slave robot arm
- 可记录 observation $o_t$ 和 action $a_t$,形成 teleoperation dataset $\mathcal{D} = \{(o_t,a_t)\}_{t=1}^T$
- Observation 可含 visual observation 与 proprioception
- Action 可是 joint-space target、task-space pose、或 very common 的 $\Delta x_t$
7.11 DAgger 与 HG-DAgger
- Original DAgger 的问题:对于 $6$-DoF/$7$-DoF robotic arm 甚至 humanoid,人工给每个状态标动作非常困难
- HG-DAgger:instead of labeling actions offline, the human intervenes during execution
- 流程:run policy,human monitors execution,when policy makes mistake human takes over,record corrected data,aggregate and retrain
- 优点:avoids manual action labeling;only labels when necessary;more data-efficient and practical
7.12 On-policy distillation
- Key idea:replace human labeling with a teacher policy
- Run student policy and collect states visited by the student
- Query teacher policy for labels
- Train student for one gradient step (no more than one to maintain pure on-policy)
- Teacher policy 例子:privileged state-based policy;或 motion planner 直接给 action label
本章公式汇总
Stochastic policy:$a \sim \pi(a|s)$
Deterministic policy:$a = \pi(s)$
Markov property:$P(s_{t+1}|s_t,a_t,s_{t-1},a_{t-1},\ldots)=P(s_{t+1}|s_t,a_t)$
Observation-based policy:$\pi_\theta(a_t|o_t)$
General robotic policy:$a \sim \pi(a|o,l)$
Teleoperation dataset:$\mathcal{D}=\{(o_t,a_t)\}_{t=1}^T$
第 8 章 · Lect08 Policy II
8.1 RL 的基本定义
- RL = Learning a policy by interacting with an environment
- Objective:maximize cumulative reward
- No need for expert demonstrations
8.2 Reward function
- Key property:Local & myopic,only reflects instant outcome
- Can be sparse or dense
- Sparse:reward only at success/failure,reward may be delayed,requires credit assignment
- Dense:frequent feedback shaping behavior
8.3 POMDP
- Lecture 8 在 MDP 基础上引入 POMDP(Partially Observed Markov Decision Process)
- 对视觉 policy 来说,$s_t$ 往往 unavailable,只能 observe $o_t$
8.4 Online RL, on-policy, off-policy, offline policy learning
| 概念 | 课件表述 |
|---|---|
| Online RL | allows interaction with the environment while doing RL |
| On-Policy RL | train a policy using experiences collected from the most recent policy |
| Off-Policy RL | use data collected throughout training and stored in buffer $D$;more sample efficient |
| Behavior cloning | learning a policy via imitating expert demonstration;no need of reward;not RL |
| Offline RL | collect data from any policy, store in $D$,then no further interaction;need reward;it is one type of RL |
8.5 Monte Carlo approximation
- Sample $N$ trajectories and approximate an expectation using sample averages
- Replace intractable expectation with empirical mean
- 这是 Law of Large Numbers 的应用
- 是 true expectation 的 unbiased estimator
8.6 REINFORCE 的特性
- Given a regular size of samples, policy gradient from REINFORCE is very noisy
- high variance
- still unbiased
8.7 Policy gradient in POMDP
- For visual policy, replace $s_t$ by $o_t$
- 课件原句:We can use policy gradient in POMDPs by simply modifying $s_t \to o_t$.
- policy 变成 $\pi_\theta(a_t|o_t)$
8.8 Causality 与 reward-to-go
因此对每个 time step $t$,可以用 reward-to-go 替代整个 episode 的 total return,从而降低方差。
8.9 Baseline
- 课件结论:subtracting a baseline is unbiased in expectation
- average reward is not the best baseline, but it's pretty good
- 引入 baseline 的主要目标是 reduce variance
8.10 Actor-critic
| 部分 | 作用 |
|---|---|
| Actor | the policy |
| Critic | value function |
- Actor-critic 通过 critic 来 estimate return,从而 reduce variance of policy gradient
- Batch actor-critic:trajectory-based gradient evaluation
- Online actor-critic:transition-based gradient evaluation
8.11 Discount factor $\gamma$
- higher $\gamma$ means considering a longer future
- smaller $\gamma$ focuses more on immediate rewards and transitions
- 课件后面还强调:discount = variance reduction
8.12 N-step returns 与 GAE
- n-step return:single parameter knob $(n)$ that balances bias and variance by deciding how long you trust the real trajectory before bootstrapping
- GAE:weighted combination of n-step advantage / n-step returns
- 课件直观表述:Mostly prefer cutting earlier (less variance),用 exponential falloff 加权
- Typical choices in on-policy methods:$\gamma \approx 0.99, \lambda \approx 0.95$
8.13 课件最后的 RL 总结
- Actor-critic algorithms:reduce variance of policy gradient
- Policy evaluation:fitting value function to policy
- Discount factors:既是 temporal horizon,也可看作 variance reduction trick
- Actor-critic design:one network with two heads or two networks;batch-mode or online (+ parallel)
- State-dependent baselines:another way to use the critic;可与 n-step returns 或 GAE 结合
本章公式汇总
Reward function:$r(s,a) \in \mathbb{R}$
Policy in POMDP:$\pi_\theta(a_t|o_t)$
Reward-to-go:$G_t = \sum_{t'=t}^{T} r_{t'}$
Q-function:reward-to-go 在给定 $(s_t,a_t)$ 条件下的期望
State-based baseline:$V(s_t)$
Typical GAE hyperparameters:$\gamma \approx 0.99, \lambda \approx 0.95$
按课件思路,把总回报拆成两部分:过去奖励和未来奖励。对于时间步 $t$ 的策略梯度项,当前动作 $a_t$ 不可能影响已经发生的过去奖励,因此过去那部分在期望里贡献为 $0$。
所以,和当前动作真正相关的只剩下从 $t$ 开始往后的那段 return,也就是 reward-to-go。
这一步的收益是:删掉了与当前动作无关、但会增加噪声的过去奖励项,因此估计方差下降。 非课件内容
第 11 章 · 互动自测
具身智能导论期中自测
第 12 章 · 考前速查表
Lecture 1-4:机器人学基础
Rigid transform:$p^s = R_{s\to b} p^b + t_{s\to b}$
Homogeneous transform:$T = \begin{bmatrix}R & t \\ 0 & 1\end{bmatrix}$
Composition:$T_{3\to1}=T_{3\to2}T_{2\to1}$
Inverse:$T_{2\to1}=(T_{1\to2})^{-1}$
Forward kinematics:$T_{s\to e}=f(\theta)$
Rodrigues:$e^{[\omega]\theta}=I+[\omega]\sin\theta+[\omega]^2(1-\cos\theta)$
Rotation distance:$\mathrm{dist}(R_1,R_2)=\arccos\left(\frac{\mathrm{tr}(R_2R_1^T)-1}{2}\right)$
Lecture 4:控制
Tracking error:$x_e = x_{ref} - x$
P control:$u = K_p x_e$
PD control:$u = K_p x_e + K_d \dot{x}_e$
PID control:$u = K_p x_e + K_i \int x_e dt + K_d \dot{x}_e$
$K_p$:increase speed, reduce steady-state error, but may increase overshoot
$K_d$:acts as damping / brake, reduce overshoot and shorten settling time
Lecture 5-6:视觉抓取
4-DoF grasp:3D position + 1D hand orientation aligned with gravity
6-DoF grasp:3D position + 3D orientation
6D object pose:object to camera space 的 6D transformation
ICP:point cloud registration,依赖 good initialization
Force closure:good minimum requirement for grasp planning
Hand-eye calibration:求 camera 与 robot coordinate systems 的 precise geometric relationship
Hand-eye equation:$AX = XB$
Lecture 7-8:策略学习
Policy:state / observation $\to$ action
Markov property:$P(s_{t+1}|s_t,a_t,\ldots)=P(s_{t+1}|s_t,a_t)$
Observation:partial, noisy, ambiguous;often non-Markov
BC 问题:distribution mismatch / error compounding
Reward:scalar immediate feedback;can be sparse or dense
REINFORCE:high variance, still unbiased
reward-to-go:来自 causality,降低方差
baseline:subtracting a baseline is unbiased in expectation
Actor-critic:actor = policy,critic = value function
GAE 常见参数:$\gamma \approx 0.99, \lambda \approx 0.95$
1. 能否不用看笔记说清楚 state、observation、action、reward、policy 的区别。
2. 能否写出 homogeneous transform、Rodrigues、$AX=XB$、P/PD/PID。
3. 能否解释 BC 为什么会 drift,reward-to-go 为什么能降方差,actor-critic 为什么比纯 Monte Carlo 更稳。
4. 能否把 “vision output in camera space” 和 “robot execution in robot space” 之间为何必须 hand-eye calibration 说清楚。
5. 能否口头串起整门课:Embodiment $\to$ robotics basics $\to$ grasping $\to$ policy learning。