2026 春 · 期中考试复习

具身智能导论
一站式复习网页

覆盖 Lect01-08：从具身智能总览、机器人学基础、运动规划与控制，到视觉抓取、模仿学习、强化学习。未标记内容按课件整理；带“非课件内容”标签的部分是为了帮助第一次接触本课的同学建立理解框架。

8讲课件

30+自测题

12复习分区

1A4 双面 Cheat Sheet

使用说明
1. 本页中未额外标记的知识点，按课件原有定义、结论、流程和术语整理；为便于复习，页面结构做了重组，但不是逐页转写。
2. 带有非课件内容的内容，是为了让第一次学这门课的人更容易理解，不应替代课件原文。
3. 课件 Lecture 8 的 logistics 页写了 “Scope: from Lecture 1 - Lecture 9”，但你已明确说明这是 typo，实际按 Lect01-08 复习。按你的更正

期中考试信息（来自 Lect08 Logistics: Midterm）

Midterm (40% of the total score)
One-page double-sided A4 cheat sheet，handwrite or print both OK
Questions are all in English；如果某个术语课堂里已经讲过，考试时不会再额外解释
No dictionaries or calculators are allowed
Multiple-select questions：选错任何一个错误选项，该题记 $0$ 分；每少选一个正确选项，扣 $1$ 分；最低为 $0$ 分
Short answer questions：Explain why and how；some questions may require mathematical derivations

复习建议非课件内容

高优先级

Lect02-04 机器人学基础

刚体变换、FK/IK、SO(3)、Euler angle / quaternion、motion planning、PD/PID，属于后续视觉与策略部分的共同语言。

高优先级

Lect07-08 策略学习

state / observation / action、MDP / POMDP、BC / DAgger、reward-to-go、baseline、actor-critic、discount 和 GAE，概念之间联系紧密。

中优先级

Lect05-06 视觉抓取

open-loop grasping、6D pose、ICP、continuous rotation representation、force closure、hand-eye calibration。

中优先级

Lect01 总览

要把 embodied AI 和 classical AI、generalist robots、VLA、synthetic data、Sim vs Real 的大图景记清楚。

推荐学习顺序非课件内容
先通读 Lect01 建立全局图景，再把 Lect02-04 当成“机器人学语言层”，接着看 Lect05-06 的视觉抓取管线，最后用 Lect07-08 把“policy learning”串起来。读完每章后立即做本章内嵌题和最后的互动自测。

考点路线图整理导图

1

Lect01 Overview

What is a robot? classical special-purpose robots 的局限，Embodied AI，generalist robots，VLA，synthetic data，sim-to-real。

2

Lect02 Robotics I

Kinematics vs Dynamics，rigid transformation，homogeneous coordinates，joint/link/DoF，FK/IK，Pieper's criterion，$SO(3)$ / $SE(3)$ 入门。

3

Lect03 Robotics II

rotation representations：Euler angle，gimbal lock，angle-axis，Rodrigues，quaternion，slerp，distance on $SO(3)$。

4

Lect04 Robotics III

motion planning，configuration space，collision checking，PRM / RRT / RRT-Connect，path vs trajectory，P / PD / PID，PD tuning。

5

Lect05 Vision and Grasping I

open-loop grasping，4-DoF / 6-DoF grasp，6D object pose estimation，ICP，rotation regression，FoundationPose，NOCS。

6

Lect06 Vision and Grasping II

grasp detection，voxel / point cloud，GS-Net，DexGraspNet 2.0，force closure，GraspNet-1Billion，camera model，PnP，AX=XB。

7

Lect07 Policy I

policy / state / observation / dynamics model / world model，MDP，BC，distribution drift，teleoperation，HG-DAgger，teacher policy。

8

Lect08 Policy II

RL，reward，sparse vs dense，online / offline，REINFORCE，reward-to-go，baselines，actor-critic，discount，GAE。

第 1 章 · Lect01 Overview

1.1 课程目标

A frontier course on Embodied AI
Covering basics in robotics and deep learning based vision and robotic system, from a modern perspective of Embodied AI
To lay a solid foundation for conducting research in Embodied AI

1.2 机器人与机器人学

A Robot is a machine capable of carrying out a complex series of actions automatically.
Robots can be guided by an external control or attached within.
Robots may be constructed to take on human form.
Robotics is a branch of engineering that involves the conception, design, manufacture and operation of robots.
The objective of the robotics field is to create intelligent machines that can assist humans in a variety of ways.

1.3 从 classical special-purpose robots 到 Embodied AI

Car factory 中的 classical special-purpose robots：Predesign and compute the trajectory，使用时 Only replay trajectory when using the arm
Limitation 1: time-consuming deployment
Limitation 2: can’t flexibly handle multi-tasks
课件结论：Very different from human intelligence!

理解补充非课件内容
这一页其实在说明：传统工业机器人更像“预先编程的轨迹执行器”，而不是能在开放环境里自主感知、决策、交互的智能体。后面整门课，就是在补齐“感知-决策-控制-交互”这条链。

1.4 Embodied AI 的核心直觉

Perception-Action Loop
How Humans Learn: Perceive, forms hypotheses, and then take action to examine.
Our brain makes sense of the world around us by creating and testing hypotheses about how the world works.

1.5 人类智能演化与具身性

课件给出的 human intelligence 演化关键词：

关键词	课件位置
Walk Upright	Evolution of Human Intelligence
Embodied Interaction	Evolution of Human Intelligence
Tool Usage	Evolution of Human Intelligence
Language	Evolution of Human Intelligence

1.6 未来图景：Generalist Robots

Past & Existing: Industrial robots
Now & Happening: Autonomous cars
Future & Our Dream: Generalist Robots - Humanoids
课件对 generalist robots 的概括：Task generalists，Environment generalists

1.7 Robot brain 的二层结构

部分	课件表述
Cerebrum	Responsible for high-level policies; Deciding what to do
Cerebellum	Responsible for low-level motor policies; Determining how to execute actions in a fast, accurate, and reliable manner

To Build Generalist Robots 时，课件把 robot brain 进一步写成：

模块	内容
Cerebrum	Perception, planning, decision making, language, etc.; Vision-Language-Action model (VLA)
Cerebellum	Motion control, trajectory tracking, stability, error feedback; Whole-body and whole-hand control; Learned mainly via Reinforcement Learning (RL)

1.8 Vision-Language-Action 与数据瓶颈

VLA：Inputs: language, vision, other proprioceptive / sensor signals; Outputs: robot actions
Advantages: end-to-end, benefit from VLM pretraining
Limitations: zero-shot performance lagging behind LLMs & VLMs
Real-world teleoperation is too costly to be scalable
Embodied foundation model pretraining may require trillions of trajectories
The largest VLA dataset is at the scale of 1M trajectories

1.9 Simulation 的角色

Simulation and Photorealistic Rendering: Annotation-free, Time efficient, Transferable to real world
For problems already done in real, learning in simulation requires generalization to real.
For problems still far to be done in real, simulation environment is a good place for exploration.

本章表达式汇总

Perception-Action Loop：Perceive $\to$ form hypotheses $\to$ take action $\to$ perceive again

VLA：Inputs = language + vision + proprioceptive / sensor signals；Outputs = robot actions

Robot brain 分层：Cerebrum $\to$ decide what to do；Cerebellum $\to$ decide how to execute actions

概念辨析 · Lect01

以下哪项最符合课件对 VLA 的描述？

A. 输入是 robot actions，输出是 language

B. 输入包括 language、vision、其他 sensor signals，输出是 robot actions

C. 只处理 text，不处理 motion control

D. 只在 simulation 中使用，不能连接真实机器人

答案：B。Lecture 1 写的是 Inputs: language, vision, other proprioceptive / sensor signals；Outputs: robot actions。

Lect01

以下哪一项最符合 Lecture 1 对未来机器人方向的描述？

A. 更强的轨迹 replay 系统

B. 只做单一工位任务的工业臂

C. Generalist Robots - Humanoids

D. 只会语言推理的 disembodied AI

答案：C。课件在 “Past, Now and Future of Robotics” 中把未来写成 “Generalist Robots - Humanoids”。

第 2 章 · Lect02 Robotics I

2.1 Kinematics vs. Dynamics

Kinematics: describing the motion of bodies (position, linear/angular velocity, linear/angular acceleration, etc.)
Kinematics does not consider how to achieve motion via force
Dynamics models all the way from force and torque to the motion

2.2 Link / Joint / DoF

Link: the rigid-body connected in sequence
Joint: the connectors between links
DoF: the number of independent parameters that define configuration
常见关节：Revolute (R)、Prismatic (P)
其他关节：Helical (H)、Spherical (S)

2.3 刚体变换与坐标变换

把随刚体运动的坐标系记为 body frame $\mathcal{F}_b$，观察者坐标系记为 $\mathcal{F}_s$
Pose 的问题是：How to transform $\mathcal{F}_s$ so that it overlaps with $\mathcal{F}_b$?
先旋转 $R_{s\to b}$ 对齐坐标轴，再平移 $t_{s\to b}$ 对齐原点
点坐标关系：$p^s = R_{s\to b} p^b + t_{s\to b}$
对任意点：${x'}^s = R_{s\to b} x^s + t_{s\to b}$

2.4 为什么引入 homogeneous coordinates

$(R_{s\to b}, t_{s\to b})$ 不是线性变换，因为平移项破坏了线性性
Homogeneous coordinate for 3D space: $\tilde{x} \in \mathbb{R}^4$
Homogeneous transformation matrix: $T_{s\to b} = \begin{bmatrix} R_{s\to b} & t_{s\to b} \\ 0 & 1 \end{bmatrix}$
Coordinate transformation under linear form: $\tilde{x}^s = T_{s\to b} \tilde{x}^b$

2.5 齐次变换的两条规则

规则	公式
Composition rule	$T_{3\to 1} = T_{3\to 2} T_{2\to 1}$
Change of observer's frame	$T_{2\to 1} = (T_{1\to 2})^{-1}$

2.6 Base link, end-effector, joint space, Cartesian space

Base link / root link：the $0$-th link of the robot；the spatial frame $\mathcal{F}_s$ is attached to it
End-effector link：the last link；例如 gripper；frame $\mathcal{F}_e$ attached to it
Joint space：each coordinate is a vector of joint poses
Cartesian space：the space of rigid transformations of the end-effector by $(R_{s\to e}, t_{s\to e})$

2.7 Forward Kinematics 与 Inverse Kinematics

Forward Kinematics：map the joint space coordinate $\theta \in \mathbb{R}^n$ to a transformation matrix $T$，即 $T_{s\to e} = f(\theta)$
FK 是通过沿 kinematic chain 复合变换得到
Inverse Kinematics：给定 $T_{s\to e}(\theta)$ 和目标位姿 $T_{target} \in SE(3)$，求满足 $T_{s\to e}(\theta) = T_{target}$ 的 $\theta$
Workspace：volume swept out by the end-effector as the robot does all possible motions

2.8 IK 的难点与解法分类

IK 可能有多个解，也可能无解
即便有解，也可能 require complex and expensive computations
分类：analytical methods 与 numerical methods
Analytical methods：直接数学求 closed-form exact solution，只适用于 relatively simple chains
Numerical methods：use approximation and iteration；通常更 expensive，但 far more general purpose

2.9 Pieper's criterion

Pieper's criterion 用于判断一个 6-DOF robotic arm 是否存在 closed-form inverse kinematics solution。课件给出的充分条件：
1. The axes of three consecutive rotational joints intersect at a single point.
2. The axes of three consecutive rotational joints are parallel.
注意：this is only a sufficient condition but not necessary。

2.10 $SO(3)$ 与 $SE(3)$

空间	定义	含义
$SO(3)$	$\{R \in \mathbb{R}^{3\times 3}: \det(R)=1, RR^T=I\}$	3D rotations, 3 DoF
$SE(3)$	$T = \begin{bmatrix}R & t \\ 0 & 1\end{bmatrix}, R\in SO(3), t\in\mathbb{R}^3$	rigid transformations, 6 DoF

本章公式汇总

Point transform：$p^s = R_{s\to b} p^b + t_{s\to b}$

General transform：${x'}^s = R_{s\to b} x^s + t_{s\to b}$

Homogeneous transform：$T_{s\to b}=\begin{bmatrix}R_{s\to b} & t_{s\to b} \\ 0 & 1\end{bmatrix}$

Composition：$T_{3\to1}=T_{3\to2}T_{2\to1}$

Inverse：$T_{2\to1}=(T_{1\to2})^{-1}$

Forward kinematics：$T_{s\to e}=f(\theta)$

概念辨析 · Lect02

以下关于 forward kinematics 和 inverse kinematics 的说法，哪一项正确？

A. FK 是由末端位姿反求关节角

B. IK 一定唯一且一定有解

C. FK 是从 joint space 到末端变换，IK 是从目标位姿求解关节变量

D. IK 只能用 analytical methods

答案：C。Lecture 2 中 FK 定义为 $T_{s\to e}=f(\theta)$，IK 是给定目标位姿反求 $\theta$；IK 可能多解，也可能无解。

Lect02

以下哪项是 homogeneous transformation 的正确写法？

A. $T = \begin{bmatrix}R & t \\ 0 & 1\end{bmatrix}$

B. $T = R + t$

C. $T = \begin{bmatrix}1 & R \\ t & 0\end{bmatrix}$

D. $T = R^TR$

答案：A。Lecture 2 直接给出了 homogeneous transformation matrix 的标准形式。

第 3 章 · Lect03 Robotics II

3.1 Rotation 的基本性质

A rotation preserves lengths
Cross products are preserved by a rotation
因此 rotation matrices 满足：$RR^T = R^TR = I$，$\det(R)=1$

3.2 为什么 rotation matrix 不理想

Efficiency：rotation has $3$ DoF, however a rotation matrix needs $9$ values
Numerical stability：需要 maintain orthogonality

3.3 Euler angle

Euler angles are three angles introduced by Leonhard Euler to describe the orientation of a rigid body with respect to a fixed coordinate system.
Lecture 3 给出的组合：$R = R_z(\gamma) R_y(\beta) R_x(\alpha)$
优点：good interpretability

3.4 Gimbal lock 与 Euler angle 的局限

当 $\beta = \pi/2$ 时，课件展示了 changing $\alpha$ and $\gamma$ has the same effects，因此 a degree of freedom disappears。
结论：Euler angle can parameterize every rotation, but it is not a unique representation at some points，并且存在一些点，target space 的变化不能通过 source space 的变化实现。

3.5 Angle-axis 与 Euler's theorem

Any rotation is equivalent to a rotation about a fixed axis $\hat{\omega}\in\mathbb{R}^3$ with $\|\hat{\omega}\|=1$ through a positive angle $\theta$
$\hat{\omega}$ 是 unit vector of rotation axis
$\theta$ 是 angle of rotation
$R \in SO(3) := \mathrm{Rot}(\hat{\omega}, \theta)$

3.6 Rodrigues formula 与 exponential coordinate

Skew-symmetric matrix：若 $A = -A^T$，则 $A$ 是 skew-symmetric
Cross product 可写成线性形式：$a \times b = [a]b$
Rodrigues formula：$e^{[\omega]\theta} = I + [\omega]\sin\theta + [\omega]^2(1-\cos\theta)$
$\vec{\theta} = \hat{\omega}\theta$ 也叫 rotation vector，或 exponential coordinate

3.7 angle-axis 参数化的非唯一性

$(\hat{\omega}, \theta)$ 和 $(-\hat{\omega}, -\theta)$ 给出同一个 rotation
当 $R=I$ 时，$\theta=0$，轴可以 arbitrary
$(\hat{\omega}, \pi)$ 和 $(-\hat{\omega}, \pi)$ 给出同一个 rotation
若约束 $\theta\in(0,\pi)$，则 unique parameterization exists

3.8 Quaternion

Quaternion 是 generalized complex number：$q = w + x\mathbf{i} + y\mathbf{j} + z\mathbf{k}$
Vector form：$q = (w, \vec{v})$
Unit quaternion 可以 represent a rotation
Four numbers plus one constraint $\Rightarrow 3$ DoF
Rotate a vector 的形式：先把 $\vec{x}$ augment 成 $x=(0,\vec{x})$，再做 $x' = qxq^*$
Compose rotations by quaternion：just multiply quaternions
Each rotation corresponds to two quaternions（double-covering）

3.9 Quaternion vs rotation matrix

方面	Quaternion	Rotation Matrix
Storage	4 floating-point values	9 values
Multiplication	16 multiplications + 12 additions	27 multiplications + 18 additions
Numerical stability	normalize 后可 maintain unit magnitude	successive operations may violate orthogonality

3.10 SLERP 与 rotation distance

Why not linear interpolation? 因为需要 normalized，而且 does not have a constant rate of rotation
Spherical linear interpolation (SLERP)：shortest path between two points on sphere，traverses a great arc on the sphere of unit quaternions
具有 uniform angular rotation velocity about a fixed axis
Rotation distance：$\mathrm{dist}(R_1, R_2) = \arccos\left(\frac{\mathrm{tr}(R_2 R_1^T)-1}{2}\right)$

3.11 课件最后的使用建议

表示	课件建议用途
rotation matrices	define concepts
Euler angles	visualize rotations
angle-axis	visualize rotations and calculate derivatives
quaternion	write fast codes

本章公式汇总

$SO(3)$：$\{R \in \mathbb{R}^{3\times 3}: RR^T=I, \det(R)=1\}$

Euler composition：$R = R_z(\gamma)R_y(\beta)R_x(\alpha)$

Rodrigues：$e^{[\omega]\theta}=I+[\omega]\sin\theta+[\omega]^2(1-\cos\theta)$

Quaternion rotation：$x' = qxq^*$，其中 $x=(0,\vec{x})$

Rotation distance：$\mathrm{dist}(R_1,R_2)=\arccos\left(\frac{\mathrm{tr}(R_2R_1^T)-1}{2}\right)$

概念辨析 · Lect03

以下关于几种 rotation representation 的说法，哪一项错误？

A. Euler angle 直观，但存在 gimbal lock

B. Quaternion 更适合高效代码实现

C. 同一个 rotation 对应两个 quaternions

D. Rotation matrix 只需要 3 个数，因此比 quaternion 更紧凑

答案：D。Rotation matrix 需要 $9$ 个数，而 quaternion 需要 $4$ 个数加一个单位范数约束，所以 quaternion 更紧凑。

Lect03

为什么课程里既讲 rotation matrix，也讲 Euler angle、angle-axis、quaternion？

因为不同表示法的优缺点不同。课件最后给出的总结是：rotation matrices 用来 define concepts，Euler angles 用来 visualize rotations，angle-axis 用来 visualize rotations and calculate derivatives，quaternion 用来 write fast codes。也就是说，本课不是要你“只会一种表示”，而是要你知道什么时候该用哪一种。非课件内容

第 4 章 · Lect04 Robotics III

4.1 Robotics stack 的大图

层	课件关键词	核心问题
Goal	task / target	要做什么
Motion Planning	collision-free search	what motion is collision-free and geometrically feasible?
Trajectory	path / waypoints	如何给路径加时间
Control	track & stabilize	how do we track that motion reliably in the real system?
Robot	execution	真实执行

4.2 Motion planning 的问题定义

问题：From one robot pose to another robot pose, how robot move safely in the world?
Input: a start state $q_{start}$, a goal state $q_{goal}$, and the collision-free space $C_{free}$
Output: a feasible path connecting the start to the goal (not actions!)
Core: Motion planning is a search problem in a high-dimensional constrained space

4.3 Workspace vs Configuration space

空间	课件描述
Workspace	Real 3D world; obstacle, robot, shelf, objects
Configuration space	One point correspond to one joint configuration

Planning is usually performed in configuration space
The robot is not a point. Its shape, joints, and limits must be considered.
Collision constraints depend on the full robot configuration
A straight line connecting the start to the goal may get collision

4.4 Collision checking

Whether a configuration $q$ is in $C_{free}$? Run collision check!
Collision checking is called repeatedly inside planning algorithms
Need to be very fast and maintain accuracy
However, accurate collision check is very slow
Visual mesh for rendering $\neq$ Collision mesh for collision check
Collision mesh of each link is defined in URDF，通常是 geometry approximation
Convex-convex collision checking is usually very fast on CPU

4.5 Grid-based vs sample-based planning

Grid-based：discretize the whole configuration space，compute $C_{free}$，用 search algorithm，如 A*
缺点：too slow for high-dimensional space
Sample-based algorithms：PRM, RRT, RRT-Connect 等

4.6 PRM / RRT / RRT-Connect / Shortcutting

方法	课件信息
PRM	Probabilistic Roadmap；asymptotically optimal but requires massive sampling
RRT	Rapidly-exploring Random Tree
RRT-Connect	single-query path planning 的经典算法
Shortcutting	very useful post-processing heuristic；fast local optimization；改善 jerky, unnatural paths

理解补充非课件内容
这一段要记住的不是每个算法的伪代码，而是三件事：第一，planning 本质上是在高维受限空间里找路；第二，sample-based 方法更适合高维；第三，planner 输出的 path 通常并不“可直接执行”，还需要 trajectory 和 control。

4.7 OMPL

OMPL = Open Motion Planning Library
Default planning library in ROS-MoveIt
Geometric planners：只考虑 geometric and kinematic constraints；假设 any feasible path can be turned into a dynamically feasible trajectory
Control-based planners：若系统 subject to differential constraints，则用 state propagation 而不是 simple interpolation

4.8 From path to trajectory

Path (Geometry): tells us where to go
Trajectory (Time): tells us when to be there and how fast to move
Path 表示为 $q(s)$，其中 $s\in[0,1]$
Time parameterization: $s=s(t) \Rightarrow q(t)=q(s(t))$
真实机器人有关节速度和加速度上限，因此 geometric path 本身不够
课件例子：Minimum Jerk Trajectory 使用 $5$ 阶多项式，保证 smooth starts and stops，minimizes jerk

4.9 Control system

A control system regulates a system's behavior to follow a reference trajectory $x_{ref}(t), \dot{x}_{ref}(t)$, despite disturbances and uncertainties
Main components：Sensor, Controller, Environment/System, Actuator
Open loop (feedforward)：control signal based only on reference
Closed-loop (feedback)：controller uses the error to adjust the control signal
Closed-loop 优点：robust to model uncertainties，rejects disturbances and noise，accurately tracks reference signals

4.10 Error metrics

Tracking error: $x_e = x_{ref} - x$
Steady-state error：$e_{ss}$
Transient response metrics：rise time，settling time，overshoot
目标：small tracking error，small $e_{ss}$，small settling time，minimal oscillation

4.11 P / PD / PID

控制器	课件核心表述
P	Serve as a virtual spring；$u = K_p x_e$
PD	D term is velocity feedback term as virtual damping；accounts for future behavior / trend；attempts to reduce overshoot
PID	I term is error accumulation term to eliminate steady-state offset；reacts to past behavior / history

P only：small $K_p$ 时 slow response / large tracking error；large $K_p$ 时 fast but oscillatory
PD：$K_p$ 决定 action 强度；$K_d$ 决定 damping
PID：I term can eliminate steady-state error

4.12 为什么现代机器人里常偏好 PD 而不是纯 PID

PD is often sufficient for tracking mechanical systems, simpler to tune, and usually more robust in real hardware
In many robots, the main challenge is fast and stable motion, not eliminating tiny steady-state errors
In contact-rich tasks, integral action may lead to unwanted force buildup
To reduce or eliminate steady-state error, modern robots often combine PD control with model-based compensation, gravity compensation, and feedforward torque

4.13 PD tuning

Second-order system 由 natural frequency $\omega_n$ 和 damping ratio $\xi$ 描述
$\xi > 1$ overdamped；$\xi < 1$ underdamped；$\xi = 1$ critically damped
课件给出 matching terms 的结论：更大 $\omega_n$ 对应更大 P gain；更大 damping 对应更大 D gain
这通常用于 heuristically design a good initial set of PD parameters，之后 still needs finetuning

4.14 Multi-DoF system 的经验规则

Rule of thumb：apply the single-joint tuning method to each joint
Humanoid 中 ankle joints require more compliance
Higher D gain 虽可 reduce oscillation，但也会 amplify noise from velocity sensors
All PD tuning must respect hardware torque limits
Robotic arms 因为建模更准确、环境更可控，还可用 feedback linearization 等 model-based methods

本章公式汇总

Path：$q(s), s\in[0,1]$

Time parameterization：$s=s(t) \Rightarrow q(t)=q(s(t))$

Tracking error：$x_e=x_{ref}-x$

P control：$u=K_p x_e$

PD control：$u=K_p x_e + K_d \dot{x}_e$

PID control：$u=K_p x_e + K_i \int x_e dt + K_d \dot{x}_e$

Standard second-order form：$\ddot{x}+2\xi\omega_n\dot{x}+\omega_n^2 x=0$

概念辨析 · Lect04

以下关于 path、trajectory 和 control 的说法，哪一项正确？

A. Path 回答去哪，trajectory 回答何时到以及如何快慢变化，control 负责跟踪

B. Trajectory 只包含几何，不包含时间

C. Open-loop 比 closed-loop 更抗扰动

D. PD 中的 D 项用于消除所有 steady-state error

答案：A。Lecture 4 中 path 是 geometry，trajectory 是带时间的路径，control 负责跟踪；closed-loop 更抗扰动，I 项才用于消除 steady-state offset。

Lect04

关于 motion planning 的输出，哪一项最准确？

A. 直接输出关节力矩序列

B. a feasible path connecting the start to the goal

C. value function

D. reward model

答案：B。课件明确强调是 path，不是 actions。

第 5 章 · Lect05 Vision and Grasping I

5.1 从机器人学到抓取流水线

Kinematics：conversion between Cartesian space and joint space
Motion Planning：get a geometrically valid path
Control：ensures the motion follow the planned trajectory
If we can provide where to move, a.k.a. end-effector pose in Cartesian space, this already forms a pipeline to grasp objects.

5.2 Grasping 与 grasp pose

Grasping：restraining an object's motion in a desired way by applying forces and torques at a set of contacts
Grasp Synthesis：high-dimensional search or optimization problem to find gripper poses or joint configurations
Grasp Pose defines the position, orientation and articulation of a hand
4-DoF grasp：a 3D position and 1D hand orientation aligned with the direction of gravity，a.k.a. top-down grasping
6-DoF grasp：a 3D position and 3D orientation

5.3 Open-loop grasping 的两条路线

路线	课件描述
Path I	For known objects with labeled grasps: 6D object pose estimation $\to$ further get grasping pose from object pose $\to$ motion planning
Path II	For unknown and general objects: directly predict grasping pose $\to$ motion planning

5.4 6D object pose

Definition：6D transformation from object to camera space
包含 3DoF translation 和 3DoF rotation
Instance-level 6D pose estimation：small set of known instances；pose defined according to their CAD model；input 可为 RGB / RGBD；输出 object pose(s)

5.5 PoseCNN

课件用 PoseCNN 作为 instance-level 6D pose estimation 的代表方法
其中 rotation estimation 页明确写了：Regress quaternion

5.6 ICP

ICP is a method for point cloud registration
输入两组 point clouds，求 $R$ 和 $T$ 使一组点云 align to the other as closely as possible
步骤：make data centered，find correspondences，solve constrained orthogonal Procrustes，obtain translation，iterate
termination condition：变换小于阈值 / loss 变化小于阈值 / 达到最大迭代次数

ICP 优点	ICP 缺点
Simple；no need for segmentation or feature extraction；good initialization 下 accuracy and convergence decent	finding correspondences cost high；只考虑 point-to-point distances；highly dependent on the accuracy of the initial estimate

5.7 Rotation regression 的表示问题

3D rotation only has $3$ DoF，但 rotation matrix 有 $9$ 个元素，makes neural network harder to predict
可选表示：Euler angle、axis angle、quaternion
这些表示 often suffer from singularities and discontinuities
Euler angles discontinuous；axis-angle 在 $\theta=0$ 与 $\theta=\pi$ 附近有问题；quaternion 有 double coverage

5.8 Continuous rotation representations

6D representation：simply eliminates the last column of rotation matrix
再用 Gram-Schmidt orthogonalization 把网络输出映射到 $SO(3)$
9D representation：对应 rotation matrix，再用 SVD orthogonalization 映射到 $SO(3)$

5.9 FoundationPose

FoundationPose 是 a unified foundation model for 6D object pose estimation and tracking
supports both model-based and model-free setups
初始化流程：先检测对象，再用 bbox center 对应的 $3D$ 点和 median depth 初始化 translation，再从 icosphere 采样 viewpoints 初始化 rotations
Pose conditioned input crop：用 coarse pose 的渲染和输入 crop 输出 refined pose

5.10 Category-level 6D pose estimation 与 NOCS

目标：estimate 6D pose and 3D size of previously unseen objects from certain categories
关键是 NOCS = Normalized Object Coordinate Space
三步 normalization：rotation normalization，translation normalization，scale normalization
Category-level pose 是 transformation from NOCS to camera space
From image to NOCS map to pose：predict NOCS map，结合 depth backproject，再做 pose fitting with RANSAC

本章公式汇总

6D object pose：3DoF translation + 3DoF rotation

ICP objective：求 $(R,T)$ 使 source point cloud 尽量对齐 target point cloud

Quaternion regression：用 quaternion 预测 rotation

6D rotation representation：取 rotation matrix 前两列，再用 Gram-Schmidt orthogonalization 映射到 $SO(3)$

Category-level pose：transformation from NOCS to camera space

概念辨析 · Lect05

以下关于 6D pose estimation、ICP 和 NOCS 的说法，哪一项正确？

A. ICP 与初始估计无关，因此总能稳定收敛

B. NOCS 只适用于已知 CAD 的 instance-level pose estimation

C. Category-level pose estimation 里，NOCS 充当规范参考坐标系

D. 6D object pose 只包含 rotation，不包含 translation

答案：C。Lecture 5 里 category-level pose 是从 NOCS 到 camera space 的变换；ICP 对初始化敏感；6D object pose 包含 translation 和 rotation。

理解补充非课件内容
Lecture 5 可以概括成一句话：先决定“抓哪里”，再决定“手以什么位姿去抓”，其中 6D object pose estimation 是 known objects 路线里的核心中间变量。

第 6 章 · Lect06 Vision and Grasping II

6.1 MuJoCo

MuJoCo = Multi-Joint dynamics with Contact
general-purpose physics engine designed for fast and accurate simulation of articulated structures interacting with their environments

6.2 两种 grasp 模型形式

形式	课件描述
Grasp detection	detect multiple grasp poses from observations
Conditional grasp generation	condition: observation; output: grasp poses

课件特别说明：Due to multi-modal grasp distribution, formulate grasping as a detection problem.

6.3 Visual input representation

表示	课件关键词	特点
Voxel Grids	VGN	Explicit geometry；limited by volume resolution
Point Cloud	PointNet / PointNet++ / GraspNet-baseline	Explicit geometry；less memory cost；higher resolution

6.4 抓取评估指标

Success Rate：the ratio of successful grasp executions
Percent cleared：the percentage of objects removed during each round
Planning time：the time between receiving input and returning grasps

6.5 GS-Net

GS-Net 把 grasp pose detection in the wild 视作 two-stage problem
Where stage：locations with high graspability
How stage：decide grasp parameters，例如 in-plane rotation, approaching depth, gripper width, grasp score
graspness：a novel geometrically based quality for distinguishing graspable area in cluttered scenes
graspness 可分为 point-wise 与 view-wise graspness scores
“success rate” 的计算依赖 force closure 和 collision checking

6.6 DexGraspNet 2.0

DexGraspNet 2.0: Diffusion-based Dexterous Grasp Generation
Stage 1: Where to grasp，predict graspness and objectness scores
Stage 2: How to grasp，predict residual position、rotation、finger joint angles
为处理 multi-modal grasping pose distribution，课件写明使用 diffusion model
DexGraspNet 2.0 contains 7600 training scenes with 426 million grasps

6.7 Force closure 与 form closure

Force Closure：如果 grasp 在一组 frictional contacts 上施加的力足以 compensate any external wrench applied to the object
物理表述：the positive span of the wrench cones is the entire wrench space
Form closure：若 rigid body 被 rigid stationary fixtures fully immobilized
When planning a grasp by a robot hand, force closure is a good minimum requirement
Form closure is usually too strict, requiring too many contacts
课件给出的关系：successful grasp $\le$ force closure $\le$ form closure

6.8 数据集

类型	数据集	课件说明
Object dataset	ShapeNet / ModelNet / Objaverse-XL	Objaverse-XL: 10M+ 3D Objects
Synthetic grasp dataset	ACRONYM	with grasping annotation
Real grasp dataset	GraspNet-1Billion	with grasping annotation

GraspNet-1Billion 的标注流程：sample grasp point from point cloud，再采样 grasp view / in-plane rotation / gripper depth 并评估，最后用 object 6D pose 投影到 scene，且做 collision detection。

6.9 Hand-eye calibration 的必要性

Grasp poses 最初是在 camera space 里估计的
机器人要执行，就必须 transform poses to the robot space
因此需要知道 camera 和 robot 之间的 transformation，即 hand-eye calibration
定义：determining the precise geometric relationship (transformation matrix) between a robot's coordinate system and a camera's coordinate system

6.10 Camera model 与 calibration

Agenda：camera model: intrinsics and extrinsics
Goal of calibration：estimate camera intrinsics and extrinsic from one or multiple images
若 intrinsics $K$ 已知或已标定，则可进一步 recover pose
Perspective-n-Point (PnP) 是课件给出的 camera calibration approach

6.11 Eye-in-hand vs Eye-to-hand

方式	安装位置	要求求解的关系
Eye-in-hand	camera mounted on the robot	camera coordinate $\to$ end-effector coordinate
Eye-to-hand	camera mounted stationary next to the robot	camera coordinate $\to$ robot base coordinate

6.12 AX = XB

Hand-eye calibration 的核心方程：$AX = XB$
在 eye-in-hand workflow 中：
$A$：end-effector pose change
$B$：camera pose change（由 marker pose 计算）
$X$：end-effector to camera

6.13 Eye-in-hand workflow

Rigidly attach the camera to the robot's end-effector
Fix a calibration target on a flat surface within workspace
Move the robot to 10-30 different poses where camera clearly sees the board
每个 pose 记录 end-effector pose，并通过 solvePnP 计算 target-to-camera transformation
Best practices：orientation 要显著变化；cover different heights and tilt angles；avoid singular configurations；board 必须 fully visible

6.14 Validation

Reprojection Error：a low pixel error (usually $<1$ pixel) indicates a successful calibration
TCP Touch Test：如果 robot accurately touches the point，则 calibration physically verified
Eye-to-hand 对应的验证包括 hand-to-pixel test：virtual dot overlays perfectly with the physical gripper

6.15 Depth sensing problem

Lecture 6 最后用 transparent / specular objects 的例子展示了 depth sensing problem。

本章公式汇总

Force closure：positive span of the wrench cones = entire wrench space

Grasp quality relation：successful grasp $\le$ force closure $\le$ form closure

Hand-eye equation：$AX = XB$

Camera model：intrinsics + extrinsics

PnP setting：已知内参和对应关系时恢复 pose

概念辨析 · Lect06

以下关于 force closure 与 hand-eye calibration 的说法，哪一项正确？

A. Form closure 通常比 force closure 更宽松

B. Hand-eye calibration 的作用是建立 camera coordinate system 与 robot coordinate system 的精确几何关系

C. Eye-to-hand 表示 camera 安装在 end-effector 上

D. AX = XB 用来计算 reward function

答案：B。Lecture 6 明确定义了 hand-eye calibration；form closure 比 force closure 更严格；eye-in-hand 才是 camera mounted on the robot。

Lect06

为什么 hand-eye calibration 在抓取里是必须的？

因为前面的视觉模块把物体和 grasp pose 估计在 camera space 里，而机器人执行动作需要的是 robot space 或 base / end-effector 相关坐标系中的位姿。如果不知道 camera 和 robot 之间的精确几何关系，视觉输出就无法稳定地映射成可执行的机器人动作。非课件内容

第 7 章 · Lect07 Policy I

7.1 Policy 的基本定义

A policy is an end-to-end mapping: state $\to$ action
Stochastic：$a \sim \pi(a|s)$
Deterministic：$a = \pi(s)$

7.2 State

In Embodied AI, state $s_t$ at time step $t$ includes a complete description of the environment
Contains all information needed to predict the future
Usually not fully accessible
Under true state, the system is Markovian
Markov property：$P(s_{t+1}|s_t,a_t,s_{t-1},a_{t-1},\ldots)=P(s_{t+1}|s_t,a_t)$

7.3 Dynamics model 与 world model

$P(s_{t+1}|s_t,a_t)$ 叫 dynamics model 或 transition model
描述 world evolves，predict next state given current state and action
有时 people also call it a world model
world model 通常是 learned 的，并可能 additionally come with reward model $r(s,a)$ 和 observation model $P(o|s)$

7.4 MDP

MDP：a framework for sequential decision making under uncertainty
每个 time step：observe current state $s_t$，take action $a_t\sim\pi(a_t|s_t)$，environment transitions to $s_{t+1}\sim p(s_{t+1}|s_t,a_t)$
Markov property：next state depends only on current state and action

7.5 Observation

Observation 是 agent 实际从 sensors 收到的量
包括 Exteroceptive information：vision, depth, tactile sensing, audio
也包括 Proprioceptive information：joint angles, velocities, torques, motor states
Observations are typically partial, noisy and ambiguous
The same observation may correspond to different underlying states
Observations are often non-Markov, while states are defined to be Markov

7.6 State-based vs observation-based policy

实践里 policy 往往 operate on observations rather than true states
一般定义：a realistic robotic policy can be $a\sim\pi(a|o,l)$
其中 $o$ 是 observation，$l$ 是 language or task instruction
State-based policy $a\sim\pi(a|s)$ 通常存在于 simulator，或 state 被其他算法估计出来

7.7 IL vs RL

方法	前提	关键思想
Imitation Learning	access to an expert	learn to mimic expert behavior directly from data
Reinforcement Learning	no expert available	learn what to do by evaluating the consequences of actions

7.8 Behavior Cloning

BC 把 policy learning 当作 supervised learning
input: observation (or state)
output: action
For deterministic policy，usually adopt MSE loss
最小化 MSE 等价于 Gaussian policy 下的 maximum likelihood（fixed covariance）

7.9 BC 的核心问题：distribution drift / mismatch

训练时数据来自 expert trajectories，只覆盖 states visited by expert；测试时数据来自 learned policy 访问到的状态。
因而会发生：Small mistake $\to$ new unseen state $\to$ larger mistake $\to$ OOD state，最终 errors accumulate over time。

7.10 Embodied AI 中的数据采集

ALOHA-style master-slave teleoperation：human operator controls a master arm, kinematically coupled to a slave robot arm
可记录 observation $o_t$ 和 action $a_t$，形成 teleoperation dataset $\mathcal{D} = \{(o_t,a_t)\}_{t=1}^T$
Observation 可含 visual observation 与 proprioception
Action 可是 joint-space target、task-space pose、或 very common 的 $\Delta x_t$

7.11 DAgger 与 HG-DAgger

Original DAgger 的问题：对于 $6$-DoF/$7$-DoF robotic arm 甚至 humanoid，人工给每个状态标动作非常困难
HG-DAgger：instead of labeling actions offline, the human intervenes during execution
流程：run policy，human monitors execution，when policy makes mistake human takes over，record corrected data，aggregate and retrain
优点：avoids manual action labeling；only labels when necessary；more data-efficient and practical

7.12 On-policy distillation

Key idea：replace human labeling with a teacher policy
Run student policy and collect states visited by the student
Query teacher policy for labels
Train student for one gradient step (no more than one to maintain pure on-policy)
Teacher policy 例子：privileged state-based policy；或 motion planner 直接给 action label

本章公式汇总

Stochastic policy：$a \sim \pi(a|s)$

Deterministic policy：$a = \pi(s)$

Markov property：$P(s_{t+1}|s_t,a_t,s_{t-1},a_{t-1},\ldots)=P(s_{t+1}|s_t,a_t)$

Observation-based policy：$\pi_\theta(a_t|o_t)$

General robotic policy：$a \sim \pi(a|o,l)$

Teleoperation dataset：$\mathcal{D}=\{(o_t,a_t)\}_{t=1}^T$

概念辨析 · Lect07

以下关于 state、observation 和 policy 的说法，哪一项正确？

A. Observation 往往是 partial、noisy、ambiguous，而 true state 才按定义满足 Markov

B. Observation 一定比 state 更完整

C. 机器人策略只能写成 $\pi(a|s)$，不能依赖 observation

D. BC 的训练和测试状态分布天然完全相同

答案：A。Lecture 7 明确区分了 state 与 observation；实践中 policy 往往基于 observation，而 BC 的关键问题正是 distribution mismatch。

Lect07

Behavior Cloning 在机器人里最典型的问题是什么？

A. reward 太稀疏

B. distribution mismatch / error compounding

C. 无法处理 deterministic policy

D. 只能用于 state-based policy，不能用于 observation-based policy

答案：B。课件写的是 distribution drift / distribution mismatch，并说明 Small mistake $\to$ new unseen state $\to$ larger mistake $\to$ OOD state，最终 errors accumulate over time。

第 8 章 · Lect08 Policy II

8.1 RL 的基本定义

RL = Learning a policy by interacting with an environment
Objective：maximize cumulative reward
No need for expert demonstrations

8.2 Reward function

Reward function $r(s,a) \in \mathbb{R}$ is a scalar signal that evaluates the quality of an action in a state and provides immediate feedback from the environment.

Key property：Local & myopic，only reflects instant outcome
Can be sparse or dense
Sparse：reward only at success/failure，reward may be delayed，requires credit assignment
Dense：frequent feedback shaping behavior

8.3 POMDP

Lecture 8 在 MDP 基础上引入 POMDP（Partially Observed Markov Decision Process）
对视觉 policy 来说，$s_t$ 往往 unavailable，只能 observe $o_t$

8.4 Online RL, on-policy, off-policy, offline policy learning

概念	课件表述
Online RL	allows interaction with the environment while doing RL
On-Policy RL	train a policy using experiences collected from the most recent policy
Off-Policy RL	use data collected throughout training and stored in buffer $D$；more sample efficient
Behavior cloning	learning a policy via imitating expert demonstration；no need of reward；not RL
Offline RL	collect data from any policy, store in $D$，then no further interaction；need reward；it is one type of RL

8.5 Monte Carlo approximation

Sample $N$ trajectories and approximate an expectation using sample averages
Replace intractable expectation with empirical mean
这是 Law of Large Numbers 的应用
是 true expectation 的 unbiased estimator

8.6 REINFORCE 的特性

Given a regular size of samples, policy gradient from REINFORCE is very noisy
high variance
still unbiased

8.7 Policy gradient in POMDP

For visual policy, replace $s_t$ by $o_t$
课件原句：We can use policy gradient in POMDPs by simply modifying $s_t \to o_t$.
policy 变成 $\pi_\theta(a_t|o_t)$

8.8 Causality 与 reward-to-go

Causality：actions only affect future rewards, not past ones。
因此对每个 time step $t$，可以用 reward-to-go 替代整个 episode 的 total return，从而降低方差。

8.9 Baseline

课件结论：subtracting a baseline is unbiased in expectation
average reward is not the best baseline, but it's pretty good
引入 baseline 的主要目标是 reduce variance

8.10 Actor-critic

部分	作用
Actor	the policy
Critic	value function

Actor-critic 通过 critic 来 estimate return，从而 reduce variance of policy gradient
Batch actor-critic：trajectory-based gradient evaluation
Online actor-critic：transition-based gradient evaluation

8.11 Discount factor $\gamma$

higher $\gamma$ means considering a longer future
smaller $\gamma$ focuses more on immediate rewards and transitions
课件后面还强调：discount = variance reduction

8.12 N-step returns 与 GAE

n-step return：single parameter knob $(n)$ that balances bias and variance by deciding how long you trust the real trajectory before bootstrapping
GAE：weighted combination of n-step advantage / n-step returns
课件直观表述：Mostly prefer cutting earlier (less variance)，用 exponential falloff 加权
Typical choices in on-policy methods：$\gamma \approx 0.99, \lambda \approx 0.95$

8.13 课件最后的 RL 总结

Actor-critic algorithms：reduce variance of policy gradient
Policy evaluation：fitting value function to policy
Discount factors：既是 temporal horizon，也可看作 variance reduction trick
Actor-critic design：one network with two heads or two networks；batch-mode or online (+ parallel)
State-dependent baselines：another way to use the critic；可与 n-step returns 或 GAE 结合

本章公式汇总

Reward function：$r(s,a) \in \mathbb{R}$

Policy in POMDP：$\pi_\theta(a_t|o_t)$

Reward-to-go：$G_t = \sum_{t'=t}^{T} r_{t'}$

Q-function：reward-to-go 在给定 $(s_t,a_t)$ 条件下的期望

State-based baseline：$V(s_t)$

Typical GAE hyperparameters：$\gamma \approx 0.99, \lambda \approx 0.95$

概念辨析 · Lect08

以下关于 reward-to-go、baseline 和 actor-critic 的说法，哪一项正确？

A. Subtracting a baseline 会必然引入偏差，因此不能用

B. Reward-to-go 的作用是把 future rewards 也删掉

C. Actor-critic 用 critic 估计 value / return，主要目的是降低 policy gradient 的方差

D. $\gamma$ 越小，就一定看得越远

答案：C。Lecture 8 强调 baseline 在期望下不引入偏差，reward-to-go 利用 causality 保留 future rewards，actor-critic 的核心作用之一是 reduce variance。

大题精解 · 为什么 reward-to-go 合理？

课件在 Lect08 中用 causality 说明：actions only affect future rewards, not past ones。解释为什么这允许我们把每个时间步的 total return 改写成 reward-to-go。

关键不是“改写后更准确”，而是“改写后仍然无偏，但方差更小”。

按课件思路，把总回报拆成两部分：过去奖励和未来奖励。对于时间步 $t$ 的策略梯度项，当前动作 $a_t$ 不可能影响已经发生的过去奖励，因此过去那部分在期望里贡献为 $0$。

所以，和当前动作真正相关的只剩下从 $t$ 开始往后的那段 return，也就是 reward-to-go。

这一步的收益是：删掉了与当前动作无关、但会增加噪声的过去奖励项，因此估计方差下降。非课件内容

第 11 章 · 互动自测

具身智能导论期中自测

0 / 0

第 12 章 · 考前速查表

Lecture 1-4：机器人学基础

Rigid transform：$p^s = R_{s\to b} p^b + t_{s\to b}$

Homogeneous transform：$T = \begin{bmatrix}R & t \\ 0 & 1\end{bmatrix}$

Composition：$T_{3\to1}=T_{3\to2}T_{2\to1}$

Inverse：$T_{2\to1}=(T_{1\to2})^{-1}$

Forward kinematics：$T_{s\to e}=f(\theta)$

Rodrigues：$e^{[\omega]\theta}=I+[\omega]\sin\theta+[\omega]^2(1-\cos\theta)$

Rotation distance：$\mathrm{dist}(R_1,R_2)=\arccos\left(\frac{\mathrm{tr}(R_2R_1^T)-1}{2}\right)$

Lecture 4：控制

Tracking error：$x_e = x_{ref} - x$

P control：$u = K_p x_e$

PD control：$u = K_p x_e + K_d \dot{x}_e$

PID control：$u = K_p x_e + K_i \int x_e dt + K_d \dot{x}_e$

$K_p$：increase speed, reduce steady-state error, but may increase overshoot

$K_d$：acts as damping / brake, reduce overshoot and shorten settling time

Lecture 5-6：视觉抓取

4-DoF grasp：3D position + 1D hand orientation aligned with gravity

6-DoF grasp：3D position + 3D orientation

6D object pose：object to camera space 的 6D transformation

ICP：point cloud registration，依赖 good initialization

Force closure：good minimum requirement for grasp planning

Hand-eye calibration：求 camera 与 robot coordinate systems 的 precise geometric relationship

Hand-eye equation：$AX = XB$

Lecture 7-8：策略学习

Policy：state / observation $\to$ action

Markov property：$P(s_{t+1}|s_t,a_t,\ldots)=P(s_{t+1}|s_t,a_t)$

Observation：partial, noisy, ambiguous；often non-Markov

BC 问题：distribution mismatch / error compounding

Reward：scalar immediate feedback；can be sparse or dense

REINFORCE：high variance, still unbiased

reward-to-go：来自 causality，降低方差

baseline：subtracting a baseline is unbiased in expectation

Actor-critic：actor = policy，critic = value function

GAE 常见参数：$\gamma \approx 0.99, \lambda \approx 0.95$

最后检查清单非课件内容
1. 能否不用看笔记说清楚 state、observation、action、reward、policy 的区别。
2. 能否写出 homogeneous transform、Rodrigues、$AX=XB$、P/PD/PID。
3. 能否解释 BC 为什么会 drift，reward-to-go 为什么能降方差，actor-critic 为什么比纯 Monte Carlo 更稳。
4. 能否把 “vision output in camera space” 和 “robot execution in robot space” 之间为何必须 hand-eye calibration 说清楚。
5. 能否口头串起整门课：Embodiment $\to$ robotics basics $\to$ grasping $\to$ policy learning。

具身智能导论一站式复习网页

期中考试信息（来自 Lect08 Logistics: Midterm）

复习建议 非课件内容

Lect02-04 机器人学基础

Lect07-08 策略学习

Lect05-06 视觉抓取

Lect01 总览

考点路线图 整理导图

Lect01 Overview

Lect02 Robotics I

Lect03 Robotics II

Lect04 Robotics III

Lect05 Vision and Grasping I

Lect06 Vision and Grasping II

Lect07 Policy I

Lect08 Policy II

第 1 章 · Lect01 Overview

1.1 课程目标

1.2 机器人与机器人学

1.3 从 classical special-purpose robots 到 Embodied AI

1.4 Embodied AI 的核心直觉

1.5 人类智能演化与具身性

1.6 未来图景：Generalist Robots

1.7 Robot brain 的二层结构

1.8 Vision-Language-Action 与数据瓶颈

1.9 Simulation 的角色

本章表达式汇总

第 2 章 · Lect02 Robotics I

2.1 Kinematics vs. Dynamics

2.2 Link / Joint / DoF

2.3 刚体变换与坐标变换

2.4 为什么引入 homogeneous coordinates

2.5 齐次变换的两条规则

2.6 Base link, end-effector, joint space, Cartesian space

2.7 Forward Kinematics 与 Inverse Kinematics

2.8 IK 的难点与解法分类

2.9 Pieper's criterion

2.10 $SO(3)$ 与 $SE(3)$

本章公式汇总

第 3 章 · Lect03 Robotics II

3.1 Rotation 的基本性质

3.2 为什么 rotation matrix 不理想

3.3 Euler angle

3.4 Gimbal lock 与 Euler angle 的局限

3.5 Angle-axis 与 Euler's theorem

3.6 Rodrigues formula 与 exponential coordinate

3.7 angle-axis 参数化的非唯一性

3.8 Quaternion

3.9 Quaternion vs rotation matrix

3.10 SLERP 与 rotation distance

3.11 课件最后的使用建议

本章公式汇总

第 4 章 · Lect04 Robotics III

4.1 Robotics stack 的大图

4.2 Motion planning 的问题定义

4.3 Workspace vs Configuration space

4.4 Collision checking

4.5 Grid-based vs sample-based planning

4.6 PRM / RRT / RRT-Connect / Shortcutting

4.7 OMPL

4.8 From path to trajectory

4.9 Control system

4.10 Error metrics

4.11 P / PD / PID

4.12 为什么现代机器人里常偏好 PD 而不是纯 PID

4.13 PD tuning

4.14 Multi-DoF system 的经验规则

本章公式汇总

第 5 章 · Lect05 Vision and Grasping I

5.1 从机器人学到抓取流水线

5.2 Grasping 与 grasp pose

5.3 Open-loop grasping 的两条路线

5.4 6D object pose

5.5 PoseCNN

5.6 ICP

5.7 Rotation regression 的表示问题

5.8 Continuous rotation representations

5.9 FoundationPose

5.10 Category-level 6D pose estimation 与 NOCS

本章公式汇总

具身智能导论
一站式复习网页

复习建议非课件内容

考点路线图整理导图