YOLO11 + 深度相机 6DoF 位姿估计教程

0

概述 Overview

目标：从 YOLO11 检测结果 + 深度相机，获取物体完整 6DoF 位姿（位置 + 姿态）。 Goal: Recover the complete 6DoF pose (position + orientation) of an object from YOLO11 detections combined with a depth camera.

本教程以 GB/T 20234.3 直流充电口为例，使用三个检测类别：charging_port / DC_hole / PE。 This tutorial uses a GB/T 20234.3 DC charging port as the example object with three detection classes: charging_port / DC_hole / PE.

最终输出：T_cam2port（4×4 变换矩阵），包含相机坐标系下充电口的完整位置和朝向。 Final output: T_cam2port (4×4 transformation matrix) encoding the full position and orientation of the charging port in the camera coordinate frame.

YOLO11 检测

→

深度采样

→

反投影 3D 点

→

RANSAC 法向量

→

T_cam2port

1

为什么需要 6DoF 而不只是距离 Why 6DoF Instead of Just Distance

YOLO + 深度读数只能给出「物体在多远」——Z 方向的标量距离。插接任务需要完整的 6 个自由度：X/Y/Z 三维位置 + Roll/Pitch/Yaw 三个朝向角（共 6 个自由度）。 YOLO plus a depth reading only tells you how far away the object is — a scalar Z distance. A plug-insertion task requires all 6 degrees of freedom: X/Y/Z translation plus Roll/Pitch/Yaw orientation (6 DOF total).

自由度DOF	方向Direction	YOLO + 单深度可得？YOLO + single depth?
X	左右平移Left / Right	YES
Y	上下平移Up / Down	YES
Z	前后距离Depth	YES
Roll	绕 Z 轴旋转Rotation about Z	NO
Pitch	绕 X 轴旋转Rotation about X	NO
Yaw	绕 Y 轴旋转Rotation about Y	NO

类比：Analogy: 知道前方 1.2 米有个插座，但不知道插座面朝哪个方向，机器人无法精准插进去。需要完整的旋转信息才能对齐插头与插座。 You know there is a socket 1.2 m ahead, but without knowing which direction it faces, the robot arm cannot insert the plug. Full rotation information is required to align the connector.

2

整体方案 Pipeline Overview

三步原理： Three-step principle:

YOLO 检测 charging_port bbox → 区域深度点云 → RANSAC 平面拟合 → 法向量（Z 轴方向） YOLO detects charging_port bbox → extract region depth point cloud → RANSAC plane fitting → surface normal (Z-axis direction)
DC_hole（两个孔）+ PE 的 bbox 中心像素 + 深度 → 相机内参反投影 → 三个 3D 关键点 DC_hole (two holes) + PE bbox center pixels + depth → back-project via camera intrinsics → three 3D keypoints
DC+/DC- 连线 → X 轴；DC 中点 → PE 方向 → Y 轴；法向量校验 Z 轴 → 完整旋转矩阵 R DC+ → DC- vector defines X-axis; DC midpoint → PE direction defines Y-axis; normal validates Z-axis → full rotation matrix R

数据流示意Data-flow diagram

charging_port bbox

→

RANSAC

→

Z 轴（法向量）

DC_hole + PE bbox

→

反投影 3D 点

→

X 轴 + Y 轴

X + Y + Z 轴

→

旋转矩阵 R

→

T_cam2port

3

前置条件 Prerequisites

已完成 YOLO11 自定义模型训练（参见训练教程 LEARN — 002） YOLO11 custom model training completed (see training tutorial LEARN — 002)
已安装 pyorbbecsdk 并可正常驱动 Orbbec DaBai DCW2 pyorbbecsdk installed and Orbbec DaBai DCW2 working
检测类别包含 charging_port、DC_hole、PE Detection classes include charging_port, DC_hole, PE
Python 环境已安装：ultralytics、opencv-python、numpy Python environment with: ultralytics, opencv-python, numpy

D2C 对齐说明：D2C Alignment: Orbbec DaBai DCW2 支持硬件深度到彩色对齐（HW D2C）。启用后，深度图与 RGB 图像素一一对应，可统一使用 rgb_intrinsic 内参进行反投影。 Orbbec DaBai DCW2 supports hardware depth-to-color alignment (HW D2C). Once enabled, depth and RGB pixels correspond one-to-one and share the same rgb_intrinsic parameters for backprojection.

4

获取相机内参 Getting Camera Intrinsics

相机内参（fx, fy, cx, cy）是将像素坐标反投影为 3D 点的必要参数。在 pipeline.start(config) 之后调用： Camera intrinsics (fx, fy, cx, cy) are required to back-project pixel coordinates into 3D space. Call after pipeline.start(config):

# pipeline.start(config) 之后调用
cam_param = pipeline.get_camera_param()
fx = cam_param.rgb_intrinsic.fx
fy = cam_param.rgb_intrinsic.fy
cx = cam_param.rgb_intrinsic.cx
cy = cam_param.rgb_intrinsic.cy
# HW D2C 对齐后深度与彩色同坐标系，统一用 rgb_intrinsic

参数Parameter	含义Meaning
`fx`	X 方向焦距（像素）Focal length in X (pixels)
`fy`	Y 方向焦距（像素）Focal length in Y (pixels)
`cx`	主点 X 坐标Principal point X
`cy`	主点 Y 坐标Principal point Y

5

关键点物理坐标 Keypoint Physical Coordinates

预先量好的充电口物理尺寸，不是实时测量。以充电口圆心为原点，单位米。数据来源 GB/T 20234.3-2023： Pre-measured physical dimensions of the charging port — not measured at runtime. Origin at the charging port center, unit: meters. Source: GB/T 20234.3-2023:

# GB/T 20234.3-2023 车辆插座孔位坐标（单位：米）
# DC+/DC- 水平间距 34mm，各距圆心 ±17mm；垂直方向距圆心 1.2mm 偏下
# PE 在圆心正下方 21mm，左右居中
OBJECT_POINTS = np.array([
    [-0.017, -0.0012, 0],  # DC+ 左，距圆心 -17mm
    [ 0.017, -0.0012, 0],  # DC- 右，距圆心 +17mm
    [ 0.000, -0.021,  0],  # PE  圆心正下方 21mm
], dtype=np.float64)

说明：Note: OBJECT_POINTS 是物理模型常量，用于 PnP 求解（若使用 solvePnP 方案）或几何关系校验。本教程采用几何构建法，不依赖 solvePnP，但物理坐标仍是理解坐标系定义的重要参考。 OBJECT_POINTS is a physical model constant used for PnP solving (if using solvePnP) or geometric validation. This tutorial uses geometric construction rather than solvePnP, but these coordinates are important for understanding the coordinate frame definition.

6

深度反投影 Depth Backprojection

将像素坐标 (u, v) 加上深度值，利用相机内参反投影为相机坐标系下的 3D 点： Convert a pixel coordinate (u, v) plus a depth value into a 3D point in camera space using the intrinsic parameters:

def backproject(u, v, z_mm, fx, fy, cx, cy):
    """像素坐标 + 深度 → 相机坐标系 3D 点（单位：米）"""
    z = z_mm / 1000.0
    x = (u - cx) * z / fx
    y = (v - cy) * z / fy
    return np.array([x, y, z])

深度单位：Depth units: Orbbec DaBai DCW2 原始深度值单位为毫米（mm），除以 1000 转换为米。返回的 3D 点单位为米，与旋转矩阵单位一致。 Orbbec DaBai DCW2 outputs depth in millimeters. Dividing by 1000 converts to meters, keeping units consistent with the rotation matrix output.

对每个关键点（DC+、DC-、PE），取 bbox 中心像素作为 (u, v)，在深度图中采样对应深度值，再调用 backproject() 得到 3D 坐标。 For each keypoint (DC+, DC-, PE), use the bbox center pixel as (u, v), sample the corresponding depth from the depth map, then call backproject() to get the 3D coordinate.

7

RANSAC 平面拟合 RANSAC Plane Fitting

从 charging_port bbox 区域的深度点云中，用 RANSAC 鲁棒地拟合充电口平面，得到法向量（即 Z 轴方向）： From the depth point cloud within the charging_port bbox region, robustly fit the charging port plane using RANSAC to obtain the surface normal (Z-axis direction):

def fit_plane_ransac(points, n_iter=100, thresh_m=0.005):
    """对 Nx3 点云做 RANSAC 平面拟合，返回单位法向量"""
    best_normal, best_inliers = None, 0
    for _ in range(n_iter):
        idx = np.random.choice(len(points), 3, replace=False)
        p0, p1, p2 = points[idx]
        n = np.cross(p1 - p0, p2 - p0)
        norm = np.linalg.norm(n)
        if norm < 1e-9:
            continue
        n = n / norm
        dists = np.abs((points - p0) @ n)
        inliers = np.sum(dists < thresh_m)
        if inliers > best_inliers:
            best_inliers = inliers
            best_normal = n
    return best_normal

参数Parameter	默认值Default	说明Description
`n_iter`	100	随机采样迭代次数Number of random sampling iterations
`thresh_m`	0.005	内点距离阈值（5 mm）Inlier distance threshold (5 mm)

为什么用 RANSAC：Why RANSAC: 深度图存在噪点和孔洞，直接用最小二乘拟合容易被离群点破坏。RANSAC 通过随机采样三点并统计内点数的方式，对噪声有更好的鲁棒性。 Depth maps contain noise and holes. Least-squares fitting is easily distorted by outliers. RANSAC randomly samples three points, counts inliers, and is therefore far more robust to noise.

8

构建完整坐标系 Building the Full Coordinate Frame

def build_pose(p_dc_plus, p_dc_minus, p_pe, normal_ransac):
    # X轴：DC+ → DC-
    x_axis = normalize(p_dc_minus - p_dc_plus)
    # Y轴参考：PE → DC中点（向上）
    dc_mid = (p_dc_plus + p_dc_minus) / 2.0
    y_raw  = normalize(dc_mid - p_pe)
    # Z轴初算：X × Y_raw
    z_axis = normalize(np.cross(x_axis, y_raw))
    # 法向量校验：夹角 > 30° 则丢弃本帧
    if np.dot(z_axis, -normal_ransac) < np.cos(np.radians(30)):
        return None
    # 用点云法向量替换 Z，精度更高
    z_axis = -normal_ransac
    # 重新正交化 Y
    y_axis = normalize(np.cross(z_axis, x_axis))
    # 组装 4×4
    T = np.eye(4)
    T[:3,0]=x_axis; T[:3,1]=y_axis; T[:3,2]=z_axis; T[:3,3]=dc_mid
    return T

为什么用三点而不是两点：PE 提供 Y 轴约束，三点互相冗余校验比两点更鲁棒。若只用 DC+/DC-，则 Y 轴方向完全依赖法向量，无额外校验；加入 PE 后，两路信息可以交叉验证，减少单帧噪声影响。 Why three points instead of two: PE provides the Y-axis constraint, giving three-point redundancy that is more robust than two points alone. With only DC+/DC-, the Y-axis depends entirely on the surface normal with no cross-check. Adding PE creates two independent estimates that validate each other, reducing the effect of single-frame noise.

重新正交化 Y：Re-orthogonalizing Y: 用法向量替换 Z 轴后，原来的 Y 轴与新 Z 轴可能不完全正交。通过 Z × X 重新计算 Y，保证旋转矩阵是严格正交矩阵（行列式为 +1）。 After replacing Z with the RANSAC normal, the original Y may not be perfectly orthogonal to the new Z. Recomputing Y as Z × X ensures the rotation matrix is strictly orthogonal (determinant = +1).

9

DC+ 和 DC- 的区分 Distinguishing DC+ from DC-

YOLO 无法区分 DC+ 和 DC-（两者属于同一类别 DC_hole，外观完全相同）。代码中按图像 X 坐标排序：X 较小（左边）= DC+，X 较大（右边）= DC-。 YOLO cannot distinguish DC+ from DC- — both belong to class DC_hole and look identical. The code sorts by image X coordinate: smaller X (left) = DC+, larger X (right) = DC-.

# YOLO 无法区分 DC+ 和 DC-（同类别 DC_hole），按 X 坐标排序
top2_dc = sorted(dc_boxes, key=lambda x: x[1], reverse=True)[:2]  # 取置信度最高的2个
top2_dc = sorted(top2_dc, key=lambda x: (x[0][0]+x[0][2])//2)     # 按中心 X 坐标排序
dc_plus_box  = top2_dc[0][0]  # 左边 = DC+
dc_minus_box = top2_dc[1][0]  # 右边 = DC-

前提假设：Assumption: 相机正对充电口（偏转角较小）时，图像 X 轴与充电口物理 X 轴方向一致，左侧即为 DC+。若相机从右侧大角度拍摄（偏转 > 90°），则左右关系翻转，需额外逻辑处理。工程上建议限制相机相对充电口的进入角度范围。 This assumes the camera faces the charging port roughly head-on. When the camera is angled more than ~90° to one side, the left/right relationship flips and additional logic is needed. It is recommended to constrain the camera approach angle in practice.

10

主循环完整流程 Complete Main Loop

每帧执行以下步骤： Execute the following steps every frame:

YOLO 推理，按类别收集 charging_port / DC_hole / PE 检测框 Run YOLO inference, collect detected bounding boxes by class: charging_port, DC_hole, PE
检查三类是否都存在（charging_port ≥1，DC_hole ≥2，PE ≥1），否则显示缺失提示并跳过本帧 Verify all three classes are present (charging_port ≥1, DC_hole ≥2, PE ≥1); if not, display a missing-detection warning and skip this frame
取 charging_port bbox 区域深度点云，调用 fit_plane_ransac() 拟合平面法向量 Extract the depth point cloud within the charging_port bbox region, call fit_plane_ransac() to fit the surface normal
取两个 DC_hole bbox 中心 + 深度 → 调用 backproject() → 得到 3D 坐标 p_dc_plus、p_dc_minus Get center + depth of two DC_hole bboxes → call backproject() → obtain 3D coordinates p_dc_plus, p_dc_minus
取 PE bbox 中心 + 深度 → 调用 backproject() → 得到 3D 坐标 p_pe Get center + depth of PE bbox → call backproject() → obtain 3D coordinate p_pe
调用 build_pose(p_dc_plus, p_dc_minus, p_pe, normal_ransac) 构建 T_cam2port Call build_pose(p_dc_plus, p_dc_minus, p_pe, normal_ransac) to construct T_cam2port
调用 draw_axis() 在图像上画坐标轴，打印变换矩阵到终端 Call draw_axis() to overlay coordinate axes on the image, print the transformation matrix to the terminal

帧过滤：Frame filtering: 若 build_pose() 返回 None（法向量校验失败），跳过当前帧，继续处理下一帧。建议对多帧结果做滑动平均，进一步减少单帧噪声。 If build_pose() returns None (normal validation failed), skip this frame and continue. A sliding-window average over multiple frames further reduces per-frame noise.

11

可视化坐标轴 Visualizing the Coordinate Axes

在图像上画出充电口坐标系三轴，便于调试和验证位姿结果： Draw the three axes of the charging port coordinate frame onto the image for debugging and visual validation:

def draw_axis(img, T, fx, fy, cx, cy, length=0.03):
    """在图像上画充电口坐标系三轴（红X，绿Y，蓝Z）"""
    origin = T[:3, 3]
    axes = {'X': (T[:3,0], (0,0,255)),
            'Y': (T[:3,1], (0,255,0)),
            'Z': (T[:3,2], (255,0,0))}
    def proj(p):
        return (int(p[0]/p[2]*fx+cx), int(p[1]/p[2]*fy+cy))
    o2d = proj(origin)
    for name, (axis, color) in axes.items():
        end = origin + axis * length
        cv2.arrowedLine(img, o2d, proj(end), color, 2, tipLength=0.2)
        cv2.putText(img, name, proj(end), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1)

轴Axis	颜色（OpenCV BGR）Color (OpenCV BGR)	方向含义Direction
X	红色 (0,0,255)Red (0,0,255)	DC+ → DC- 方向DC+ → DC- direction
Y	绿色 (0,255,0)Green (0,255,0)	充电口向上方向Upward direction of the port
Z	蓝色 (255,0,0)Blue (255,0,0)	充电口法向量（指向相机）Port normal (pointing toward camera)

length=0.03 表示轴长 30 mm，适合充电口尺寸，可根据相机距离调整。 length=0.03 draws 30 mm arrows, appropriate for the charging port size. Adjust based on the camera-to-port distance.

12

输出结果说明 Understanding the Output

T_cam2port 的结构含义： Structure of T_cam2port:

T_cam2port = [R | t]
             [0 | 1]

t = T[:3,3]   → 充电口中心在相机坐标系的 XYZ 坐标（米）
R = T[:3,:3]  → 充电口旋转矩阵
  列0 = X轴方向（DC+ → DC- 方向）
  列1 = Y轴方向（向上）
  列2 = Z轴方向（充电口法向量，指向相机）

终端输出示例：
T_cam2port:
[[ 0.9998 -0.0123  0.0156  0.0023]
 [ 0.0119  0.9994  0.0312 -0.0187]
 [-0.0160 -0.0310  0.9994  0.4821]
 [ 0.      0.      0.      1.    ]]
距离: 482mm  X: 2.3mm  Y: -18.7mm  Z: 482.1mm

读数Reading	来源Source	含义Meaning
`Z: 482.1mm`	`T[:3,3][2] * 1000`	充电口距相机的深度Depth from camera to port
`X: 2.3mm`	`T[:3,3][0] * 1000`	充电口水平偏移（相机坐标系）Lateral offset in camera frame
`Y: -18.7mm`	`T[:3,3][1] * 1000`	充电口垂直偏移（相机坐标系）Vertical offset in camera frame

13

后续：手眼标定（TODO） Next Steps: Eye-Hand Calibration (TODO)

T_cam2port 是相机坐标系下的位姿。要让机械臂知道末端去哪，还需手眼标定得到 T_cam2gripper： T_cam2port is expressed in the camera frame. For the robot arm to know where to move its end effector, eye-hand calibration is needed to obtain T_cam2gripper:

T_base2port = T_base2gripper × T_gripper2cam × T_cam2port
其中：
  T_cam2gripper  — 手眼标定结果（相机相对末端的固定变换）
  T_base2gripper — 机械臂正运动学（实时读取）
  T_cam2port     — 本教程输出

变换Transform	来源Source	更新频率Update Rate
`T_cam2gripper`	手眼标定（一次性）Eye-hand calibration (one-time)	固定不变Static
`T_base2gripper`	机械臂正运动学Robot forward kinematics	实时Real-time
`T_cam2port`	本教程（视觉检测）This tutorial (vision)	每帧Per frame

下一步参考：Next step: 操作教程 Part 2 手眼标定。标定完成后，本教程输出的 T_cam2port 即可直接用于机械臂路径规划，实现自动充电插枪。 See Tutorial Part 2: Eye-Hand Calibration. Once calibrated, the T_cam2port output from this tutorial can be directly fed into the robot motion planner to achieve autonomous charging plug insertion.

YOLO11 + 深度相机 6DoF 位姿估计 YOLO11 + Depth Camera 6DoF Pose Estimation

概述 Overview

为什么需要 6DoF 而不只是距离 Why 6DoF Instead of Just Distance

整体方案 Pipeline Overview

数据流示意Data-flow diagram

前置条件 Prerequisites

获取相机内参 Getting Camera Intrinsics

关键点物理坐标 Keypoint Physical Coordinates

深度反投影 Depth Backprojection

RANSAC 平面拟合 RANSAC Plane Fitting

构建完整坐标系 Building the Full Coordinate Frame

DC+ 和 DC- 的区分 Distinguishing DC+ from DC-

主循环完整流程 Complete Main Loop

可视化坐标轴 Visualizing the Coordinate Axes

输出结果说明 Understanding the Output

后续：手眼标定（TODO） Next Steps: Eye-Hand Calibration (TODO)