YOLO11 目标检测 + Orbbec DaBai DCW2 深度相机,获取物体完整 6DoF 位姿——以 GB/T 20234.3 直流充电口为例 Combine YOLO11 detection with an Orbbec DaBai DCW2 depth camera to recover full 6DoF pose — demonstrated on a GB/T 20234.3 DC charging port
目标:从 YOLO11 检测结果 + 深度相机,获取物体完整 6DoF 位姿(位置 + 姿态)。 Goal: Recover the complete 6DoF pose (position + orientation) of an object from YOLO11 detections combined with a depth camera.
本教程以 GB/T 20234.3 直流充电口为例,使用三个检测类别:charging_port / DC_hole / PE。
This tutorial uses a GB/T 20234.3 DC charging port as the example object with three detection classes: charging_port / DC_hole / PE.
最终输出:T_cam2port(4×4 变换矩阵),包含相机坐标系下充电口的完整位置和朝向。
Final output: T_cam2port (4×4 transformation matrix) encoding the full position and orientation of the charging port in the camera coordinate frame.
YOLO + 深度读数只能给出「物体在多远」——Z 方向的标量距离。插接任务需要完整的 6 个自由度:X/Y/Z 三维位置 + Roll/Pitch/Yaw 三个朝向角(共 6 个自由度)。 YOLO plus a depth reading only tells you how far away the object is — a scalar Z distance. A plug-insertion task requires all 6 degrees of freedom: X/Y/Z translation plus Roll/Pitch/Yaw orientation (6 DOF total).
| 自由度DOF | 方向Direction | YOLO + 单深度可得?YOLO + single depth? |
|---|---|---|
| X | 左右平移Left / Right | YES |
| Y | 上下平移Up / Down | YES |
| Z | 前后距离Depth | YES |
| Roll | 绕 Z 轴旋转Rotation about Z | NO |
| Pitch | 绕 X 轴旋转Rotation about X | NO |
| Yaw | 绕 Y 轴旋转Rotation about Y | NO |
三步原理: Three-step principle:
charging_port bbox → 区域深度点云 → RANSAC 平面拟合 → 法向量(Z 轴方向)
YOLO detects charging_port bbox → extract region depth point cloud → RANSAC plane fitting → surface normal (Z-axis direction)
DC_hole(两个孔)+ PE 的 bbox 中心像素 + 深度 → 相机内参反投影 → 三个 3D 关键点
DC_hole (two holes) + PE bbox center pixels + depth → back-project via camera intrinsics → three 3D keypoints
pyorbbecsdk 并可正常驱动 Orbbec DaBai DCW2
pyorbbecsdk installed and Orbbec DaBai DCW2 working
charging_port、DC_hole、PE
Detection classes include charging_port, DC_hole, PE
ultralytics、opencv-python、numpy
Python environment with: ultralytics, opencv-python, numpy
rgb_intrinsic 内参进行反投影。
Orbbec DaBai DCW2 supports hardware depth-to-color alignment (HW D2C). Once enabled, depth and RGB pixels correspond one-to-one and share the same rgb_intrinsic parameters for backprojection.
相机内参(fx, fy, cx, cy)是将像素坐标反投影为 3D 点的必要参数。在 pipeline.start(config) 之后调用:
Camera intrinsics (fx, fy, cx, cy) are required to back-project pixel coordinates into 3D space. Call after pipeline.start(config):
| 参数Parameter | 含义Meaning |
|---|---|
fx | X 方向焦距(像素)Focal length in X (pixels) |
fy | Y 方向焦距(像素)Focal length in Y (pixels) |
cx | 主点 X 坐标Principal point X |
cy | 主点 Y 坐标Principal point Y |
预先量好的充电口物理尺寸,不是实时测量。以充电口圆心为原点,单位米。数据来源 GB/T 20234.3-2023: Pre-measured physical dimensions of the charging port — not measured at runtime. Origin at the charging port center, unit: meters. Source: GB/T 20234.3-2023:
OBJECT_POINTS 是物理模型常量,用于 PnP 求解(若使用 solvePnP 方案)或几何关系校验。本教程采用几何构建法,不依赖 solvePnP,但物理坐标仍是理解坐标系定义的重要参考。
OBJECT_POINTS is a physical model constant used for PnP solving (if using solvePnP) or geometric validation. This tutorial uses geometric construction rather than solvePnP, but these coordinates are important for understanding the coordinate frame definition.
将像素坐标 (u, v) 加上深度值,利用相机内参反投影为相机坐标系下的 3D 点: Convert a pixel coordinate (u, v) plus a depth value into a 3D point in camera space using the intrinsic parameters:
对每个关键点(DC+、DC-、PE),取 bbox 中心像素作为 (u, v),在深度图中采样对应深度值,再调用 backproject() 得到 3D 坐标。
For each keypoint (DC+, DC-, PE), use the bbox center pixel as (u, v), sample the corresponding depth from the depth map, then call backproject() to get the 3D coordinate.
从 charging_port bbox 区域的深度点云中,用 RANSAC 鲁棒地拟合充电口平面,得到法向量(即 Z 轴方向):
From the depth point cloud within the charging_port bbox region, robustly fit the charging port plane using RANSAC to obtain the surface normal (Z-axis direction):
| 参数Parameter | 默认值Default | 说明Description |
|---|---|---|
n_iter |
100 | 随机采样迭代次数Number of random sampling iterations |
thresh_m |
0.005 | 内点距离阈值(5 mm)Inlier distance threshold (5 mm) |
为什么用三点而不是两点:PE 提供 Y 轴约束,三点互相冗余校验比两点更鲁棒。若只用 DC+/DC-,则 Y 轴方向完全依赖法向量,无额外校验;加入 PE 后,两路信息可以交叉验证,减少单帧噪声影响。 Why three points instead of two: PE provides the Y-axis constraint, giving three-point redundancy that is more robust than two points alone. With only DC+/DC-, the Y-axis depends entirely on the surface normal with no cross-check. Adding PE creates two independent estimates that validate each other, reducing the effect of single-frame noise.
Z × X 重新计算 Y,保证旋转矩阵是严格正交矩阵(行列式为 +1)。
After replacing Z with the RANSAC normal, the original Y may not be perfectly orthogonal to the new Z. Recomputing Y as Z × X ensures the rotation matrix is strictly orthogonal (determinant = +1).
YOLO 无法区分 DC+ 和 DC-(两者属于同一类别 DC_hole,外观完全相同)。代码中按图像 X 坐标排序:X 较小(左边)= DC+,X 较大(右边)= DC-。
YOLO cannot distinguish DC+ from DC- — both belong to class DC_hole and look identical. The code sorts by image X coordinate: smaller X (left) = DC+, larger X (right) = DC-.
每帧执行以下步骤: Execute the following steps every frame:
charging_port / DC_hole / PE 检测框
Run YOLO inference, collect detected bounding boxes by class: charging_port, DC_hole, PE
charging_port ≥1,DC_hole ≥2,PE ≥1),否则显示缺失提示并跳过本帧
Verify all three classes are present (charging_port ≥1, DC_hole ≥2, PE ≥1); if not, display a missing-detection warning and skip this frame
charging_port bbox 区域深度点云,调用 fit_plane_ransac() 拟合平面法向量
Extract the depth point cloud within the charging_port bbox region, call fit_plane_ransac() to fit the surface normal
DC_hole bbox 中心 + 深度 → 调用 backproject() → 得到 3D 坐标 p_dc_plus、p_dc_minus
Get center + depth of two DC_hole bboxes → call backproject() → obtain 3D coordinates p_dc_plus, p_dc_minus
PE bbox 中心 + 深度 → 调用 backproject() → 得到 3D 坐标 p_pe
Get center + depth of PE bbox → call backproject() → obtain 3D coordinate p_pe
build_pose(p_dc_plus, p_dc_minus, p_pe, normal_ransac) 构建 T_cam2port
Call build_pose(p_dc_plus, p_dc_minus, p_pe, normal_ransac) to construct T_cam2port
draw_axis() 在图像上画坐标轴,打印变换矩阵到终端
Call draw_axis() to overlay coordinate axes on the image, print the transformation matrix to the terminal
build_pose() 返回 None(法向量校验失败),跳过当前帧,继续处理下一帧。建议对多帧结果做滑动平均,进一步减少单帧噪声。
If build_pose() returns None (normal validation failed), skip this frame and continue. A sliding-window average over multiple frames further reduces per-frame noise.
在图像上画出充电口坐标系三轴,便于调试和验证位姿结果: Draw the three axes of the charging port coordinate frame onto the image for debugging and visual validation:
| 轴Axis | 颜色(OpenCV BGR)Color (OpenCV BGR) | 方向含义Direction |
|---|---|---|
| X | 红色 (0,0,255)Red (0,0,255) | DC+ → DC- 方向DC+ → DC- direction |
| Y | 绿色 (0,255,0)Green (0,255,0) | 充电口向上方向Upward direction of the port |
| Z | 蓝色 (255,0,0)Blue (255,0,0) | 充电口法向量(指向相机)Port normal (pointing toward camera) |
length=0.03 表示轴长 30 mm,适合充电口尺寸,可根据相机距离调整。
length=0.03 draws 30 mm arrows, appropriate for the charging port size. Adjust based on the camera-to-port distance.
T_cam2port 的结构含义:
Structure of T_cam2port:
| 读数Reading | 来源Source | 含义Meaning |
|---|---|---|
Z: 482.1mm |
T[:3,3][2] * 1000 |
充电口距相机的深度Depth from camera to port |
X: 2.3mm |
T[:3,3][0] * 1000 |
充电口水平偏移(相机坐标系)Lateral offset in camera frame |
Y: -18.7mm |
T[:3,3][1] * 1000 |
充电口垂直偏移(相机坐标系)Vertical offset in camera frame |
T_cam2port 是相机坐标系下的位姿。要让机械臂知道末端去哪,还需手眼标定得到 T_cam2gripper:
T_cam2port is expressed in the camera frame. For the robot arm to know where to move its end effector, eye-hand calibration is needed to obtain T_cam2gripper:
| 变换Transform | 来源Source | 更新频率Update Rate |
|---|---|---|
T_cam2gripper |
手眼标定(一次性)Eye-hand calibration (one-time) | 固定不变Static |
T_base2gripper |
机械臂正运动学Robot forward kinematics | 实时Real-time |
T_cam2port |
本教程(视觉检测)This tutorial (vision) | 每帧Per frame |
T_cam2port 即可直接用于机械臂路径规划,实现自动充电插枪。
See Tutorial Part 2: Eye-Hand Calibration. Once calibrated, the T_cam2port output from this tutorial can be directly fed into the robot motion planner to achieve autonomous charging plug insertion.