Common Knowledge in Pose Estimation

This article is written in Chinese. The English version is translated by GPT-4o. Please refer to the original Chinese version for the original content.

Pose estimation and a general application.

Points and Vectors

Distinguishing between points and vectors:

Point: Position information, has magnitude but no direction.
Vector: Displacement information, has both magnitude and direction. Formed by connecting two points.

Points and vectors in space are real entities that do not change in nature due to changes in the coordinate system. However, we need a coordinate system to describe these entities.

For an $n$ -dimensional space, we use $n$ linearly independent vectors to describe the space. This set of vectors is called a basis. With a basis, we can describe any vector in the space. After defining the origin of the coordinate system, we translate the starting point of the vector to the origin, allowing us to represent any point in the space with an $n$ -dimensional vector.

Below is a set of basis vectors in three-dimensional space to represent any vector in the space.

\boldsymbol a= \begin{bmatrix} \boldsymbol e_1 & \boldsymbol e_2 & \boldsymbol e_3 \end{bmatrix} \begin{bmatrix} a_1\\ a_2\\ a_3 \end{bmatrix} =a_1\boldsymbol e_1+a_2\boldsymbol e_2+a_3\boldsymbol e_3

Here, $\boldsymbol e_1,\boldsymbol e_2,\boldsymbol e_3$ form a basis, and $[a_1,a_2,a_3]^\text T$ are the coordinates of vector $\boldsymbol a$ under this basis.

Euclidean Transformations Between Coordinate Systems

The transformation between two coordinate systems can be represented by a rotation matrix and a translation vector.

Rotation Transformation

Rotation Matrix

From the above, we know that the same vector does not change with the choice of coordinate system. Therefore, during coordinate transformation, the following equation holds.

[\boldsymbol e_1, \boldsymbol e_2, \boldsymbol e_3] \begin{bmatrix} \boldsymbol a_1\\ \boldsymbol a_2\\ \boldsymbol a_3 \end{bmatrix} = [\boldsymbol e'_1, \boldsymbol e'_2, \boldsymbol e'_3] \begin{bmatrix} \boldsymbol a'_1\\ \boldsymbol a'_2\\ \boldsymbol a'_3 \end{bmatrix}

Multiplying both sides by $[\boldsymbol e_1, \boldsymbol e_2, \boldsymbol e_3]^\text T$ on the left, we get

\begin{bmatrix} a_1\\ a_2\\ a_3 \end{bmatrix} = \begin{bmatrix} \boldsymbol e_1^\text T\boldsymbol e'_1 & \boldsymbol e_1^\text T\boldsymbol e'_2 & \boldsymbol e_1^\text T\boldsymbol e'_3\\ \boldsymbol e_2^\text T\boldsymbol e'_1 & \boldsymbol e_2^\text T\boldsymbol e'_2 & \boldsymbol e_2^\text T\boldsymbol e'_3\\ \boldsymbol e_3^\text T\boldsymbol e'_1 & \boldsymbol e_3^\text T\boldsymbol e'_2 & \boldsymbol e_3^\text T\boldsymbol e'_3 \end{bmatrix} \begin{bmatrix} a'_1\\ a'_2\\ a'_3 \end{bmatrix} =\boldsymbol R\boldsymbol a'

Here, $\boldsymbol R$ is the rotation matrix. It is composed of the inner products of the two sets of basis vectors and describes the relationship between the coordinates before and after rotation.

The set of $n$ -dimensional rotation matrices forms a special Lie group called the special orthogonal group $\mathrm {SO}(n)$ .

\mathrm{SO}(n)=\{\boldsymbol R\in\mathbb R^{n\times n}|\boldsymbol R^\text T\boldsymbol R=\boldsymbol I, \det(\boldsymbol R)=1\}

Rotation Vector and Rodrigues’ Formula

Using the rotation matrix for rotation is abstract, and we cannot directly obtain specific rotation information from it. It is also difficult to derive the rotation matrix from a known rotation.

A rotation can be described by a rotation axis and a rotation angle. The direction of the rotation vector is determined by the right-hand rule, represented by a unit vector $\boldsymbol k$ . The magnitude of the rotation vector is determined by the rotation angle, represented by $\theta$ . With the rotation vector, we can describe any fixed-axis rotation in space. This rotation is related to the rotation matrix mentioned earlier, and this relationship is given by Rodrigues’ rotation formula.

\boldsymbol P_\text{rot}=\cos\theta\boldsymbol P+(1-\cos\theta)(\boldsymbol k\cdot\boldsymbol P)\boldsymbol k+\sin\theta\boldsymbol k\times\boldsymbol P

For example, if we want to rotate a vector $[1\ 0\ 0]^\text T$ around the z-axis by $\theta$ degrees, we can use Rodrigues’ formula to describe this rotation. The rotation vector is $\boldsymbol k=[0\ 0\ 1]^\text T$ . Substituting into the formula, we get the rotated vector.

\begin{aligned} \boldsymbol P_\text{rot}&=\cos\theta[1\ 0\ 0]^\text T+(1-\cos\theta)([0\ 0\ 1]^\text T\cdot[1\ 0\ 0]^\text T)[0\ 0\ 1]^\text T+\sin\theta[0\ 0\ 1]^\text T\times[1\ 0\ 0]^\text T\\ &=[\cos\theta\ \sin\theta\ 0]^\text T \end{aligned}

This is a polar coordinate representation in the xy-plane, which aligns well with our expected result.

Rodrigues’ formula only provides a way to rotate a vector, but we need to transform it into a rotation matrix.

\boldsymbol P_\text{rot}=\boldsymbol R\boldsymbol P= (\cos\theta\boldsymbol I+(1-\cos\theta)\boldsymbol k\boldsymbol k^\text T+\sin\theta\boldsymbol K)\boldsymbol P

Here, $\boldsymbol K$ is the skew-symmetric matrix generated by $\boldsymbol k$ . The characteristic of a skew-symmetric matrix is $\boldsymbol K^\text T=-\boldsymbol K$ , which converts the cross product operation into a matrix multiplication.

\boldsymbol k\times\boldsymbol P=\boldsymbol K\boldsymbol P

Proof:

\boldsymbol k\times\boldsymbol P= \begin{bmatrix} k1\\ k2 \\ k3 \end{bmatrix} \times \begin{bmatrix} P1\\ P2 \\ P3 \end{bmatrix} = \begin{bmatrix} -P_3k_2+P_2k_3\\ P_3k_1-P_1k_3\\ -P_2k_1+P_1k_2 \end{bmatrix}

\begin{aligned} \boldsymbol K\boldsymbol P&= \begin{bmatrix} 0 & -k_3 & k_2\\ k_3 & 0 & -k_1\\ -k_2 & k_1 & 0 \end{bmatrix} \begin{bmatrix} P_1\\ P_2\\ P_3 \end{bmatrix}\\ &= \begin{bmatrix} -P_3k_2+P_2k_3\\ P_3k_1-P_1k_3\\ -P_2k_1+P_1k_2 \end{bmatrix} \end{aligned}

Euler Angles for Describing Rotation

Euler angles are a common way to describe rotation. Euler angles decompose a complex rotation into three simple rotations. In three-dimensional space, we typically use yaw-pitch-roll to describe rotation. That is, rotating $\psi$ degrees around the x-axis, $\theta$ degrees around the y-axis, and $\phi$ degrees around the z-axis.

Conventionally, rotation around the $x$ -axis is called roll, rotation around the $y$ -axis is called pitch, and rotation around the $z$ -axis is called yaw.

\boldsymbol R=\boldsymbol R_x(\psi)\boldsymbol R_y(\theta)\boldsymbol R_z(\phi)

It is important to note that the order of rotation is crucial when using Euler angles. Different rotation orders result in different rotation matrices and produce different rotation effects. The multiplication of rotation matrices follows the right-associative property of vector products, meaning the rightmost rotation is performed first. In the above equation, the rotation order should be read from right to left, i.e., yaw-pitch-roll.

Translation Transformation

Translation transformation moves the origin of the coordinate system to another position. Translation can be represented by a vector. Translation is a linear transformation that does not change the direction of the vector, only its position.

\boldsymbol P'=\boldsymbol R\boldsymbol P+\boldsymbol t

Here, $\boldsymbol t$ is the translation vector.

Transformation Matrix and Homogeneous Coordinates

By adding a 1 at the end of a three-dimensional vector, we obtain a four-dimensional vector called homogeneous coordinates. This allows rotation and translation to be uniformly represented as a matrix multiplication.

\begin{bmatrix} \boldsymbol a'\\ 1 \end{bmatrix} = \begin{bmatrix} \boldsymbol R & \boldsymbol t\\ \boldsymbol 0^\text T & 1 \end{bmatrix} \begin{bmatrix} \boldsymbol a\\ 1 \end{bmatrix} =\boldsymbol T \begin{bmatrix} \boldsymbol a\\ 1 \end{bmatrix}

Here, $\boldsymbol R$ is the rotation matrix, and $\boldsymbol t$ is the translation vector.

The set of transformation matrices between $n$ -dimensional spaces forms a special Lie group, denoted as $\mathrm{SE}(n)$ . For convenience in later descriptions, when multiplying coordinates by an $\mathrm{SE}(n)$ transformation, we will not first convert the coordinates to homogeneous coordinates.

Multiple Coordinate System Transformations

In practical applications, we may need multiple coordinate system transformations. In such cases, we can combine multiple transformations into a single transformation matrix. This allows us to describe the transformation from one coordinate system to another with a single transformation matrix.

\boldsymbol P_b=\boldsymbol T_{ba}\boldsymbol P_a

Here, $\boldsymbol T_{ab}=\boldsymbol T_{ac}\boldsymbol T_{cb}$ .

Specific Example

In a common automatic aiming turret structure, we typically define four coordinate systems.

Armor Plate Coordinate System: Established on the target armor plate, with the origin at the hitting center. The xy-plane spans the armor plate plane, and the z-axis is perpendicular to the armor plate plane, pointing outward.
Camera Coordinate System: Established on the camera, with the origin at the optical center. The z-axis points upward, the x-axis points to the right, and the y-axis points forward.
Turret Coordinate System: Established on the turret, with the origin at the intersection of the yaw and pitch axes. The z-axis is perpendicular to the yaw plane, pointing upward, and the y-axis coincides with the gun barrel direction, pointing to the right perpendicular to the barrel.
World Coordinate System: Established in the world, with the origin coinciding with the turret coordinate system origin. The z-axis points vertically upward. The xy-plane forms the horizontal plane, with specific directions determined by the gyroscope’s zero position.

This article stipulates that when using subscripts to represent transformation matrices, the right subscript represents the original coordinate system, and the left subscript represents the transformed coordinate system.
For example, $\boldsymbol T_{wa}$ represents the transformation from the armor coordinate system to the world coordinate system. That is, $\boldsymbol T_{wa}\boldsymbol P_a=\boldsymbol P_w$ . This notation aligns well with the right-multiplication property of transformations.

Armor Plate Coordinate System to Camera Coordinate System

The relationship between the armor plate coordinate system and the camera coordinate system depends only on the pose of the armor plate. The pose of the armor plate is typically calculated using PnP.

Through PnP, we can obtain the transformation from the armor plate coordinate system to the camera coordinate system, including the rotation vector (rvec) and translation vector (tvec). These can be combined to form a transformation matrix. However, OpenCV’s default camera coordinate system has the z-axis pointing forward, the x-axis pointing to the right, and the y-axis pointing downward. Therefore, we need to transform the rvec and tvec by left-multiplying a corresponding coordinate axis transformation matrix.

\boldsymbol T_{wa}= \text{SE3}( \begin{bmatrix} 1 & 0 & 0\\ 0 & 0 & 1\\ 0 & -1 & 0 \end{bmatrix},\boldsymbol 0 ) \text{SE3}(\boldsymbol{R},\boldsymbol{t})

Here, $\boldsymbol{R}$ and $\boldsymbol{t}$ are the rvec and tvec obtained from PnP.

Camera Coordinate System to Turret Coordinate System

The relationship between the camera coordinate system and the turret coordinate system depends only on the camera’s mounting. This is typically determined from mechanical assembly drawings, with adjustments made to correct for minor angle errors during assembly. In practice, many long-distance calculation errors stem from small angle errors during camera assembly.

Similarly, we have the following equation.

\boldsymbol P_g=\boldsymbol T_{gc}\boldsymbol P_c

$\boldsymbol T_{gc}$ can also be decomposed into a rotation matrix and a translation vector. Let’s consider how to determine the rotation matrix and translation vector.

We start with a simpler problem. Assume that in the mechanical design, the camera and the gun barrel have a pitch angle rotation $\theta$ and a translation $t$ along the z-axis. $\theta$ and $t$ are defined in the turret coordinate system, and in the figure below, they should be negative values.

We now need to determine the transformation matrix for this process. An intuitive method is to assume a vector $[1\ 1\ 0]^\text T$ in the camera coordinate system. According to our earlier definition, this vector points directly forward from the camera. To represent the same vector in the turret coordinate system, we need to rotate this vector downward by the pitch angle. The rotation matrix is given by Rodrigues’ formula, $\boldsymbol R=\boldsymbol R_y(\theta)$ .

To represent the point $[1\ 0\ 0]^\text T$ in the camera coordinate system, we also need to translate it. The translation vector is the vector from the turret coordinate system origin to the camera coordinate system origin. Since we rotate first and then translate, this vector needs to be expressed in the turret coordinate system. The translation vector is $[0\ 0\ t]^\text T$ .

Combining these two steps, the representation of $[1\ 0\ 0]^\text T$ in the camera coordinate system in the turret coordinate system is $R_y(\theta)[1\ 0\ 0]^\text T+[0\ 0\ t]^\text T$ .

Considering the full form of the problem, assume that the camera and turret have a complete Euler angle transformation and a full translation. The Euler angle relationship between the camera and turret is typically described from the turret’s perspective, following the yaw-pitch-roll rotation order to describe the rotation from the turret to the camera. However, when we need to go from the camera back to the turret, we need to reverse the order, i.e., roll-pitch-yaw. The corresponding rotation vector is $\boldsymbol R=\boldsymbol R_z(\phi)\boldsymbol R_y(\theta)\boldsymbol R_x(\psi)$ . The translation vector is the vector from the turret coordinate system origin to the camera coordinate system origin, expressed in the turret coordinate system.

Through these two examples, we can further abstract this process. When solving for the transformation from coordinate system A to B, we typically know the pose of A in the B coordinate system. From the above analysis, we see that the rotation matrix and translation matrix have a consistent form. Although we need to transform a coordinate from A to B, the transformation matrix is derived using the transformation values from B to A.

Combining these two steps, the transformation of $\boldsymbol P_c$ in the camera coordinate system to the turret coordinate system is $\boldsymbol P_g=\boldsymbol P_c\boldsymbol R_z(\phi)\boldsymbol R_y(\theta)\boldsymbol R_x(\psi)+\boldsymbol t$ . The corresponding transformation is $\text{SE3}(\boldsymbol R_z(\phi)\boldsymbol R_y(\theta)\boldsymbol R_x(\psi),\boldsymbol t)$ .

Turret Coordinate System to World Coordinate System

The transformation from the turret coordinate system to the world coordinate system is obtained through the gyroscope. The gyroscope’s zero position is the origin of the world coordinate system. The gyroscope’s output is the transformation from the turret coordinate system to the world coordinate system.

Generally, we align the turret and world coordinate system origins, so no additional translation is needed. Therefore, we only need to convert the gyroscope data into a rotation matrix. The process is similar to the derivation from the camera coordinate system to the turret coordinate system, and readers are encouraged to think through it.

Ballistic Equation

The motion equation of the projectile is:

\frac{\mathrm d\boldsymbol{v_w}}{\mathrm dt} = \boldsymbol{a} = -\frac{B}{m}\boldsymbol{v_w} + \boldsymbol{g}

Here, $\boldsymbol{v}$ is the projectile velocity, $\boldsymbol{a}$ is the acceleration, $B$ is the projectile air resistance coefficient, $m$ is the projectile mass, and $\boldsymbol{g}$ is the gravitational acceleration.

The initial conditions of the projectile in the turret coordinate system are:

\boldsymbol{v_g}(0)= \begin{bmatrix} 0\\ v_0\\ 0 \end{bmatrix}

Here, $v_0$ is the initial velocity of the projectile.

Note that in the motion equation, we use the velocity in the world coordinate system. Therefore, we need to transform the velocity from the turret coordinate system to the world coordinate system.

\boldsymbol v_w = \boldsymbol T_{wg}\boldsymbol v_g

The flight path equation is:

\boldsymbol P_w(t) = \int_0^t \boldsymbol v_w(\tau)\mathrm d\tau

The target hitting point constraint is:

\boldsymbol P(t_\text{hit}) = \boldsymbol P_{\text{target}}

Combining the above equations, we can determine the unique pitch angle $\theta$ and yaw angle $\phi$ .

Case Without Roll Angle

In the absence of a roll angle, we can directly control the yaw to align the projectile’s flight plane with the target point. Thus, the yaw value can be directly determined. We only need to solve for the pitch angle within a plane.

First, project the target point onto the xy-plane and calculate the azimuth angle of the target point, which is the yaw angle.

\phi = \arctan\left(\frac{P_{\text{target},y}}{P_{\text{target},x}}\right)

Since air resistance is considered, we need to solve the above ballistic differential equation. We can use the Newton-Raphson method to obtain an approximate numerical solution.

\begin{aligned} \theta &= \theta - \frac{\Delta z}{-\frac{v\cdot t}{\cos^2\theta}+\frac{g\cdot t^2}{v^2}\cdot\frac{\sin\theta}{\cos^3\theta}}\\ t&=\frac{e^{B\cdot l}-1}{B\cdot v\cdot\cos\theta} \end{aligned}

Here, $\Delta z$ is the height difference between the target point and the projectile’s flight path, $v$ is the projectile velocity, $t$ is the flight time, $g$ is the gravitational acceleration, $l$ is the flight distance, and $B$ is the projectile air resistance coefficient.

MicDZ's Blog