kl800.com省心范文网

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

1

Linear recovery of articulated pose change: comparing pre- and post-imposed constraints

T E de Campos, B J Tordoff and D W Murray Department of Engineering Science, University of Oxford Parks Road, Oxford OX1 3PJ, UK

Abstract This paper contrasts two methods of imposing constraints during the tracking of articulated objects. The ?rst method develops constraints using a conventional kinematical approach, but in a novel extension of Harris’ RAPiD rigid body tracker. This generates a linear system for pose update in terms of the minimal set of variables describing the six degrees of freedom of a base subpart and the joint parameters between the remaining subparts. Experimental demonstrations of this system are given in a video-rate implementation using imagery from multiple cameras. The second method is that of Drummond and Cipolla, which tracks the subparts of an articulated object individually, and hence uses the maximal set of variables, but then imposes the motion constraints using Lagrange multipliers. The paper shows that these methods, despite their very different formulations, are functionally equivalent in terms of the pose results recovered. Further comparisons between the methods are drawn in terms of computational speed and algorithmic simplicity and robustness, and it is the last area which is the most telling. The comparative results suggest that using built-in constraints is well-suited to tracking individual articulated objects, whereas applying constraints afterwards is most suited to problems involving contact and breakage between articulated (or rigid) objects, where the ability quickly to test tracking performance with constraints turned on or off is desirable. Index Terms real-time visual tracking, articulated objects, motion constraints

Oxford University Engineering Library Technical Report OUEL 2279/05 June 2005

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

2

C ONTENTS I II Introduction Scene and projected image motion II-A Scene and image motions in RAPiD . . . . . . . . . . . . . . . . . . . . . II-B Scene and image motions in DCT . . . . . . . . . . . . . . . . . . . . . . II-C Measurements of edge-normal motion . . . . . . . . . . . . . . . . . . . . An articulated RAPiD tracker: III-A Method . . . . . . . . . III-B Implementation . . . . III-C Examples . . . . . . . . 3 6 6 7 9

III

ART 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 16 18 20 20 21 21 24 24 24 27 29 30 34

IV

Enforcing constraints after measurement IV-A The cost of (mis-)?tting single object data . IV-B Developing and imposing constraints . . . . IV-C Handling general articulated graphs in DCT IV-D Speci?c Cases of Graphs . . . . . . . . . . IV-D.1 Non-branching kinematic chains IV-D.2 Kinematic trees . . . . . . . . . IV-D.3 Closed loop kinematic chains . . Experimental comparison of ART and DCT V-A Similarity of results . . . . . . . . . . V-B Computational Cost . . . . . . . . . . V-C Algorithmic robustness . . . . . . . . V-D Data robustness . . . . . . . . . . . . Discussion and Conclusions . . . . . . . . . . . .

V

VI

References

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

3

I. I NTRODUCTION The ability to track multiple and articulated modelled objects is an important one, not least in the areas of autonomous and teleoperated robotics, visual surveillance and human motion analysis. It is an area which is still proving challenging more than twenty years after Hogg [1] demonstrated visual tracking of a walking person, modelled using 3D cylinders a la Marr ` and Nishihara [2]. The taxing issues remain those of how to represent the objects and their composited pose, how to associate observable image data with the correct part of the object, by what computational means economically to adjust the high dimensional state vector to improve the ?t to current observations, and, lastly, how to overcome fundamental ambiguities in the observations. A variety of ways of addressing these issues have been proposed, a variety perhaps most apparent in recent work on understanding motion of the whole human body. To represent the object most use standard or generalized cylinders [1], [3], [4], but simpler models such as planar surfaces [5] and more sophisticated quadrics-based [6], [7], [8] and deformable [9] models have been explored. The most widely used feature in matching between image and model is the edge, eg [6], [10], [11], but internal corner features have been used, for example in [12]. Some use has been made of spatio-temporal grey level changes at boundaries alone [13], but increasing use is made of the image motion at internal pixels [14], [3], [4]. Motion at internal pixels is combined with edge motion in [15]. The key distinction in ?tting pose to the image data is between works that adopt statistical techniques based on simple unimodal pdf’s and solve deterministically, eg [3], [11], and those that represent arbitrary multimodal pdf’s using mixture models or particle ?lters [16], [10], [4]. Overcoming ambiguities, particularly troublesome from single views of course, is explored in [17] and [18]. The work reported in this paper is motivated by a desire to understand better the interaction of the human hand with objects at the transition between articulated, independent motion and constrained, possibly rigid, combined motion. An issue of especial concern is how to represent articulated pose to detect a transition from articulated to rigid motion and vice versa. Perhaps

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

4

3D translation + rotation (6DOFs) 5 constraints

3D translation + rotation (6DOFs)

Fig. 1

3D translation + rotation (6DOFs)

1 relative angle

T WO REPRESENTATIONS TO TRACK ARTICULATED OBJECTS ILLUSTRATED WITH AN OBJECT WITH TWO LINKS AND

REVOLUTE JOINT: POST- AND PRE - IMPOSITION OF CONSTRAINTS .

A

each link or subpart should be tracked independently, thereby introducing redundant degrees of freedom, and constraints imposed later in a lower dimensional subspace. Alternatively the available kinematic constraints might be imposed up-front within the tracking process. In support of the latter approach, Rehg et al [18] note that the kinematics de?ne the state of the scene and de?ne the mapping from scene to image. Figure 1 illustrates the two approaches. Exemplars of both classes of method have been reported in a number of application areas. In the former class, and applied to tracking a number of mechanisms, is the work of Drummond and Cipolla [19], [11]. They track subparts independently and then apply constraints using Lagrange multipliers. In the area of hand tracking is the work of Wu et al [20] who impose constraints on independent subparts via a Markov network. Hel-Or and Werman [21] fuse constraints and measurements using an extended Kalman ?lter (EKF) by treating constraints as measurements with zero uncertainty. The kinematic chain approach is applied to tracking a robotic arm by Nickels and Hutchinson [22], who use an EKF to recover a state vector of arm joint angles and velocities from point measurements; however they simplify matters by assuming a ?xed and known base pose. In hand tracking, Rehg and Kanade [23] put joint angles and pose into a nonlinear minimization. In full body tracking, Bregler and Malik [3] develop a linear relationship between instantaneous motion and pose change; and Sidenbladh et al [4] use the kinematics in a generative model of image appearance. In each case, however, the surrounding observation and computation methodologies are suf?ciently different to make immediate comparison dif?cult, and no detailed comparison is available in the literature.

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

5

Two works in the different classes that appear most similar in other respects are those of Bregler and Malik [3] and Drummond and Cipolla [11]. Both develop linear expressions for the pose updates of articulated objects using exponential representations of motion and Lie algebra. The differences — in addition of course to the way that articulation constraints are imposed — are the use of different image measures, viz. warped image patches and edges, respectively, and the use of different image projections, viz. scaled orthography and full perspective. This paper attempts to make a fair comparison of the two constraint approaches. First, edge data and perspective projection are used for both. Second, both methods are re-implemented using Harris’ earlier RAPiD tracker as a common base. In RAPiD [24], [25], Harris proposed an object pose update which is linear in the elements of the kinematic screw. Here it is shown that RAPiD is entirely equivalent to Drummond and Cipolla’s rigid body tracker based on Lie algebra [26], which forms the basis of their articulated tracker [11]. The screw is synonymous with the exponential twist, and so for articulated tracking with kinematic constraints we do not need slavishly to re-implement Bregler and Malik, but instead provide an extension of RAPiD to articulated objects. Our implementations therefore share much of their code. Obtaining a linear pose update was not the only contribution of Harris’s tracker. A number of early model-based trackers (eg [27], [28], [29]) recovered explicit image features, such as straight lines, throughout each image, and then adjusted the pose of the model so that the distance between projected and observed features was minimized. Harris [24], [25] made videorate tracking possible on meagre general purpose hardware by being far more parsimonious with the use of the image data, searching perpendicularly from just a few control points on projected lines to locate nearby image edges and then adjusting the pose to minimize the summed squareddistances, a method to become the norm in active contours. Although processor speeds have since increase a hundredfold, economy still remains an issue as the the number of control points to handle complex and multiple objects has increased similarly. Section II reviews the ways in which scene and image motion are described in RAPiD and Drummond and Cipolla’s tracker (here referred to as DCT), and shows that for rigid objects they

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

6

are entirely equivalent. Section III develops the articulated version of the RAPiD tracker (ART) which imposes kinematic constraints between subparts, and gives some illustrative tracking results. Section IV reviews how constraints are imposed in DCT and, for comparison with ART, gives detail of the solution for both kinematic chains and branching trees. Section V gives a comparison of the two approaches in terms of accuracy, ef?ciency, and robustness. Finally, Section VI of the paper draws conclusions on the applicability of the pre-constrained and postconstrained methods in the light of the earlier ?ndings. The main contributions described here have been reported in [30]. II. SCENE

AND PROJECTED IMAGE MOTION

RAPiD was originally formulated using inhomogeneous coordinates, whereas DCT used homogeneous coordinates. To provide continuity with previous work, this paper must move between both, but will do so mostly without comment. In both RAPiD and DCT, a single rigid object (or rigid subpart of an articulated object) is described in a object frame 0 by the coordinates of each of a set of control points — points

which may be genuine points on the object, but which more usually are parametrized locations on ?xed crease or albedo edges, or are generated on the ?y as extremal edges of a curved object, as sketched in Fig. 2. To allow multiple and possibly moving cameras to be treated equally, the object’s pose will be one or other representation of the rotation and translation ?? ??? that ? take points in frame 0 into points in a ?xed world frame C is de?ned similarly by ?? ?? calibrations are known. A. Scene and image motions in RAPiD In RAPiD, the aim is to recover the kinematic screw × containing the instantaneous angular and translational velocities and ? . The velocities are speci?ed in the “aligned” frame , one

? . The position of each camera

. It is assumed that both external and internal camera

whose origin coincides with that of the object but which is aligned with the world frame, such

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

7

y

C

k

x z Optic Axis W 01

Fig. 2

0

2

A2

A1

E ACH OBJECT IS MODELLED WITHIN ITS OWN

COORDINATE FRAME 0 AS A SET OF CONTROL POINTS LYING ON EDGES ,

WHICH MAY BE CREASE , ALBEDO OR EXTREMAL EDGES .

, ? AND

?

ARE THE

- TH CAMERA , THE WORLD AND ?- TH

OBJECT ALIGNED FRAMES , RESPECTIVELY.

that

?? ?

?

. The object’s instantaneous velocity in the world frame is then

?

? ?

?

?

? ????

?

?

(1)

?×

The antisymmetric matrix

?? is such that ??

is the vector product

?

. The velocity in

the world frame is transformed into a camera frame as

?? ? ×

(2)

The projection into a normalized image (i.e., one corrected by the intrinsic calibration to have focal length and aspect ratio unity, and origin at the optic centre) is ? projected motion is , and so the

?

?? ??

?? ? ? ? ? ? ? ? ????? ?? ?×

?

(3)

where ?? is the

? ? ? identity matrix and ? ???? is an outer product.

?

of coef?cients of the generators of SE(3)

B. Scene and image motions in DCT In DCT the aim is to recover the 6-vector

describing the change of homogeneous transformation between object and camera. To draw

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

8

proper comparison with RAPiD, we specify this change in the aligned frame, and so it is the

?

transformation from object to world frames that is updated after movement, from ?? ??

to ?? ??? , where ?

?? ?? ? ?. To conform with the conventional screw order, we switch

the ?rst and last three generators of the Lie group from [11]; in turn they are associated with angular velocities about the ?-, ? - and ? - axes, and with translational velocities in the ?-, ? - and

? -directions

?

?

?? ? ? ? ? ?? ? ? ? ? ? ?? ? ?

?

?

?

?

? ??? ? ??? ?? ? ? ? ? ???

?

?

?

?

? ?? ? ? ? ? ?? ? ? ? ? ? ? ??

?

?

? ? ? ?

coordinates can be written

?

? ? ? ?

? ? ? ?

? ? ? ?

?

? ? ? ?

? ? ? ?

? ? ? ?

? ? ? ?

?

As the change in pose is small, ? is approximated by ?

?·

?

? ? ? ?

? ? ? ?

? ? ? ?

? ? ? ?

?

(4)

? , and the velocity in world

? ?? ? ? ??? ? ? ?? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ? ? ? ?? ?

? ? ? ? ? ? ? ? ?

?

? ???

?

?

?

?? ? ?

?

?

?

? ?

?

(5)

Comparison of Eq (5) with Eq (1) shows that recovering ? is identical with recovering the screw

×. The equivalence is maintained whichever frame is used to specify motion.

DCT similarly recovers

?

from image motion, and for completeness it is shown that the

measurable image motion derived from the homogeneous expressions in [11] is identical with Eq. (3). In normalized image coordinates [11] has

?

? ? ?

?

?

?? ?

?

? ? ?

?

??

?

(6)

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

9

from which the inhomogeneous image motion is derived as

?

?

?

?

? ? ? ?? ?? ???

?

?

? ?

?

?? ? ? ????? ?? ? ? ????? ??

? ? ?

?

?

(7)

Inserting

?

from Eq. (2) into Eq. (7) again yields Eq. (3).

Thus, for rigid objects, the two methods are functionally equivalent. (Note however that in [11]

? is recovered for each object in the

object’s frame. The screw recovered,

within a linear transformation of that in the aligned frame, C. Measurements of edge-normal motion

?

?

? ?

?

? , is then

?

, as clari?ed later.)

Full motion vectors ? could be used to recover the screw ×, or ?, were three or more point to point matches available. But in both methods it is more usual to match control points to image lines, and the resulting aperture problem requires measurement of the projection of ? onto a direction normal to (or near-normal to) the edge on which the control point lies (Fig. 3). The is

measurement equation for each control point

??

?

?? ? ?

????? ?? ? × ?? ×

×

?

(8)

and both methods require stacking six or more measurement rows into (9)

before solving using some variant of least squares. III. AN

ARTICULATED

RAP I D

TRACKER:

ART

Having shown that the kinematic screw approach of Harris gives identical relationships between scene and image as the Lie algebra approaches of Drummond and Cipolla and Bregler and Malik, we extend the RAPiD approach to include kinematic constraints. We shall dub this tracker ART.

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

10

Predicted Control Point

x d

Predicted Edge

^

Located edge

^ n

x

Image Edge

Fig. 3 I N BOTH RAP I D AND DCT THE SEARCH FROM THE PREDICTED CONTROL POINT ? IS ALONG A CONVENIENT DIRECTION

CLOSE TO THE EDGE NORMAL .

F OR FAST IMAGE SEARCH ,

MAY BE TAKEN AS ONE OF THE EIGHT CARDINAL DIRECTIONS.

P z1 x1 z0 x0

Fig. 4 A SIMPLE ARTICULATED OBJECT TO ILLUSTRATE THE BASICS OF ART.

θ1

y1 l0 y 0

A. Method Consider two subparts labelled 0 and 1, connected by a pure revolute joint with joint angle

?

located at

?

in subpart 0’s frame, as shown in Fig 4. A point P at

?

referred to the local

frame attached to subpart 1 of the articulated mechanism is at

?

?? ?

?

??

?

?

?

?

?

?

?

(10)

in part 0’s frame. As point P is stationary in frame 1, differentiation with respect to time gives

? ?

?? ?

?

?

?

?

?

?

?? ?

?

(11)

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

11

where ?? is the element-by-element derivative of ? with respect to prismatic joints

?

. Similarly for purely

?? ?

?

??

?

· ?

?

?

??

?? ?

?

?? ??

?

(12)

where ?? is a unit vector and parameter

is now a length, not an angle.

This is straightforwardly extended to a point on subpart

?

? of a mechanism

? ? ??

?

?? ? ?

?

?

?

?? ? ? ?? ? ? ? ? ? · ? · ? ·

? ? ? ? ? ? ? ? ?

(13)

?

(14)

where ? ? ?

?? ?? ?? ?? ; ? ? ? ? ? ?

?

?? ?? ?? ?? ; and so on. For a mechanism with ? ? ?

this expression can be written as a linear sum over all

? ? ?

? joint velocities

?

?? · ?? parts, ? ??

?

? ?

??? ?? ?

??

(15)

where

?

?

? . As

?

is a direction vector, its fourth component is always zero. Below we as a

use it as a 3-vector, and write

?? ? ? ? matrix

?

??

?? ??

?

?? .

If the pose of base part (0) is given by ?? ??? ?

referred to the world frame, then the

instantaneous velocity in the world frame is (returning to non-homogeneous coordinates)

?

?? ? ?? ?

?

?

· ? ?? · ? ·? ?? · ?

? ?

(16)

So, as in Eq. (1), the velocity in the world frame is

?

?×

(17)

but now ? has

? extra columns at the right

?

?

? ??

?

???? ?? ??? ?

?

?

(18)

and × is augmented with the joint velocities

×

?

(19)

The construction of the pose update equation from the measurements then follows exactly as given in Eqs. (8) and (9) above.

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

12

Articulated RAPiD tracker for kinematic trees 1: At the base part set cumulative transformation ?? ? . ? 2: for each subpart in a depth-?rst expansion do 3: Set to be ’s parent 4: Compute and store cumulative ?? ?? ?? ? 5: Compute and store ? ? ? ?? ? 6: for each joint ? ? ? back to ?? ? do ? ? 7: Compute and store ? ? ? ? 8: end for 9: for each camera do 10: for each visible control points on subpart do 11: Construct , ? and thence 12: Search for image edge, compute . 13: Append row to matrix , and to vector 14: end for 15: end for 16: end for (Eq. 9). 17: Derive × from × 18: Update pose and joint angles

?

Fig. 5 O NE

ITERATION OF THE POSE UPDATE ALGORITHM IN

ART.

B. Implementation The articulated RAPiD tracker has been implemented as a video-rate (30Hz) process for multiple articulated objects viewed by one or more cameras. The algorithm is summarized in Fig. 5, and certain of its steps ?eshed out below. The articulated objects are restricted to acyclic mechanisms, and so the coupling of subparts with joints can be represented as a tree. Closed loop kinematic chains require further care and a solution is discussed in Section IV-D.3. To gather the information to complete each pose update, the tree is explored depth ?rst. Denoting the root and current subparts as nodes

? and

, respectively, and the parent of the current node as ?, the cumulative transformation (Eq. 13) at the current node is found as ??

?? ?? and stored at the node. To populate Eq. (14), the ?

? earlier than the current node’s

dependency on the joint angle (or length) between parent and current nodes is determined as

?

?? ?? , and the dependencies on joint angles (or lengths) ?

?

parent found as

?

? ? ? ? . The

?

matrices are again stored at the node.

The visibility of control points in each camera is determined by ray-tracing, and care is taken

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

13

Frame 1

T RACKING A BOOK

USING AN

Frame 264

Fig. 6 8

DEGREES OF FREEDOM MODEL IN

Frame 686

ART. T HE

IMAGES ( TOP ) ARE ALL FROM THE SAME

CAMERA OF THE THREE .

T HE

LIGHT CROSSES ARE THE PREDICTED CONTROL POINT POSITIONS, AND THE DARKER ARE

THE CORRESPONDING MEASURED IMAGE POSITIONS.

T HE GRAPHICAL OUTPUT SHOWS THE FITTED POSE

IN THE WORLD

COORDINATE FRAME , GENERATED FROM A VIEWPOINT OPPOSITE TO THAT OF THE CAMERA OF THE TOP ROW.

to exclude points which are close to being occluded. A few tens to few hundreds of control points will contribute rows to the measurement system. When using multiple cameras, the measurement equations are gathered into a single system to solve for the pose, as in [3], [31]. C. Examples To demonstrate ART, output from three video rate (30 Hz) experiments on increasingly complex objects is shown in Figures 6-8. The imagery was captured from three calibrated cameras viewing a

? ??

m? working area on a desk from near orthogonal directions.

Figure 6 shows three frames cut from a sequence where a book modelled with three planes (front, back and spine), two hinges, and eight degrees of freedom (dof) is tracked. The white crosses show the projection of the model’s control points in the image, and the darker crosses show the located edges. Here, two iterations of the linear update were used per frame. Tracking continues in the presence of potential distraction caused by the presence of occlusions, texture in the object, and the proximity of model and image edges.

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

14

Frame 3

T RACKING FOUR

OBJECTS AS A

Frame 2518

Fig. 7

Frame 4106

14- DOF ARTICULATED , OR “ MOTION - CONSTRAINED ”, OBJECT. A LSO SHOWN ARE

GRAPHICAL VIEWS GENERATED FROM THE CORRESPONDING CONTEMPORARY OBJECT POSES .

Frame 4

T RACKING A

HAND GRASPING A BOX.

Frame 301

Fig. 8

Frame 463

T HE GRAPHICAL IMAGES SHOW THE VIEW FROM ABOVE.

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

15

Although articulation has been described above in terms of single objects with subparts, the method can be more generally used to apply motion constraints between objects. For example, to track a ball while it rolls on a moving table, one can model the table as the base part, and the ball as connected to it using two 1-dof prismatic joints. Fig. 7 shows a more complicated example. Four entities, a plane, a mug, a ball, and two articulated cylinders, are tracked as one articulated 14 dof object, comprising six dof for the planar object, two each for ball and mug with respect to the plane, and four for jointed cylinders. Fig. 8 shows tracking of a box and hand, the latter with a single articulation for the ?ngers. The hand and the box are modelled as a single seven dof kinematic tree (three for the hand, plus one for the ?ngers and three for the box with respect to the hand), constrained to the table plane. When the hand grasps the box, the degrees of freedom between hand and box are switched off and the whole set is tracked as a rigid object, reducing the dimensionality of the problem. In all the above, mismatching is reduced using robust methods. Least median of squares is used to remove non-collinear outliers amongst measured control points which should belong to the same line [31], and guided MLESAC [32] is used to select control points in the robust generation of the solution to Eq. (9). Robust methods are equally applicable to both methods of applying constraints. Later we compare the cost implications. IV. E NFORCING

CONSTRAINTS AFTER MEASUREMENT

We now review Drummond and Cipolla’s method of applying motion constraints to articulated objects after making measurements on independent subparts. Their approach is to adjust the optimum screws (now denoted by

?) obtained for each individual subpart so that the constraints

are satis?ed and the overall ?tting cost is minimized. The solution is reached using Lagrange multipliers. A. The cost of (mis-)?tting single object data For each individual subpart, the measurement system

?

in [11] is similar to that

developed in Eq. (9) but the pose adjustment takes place in the object frame, not the aligned

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

16

frame. The measurements generated now using

?

are identical with those derived earlier, but the rows

of

are

?? ??? ?

?

?

?

?

?

(20)

Without constraints, the optimal least squares solution for each subpart would be used to update that subpart’s pose. However, when the constraints are applied, each value of

? is modi?ed to

?, and it is necessary to know the additional cost of ?tting. For any individually suboptimal solution ? , the sum-squared ?tting cost is ?? ? ? ? ? ? ? ? ? whose minimum is ?? ? ? ? ? ? ? ? ?. A little manipulation gives the well-known quadratic form for the extra

cost

?? ? ??

B. Developing and imposing constraints

?? ? ??

??? ? ??

(21)

Consider ?rst a simple example of two parts with translational velocities given by vectors of coef?cients

?? and ??

de?ned in their local Cartesian frames

? and ? . (That is, the ?rst’s

velocity is ??? ??

· ??? ?? · ??? ??, and similarly for the second.) The constraint that the two ? ??? ? ??? ? ? ?

? ?

? -velocities are equal can be written as

?

(22)

where the superscript

? denotes a value referred to the ?-th frame, and absence of superscript

denotes a value referred the native object frame given by the subscript. Drummond and Cipolla observed that exactly the same can be done using the -matrix basis, again provided the ? values refer to the same frame. So, to impose constraints between two subparts

? and ? , they wrote

(23)

??? ? ??? ?

? ?

?

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

17

where, to constrain the associated quantity, each
? ? , the set

? ? ? ? ? ? ?

?? ? is a 6-vector drawn from

?? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??

As frames

? and ? are related by

? ? ? ? ? ?

? ? ? ? ? ?

?

? ? ? ? ? ?

?? ? , the screw ?

? ? ? ? ? ?

?? is given by ?

(25)

? ? ? ? ? ?

(24)

?? ?

?????? ?

The adjoint transformation, abbreviated below to ??? clear: it is that the vectors
become easy to specify.

????, is derived in the Appendix. The ?

reason for considering the pose update in each subpart’s own frame in this method now becomes

Drummond and Cipolla’s method seeks the optimum solution for the articulated object as that which minimizes the additional cost of sub-optimally ?tting the individual subparts, subject to all relevant constraints being satis?ed. Writing minimizing the sum over all parts , they consider the problem as one

?? ??

? ? ? ???? ? ??? ???? ? ??? · ??? ? ??? ???? ? ???? ? ? ? ? ?

? ? ? ?? ? ?

? ? ? ?

? ?

(26)

subject to the constraint set

?

(27)

The relationship between ? and ? is detailed in the Appendix. However, we note that the ? second term of the minimization is actually invariant to changes in frame, and the problem is more ef?ciently written in terms of the becomes

?’s in their native frames, so that the second cost term

(28)

??? ? ?? ? ? ??? ? ?? ?

and the constraints are modi?ed to

?? ? ????? ?

? ? ? ?

? ?

?

(29)

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

18

p? p q a a+ b b+

Fig. 9 A GENERAL BRANCHING MODEL .

Depending on the physical constraints, each set contains

?

?? ?

individual constraint

equations, where the lower and upper equalities indicate, respectively, complete freedom and complete rigidity between the two subparts. C. Handling general articulated graphs in DCT In [11], Drummond and Cipolla describe the solution for two subparts connected by a hinge, and in [8] they sketch a solution for a chain of subparts. However, for comparison with the articulated RAPiD tracker, it is necessary properly to understand how DCT might handle branching in the kinematic chain. A solution that handles both non-branching chains and branching trees is developed here. Consider ?nding

All

? · ? subparts arranged in an articulated tree. The DCT problem becomes one of

??

??

?

??? ? ?? ? ? ??? ? ?? ?

(30)

subject to sets of motion constraint equations. Each subpart constraint set for each of its children

? with children generates one ?? ? ·

?·:

? ?·

?? ? ???· ??· ?

Now consider the subpart multiple lines of descendants ,

·

?

·

?

(31)

? as shown in Fig. 9, with ancestors ?, ?? , etc, and with possible

, etc, , etc. The constraint sets referencing

??

involve

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

19

the parent of

? and each of its children, if they exist. That is,

If parent If 1st child If 2nd child

? exists

exists exists

?? ? ????? ? ? ?? ? ? ? ? ? ? ?? ? ? ? ?

?

?

? ?
?
?

? ? ?

?? is therefore

?

(32) (33) (34)

and so on if

? has further children.

?? ? ?

?

That part of the Lagrange system depending of differentiation w.r.t.

? ? ??? ? ?? ? ?

? ? ?? ?

?

?? · ?

?? ?

?

?

?
? ? ?

·

?? ?

?

?
? ? ?

·

?

(35)

where the ?rst term derives from the cost, and the remainder from the constraints, and where the ’s are the Lagrange multipliers. The ?rst summation is omitted if ? has no parent, the second if

? has no children, and further terms of the sort shown in brackets are added if there are further

children. Rearranging,

?? ?? ·

?? ? ?

?

? ??

? ? ?? ?? ?

?

?? ? ?

?? ?

?

? ?

? ?

?

??
? ? ? ?

?? ?

?

? ?

? ?

?

??
?

? ?

?

?’s

in the

(36)

Similar expressions can be written for the other constraint set (Eq. 32) gives

??? ? ?

?

?’s, and

replacing all the

?? ??

?
?? ? ? ? ?? ?

?

?? ? ??
? ? ? ? ?? ?

?? ?

?

??

? ?? ?
?? ?? ?

?
? ??

· ??? ? ??? ?

?

·

?
? ??

???

??
? · ? ?

?

?? ?

? ?
? ? ?

???

??
?

? ?

·

(37)

?

? ?
? ? ? ?

? ? ? ??

?

?

?? ?

Replacing all those in the

?? ? ?

?

?? ? constraint set gives

?? ? ?
? ? ? ? ??

? ?? ?

?

?
?? ?

·

? · ?

?

?
? ? ?

? ?? ?

?? ?

?

·?? ? ??

?

? ?
? ?

·
?

?

??

??
? · ·

?
? ?

?

??
?

?

·

(38)

?
?

? ? ? ?? ?

?

??

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

20

If there are multiple children,

and so on, expressions similar to Eq.(38) can be written for the , and so on.

?? ? constraint set by swapping °

These expressions are more compactly expressed as

?? ? ?? ? · ?? ? ? ? · ?? ? ? ??

??

· ??

?

· ??

·?? ? ? · ·??

?

? ?

· ?
? ? · ?
?

°

,

· ?
·

?

? ?

?? ? ??

(39) (40)

Again, the bracketed terms are used for additional children, and a further equation of the form of Eq. (40) generated for each additional child, with The -th row and

°
, etc.

(41) (42)

? ?
? ?

?-th column of the various quantities are

?? ? ?

?? ? ? ?? ? ? ?? ? ? ??

?

? ?? ?? ?? ??

?
? ? ?????? ? ??? ?
? ? ? ???
? ? ? ?? ? ?
? ? ? · ??? ? ??? ? ?
? ? ??? ?
? ? ?

? ? ? ?

(43) (44) (45)

?

??
?

? ?

D. Speci?c Cases of Graphs Now consider speci?c cases through examples shown in Fig. 10. Below we detail the implications of each of these types of topographies in the solution with DCT. General graphs can be modelled by combining elements of each of these cases. 1) Non-branching kinematic chains: When the subparts are arranged as a linear chain, labelled here from

?

? to ? , as illustrated in Figure 10(a), the system to be solved for the

?? ? ?

’s becomes

?

block tridiagonal,

??? ?? ? ? ? ??? ?? ?? ? ??? ??? ??? ? ? ??? ?? ? ??? ??? ??? ? ??? ?? (46) . . . . . . . . . . . . . . . . . . . . . . . . ? ? ? ?? ?? ? ?? ?? ?? ? ?? ?? ?? ? ?? ? ?? ? ?? ? ?? ? ?? ? ?? ?? ? ? ? ? ?? ?? ? ?? ?? ? ? ?? ? for which a standard method, ? ?? ? in the number of links, exists to recover the values without

explicitly building or inverting the matrix (e.g. [33]). Once the ? ?·? are known, the constrained

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

21

0 1 2 3 N?2 N?1 N

0 1 2 3 4

(b)

Fig. 10

0 1 5 6 7 2 3 4

(c)

5 6 7

(a)

E XAMPLES OF

SPECIAL CASES OF KINEMATIC CHAINS: ( A ) A SINGLE KINEMATIC CHAIN; ( B ) A BRANCHING KINEMATIC TREE ; AND ( C ) A CLOSED LOOP KINEMATIC CHAIN.

screws

?? are derived for each subpart from Eq. (36). The algorithm for the pose update along

a chain is summarized in Fig. 11. 2) Kinematic trees: When branching occurs, the system loses its block tridiagonal form, and its structure depends of course on the object’s structure. By way of example, the structure in Fig. 10(b) generates the following system for solution.

? ?? ? ?

??? ??? ? ? ?? ? ?

? ? ?? ?? ??? ?? ? ?? ??? ??? ??? ? ?? ?? ?? ? ? ? ? ? ? ? ? ?

??? ? ??? ? ? ? ? ? ?? ?? ? ? ? ?

? ? ? ? ? ? ?

?? ?? ?? ?

?

??? ??? ??? ?? ?? ? ?

?

(47)

The structure is block symmetric, and is likely to remain sparse, but can no longer be solved in

???? time.

3) Closed loop kinematic chains: The general solution of Equations (39) and (40) can also be employed to build the constraint system for closed loop kinematic chains. In the example of Fig. 10(c) the constraints happens between

? ?

and

?

can be described as links to brother nodes. The same

and

for the other side of the branch. Therefore, the constraint system

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

22

D&C’s Tracker, with modi?cations for trees. 1: for each subpart ???? do 2: for each camera do 3: for each visible control point do 4: Construct ? and thence 5: Search for image edge, compute 6: Add row to measurement matrix ? 7: end for 8: end for 9: Derive ?? from ? ?? ? (Eq 9) 10: Compute ? ?? , from ? ? ? 11: Compute ???· ? from Eq. (61) (except leaves) ? 12: end for 13: for each constraint set do 14: Compute ?, ?, ?, ? and ?? from Eqs (39-40) 15: end for 16: Solve tridiagonal (Eq. 46) or block symmetric (e.g. Eq. 47) system for all 17: for each subpart do 18: Derive ?? from Equation (36) and update part poses. 19: end for

?

?

?

Fig. 11 O NE ITERATION OF THE POSE

UPDATE ALGORITHM IN

DCT

becomes:

?

??? ??? ? ? ? ?? ? ?

?? ?? ??? ??? ? ? ?? ? ? ?

? ? ?? ? ?? ??? ??? ?? ?? ?? ?? ? ? ? ? ? ?

? ? ??? ?? ?? ? ?? ??

??? ??? ? ? ? ?? ? ?

? ? ? ? ?? ?? ? ?

? ? ? ? ?? ? ? ?

??

?? ?? ?? ? ?

?

?

?

??? ??? ??? ?? ?? ?? ? ?

?

(48)

Again, the constraints system loses its block tridiagonal form and becames less sparse. The additional issue is that extra constraints added by loops introduce singularities into the constraints system, demanding rank monitoring to avoid noise imposing spurious rigidity. The iterative solution of [8] is an alternative that does not require rank monitoring. With ART, closed chains are represented following a standard solution for inverse kinematics. In [34], the pose change of the end effector (or of any chosen link in or after the loop) is represented following two paths from the base, i.e., using two Jacobian matrices and . Given

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

23

1 θ01

θ12

2 θ02 x02

0

Fig. 12 A N EXAMPLE OF CLOSED LOOP KINEMATIC CHAIN: THE PLANAR RRRP MECHANISM . C IRCLES REPRESENT REVOLUTE

JOINTS AND THE RECTANGLE REPRESENT A PRISMATIC JOINT.

L INK 0

IS THE BASIS AND THE POSITION OF THE PRISMATIC

JOINT IS THE END EFFECTOR .

the two sets of joint parameters

and

, the inverse kinematics solution is obtained by making

?? ?? ??

·

example,

?. In the example of Figure 12,

and

?

and

??? ?. For this simple

can be derived analytically (eg. [34]).

To obtain the same representation with ART, each control point lying on a link inside a loop adds two rows for the same measurement , each following a different path in the loop. For

the example of Fig. 10(c),

?

?? ? ?

? ?

?? ? ?? ? ?? ? ? ? ? ?? ? ?? ? ?

?? ? ? ?? ? ? ?? ? ? ?? ? ?

(49) (50) and the system can be solved

Therefore, each measurement of links 2–7 adds two rows

to

as before. The number of dofs of the system of Eq. (9) remains the same, so, unlike DCT, the computational complexity is not affected by the presence of loops. Note that it is necessary to use weighted least squares to avoid erroneously increasing the importance of nodes below a loop. For generic chains (specially long or non-planar mechanisms), it is troublesome to determine the set of singular poses analytically. Therefore rank monitoring is also necessary for ART. Note that the ?ow control of both ART and DCT algorithms given in Figures 5 and 11 need to be modi?ed in order to detect and deal with loop closing.

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

24

V. E XPERIMENTAL A. Similarity of results

COMPARISON OF

ART

AND

DCT

Despite their very different constraint formulations, both the articulated RAPiD tracker developed in Section III and the Drummond and Cipolla tracker reviewed in Section IV are single-shot linear methods, and, given that relationships between scene and image have been shown to be identical, one should expect them to give the same results. To verify this against known ground truth, a CAD model of two hinged subparts was generated, and both trackers deployed on the resulting imagery as the hinged opened between successive frames. In each experiment increasing amounts of zero-mean Gaussian noise was added to the image displacements measured between control points and image edges. Fig. 13(a) shows a typical view and match set. The lower trace in Fig. 13(b) shows the deviation in degrees from the veridical hinge angle recovered using ART. The upper trace shows the rising standard deviation. Fig. 13(c) shows the same when the individual subparts are tracked without imposition of constraints. Fig. 13(d) shows that as soon as the constraints are applied in DCT the modi?ed angle becomes all but identical with that recovered using kinematic constraints in (b). Indeed, the results from ART and DCT differed by at most parts in

?? , effectively at the limits of expected numerical accuracy.

B. Computational Cost Fig. 14 shows comparisons of the times for a single update cycle of the core operations of ART and DCT, run on a 1.8 GHz Pentium IV. Times were accumulated over many trials, with each data point taking at least 20 s to collect, giving each datum the same fractional error of order

??? ±. Also, in both cases, similar care was taken to avoid unnecessary calculation. The

?

object is taken to have

? subparts with ? control points per subpart, and is made up of a single

articulated chain. Sub?gure (a) shows that with up to 10 subparts there is negligible difference between the methods, and that at 30 subparts the time taken by ART is about twice that by DCT. Up to 100 parts the cost in ART is still dominated by does become ?

?

?? ? as expected for the Cholesky decomposition of the ?? · ?

?????, but beyond (not shown) it

?

matrix

that

occurs in the least squares solution of ×

. (Recall × contains the six screw components and

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

ART

20

25

hinge angle error (deg)

15

10

5

0

?5 0

2

4

6

8

10

(a)

Unconstrained

20 20

noise σ (pixels) DCT

(b)

hinge angle error (deg)

10

hinge angle error (deg)

2 4 6 8 10

15

15

10

5

5

0

0

?5 0

?5 0

2

4

6

8

10

(c)

noise σ (pixels)

noise σ (pixels)

(d)

Fig. 13 C OMPARISON OF THE NUMERICAL PERFORMANCE OF THE A RTICULATED RAP I D TRACKER WITH THE DC TRACKER FOR

TWO SUBPARTS CONNECTED BY A SINGLE REVOLUTE JOINT AS INCREASING MEASURED DISPLACEMENTS FOR THE OBJECT SHOWN IN PART ( A ), WHICH IS TRUTH IS AVAILABLE .

G AUSSIAN NOISE IS ADDED TO THE

CAD- GENERATED , SO EXACT GROUND

THE

R ESULTS FROM : ( B ) THE ARTICULATED RAP I D T RACKER ; ( C ) UNCONSTRAINED SUBPART DC TRACKER . I N EACH GRAPH THE SOLID CURVE IS

TRACKING ; ( D ) AFTER ADDING THE CONSTRAINTS IN THE

DEVIATION FROM GROUND TRUTH, AND THE DASHED CURVE IS THE STANDARD DEVIATION.

? · ? joint parameters.) Fig. 14(b) shows that both methods scale predominantly linearly with

the number of control points

? per link. (Exploration of the apparently noisy results at high ?

suggests these are a reproducible and uninteresting quirk of memory management in the matrix library used.) Fig. 15(a) shows, for the same CPU, the computational time of operations that are shared by both trackers, including occlusion handling, projection of control points, image operations, and related operating system overhead. Three objects of increasing complexity were used: a single part box with 6 dof, an object with 4 parts and 9 dof (Fig. 15b), and a hand with forearm model with 17 parts and 28 dof (Fig. 15c). The tests used 320?240 images acquired from three cameras. The search range from control points was

?10 pixels. The number of potential control

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

2

26

10

ART DCT

10

0.2

ART DCT

Time per cycle (ms)

Time per cycle (ms)

10

1

10

0

10

0

10

?0.2

10

?1

10

?0.4

10

?2

10

0

?0.6

10

10 Number of Subparts

1

10

2

10

1

10 Number of Points per Link

2

(a)

Fig. 14 T IMES ( IN MS ) FOR

UPDATE CYCLES OF THE CORES OF

(b)

ART AND DCT COMPARED . T HE COMPARISONS ARE : ( A ) A S A

? WITH A FIXED NUMBER OF SIX CONTROL POINTS PER PART; AND ( B ) AS A FUNCTION OF THE NUMBER OF CONTROL POINTS ? PER SUBPART, WITH A FIXED NUMBER OF FIVE SUBPARTS. I N THIS ? ? REGION ART IS PREDOMINANTLY ? ?? ? ( BUT DOES EVENTUALLY BECOMES ORDER ? ?? ? ), WHILE DCT REMAINS ? ??? . F OR FIXED ? BOTH METHODS EXHIBIT AN ORDER ? DEPENDENCE .

FUNCTION OF NUMBER OF SUBPARTS

points was increased in steps but, because of occlusion, not all the control points are always visible, and so the average visible number for a sequence is used as the abscissa. The time increase is dominated by a linear dependence on the number of visible control points. For all but small problems, DCT is faster than ART and its linear behaviour with number of parts demands its use on large problems, say involving more than 100 subparts. However, for modest numbers of subparts, the timing differences are insigni?cant compared with the equally shared costs. As examples, the different cores of DCT and ART took 0.45 ms and 0.41 ms respectively, to run on the four-block object with 100 control points, but shared operations took some 6 ms; on the hand model with some 170 control points ART’s time was 2.5 ms, but the shared operations took over 13 ms. The core times given for DCT are so far those for simple chains where the tridiagonal system is solved. On a chain with the same number of parts as the hand and, again, 170 control points, DCT took some 1.2 ms, but solving the actual branched system shown in Fig. 16 took some 2.5 times longer, and hence taking a little longer than ART. Once any branching occurs, we ?nd that that general matrix solvers give a solution time dependent mostly on the number of subparts, not the degree of branching: for example, with 16 parts, two chains of eight takes the same time

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

27

10 Overhead time per cycle (ms)

2

hand 4 blocks box

10

1

(b)

10

0

10 10 Number of control points in the whole body

2

3

(a)

Fig. 15 ( A ) T IMES ( IN MS ) FOR

RELATED

(c)

THE SHARED OPERATIONS ( OCCLUSION TESTING , IMAGE SEARCH AND MEASUREMENT, AND

OS

OVERHEAD ) IN ONE CYCLE OF

ART AND DCT, AS A “4

FUNCTION OF THE AVERAGE NUMBER OF VISIBLE

CONTROL POINTS IN THE WHOLE BODY. ( B , C ) SHOW THE

BLOCKS ” AND “ HAND ” MODELS RELATED TO THE UPPER TWO

TRACES IN PART ( A ).

0

1

14 8 11 15 9 12 16

2 3 4

5 6

R R Q R P Q R S S P Q R P Q S P S Q R P Q R P Q P S S Q R P Q R P Q P S S S

R S

R S

λ 0 1

1 2 2 3 3 4 1 5 5 6 6 7 1 8 8 9 9 10 1 11 11 12 12 13 1 14 14 15 15 16

S

S

S

S

P S

S

S

7 10 13

S Q R P Q R P Q S Q R P Q R P Q

(a)

Fig. 16

(b)

SUBSCRIPTS ( B ).

T HE HAND MODEL ( A ), AND THE BLOCK STRUCTURES OF ITS CORRESPONDING MATRIX AND

as four chains of four. However, we have not investigated to what extent careful hand-tuning of the code to a speci?c object would change this. C. Algorithmic robustness In general, tracking unconstrained subparts is certainly more fragile than tracking subparts collectively in a constrained system. The more degrees of freedom, the greater the likelihood of ?tting to noise. Breakage might occur when, for example, the available control points do not

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

28

Frame 34

Frame 40

Frame 50

Unconstrained

Constrained

Fig. 17 A N EXAMPLE SHOWING ( TOP ) FAILURE WHEN TRACKING WHEN SUBPARTS COMPLETELY UNCONSTRAINED , CONTRASTED

WITH ( BOTTOM ) SUCCESSFUL TRACKING THE SAME PARTS ARE CONSTRAINED.

T HE FAILURE ARISES HERE BECAUSE OF (A S THE TEXT EXPLAINS , THIS IS

SCENE ROTATION GIVING RISE TO RAPID MOTION AT THE END OF THE CHAIN OF PARTS . NOT A COMPARISON BETWEEN DCT AND

ART.)

provide suf?cient constraint — a cylinder with points only along its extremal boundaries is an example (Fig. 2) — or when a subpart moves quickly at the end of a long linkage (Fig. 17). On ?rst consideration this seems to make DCT inherently less robust than ART, but this is not the case. Assume that, at the start of an update cycle, both trackers have placed their subparts in the same location. They will generate the same control points and therefore generate the same measurements. Suppose ?rst that there are enough measurements to determine the pose of each subpart, but that the measurements belonging to a particular subpart are quite erroneous. Using ART, the costs are immediately shared and the pose updates of all the subparts will be disturbed somewhat, whereas in DCT the initial pose updates (?) will all be better, except that for the erroneous one. However, when the constraints are imposed, the costs of (mis-)?tting with the

? values are all

correctly accounted for and minimized. Because the measurements are the same, the solutions

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

29

must be same. Suppose now that there are insuf?cient measurements to determine the pose update of a particular subpart. It is important not to discard those measurements that are available, but instead invent a pose update

? for that subpart against which the change in cost when using

some ? instead may be measured, a change which could easily now be a decrease rather than the usual increase. This tactic extends to zero measurements, where the cost change when moving the part is zero. However, what is lost in this case is the ability to separate the recovery of the Lagrange multipliers and the

? values. Insuf?cient measurements means that ?’s and

is rank

de?cient, and so cannot be inverted in Eq. (36). There appears no generalizable alternative to solving the linear system in its entirety, with a stacked vector of D. Data robustness To consider the likely cost of computing the solutions to ART and DCT in a robust manner using a random sampler such as RANSAC [35] or LMedS [36], we use the formal relationship ’s as the unknowns.

?

for the con?dence

? ? ?? ? ? ??

(51)

? that a valid minimal set of ? measurements will be selected after ? trials

. Although, for a range of problems, experiment indicates

when the fraction of valid data is

that this should be regarded as an underestimate of the number of trials comparative results will be less affected. For required are

? actually required, the

? subparts with ?? ? ?? 1 dof joints, the trials ?

?

?

??

?? ?? ? ? ? ?? ?? ? ??

·

?

?? ?? ? ? ? ?? ?? ? ?

?

(52)

Multiplying these expressions by the different time costs per iteration of each method, gives the pairs of curves in Fig. 18 for 10, 20 and 30 subparts, derived at

± con?dence, and each plotted as a function of the percentage of outlying or invalid data, ????? ? ?. Also shown is

the locus of the crossover point, as the number of parts is varied. For a modest number of parts and quite low values of corruption, ART’s requirements undercut DCT’s, but more remarkable is how rapidly a certain fraction of outliers becomes intolerable as the number of subparts rises.

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

30

250

Time for I iterations (ms)

200

150

100

50

0 0

5

10

15 20 % outlying data

25

30

35

Fig. 18 PAIRS OF CURVES SHOWING THE TIMES REQUIRED TO COMPLETE THE REQUIRED RANDOM SAMPLING TRIALS AS

PERCENTAGE OF INVALID , OUTLYING , DATA VARIES . THE

T HE PAIRS OF CURVES ARE

FOR

30 ( LEFT ), 20 ( CENTRE ), AND 10 DCT, RESPECTIVELY. A LSO

( RIGHT ) SUBPARTS , AND

IN EACH PAIR THE SOLID AND DOTTED LINES ARE FROM ART AND SHOWN IN THE LOCUS OF CURVE PAIR INTERSECTIONS.

VI. D ISCUSSION

AND

C ONCLUSIONS

This paper has described and contrasted two different methods of tracking articulated objects in a video sequence. The ?rst method is a straightforward, but novel, extension of Harris’ RAPiD tracker to handle articulation (dubbed ART) by explicitly including the kinematics in the linearized pose update equation. The second is the articulated tracker of Drummond and Cipolla (dubbed DCT) which imposes motion constraints on subparts after they have been tracked independently. To motivate the comparison, Harris’s original method for recovering the kinematic screw of a single rigid object was described, and shown to be entirely equivalent to Drummond and Cipolla’s later formalism based on Lie algebra. It was noted that Bregler and Malik had also described the theory of a kinematics-based tracker using Lie algebra, and it is concluded that ART is equivalent to their formulation, but one suited explicitly to perspective projection. Drummond and Cipolla’s method was reviewed, and a method for solving the constraints system of branching kinematic chains was given. Comparative experiments on both methods showed, ?rst, that the results for the pose updates

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

31

are identical, notwithstanding the constraints being applied in very different ways. Both are performing linear least squares in a single shot, and so their equivalence is expected. Second, it was shown empirically that the computational kernel of DCT retains plexity in the number of subparts ?, whereas ART rises from for, say,

??? before eventually succumbing to ??? ?. For large ?, DCT’s behaviour is characterized in the best case by ? inversions of a ?xed size ? matrix and, at least for a non-branching chain, a ? ??? recovery of Lagrange multipliers, whereas ART has to invert a ? · ?? ? ??? matrix. However, for tens of subparts the absolute difference in core cost is

?

? ?

??

???? for low numbers, to ???? ?

???? com-

substantially outweighed by the commonly shared costs of model projection, image search, and so on. Where DCT appears less satisfactory than ART is in the area of algorithmic robustness. One dif?culty occurs when, rather than a simple chain of parts, there is branching into a tree structure. The description of the ART algorithm shows it to be, de facto, a simple tree-based method. But in DCT, as noted earlier, at a branch a subpart has more than two sets of motion constraints to satisfy, and the matrix to be solved in Eq. (47) no longer has a tridiagonal structure. Instead it acquires a block-symmetric structure dependent on the model’s structure, but no general fast solution methods exists for the solution of such systems. Indeed, in [8], Drummond and Cipolla adopt an iterative method of solution. Again, when there is insuf?cient information to solve for the initial pose update of a particular part, the tridiagonal method cannot work. A more general curiosity in DCT is that the less complicated the object’s kinematics, the more computation has to be done to impose the constraints. This, and the issues raised in the previous paragraph, are outcomes of allowing the degrees of freedom to grow to their maximum and then having to prune them back when those freedoms are not required. While this extra effort is rather neatly tamed if the problem is a well-behaved chain of subparts, exceptions become hard to manage. An attempt has been made to characterize the time requirements of both methods to complete trials of a random sampler. Here it does seem that ART has an advantage over DCT, but for both

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

32

methods the time required for trials increases sharply with rising number of parts, suggesting that it is more prudent to improve the quality of measurement than to rely on random sampling to “clean up” afterwards. We suggest the most signi?cant aspect of Drummond and Cipolla’s method is that it makes comparatively easy the switching on and off of constraints, after, and separately from, the expensive process of making image measurements. It appears to provide exactly the mechanism required to account for the motion of objects which make contact and later break apart. As contact approaches between two parts, tracking performance with and without the relevant constraints can be tested, to decide whether the motion was constrained or was still independent. While the application of constraint hypothesis-and-test to rigid objects is straightforward, a more signi?cant challenge is to apply it to separate articulated objects. The approach favoured by the conclusions of this paper would be to track the joined articulated entities using ART, and employ DCT at the contact. ACKNOWLEDGEMENTS This work was supported by Grant GR/N03266 from the UK Engineering and Physical Science Research Council, and by Brazilian Government CAPES scholarship BEX-1550/00-4 to TEdC.

A PPENDIX Consider two frames and where points are related by the homogeneous transformation

?

?

?

?

?

(53)

To derive the effect of changing frame on the screw vector, ?, consider writing the scene velocity in frame in two different ways

?

indicating that

? ? ?

?

?

?

?

?

(54)

? ?

(55)

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

33

Recalling that

? ? ? ? ?? ?

?

, and the earlier expressions for

, Eq. (55) is just (56)

where

?? is the antisymmetric matrix such that ??

?

?

?

?

?

?

?

?? ? ? ?? ? ? ? ? ?? ? ? ?. Hence ?? ?

(57)

which is equivalent to

?

Also

(58)

?

??

??? ? · ? ?

?

? ?? ?

·? ?

(59)

The relationship between the screws is de?ned as

?? ??

? ?? ?

? ? ?

(60)

and hence using Eqs. (57,59) the adjoint is given by

?? ?

As the measurement vector

(61)

Eq. (61) agrees1 with Drummond and Cipolla, given that they recover is an invariant,

?

?

?

? ??

?

(62)

and so

?? ??

which gives2

?? ??

1 2

?? ??

?

(63)

Though a minus sign appears missing from the de?nition of their antisymmetric matrix ?? . This differs from the equivalent in Drummond and Cipolla, a difference which may arise because equation (29) in ref ?? ? , in contradiction with the later (agreed) statement in equation (32) in ref [11] that [11] states ? ?? ??? ? ?? ?? .

?

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

34

R EFERENCES

[1] D. C. Hogg. Model-based vision: a program to see a walking person. Image and Vision Computing, 1(1):5–20, 1983. [2] D. Marr and H.K. Nishihara. Representation and recognition of the spatial organization of three-dimensional shapes. Proc Roy Soc Lond B, 200:269–294, 1978. [3] C. Bregler and J. Malik. Tracking people with twists and exponential maps. In Proc IEEE Conf on Computer Vision and Pattern Recognition, Santa Barbara, June 23-25, 1998, pages 8–15. IEEE Computer Society Press, 1998. [4] H. Sidenbladh, M.J. Black, and D.J. Fleet. Stochastic tracking of 3D human ?gures using 2D image motion. In D. Vernon, editor, Proc 6th European Conf on Computer Vision, Dublin, June/July 2000, LNCS 1843, pages 702–718. Springer, 2000. [5] S.X. Ju, M.J. Black, and Y. Yacoob. Cardboard people: a parametrized model of articulated motion. In Proc 2nd Int Conf on Automatic Face and Gesture Recognition, Killington VT, 1996, pages 38–44, 1996. [6] D.M. Gavrila and L.S. Davis. 3D model-based tracking of humans in action: a multiview approach. In Proc IEEE Conf on Computer Vision and Pattern Recognition, San Francisco CA, June 18-20, 1996, pages 73–80. IEEE Computer Society Press, 1996. [7] B. Stenger, P.R.S. Mendonca, and R. Cipolla. Model-based 3d tracking of an articulated hand. In Proc IEEE Conf on Computer Vision and Pattern Recognition, Kauai HI, Dec 8-14, 2001, volume 2, pages II:310–315. IEEE Computer Society Press, 2001. [8] T. Drummond and R. Cipolla. Real-time tracking highly articulated structures in the presence of noisy measurements. In Proc 8th IEEE Int Conf on Computer Vision, Vancouver. IEEE Computer Society Press, 2001. [9] T. Heap and D. Hogg. Towards 3d hand tracking using a deformable model. In 2th Conference on Face and Gesture Recognition, Killington, Vermont, USA, October 1996. IEEE Computer Society Press. [10] J. Deutscher, A. Blake, and I.D. Reid. Articulated body motion capture by annealed particle ?ltering. In Proc IEEE Conf on Computer Vision and Pattern Recognition, Hilton Head SC, June 13-15, 2000, pages 2126–2133. IEEE Computer Society Press, 2000. [11] T. Drummond and R. Cipolla. Real-time visual tracking of complex structures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):932–946, 2002. [12] B.J. Tordoff, W.W. Mayol, T.E. de Campos, and D.W. Murray. Head pose estimation for wearable robot control. In Proc British Machine Vision Conference, Cardiff, Sept 2-5, 2002, pages II:807–816, 2002. [13] M. Brand. Shadow puppetry. In Proc 7th IEEE Int Conf on Computer Vision, Kerkyra Greece, pages 1237–1244. IEEE Computer Society Press, 1999. [14] Y. Yacoob and L.S. Davis. Learned temporal models of image motion. In Proc 6th IEEE Int Conf on Computer Vision, Bombay, pages 446–453. IEEE Computer Society Press, 1998. [15] S. Wachter and H-H. Nagel. Tracking persons in monocular image sequences. Computer Vision and Image Understanding, 74(3):174–192, 1999. [16] M. Isard and A. Blake. Contour tracking by stochastic propagation of conditional density. In Proc 4th European Conf on Computer Vision, Cambridge, pages 343–356. Springer, 1996. [17] C. Sminchisescu and W. Triggs. Covariance scaled sampling for monocular 3D body tracking. In Proc IEEE Conf on Computer Vision and Pattern Recognition, Kauai HI, Dec 8-14, 2001, pages I:447–454. IEEE Computer Society Press, 2001. [18] J.M. Rehg, D.D. Morris, and T. Kanade. Ambiguities in visual tracking of articulated objects using two- and threedimensional models. International Journal of Robotics Research, 22(6):393–418, 2003. [19] T. Drummond and R. Cipolla. Real-time tracking of multiple articulated structures in multiple views. In Proc 6th European Conf on Computer Vision, Dublin, June/July 2000, volume 1843 of LNCS, pages 20–36. Springer, 2000. [20] Y. Wu, G. Hua, and T. Yu. Tracking articulated body by dynamic Markov network. In Proc 9th Int Conf on Computer Vision, Nice France, Oct 13-16, 2003, pages 1094–1101. IEEE Computer Society Press, 2003. [21] Y. Hel-Or and M. Werman. Constraint fusion for recognition and localization of articulated objects. IJCV, 19(1):15–28, July 1996. [22] K. Nickels and S. Hutchinson. Model-based tracking of complex articulated objects. IEEE Trans. Robotics and Automation, 17(1):28–36, 2001. [23] J.M. Rehg and T. Kanade. Model-based tracking of self-occluding articulated objects. In Proc 5th IEEE Int Conf on Computer Vision, Boston MA, June 20-23, 1995, pages 612–617. IEEE Computer Society Press, 1995. [24] C. Harris and C. Stennett. RAPiD – a video rate object tracker. In Proc 1st British Machine Vision Conf, Oxford, pages 73–78, 1990. [25] C. Harris. Tracking with Rigid Models, pages 59–73. MIT Press, Cambridge MA, 1992. [26] T. Drummond and R. Cipolla. Visual tracking and control using lie algebras. In Proc IEEE Conf on Computer Vision and Pattern Recognition, Fort Collins CO, June , 1999, volume 2, pages 652–657. IEEE Computer Society Press, 1999. [27] D.B. Gennery. Visual tracking of known three-dimensional objects. International Journal of Computer Vision, 7(3):243– 270, 1992. [28] D.G. Lowe. Robust model-based motion tracking through the integration of search and estimation. International Journal of Computer Vision, 8(2):113–122, August 1992. [29] G.D. Sullivan. Visual interpretation of known objects in constrained scenes. Phil Trans Roy Soc Lond, 337:361–370, 1992.

DE CAMPOS ET AL, LINEAR RECOVERY OF ARTICULATED POSE CHANGE, OUEL 2279/05

35

[30] T.E. deCampos, B.J. Tordoff, and D.W. Murray. Recovering articulated pose: a comparison of two pre- and post-imposed constraint methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, to appear in 2005 or 2006. In press. [31] R.L. Thompson, I.D. Reid, A. Mun oz, and D.W. Murray. Providing synthetic views for teleoperation using visual pose tracking in multiple cameras. IEEE Transactions on Systems, Man, and Cybernetics, 31(1):43–54, 2001. [32] B.J. Tordoff and D.W. Murray. Guided sampling and consensus for motion estimation. In Proc 7th European Conf on Computer Vision, Copenhagen, pages 82–96, 2002. [33] W.H.Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes in C - The Art of Scienti?c Computing. Cambridge University Press, 2nd edition, 1988. [34] Clement Gosselin and Jorge Angeles. Singularity analysis of closed-loop kinematic chains. IEEE Trans. Robotics and Automation, 6(3):281–290, June 1990. [35] M.A. Fischler and R.C. Bolles. Random sample concensus: A paradigm for model ?tting with applications to image analysis and automated cartography. Comm. ACM, 26:381–395, 1981. [36] P.J. Rousseeuw and A.M. Leroy. Robust regression and outlier detection. Wiley, New York, 1987.

- WB Terms Index(对公银行 常用英文)
- Index Terms — FPGA Runtime Environments, Module-Based
- Index Terms—Mass Customization, product variety, process
- Social Networks Index Terms K.6.1 [Management of Computing and Information
- Index Terms — Intuitionistic Fuzzy Set Theory, Relational Calculus
- Index Terms – Introductory circuits, technology-enhanced
- Index Terms Real Time Computing
- Index terms Computed tomography (CT),
- Index Terms — Particle filter – Regularized particle filter – Expectation Maximization
- Index terms—Bandwidth Management, Quality of Service