Towards the Vantage Project:
Camera calibration and structure recovery from a single image

Jonathan Deutscher
Oxford University

The Vantage Project proposes to deploy many cameras around SRC and to track people as they move from the field of view of one camera to another. It should be possible to use this information to analyse the movements of people and identify individuals.

This tracking (in particular the hand-off between cameras as a person moves between their fields of view) will be easier if we can make inferences about the real location of a person in the world based on that person's position in a camera's image plane. It would also be helpful to know the 3D structure of the scene to guide the tracking and enforce the constraint that the person is walking on the floor, not up the wall.

The problem can be divided into three areas:

Camera Calibration
A video camera can be approximated as a projection from the 3D world onto a 2D image plane, and a calibrated camera is one for which that projection matrix is known. Current methods for automatic calibration and structure recovery require stereo images, hand-registration of features or the observation of known objects. We used a method that uses the Manhattan assumption (that most of the lines in a scene are aligned along three perpendicular axes) to automatically recover the camera calibration from a single image.

Image Segmentation
Once the camera has been calibrated we want to segment the pixels in the image into different regions that correspond to some kind of structure in the world. Using our Manhattan assumption we can assume that most of the surfaces in the scene are planar and that they are separated by extended lines in one of the three primary directions. Once the camera is calibrated we can detect these extented lines in the image and use them to define an initial set of regions. We then reduce this set by merging the most similar neighbouring regions until a minimum region difference is reached.

Structure Recovery
Once the image has been segmented we assume that each segment represents a planar surface in the world. We begin by heuristically identifying the floor region and assuming that the camera has been installed roughly upright we can compute the orientation of this floor region. We then compute the world coordinates of every pixel in the floor segment by performing simple line plane intersections. Using these world coordinates we can discern the boundaries of the floor which should assist greatly in tracking people for the Vantage Project.

Towards the Vantage Project: Camera calibration and structure recovery from a single image

Towards the Vantage Project:
Camera calibration and structure recovery from a single image