The overall task which is addressed in this working package is the development of a view-independent video based on multiple video streams which is informative and pleasant to watch. Several computer vision techniques were developed for a system containing two principal constituents, a virtual camera and an automatic viewpoint selection.
The task of the viewpoint selection is to choose a suitable view on the actions happening within the observed space. A multi camera network is used to detect and track the person in the scene. A Bayesian per-pixel classification approach is used to segment the image into foreground and background regions and to instantiate individual appearance and motion models for object tracking. As addition to the position and velocity of the person, interesting actions of his or her hands can be detected by means of skin-color blob detection. The set of rules for choosing a suitable viewpoint are inspired by cinematographic techniques and exploit the visual tracking data from all cameras in the network.
Virtual camera views are created as an artificial intermediate view between real camera viewpoints on request. In order to allow real-time operation several optimizations were applied to the algorithm in the following way. The background model is assumed static and as such needs only an initialization at startup but no extra no processing time during operation. Image areas showing the person in the foreground are segmented and treated separately to build a more detailed 3D model in front of the background. We use a plane sweep algorithm, implemented on the graphical board to efficiently compute a dense depth map from which a 3D triangle mesh is generated using the Delaunay algorithm. As a last step the persons’ texture is projected onto this mesh finishing the creation of a 3D scene model. The interpolated view finally is rendered by specifying a new virtual camera position in the constructed 3D scene. A limited extrapolation of the views is also possible.
|The viewpoint selection on the big image, as one of the 5 cameras shown on the bottom.
||Interpolated view between two real cameras.
||Close-up margins for the viewpoint selection. |