Patent attributes
The present teaching relates to method, system, medium, and implementations for understanding a three dimensional (3D) scene. Image data acquired by a camera at different time instances with respect to the 3D scene are received wherein the 3D scene includes a user or one or more objects. The face of the user is detected and tracked at different time instances. With respect to some of the time instances, a 2D user profile representing a region in the image data occupied by the user is generated based on a corresponding face detected and a corresponding 3D space in the 3D scene is estimated based on calibration parameters associated with the camera. Such estimated 3D space occupied by the user in the 3D scene is used to dynamically update a 3D space occupancy record of the 3D scene.