A multicamera image processing system is disclosed. In various embodiments, image data is received from each of a plurality of sensors associated with a workspace, the image data comprising for each sensor in the plurality of sensors one or both of visual image information and depth information. Image data from the plurality of sensors is merged to generate a merged point cloud data. Segmentation is performed based on visual image data from at least a subset of the sensors in the plurality of sensors to generate a segmentation result. One or both of the merged point cloud data and the segmentation result is/are used to generate a merged three dimensional and segmented view of the workspace.