Patent attributes
Aspects of the disclosure relate to training and using a model for identifying actions of objects. For instance, LIDAR sensor data frames including an object bounding box corresponding to an object as well as an action label for the bounding box may be received. Each sensor frame is associated with a timestamp and is sequenced with respect to other sensor frames. Each given sensor data frame may be projected into a camera image of the object based on the timestamp associated with the given sensor data frame in order to provide fused data. The model may be trained using the fused data such that the model is configured to, in response to receiving fused data, the model outputs an action label for each object bounding box of the fused data. This output may then be used to control a vehicle in an autonomous driving mode.