Patent attributes
Systems and methods provide editing operations in a smart editing system that may generate a focal point within a mask of an object for each frame of a video segment and perform editing effects on the frames of the video segment to quickly provide users with natural video editing effects. An eye-gaze network may produce a hotspot map of predicted focal points in a video frame. These predicted focal points may then be used by a gaze-to-mask network to determine objects in the image and generate an object mask for each of the detected objects. This process may then be repeated to effectively track the trajectory of objects and object focal points in videos. Based on the determined trajectory of an object in a video clip and editing parameters, the editing engine may produce editing effects relative to an object for the video clip.