A method and system may be used to determine gestures of one or more users from a video. Motion may be detected in an image frame of a video, and the image frame may be cropped around the motion. Body pose estimation may be performed on the cropped image frame. The location of the user's hands may be determined from the body pose. Additional processing may be performed to identify hand gestures.