Techniques are presented for detecting when drivers drive while drowsy. In some implementations, a drowsiness model is trained with data associated with inward videos and outward videos captured during a trip. The inward videos capture the inside of the cabin with the driver, and the outward videos capture the view in front of the vehicle in the direction of travel. Further, a device at the vehicle periodically calculates a drowsiness scale index value that indicates the level of drowsiness of the driver. Calculating the drowsiness scale index value includes obtaining a set of inward frames from the inward videos; for each inward frame, creating a face image by cropping the inward frame; obtaining a set of outward frames from the outward videos; calculating inward embeddings of the face images and outward embeddings of the outward frames; and calculating, by the drowsiness model, the drowsiness scale index value.