Is a
Patent attributes
Patent Applicant
Current Assignee
Patent Jurisdiction
Patent Number
Patent Inventor Names
Junnan Li0
Chu Hong Hoi0
Dongxu Li0
Date of Patent
May 21, 2024
0Patent Application Number
175661730
Date Filed
December 30, 2021
0Patent Primary Examiner
Patent abstract
Embodiments described a method of video-text pre-learning to effectively learn cross-modal representations from sparse video frames and text. Specifically, an align and prompt framework provides a video and language pre-training framework that encodes the frames and text independently using a transformer-based video encoder and a text encoder. A multi-modal encoder is then employed to capture cross-modal interaction between a plurality of video frames and a plurality of texts. The pre-training includes a prompting entity modeling that enables the model to capture fine-grained region-entity alignment.
Timeline
No Timeline data yet.
Further Resources
No Further Resources data yet.