US Patent 11989941 Systems and methods for video and language pre-training

Patent 11989941 was granted and assigned to Salesforce on May, 2024 by the United States Patent and Trademark Office.

Overview Structured Data Issues Contributors Activity

All edits

Edits on 22 May, 2024

"Created via: Patent importer"

Golden AI

created this topic on 22 May, 2024

Edits made to:

Infobox (+19 properties)

Article (+622 characters)

‌

US Patent 11989941 Systems and methods for video and language pre-training

Article

Patent abstract

Embodiments described a method of video-text pre-learning to effectively learn cross-modal representations from sparse video frames and text. Specifically, an align and prompt framework provides a video and language pre-training framework that encodes the frames and text independently using a transformer-based video encoder and a text encoder. A multi-modal encoder is then employed to capture cross-modal interaction between a plurality of video frames and a plurality of texts. The pre-training includes a prompting entity modeling that enables the model to capture fine-grained region-entity alignment.

Infobox

Is a