Segment Anything

segment-anything.com

Is a

Software

Product

Product attributes

Launch Date

April 5, 2023

Industry

Artificial Intelligence (AI)

Product Parent Company

Meta AI

Technologies Used

‌

Image segmentation

Software attributes

First Release

April 5, 2023

Other attributes

Blog

ai.facebook.com/blog/se...ntation/

Overview

The Segment Anything Model (SAM) is a promptable segmentation system with zero-shot generalization to unfamiliar objects and images, without the need for additional training. Released on April 5, 2023, the Segment Anything project was developed by Meta AI. The company has made both the model and its dataset available under a permissive open license (Apache 2.0) for research purposes. Segmentation is the process of identifying image pixels belonging to an object. Meta already uses this technology internally for tasks such as tagging photos, moderating prohibited content, and determining the posts recommended to users on Facebook and Instagram.

Example of image segmentation using Segment Anything.

SAM can identify objects in images from various input prompts allowing for a wide range of segmentation tasks without requiring additional training. Supported prompts include foreground/background points, bounding boxes, and masks; text prompts are being explored, but the capability is not supported upon the release of the model. SAM's promptable design enables the model to be integrated with other systems.

In the blog accompanying the release of SAM, Meta discussed some of the future potential use cases of the model across various industries, including the following:

AI systems—allowing a multimodal understanding of the world; for example, understanding both the visual and text content of a webpage
AR/VR—enabling the selection of an object based on a user’s gaze and then “lifting” it into 3D
Content creation—improving creative applications, such as extracting image regions for collages or video editing
Science—studying natural occurrences on Earth or even in space; for example, by localizing animals or objects to study and track in video

Model

Previously, there were two primary approaches to segmentation. The first, Interactive segmentation, required a user to iteratively refine a mask. The second, automatic segmentation, allowed for specific object categories to be defined ahead of time. This approach also required training on a substantial amount of manually annotated objects. SAM is a generalization of these two classes in a single model. It can perform both interactive and automatic segmentation in a flexible way, due to the model's promptable interface. SAM is also trained on a diverse dataset of over 1 billion masks, enabling it to generalize new types of objects and images.

SAM is structured with a VIT-H image encoder that runs once per image, outputting an image embedding. The prompt encoder embeds input prompts, such as clicks or boxes. A lightweight transformer-based mask decoder predicts object masks from the image embedding and prompt embedding.

Structure of the Say Anything Model.

The image encoder has 632M parameters, and the prompt encoder/mask decoder has 4M parameters. The image encoder is implemented in PyTorch and requires a GPU for efficient inference. Both the prompt encoder and mask decoder can run directly with PyTorch or be converted to ONNX. They run efficiently on a CPU or GPU.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

Segment Anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick

https://arxiv.org/abs/2304.02643

April 5, 2023

Segment Anything

Contents

Product attributes

Software attributes

Other attributes

Timeline

Further Resources

References

Find more entities like Segment Anything