Grounded-SAM: Segmented Any Object from Text
This is an introduction to「Grounded-SAM」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.
Overview
Grounded-SAM is a machine learning model capable of segmenting any object specified by text.

Source: https://github.com/IDEA-Research/Grounded-Segment-Anything/blob/main/assets/demo2.jpg
Architecture
Grounded-SAM uses GroundingDINOto calculate the bounding box of the object being specified by text, and then uses that bounding box as input to the Segment Anything model to perform segmentation.

Grounded SAM architecture ( Source: https://arxiv.org/abs/2401.14159)
As an application example, by combining Grounded-SAM with Stable Diffusion, it becomes possible to perform advanced image editing, as we can see in the image above. On the 3rd row, the user specifies “bench” by text, which gets segmented, then Stable Diffusion changes its appearance seamlessly.
Grounded-SAM can segment objects based on text, even complex statements such as “a person wearing pink clothes” or “a man wearing sunglasses”

(Source: https://arxiv.org/abs/2401.14159)
You can refer to the following articles to get more information about the models used internally.
Usage
To use Grounded-SAM with ailia SDK, use the following command. The memory consumption is approximately 5GB. If your VRAM is limited, add the -e 1 option to execute it on the CPU.
$ python3 grounded_sam.py -i demo.jpg --caption "The running dog."
To run Grounded SAM, you’ll need ailia_tokenizer for the BERT Tokenizer. Please install it using the following command.
pip3 install ailia_tokenizer
ailia Inc. has developed ailia SDK, which enables cross-platform, GPU-based rapid inference.
ailia Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to contact us for any inquiry.
ailia Tech BLOG