Segment Anything

Segment Anything image segmentation automation and seamless integration

Segment Anything is a community skill for image segmentation using Meta's Segment Anything Model, covering automatic mask generation, point-prompted segmentation, box-prompted segmentation, text-guided selection, and batch processing for object isolation in images.

What Is This?

Overview

Segment Anything provides tools for extracting object masks from images using foundation models trained on large-scale segmentation datasets. It covers automatic mask generation that segments all objects in an image without manual prompts, point-prompted segmentation that isolates objects based on clicked coordinate points, box-prompted segmentation that extracts objects within specified bounding rectangles, text-guided selection that identifies and segments objects matching text descriptions, and batch processing that applies segmentation across multiple images in automated workflows. The skill helps users isolate objects from images programmatically.

Who Should Use This

This skill serves computer vision engineers building object detection pipelines, designers needing automated background removal, and researchers creating labeled datasets for machine learning training.

Why Use It?

Problems It Solves

Manual image segmentation with editing tools is slow and impractical for large image collections. Traditional segmentation models require task-specific training for each object category. Creating pixel-accurate masks for dataset labeling demands significant annotation effort. Existing segmentation tools often fail on unusual objects or complex backgrounds.

Core Highlights

Automatic segmenter detects and masks all objects without prompts. Point selector isolates objects from coordinate click positions. Box extractor segments objects within bounding regions. Batch processor applies segmentation across image collections.

How to Use It?

Basic Usage

import numpy as np
from PIL import Image

class SAMSegmenter:
  def __init__(
    self,
    model_path: str,
    device: str = 'cpu'
  ):
    from segment_anything\
      import (
        sam_model_registry,
        SamPredictor)
    sam = sam_model_registry[
      'vit_h'](
        checkpoint=
          model_path)
    sam.to(device)
    self.predictor = (
      SamPredictor(sam))

  def segment_point(
    self,
    image_path: str,
    points: list[tuple]
  ) -> np.ndarray:
    img = np.array(
      Image.open(
        image_path))
    self.predictor\
      .set_image(img)
    pts = np.array(points)
    labels = np.ones(
      len(points))
    masks, scores, _ = (
      self.predictor
      .predict(
        point_coords=pts,
        point_labels=
          labels))
    best = masks[
      scores.argmax()]
    return best

  def save_mask(
    self,
    mask: np.ndarray,
    output: str
  ):
    img = Image.fromarray(
      (mask * 255).astype(
        np.uint8))
    img.save(output)

seg = SAMSegmenter(
  'sam_vit_h.pth')
mask = seg.segment_point(
  'photo.jpg',
  [(300, 400)])
seg.save_mask(
  mask, 'mask.png')

Real-World Examples

import numpy as np
from PIL import Image
from pathlib import Path

class BatchSegmenter:
  def __init__(
    self, segmenter
  ):
    self.seg = segmenter

  def auto_segment(
    self,
    image_path: str
  ) -> list:
    from segment_anything\
      import (
        SamAutomaticMask\
        Generator)
    gen = (
      SamAutomaticMask\
      Generator(
        self.seg
        .predictor.model))
    img = np.array(
      Image.open(
        image_path))
    masks = gen.generate(
      img)
    return sorted(
      masks,
      key=lambda x:
        x['area'],
      reverse=True)

  def process_folder(
    self,
    input_dir: str,
    output_dir: str
  ) -> dict:
    inp = Path(input_dir)
    out = Path(output_dir)
    out.mkdir(
      exist_ok=True)
    results = {}
    for img_path in (
      inp.glob('*.jpg')
    ):
      masks = (
        self
        .auto_segment(
          str(img_path)))
      results[
        img_path.name
      ] = len(masks)
      for i, m in (
        enumerate(
          masks[:5])
      ):
        mask_img = (
          Image.fromarray(
            (m['segmentation']
             * 255).astype(
              np.uint8)))
        mask_img.save(
          out /
          f'{img_path.stem}'
          f'_mask{i}.png')
    return results

batch = BatchSegmenter(seg)
counts = (
  batch.process_folder(
    'images', 'masks'))
for name, ct in (
  counts.items()
):
  print(
    f'{name}: {ct} objects')

Advanced Tips

Combine point and box prompts for more precise segmentation of objects near other elements. Use the automatic mask generator with area filters to focus on objects within a specific size range. Cache image embeddings when running multiple queries on the same image to avoid recomputation.

When to Use It?

Use Cases

Extract product images from photographs by segmenting items and removing backgrounds. Create training data for object detection models by generating masks for annotated regions. Isolate specific objects in medical or satellite imagery for analysis.

Related Topics

Image segmentation, computer vision, object detection, SAM, mask generation, instance segmentation, and deep learning.

Important Notes

Requirements

Segment Anything Python package with PyTorch backend installed. Pre-trained SAM model checkpoint file downloaded for inference. GPU recommended for processing speed though CPU execution is supported.

Usage Recommendations

Do: use the appropriate prompt type for each task since point prompts work well for single objects and box prompts suit rectangular regions. Filter automatic masks by confidence score and area to reduce noise. Pre-process images to consistent sizes for predictable results.

Don't: run automatic mask generation on very high resolution images without downscaling since memory usage scales with image dimensions. Expect text-guided segmentation to match the precision of point prompts for specific objects. Use SAM for semantic segmentation tasks that require category labels on each mask.

Limitations

SAM generates masks without semantic labels so post-processing is needed for classification. Large model checkpoints require significant disk space and memory for loading. Processing speed on CPU is slow for interactive applications requiring real-time feedback.