Faiss

Automate and integrate FAISS for fast and efficient similarity search and vector indexing

Source: Orchestra-Research/AI-Research-SKILLs

FAISS is a community skill for building efficient vector similarity search systems, covering index construction, vector quantization, GPU-accelerated search, index serialization, and approximate nearest neighbor retrieval for embedding-based applications.

What Is This?

Overview

FAISS provides patterns for building fast similarity search over high-dimensional vector collections using Facebook AI Similarity Search. It covers index construction that selects appropriate index types based on dataset size and accuracy requirements, vector quantization that compresses vectors to reduce memory usage while maintaining search quality, GPU-accelerated search that uses CUDA for throughput at scale, index serialization that saves and loads trained indexes for production deployment, and approximate nearest neighbor retrieval that trades small accuracy losses for orders of magnitude speed improvement. The skill enables developers to build vector search powering recommendation, retrieval, and similarity applications.

Who Should Use This

This skill serves AI engineers building retrieval-augmented generation systems, recommendation system developers implementing nearest neighbor lookup, and search engineers scaling vector similarity search to millions of embeddings.

Why Use It?

Problems It Solves

Brute-force nearest neighbor search becomes prohibitively slow as the vector collection grows beyond thousands of items. Storing full-precision vectors for millions of items requires excessive memory. CPU-only search cannot meet latency requirements for real-time applications at scale. Choosing between accuracy and speed requires understanding index type trade-offs.

Core Highlights

Index selector recommends Flat, IVF, or HNSW indexes based on dataset size and latency needs. Quantizer compresses vectors using product quantization to reduce memory by up to thirty-two times. GPU searcher accelerates retrieval using CUDA-optimized kernels. Serializer saves trained indexes to disk for deployment.

How to Use It?

Basic Usage

import faiss
import numpy as np

class VectorIndex:
  def __init__(
    self,
    dimension: int,
    index_type:\
      str = 'flat'
  ):
    self.dim = dimension
    if index_type\
        == 'flat':
      self.index =\
        faiss.IndexFlatL2(
          dimension)
    elif index_type\
        == 'ivf':
      quantizer =\
        faiss\
          .IndexFlatL2(
            dimension)
      self.index =\
        faiss.IndexIVFFlat(
          quantizer,
          dimension,
          100)

  def add(
    self,
    vectors: np.ndarray
  ):
    if hasattr(
        self.index,
        'is_trained')\
        and not self\
          .index\
            .is_trained:
      self.index.train(
        vectors)
    self.index.add(
      vectors)

  def search(
    self,
    query: np.ndarray,
    k: int = 10
  ) -> tuple:
    return self.index\
      .search(query, k)

Real-World Examples

def build_production_index(
  vectors: np.ndarray,
  dim: int,
  n_list: int = 256,
  m_subquant: int = 8
) -> faiss.Index:
  quantizer =\
    faiss.IndexFlatL2(dim)
  index = faiss.IndexIVFPQ(
    quantizer,
    dim,
    n_list,
    m_subquant,
    8)  # 8 bits per code
  index.train(vectors)
  index.add(vectors)
  index.nprobe = 16
  return index

def save_and_load(
  index: faiss.Index,
  path: str
):
  faiss.write_index(
    index, path)
  loaded =\
    faiss.read_index(
      path)
  return loaded

data = np.random.rand(
  1000000, 128)\
    .astype('float32')
idx = build_production_index(
  data, 128)
save_and_load(
  idx, 'prod.index')

Advanced Tips

Increase the nprobe parameter on IVF indexes to improve recall at the cost of higher latency, tuning this value against your accuracy requirements. Normalize vectors to unit length before indexing when using inner product similarity to make it equivalent to cosine similarity. Use the index factory string syntax for concise index construction such as IVF256,PQ8 for an IVF index with product quantization.

When to Use It?

Use Cases

Build a semantic search engine over document embeddings using an IVF index for sub-millisecond retrieval. Create a recommendation system that finds similar items from millions of product vectors. Power a RAG pipeline with efficient retrieval of relevant context passages.

Important Notes

Requirements

FAISS library installed via pip or conda with optional GPU support. NumPy for vector data handling in float32 format. Sufficient RAM to hold the index in memory during search operations.

Usage Recommendations

Do: start with IndexFlatL2 for datasets under one hundred thousand vectors and switch to IVF or HNSW for larger collections. Train IVF indexes on a representative sample of the data for accurate cluster centroids. Benchmark recall and latency at different nprobe values to find the optimal accuracy-speed trade-off.

Don't: use IVF indexes without training them first which produces poor search quality. Store float64 vectors without converting to float32 which FAISS requires. Assume GPU indexes always outperform CPU indexes since small datasets may not benefit from the GPU transfer overhead.

Limitations

FAISS indexes must fit in RAM or GPU memory which limits the maximum collection size on a single machine. Product quantization reduces memory but introduces recall loss that varies by dataset characteristics. Index updates require rebuilding since FAISS does not support efficient single-vector deletion from trained indexes.

More Skills You Might Like

Explore similar skills to enhance your workflow