Faiss
Automate and integrate FAISS for fast and efficient similarity search and vector indexing
Category: productivity Source: Orchestra-Research/AI-Research-SKILLsFAISS is a community skill for building efficient vector similarity search systems, covering index construction, vector quantization, GPU-accelerated search, index serialization, and approximate nearest neighbor retrieval for embedding-based applications.
What Is This?
Overview
FAISS provides patterns for building fast similarity search over high-dimensional vector collections using Facebook AI Similarity Search. It covers index construction that selects appropriate index types based on dataset size and accuracy requirements, vector quantization that compresses vectors to reduce memory usage while maintaining search quality, GPU-accelerated search that uses CUDA for throughput at scale, index serialization that saves and loads trained indexes for production deployment, and approximate nearest neighbor retrieval that trades small accuracy losses for orders of magnitude speed improvement. The skill enables developers to build vector search powering recommendation, retrieval, and similarity applications.
Who Should Use This
This skill serves AI engineers building retrieval-augmented generation systems, recommendation system developers implementing nearest neighbor lookup, and search engineers scaling vector similarity search to millions of embeddings.
Why Use It?
Problems It Solves
Brute-force nearest neighbor search becomes prohibitively slow as the vector collection grows beyond thousands of items. Storing full-precision vectors for millions of items requires excessive memory. CPU-only search cannot meet latency requirements for real-time applications at scale. Choosing between accuracy and speed requires understanding index type trade-offs.
Core Highlights
Index selector recommends Flat, IVF, or HNSW indexes based on dataset size and latency needs. Quantizer compresses vectors using product quantization to reduce memory by up to thirty-two times. GPU searcher accelerates retrieval using CUDA-optimized kernels. Serializer saves trained indexes to disk for deployment.
How to Use It?
Basic Usage
import faiss
import numpy as np
class VectorIndex:
def __init__(
self,
dimension: int,
index_type:\
str = 'flat'
):
self.dim = dimension
if index_type\
== 'flat':
self.index =\
faiss.IndexFlatL2(
dimension)
elif index_type\
== 'ivf':
quantizer =\
faiss\
.IndexFlatL2(
dimension)
self.index =\
faiss.IndexIVFFlat(
quantizer,
dimension,
100)
def add(
self,
vectors: np.ndarray
):
if hasattr(
self.index,
'is_trained')\
and not self\
.index\
.is_trained:
self.index.train(
vectors)
self.index.add(
vectors)
def search(
self,
query: np.ndarray,
k: int = 10
) -> tuple:
return self.index\
.search(query, k)
Real-World Examples
def build_production_index(
vectors: np.ndarray,
dim: int,
n_list: int = 256,
m_subquant: int = 8
) -> faiss.Index:
quantizer =\
faiss.IndexFlatL2(dim)
index = faiss.IndexIVFPQ(
quantizer,
dim,
n_list,
m_subquant,
8) # 8 bits per code
index.train(vectors)
index.add(vectors)
index.nprobe = 16
return index
def save_and_load(
index: faiss.Index,
path: str
):
faiss.write_index(
index, path)
loaded =\
faiss.read_index(
path)
return loaded
data = np.random.rand(
1000000, 128)\
.astype('float32')
idx = build_production_index(
data, 128)
save_and_load(
idx, 'prod.index')
Advanced Tips
Increase the nprobe parameter on IVF indexes to improve recall at the cost of higher latency, tuning this value against your accuracy requirements. Normalize vectors to unit length before indexing when using inner product similarity to make it equivalent to cosine similarity. Use the index factory string syntax for concise index construction such as IVF256,PQ8 for an IVF index with product quantization.
When to Use It?
Use Cases
Build a semantic search engine over document embeddings using an IVF index for sub-millisecond retrieval. Create a recommendation system that finds similar items from millions of product vectors. Power a RAG pipeline with efficient retrieval of relevant context passages.
Related Topics
Vector search, FAISS, approximate nearest neighbors, embeddings, similarity search, and retrieval-augmented generation.
Important Notes
Requirements
FAISS library installed via pip or conda with optional GPU support. NumPy for vector data handling in float32 format. Sufficient RAM to hold the index in memory during search operations.
Usage Recommendations
Do: start with IndexFlatL2 for datasets under one hundred thousand vectors and switch to IVF or HNSW for larger collections. Train IVF indexes on a representative sample of the data for accurate cluster centroids. Benchmark recall and latency at different nprobe values to find the optimal accuracy-speed trade-off.
Don't: use IVF indexes without training them first which produces poor search quality. Store float64 vectors without converting to float32 which FAISS requires. Assume GPU indexes always outperform CPU indexes since small datasets may not benefit from the GPU transfer overhead.
Limitations
FAISS indexes must fit in RAM or GPU memory which limits the maximum collection size on a single machine. Product quantization reduces memory but introduces recall loss that varies by dataset characteristics. Index updates require rebuilding since FAISS does not support efficient single-vector deletion from trained indexes.