Chroma

Chroma vector database automation and integration for AI-powered applications

Chroma is a community skill for using the Chroma vector database for embedding storage and similarity search, covering collection management, document ingestion, vector querying, metadata filtering, and integration with LLM applications for retrieval-augmented generation.

What Is This?

Overview

Chroma provides patterns for storing and querying vector embeddings using the Chroma database. It covers collection management that creates and configures named collections for different document sets, document ingestion that converts text into embeddings and stores them with metadata, vector querying that finds the most similar documents to a query embedding, metadata filtering that narrows search results by document attributes like source and date, and LLM integration that feeds retrieved documents as context into language model prompts. The skill enables building retrieval-augmented generation pipelines with persistent vector storage.

Who Should Use This

This skill serves developers building RAG applications that need document retrieval, teams creating semantic search over internal knowledge bases, and ML engineers prototyping embedding-based retrieval systems. It is also useful for data engineers who need to evaluate vector database options before committing to a production-scale solution.

Why Use It?

Problems It Solves

LLM context windows cannot hold entire document collections requiring selective retrieval. Keyword search misses semantically related documents that do not share exact terms. Embedding storage needs persistent indexing for efficient similarity queries. Combining vector similarity with metadata filters requires integrated query support.

Core Highlights

Collection API manages named document sets with configurable embedding functions. Ingestion pipeline converts text to embeddings and stores with metadata. Similarity search returns nearest neighbors ranked by vector distance. Metadata filters combine vector search with attribute constraints, allowing precise scoping of results to specific document sources, date ranges, or custom tags.

How to Use It?

Basic Usage

import chromadb

client = chromadb.Client()

collection = client\
  .get_or_create_collection(
    name='documents',
    metadata={
      'hnsw:space':
        'cosine'})

collection.add(
  ids=['doc1', 'doc2',
    'doc3'],
  documents=[
    'Python is a '
    'programming language',
    'JavaScript runs '
    'in browsers',
    'Rust provides '
    'memory safety'],
  metadatas=[
    {'source': 'wiki',
     'lang': 'python'},
    {'source': 'wiki',
     'lang': 'javascript'},
    {'source': 'blog',
     'lang': 'rust'}])

results = collection.query(
  query_texts=[
    'web development'],
  n_results=2,
  where={
    'source': 'wiki'})

Real-World Examples

class RAGPipeline:
  def __init__(
    self,
    collection_name: str,
    persist_dir: str
  ):
    self.client =\
      chromadb\
        .PersistentClient(
          path=persist_dir)
    self.collection =\
      self.client\
        .get_or_create\
          _collection(
            collection_name)

  def ingest(
    self,
    docs: list[dict]
  ):
    self.collection.add(
      ids=[d['id']
        for d in docs],
      documents=[d['text']
        for d in docs],
      metadatas=[
        d.get('metadata',{})
        for d in docs])

  def retrieve(
    self,
    query: str,
    n: int = 5
  ) -> list[str]:
    results =\
      self.collection\
        .query(
          query_texts=[
            query],
          n_results=n)
    return results[
      'documents'][0]

  def ask(
    self,
    question: str,
    llm_fn
  ) -> str:
    context = self.retrieve(
      question)
    prompt = (
      f'Context:\n'
      f'{chr(10).join('
      f'context)}\n\n'
      f'Question: '
      f'{question}')
    return llm_fn(prompt)

Advanced Tips

Use PersistentClient to save collections to disk for reuse across application restarts. Chunk large documents into smaller segments before ingestion for more precise retrieval results. Aim for chunks that represent a single coherent idea or paragraph, as overly broad chunks reduce retrieval precision. Combine where filters with where_document for metadata and content-based filtering in a single query. Use collection-level embedding functions so all documents in a collection are embedded consistently without manual vectorization.

When to Use It?

Use Cases

Build a documentation chatbot that retrieves relevant sections before generating answers. Create a semantic search engine over a knowledge base of internal documents. Implement a RAG pipeline that grounds LLM responses in company-specific data.

Related Topics

Vector databases, RAG, semantic search, embeddings, and LLM applications.

Important Notes

Requirements

chromadb Python package for client and server operations. Embedding model or Chroma default embeddings for document vectorization. Disk space for persistent storage of vector collections. An embedding model compatible with your document domain for optimal retrieval accuracy.

Usage Recommendations

Do: use persistent storage for production applications to avoid re-ingesting documents on restart. Set appropriate distance metrics matching your embedding model, for example using cosine similarity for normalized sentence transformer embeddings. Chunk documents to 200 to 500 tokens for optimal retrieval granularity.

Don't: store millions of vectors in an in-memory client which will exhaust available RAM. Mix embeddings from different models in the same collection as distances become meaningless. Skip metadata on documents which limits filtering capability.

Limitations

In-memory mode loses all data when the process terminates. Query performance degrades with very large collections without proper HNSW index tuning. Chroma default embedding model may not suit domain-specific retrieval needs. Updating existing documents requires deletion and re-insertion as Chroma does not support in-place updates to stored embeddings.