Long Context

Optimize long-context window processing and automated retrieval-augmented integration

Long Context is a community skill for processing extended text inputs with large language models, covering context window management, document chunking, retrieval-augmented strategies, summarization pipelines, and memory optimization for long-document LLM workflows.

What Is This?

Overview

Long Context provides tools for handling documents and conversations that exceed standard LLM context window limits. It covers context window management that tracks token usage and structures prompts to maximize useful content within model limits, document chunking that splits long texts into overlapping segments with configurable size and stride for sequential processing, retrieval-augmented strategies that embed and index document chunks for selective retrieval instead of loading entire documents, summarization pipelines that compress long documents into condensed representations that preserve key information, and memory optimization that reduces GPU memory usage through techniques like attention windowing and KV cache management. The skill enables applications to work with documents that exceed single context window capacity.

Who Should Use This

This skill serves application developers processing long documents with LLMs, AI engineers building document analysis pipelines, and researchers working with extended context models for long-form tasks.

Why Use It?

Problems It Solves

Documents that exceed context window limits cannot be processed in a single LLM call requiring splitting strategies. Naive text truncation discards important content from the end of documents. Loading full documents into extended context windows consumes excessive GPU memory. Multi-document question answering requires selecting relevant passages rather than processing all content.

Core Highlights

Token counter tracks prompt and document token usage against model context limits. Chunker splits documents into overlapping segments with configurable parameters. Retriever indexes and selects relevant chunks based on query similarity. Summarizer compresses long content into condensed representations.

How to Use It?

Basic Usage

class DocumentChunker:
  def __init__(
    self,
    chunk_size:
      int = 1000,
    overlap: int = 200
  ):
    self.chunk_size = (
      chunk_size)
    self.overlap = overlap

  def chunk_text(
    self,
    text: str
  ) -> list[dict]:
    words = text.split()
    chunks = []
    start = 0
    idx = 0
    while start < len(
      words):
      end = min(
        start
        + self.chunk_size,
        len(words))
      chunk_text = (
        ' '.join(
          words[
            start:end]))
      chunks.append({
        'id': idx,
        'text':
          chunk_text,
        'start': start,
        'end': end})
      start += (
        self.chunk_size
        - self.overlap)
      idx += 1
    return chunks

  def estimate_tokens(
    self,
    text: str
  ) -> int:
    return len(
      text.split()) * 4 // 3

Real-World Examples

class MapReduceSummarizer:
  def __init__(
    self,
    llm_call,
    chunker:
      DocumentChunker
  ):
    self.llm = llm_call
    self.chunker = chunker

  def map_step(
    self,
    chunks: list[dict]
  ) -> list[str]:
    summaries = []
    for chunk in chunks:
      prompt = (
        'Summarize the '
        'following text '
        'concisely:\n\n'
        f'{chunk["text"]}')
      summary = (
        self.llm(prompt))
      summaries.append(
        summary)
    return summaries

  def reduce_step(
    self,
    summaries:
      list[str]
  ) -> str:
    combined = (
      '\n\n'.join(
        summaries))
    prompt = (
      'Combine these '
      'summaries into '
      'a single coherent'
      ' summary:\n\n'
      f'{combined}')
    return self.llm(
      prompt)

  def summarize(
    self,
    document: str
  ) -> str:
    chunks = (
      self.chunker
        .chunk_text(
          document))
    mapped = (
      self.map_step(
        chunks))
    return (
      self.reduce_step(
        mapped))

Advanced Tips

Use overlapping chunks to prevent information loss at segment boundaries where sentences may be split between adjacent chunks. Place the most important context near the beginning and end of prompts since models attend more strongly to these positions. Implement iterative refinement by summarizing chunks then re-summarizing the summaries for very long documents.

When to Use It?

Use Cases

Summarize a long research paper by chunking it and applying map-reduce summarization. Answer questions about a large codebase by retrieving relevant file chunks based on query similarity. Process a book-length document by splitting into chapters and analyzing each section independently.

Related Topics

Context windows, document chunking, retrieval-augmented generation, summarization, token management, long documents, and LLM memory.

Important Notes

Requirements

LLM access with known context window limits for token budget planning. Tokenizer for accurate token count estimation. Embedding model for retrieval-based chunk selection.

Usage Recommendations

Do: estimate token counts accurately using the model's tokenizer rather than word count approximations. Use overlap between chunks to maintain context continuity at boundaries. Test chunking strategies with representative documents before production deployment.

Don't: truncate documents without considering which sections contain the most relevant information. Set chunk sizes larger than the model context window minus the prompt template tokens. Assume all content in a long document is equally relevant to every query.

Limitations

Chunking inevitably loses cross-chunk context that spans segment boundaries. Map-reduce summarization may drop details during compression steps. Token estimation without the exact tokenizer introduces counting errors that can cause context overflow.