Atheris

Automate and integrate Atheris fuzzing tools into your testing pipelines

Atheris is a community skill for using Google Atheris to fuzz test Python code, covering coverage-guided fuzzing of Python functions, native extension testing, custom harness writing, crash reproduction, and integration with continuous fuzzing infrastructure.

What Is This?

Overview

Atheris provides patterns for finding bugs in Python code through coverage-guided fuzz testing. It covers Python function fuzzing that generates random inputs to exercise code paths in pure Python and CPython modules, native extension testing that fuzzes C/C++ code called through Python bindings with AddressSanitizer integration, custom harness writing that creates targeted fuzz targets for specific parsing functions and data processing pipelines, crash reproduction that saves and replays failing inputs for debugging, and CI integration that runs fuzzing campaigns in continuous integration with corpus management. The skill enables Python developers to discover bugs through automated input generation that would be impractical to construct manually.

Who Should Use This

This skill serves Python developers testing parser and data processing code for robustness, security researchers fuzzing Python libraries with native extensions, and teams integrating fuzz testing into Python CI pipelines. It is particularly valuable for projects that accept untrusted input from external sources.

Why Use It?

Problems It Solves

Unit tests cover expected inputs but miss unexpected edge cases that cause crashes. Python code handling binary data or complex formats can have hidden parsing bugs that only surface with malformed or boundary-condition inputs. Native C extensions called from Python may have memory safety issues invisible to Python tests. Manual test case creation cannot explore the full input space of complex functions, making automated generation essential for thorough coverage.

Core Highlights

Coverage-guided engine evolves inputs to explore new Python code paths. Native extension support detects memory errors in C code via ASan integration. Corpus management saves interesting inputs for regression testing. Crash reproduction replays exact failing inputs for debugging.

How to Use It?

Basic Usage

import atheris
import sys

def test_one_input(
  data: bytes
) -> None:
  fdp = atheris\
    .FuzzedDataProvider(data)

  # Generate typed inputs
  text = fdp.ConsumeUnicode(
    fdp.ConsumeIntInRange(
      0, 1000))
  number = fdp.ConsumeFloat()
  flag = fdp.ConsumeBool()

  # Call target function
  try:
    parse_config(text)
  except (ValueError,
      KeyError):
    pass  # Expected errors

def parse_config(
  raw: str
) -> dict:
  """Target function."""
  import json
  config = json.loads(raw)
  if 'version' not in config:
    raise KeyError(
      'missing version')
  return config

if __name__ == '__main__':
  atheris.Setup(
    sys.argv,
    test_one_input)
  atheris.Fuzz()

Real-World Examples

import atheris
import sys

with atheris.instrument_imports():
  import my_native_lib

def fuzz_native(
  data: bytes
) -> None:
  fdp = atheris\
    .FuzzedDataProvider(data)

  buf = fdp.ConsumeBytes(
    fdp.ConsumeIntInRange(
      0, 10000))
  mode = fdp.PickValueInList(
    ['strict', 'lenient',
     'auto'])

  try:
    result =\
      my_native_lib.process(
        buf, mode=mode)
  except (ValueError,
      RuntimeError):
    pass

if __name__ == '__main__':
  atheris.Setup(
    sys.argv, fuzz_native)
  atheris.Fuzz()

Advanced Tips

Use atheris.instrument_imports() to enable coverage tracking for imported modules that the fuzzer should explore. Combine FuzzedDataProvider methods to generate structured inputs like JSON objects or protocol messages from raw fuzz data. For example, consume an integer to determine array length before consuming that many string elements to simulate realistic structured payloads. Run fuzzing with a time limit in CI and save the corpus for incremental progress across runs.

When to Use It?

Use Cases

Fuzz a Python JSON parser to find inputs that cause unexpected exceptions or infinite loops. Test a C extension image library called from Python for buffer overflows. Run nightly fuzzing in CI with corpus accumulation for a data processing pipeline.

Related Topics

Fuzz testing, Atheris, Python security, coverage-guided fuzzing, and native extension testing.

Important Notes

Requirements

Python 3.6 or later with the atheris package installed. For native extension fuzzing, a C compiler with ASan support is required, and the extension must be compiled with sanitizer flags enabled. Initial seed corpus with representative valid inputs for the target function.

Usage Recommendations

Do: instrument imported modules with atheris.instrument_imports for full coverage tracking. Catch expected exceptions in the harness to focus fuzzing on unexpected failures. Save the corpus directory between runs for incremental progress.

Don't: catch all exceptions in the harness as this hides bugs the fuzzer should report. Fuzz functions with heavy side effects like network calls or file writes without mocking. Use excessively large max_len values that slow down fuzzing throughput.

Limitations

Atheris coverage tracking adds overhead that slows execution compared to C fuzzing tools. Pure Python fuzzing cannot detect memory safety issues without native extension involvement. Some Python builtins and C extensions may not be fully instrumentable.