Fuzzing Dictionary
Fuzzing Dictionary automation and integration for security testing workflows
Fuzzing Dictionary is a community skill for creating and managing fuzzing input dictionaries, covering token extraction from target programs, format-aware dictionary construction, mutation seed selection, dictionary optimization, and corpus management for coverage-guided fuzzing effectiveness.
What Is This?
Overview
Fuzzing Dictionary provides patterns for building effective input dictionaries that improve fuzzing coverage. It covers token extraction that scans target program source code and binaries for magic values, keywords, and format markers, format-aware construction that builds dictionaries matching expected input formats like JSON, XML, or protocol buffers, mutation seed selection that picks initial inputs most likely to trigger new code paths, dictionary optimization that removes redundant entries and prioritizes tokens by coverage impact, and corpus management that maintains a minimized set of inputs maximizing coverage. The skill enables security researchers to improve fuzzing campaign effectiveness.
Who Should Use This
This skill serves security researchers running fuzzing campaigns against parsers and protocols, software engineers adding fuzz testing to CI pipelines, and QA teams using fuzzing for robustness testing.
Why Use It?
Problems It Solves
Fuzzing without dictionaries wastes time generating inputs the parser immediately rejects. Random mutation alone struggles to produce valid structure needed to reach deep code paths. Large dictionaries with redundant tokens slow down the fuzzing loop without improving coverage. Initial seed corpora without strategic selection lead to slow coverage growth.
Core Highlights
Token extractor scans binaries for string constants and magic values relevant to the input format. Format builder constructs dictionaries matching target format grammar tokens. Seed selector ranks initial inputs by coverage potential and minimality. Optimizer prunes dictionaries to remove redundant tokens.
How to Use It?
Basic Usage
import re
from pathlib import Path
class DictBuilder:
def __init__(self):
self.tokens = set()
def extract_strings(
self,
source_path: str
):
content = Path(
source_path)\
.read_text()
strings = re.findall(
r'"([^"]{1,64})"',
content)
self.tokens.update(
strings)
def add_format_tokens(
self,
fmt: str
):
formats = {
'json': [
'{', '}', '[',
']', ':', ',',
'null', 'true',
'false'],
'xml': [
'<?xml', '<!',
'</', '/>', '&',
'CDATA'],
'http': [
'GET', 'POST',
'Content-Type',
'Host',
'HTTP/1.1']}
self.tokens.update(
formats.get(
fmt, []))
def export(
self,
output: str
):
with open(output,
'w') as f:
for token\
in sorted(
self.tokens):
f.write(
f'"{token}"\n')Real-World Examples
class CorpusManager:
def __init__(
self,
corpus_dir: str
):
self.dir = Path(
corpus_dir)
self.inputs = []
def load(self):
for f in self.dir\
.glob('*'):
if f.is_file():
self.inputs\
.append({
'path': f,
'size':
f.stat()\
.st_size,
'content':
f.read_bytes()
})
def minimize(
self,
max_size:\
int = 4096
) -> list:
self.inputs.sort(
key=lambda x:
x['size'])
unique = []
seen = set()
for inp\
in self.inputs:
h = hash(
inp['content'])
if h not in seen\
and inp['size']\
<= max_size:
seen.add(h)
unique.append(
inp)
return uniqueAdvanced Tips
Extract tokens from both source code string literals and the binary strings section since compiled code may contain tokens not visible in source. Use format-specific dictionary entries as initial seeds paired with structural mutations to reach deep parsing logic. Run periodic corpus minimization to keep the seed set small and focused on unique coverage.
When to Use It?
Use Cases
Build a fuzzing dictionary for a JSON parser by extracting format tokens and string constants. Optimize a fuzzing corpus by removing redundant inputs while preserving coverage. Create format-aware seed inputs for fuzzing an HTTP server request parser.
Related Topics
Fuzzing, dictionary generation, corpus management, coverage-guided testing, security testing, and input mutation.
Important Notes
Requirements
Target program source code or binary for token extraction. Fuzzing engine such as AFL or libFuzzer that supports dictionary inputs. Coverage measurement tool for evaluating dictionary effectiveness.
Usage Recommendations
Do: combine extracted tokens with format-specific grammar tokens for comprehensive dictionaries. Measure coverage improvement when adding dictionary entries to verify they contribute to testing depth. Keep individual dictionary tokens short since the fuzzer combines them during mutation.
Don't: create dictionaries with thousands of entries without measuring coverage impact since large dictionaries slow the fuzzing loop. Include long multi-word strings as single dictionary entries since the fuzzer works better with atomic tokens. Skip corpus minimization which allows the seed set to grow with redundant inputs.
Limitations
Dictionary-based fuzzing helps reach code behind format checks but cannot replace structure-aware grammar-based fuzzing for deeply nested formats. Token extraction from binaries may include irrelevant strings that add noise to the dictionary. Coverage-guided corpus minimization requires instrumented builds which may not be available for all targets.
More Skills You Might Like
Explore similar skills to enhance your workflow
Research Lookup
Quickly find and retrieve accurate information with Research Lookup integration
Devops Engineer
Automate and integrate DevOps Engineer workflows for streamlined operations
Foursquare Automation
Automate Foursquare operations through Composio's Foursquare toolkit
Corrently Automation
Automate Corrently operations through Composio's Corrently toolkit via
Edit
Automate and integrate Edit workflows to streamline your editing process
Chaos Engineer
Automate fault injection and resilience testing to ensure system stability under unpredictable conditions