Analyzing PDF Malware with PDFiD
Analyzes malicious PDF files using PDFiD, pdf-parser, and peepdf to identify embedded JavaScript, shellcode,
What Is This
"Analyzing PDF Malware with PDFiD" is a technical skill designed to help cybersecurity professionals and analysts identify and triage malicious PDF documents. This skill makes use of Didier Stevens’ PDFiD and pdf-parser tools, along with peepdf, to scan PDF files for suspicious objects, embedded scripts, shellcode, and exploit code. It allows analysts to perform static analysis of PDF files without opening or rendering them, reducing the risk of accidental exploitation. By examining the internal structure of a PDF, this skill helps uncover indicators of compromise, embedded payloads, and exploit techniques commonly used in targeted attacks and spam campaigns.
Why Use It
Malicious PDF files are a common vector for delivering malware, phishing links, and exploits targeting vulnerabilities in PDF readers, especially Adobe Reader. Attackers often embed JavaScript, shellcode, or files within PDFs to trigger exploits or drop additional payloads. Traditional antivirus solutions may miss sophisticated or obfuscated threats within PDFs. Analyzing the file’s structure with PDFiD and related tools enables you to:
- Detect the presence of embedded JavaScript, launch actions, or automatic execution triggers
- Identify suspicious objects such as embedded files, streams, or URLs
- Extract and inspect potential payloads or exploit code
- Perform rapid triage of potentially malicious attachments without executing them
- Reduce risk by analyzing files statically before any dynamic or sandbox-based analysis
How to Use It
Prerequisites
- Python 3.8 or above
- Didier Stevens’ tools installed via pip:
pip install pdfid pdf-parser - peepdf installed for deeper, interactive analysis:
pip install peepdf
Step 1:
Initial Structure Scanning with PDFiD
PDFiD performs a lightweight scan for suspicious keywords and objects within a PDF file. It does not parse the entire file but provides a quick overview of potentially dangerous features.
Example usage:
pdfid malicious.pdfSample output:
PDFiD 0.2.7 malicious.pdf
/JavaScript 2
/JS 2
/OpenAction 1
/AA 0
/Launch 0
/EmbeddedFile 1
...Interpretation:
/JavaScriptand/JSindicate embedded scripts, commonly used in exploits./OpenActioncan trigger code execution when the document is opened./EmbeddedFilesuggests the presence of additional embedded content.
Step 2:
Deep Dive with pdf-parser
After identifying suspicious elements, use pdf-parser to inspect or extract specific objects, streams, or scripts.
Example usage:
pdf-parser.py -a malicious.pdfThis shows an overview of all objects and streams.
To extract a suspicious stream (e.g., object 8):
pdf-parser.py -o 8 -f -d malicious.pdf-o 8selects object 8-ffilters out non-stream data-ddumps the raw stream data for further analysis
Step 3:
Interactive Analysis with peepdf
For complex or heavily obfuscated PDFs, peepdf offers an interactive shell for navigating objects, streams, and scripts.
Example usage:
peepdf malicious.pdfWithin peepdf, you can:
- List objects:
info - View JavaScript:
js - Extract embedded files:
extract - Search for suspicious keywords:
/keyword
Typical Analysis Workflow
- Run PDFiD to quickly determine if the PDF warrants further investigation.
- Use pdf-parser to extract or analyze suspicious objects revealed by PDFiD.
- If needed, launch peepdf for interactive exploration, deobfuscation, and payload extraction.
When to Use It
This skill is valuable when:
- A suspicious PDF is reported by users or flagged by email security systems
- You need to assess a PDF document for embedded JavaScript, exploits, or payloads before opening it
- Triaging potentially malicious attachments in a forensic or SOC environment
- Investigating known PDF exploit kits or targeted attack campaigns
- Extracting embedded executables, scripts, or suspicious URLs from PDF files
It is not designed for analyzing the visual or rendered content of PDFs. Its focus is on the static, structural analysis of the file format for evidence of malicious activity.
Important Notes
- Never open suspicious PDFs in a standard PDF reader before analysis, as exploits may trigger on load.
- PDFiD provides a high-level overview but does not decode obfuscated scripts or deeply nested objects; always follow up with pdf-parser or peepdf.
- Some malware uses advanced obfuscation or encryption. Manual inspection, scripting, and deeper forensic analysis may be necessary.
- This skill focuses on static analysis. For behavioral analysis, use dynamic sandbox environments after initial triage.
- Keep analysis tools up to date to detect new exploitation techniques.
By systematically applying PDFiD, pdf-parser, and peepdf, analysts can quickly triage and investigate suspicious PDF files, identify embedded threats, and extract malicious payloads for further study. This approach significantly reduces the risk of accidental infection and improves the detection of document-based malware.
More Skills You Might Like
Explore similar skills to enhance your workflow
Shuffle JSON Data
shuffle-json-data skill for programming & development
SAP BTP Service Manager
Manage SAP BTP service instances, bindings, and marketplace offerings
Claude Code Expert
Claude Code is Anthropic's agentic coding tool that lives in the terminal and helps turn ideas into code faster. It combines autonomous planning, exec
Golang Testing
Automate Go unit testing and benchmark suites to ensure robust and performant code quality
Rag Architect
Use when the user asks to design RAG pipelines, optimize retrieval strategies, choose embedding models, implement vector search, or build knowledge re
Uniprot
Query and retrieve protein data from UniProt knowledge base and API