Analyzing Malicious PDF with peepdf
Perform static analysis of malicious PDF documents using peepdf, pdfid, and pdf-parser to extract embedded JavaScript,
What Is This
"Analyzing Malicious PDF with peepdf" is a cybersecurity skill focused on using peepdf and companion tools like pdfid and pdf-parser to perform static analysis of potentially malicious PDF documents. This technique allows analysts, incident responders, and forensic investigators to identify and extract embedded JavaScript, shellcode, and other suspicious objects within PDF files. By leveraging these tools, you can safely dissect weaponized PDF documents without executing the payload, ensuring a controlled and comprehensive examination.
peepdf is a powerful Python-based tool designed for detailed PDF analysis. It provides an interactive shell, enabling analysts to traverse the internal object structure of PDFs, decode streams, and extract embedded content. Complementary tools like pdfid and pdf-parser, developed by Didier Stevens, allow for rapid triage and in-depth object parsing, forming a robust workflow for static PDF malware analysis.
Why Use It
Malicious PDF documents are commonly used in phishing campaigns and targeted attacks. Attackers often embed harmful JavaScript, exploits, or even executable payloads within PDFs to compromise end-user systems. Traditional antivirus solutions may fail to detect such threats, especially if obfuscation or novel exploitation techniques are used.
Static analysis using peepdf and related tools offers several advantages:
- Safe Inspection: Examine the PDF’s structure and content without triggering any embedded exploits or payloads.
- Detailed Object Analysis: Identify suspicious objects, streams, and encoded data that could harbor malicious code.
- Extraction of Artifacts: Extract and analyze embedded JavaScript, shellcode, and files for further reverse engineering or sandbox execution.
- Signature Development: Gather indicators of compromise (IOCs) and behavioral patterns to enhance detection rules for security tools.
- Forensic Integrity: Maintain evidence integrity by working in a controlled, read-only manner.
This skill is essential for malware analysts, DFIR (Digital Forensics and Incident Response) professionals, and security engineers who need to dissect, understand, and document PDF-based threats.
How to Use It
Prerequisites
Before starting, ensure you have:
- Python 3.8+ with peepdf-3 installed (
pip install peepdf-3) - Didier Stevens’ pdfid.py and pdf-parser.py (download from his GitHub repository)
- An isolated analysis environment (such as a virtual machine or sandbox)
- Optionally, PyV8 for JavaScript emulation and Pylibemu for shellcode analysis
Step 1:
Triage with pdfid
Begin by scanning the suspicious PDF for known exploit indicators using pdfid:
python pdfid.py suspicious.pdfLook for the presence of suspicious keywords, such as /JavaScript, /JS, /OpenAction, /Launch, and /EmbeddedFile. A positive result for these keywords suggests the PDF may contain embedded scripts or payloads.
Step 2:
Parse Objects with pdf-parser
Use pdf-parser to identify and extract objects of interest:
python pdf-parser.py suspicious.pdfTo filter for objects containing JavaScript:
python pdf-parser.py suspicious.pdf --search javascriptTo dump the content of a specific object (for example, object 5):
python pdf-parser.py suspicious.pdf -o 5 -dStep 3:
Interactive Analysis with peepdf
Invoke peepdf in interactive shell mode:
peepdf suspicious.pdfInside the peepdf shell, you can list all objects:
> objectsTo inspect a specific object (e.g., object 7):
> info 7To dump and decode suspicious streams:
> stream 7peepdf can automatically decode common encodings (FlateDecode, ASCIIHexDecode, etc.), making it easier to analyze obfuscated content.
Step 4:
Extract and Analyze Embedded Content
When you encounter objects with embedded JavaScript or files, extract them:
> js_unescape 10This command attempts to deobfuscate JavaScript code in object 10. For embedded files:
> extract 15The extracted artifacts can be further analyzed with external tools (e.g., running JavaScript in a controlled emulator, or examining shellcode with disassemblers).
Step 5:
Optional - Emulate JavaScript and Analyze Shellcode
If PyV8 is installed, peepdf can emulate JavaScript code, helping you understand its behavior without executing it on a live system. Pylibemu can be used to analyze shellcode, identifying its type and intent.
When to Use It
This skill is particularly useful in the following scenarios:
- Triaging suspicious PDF attachments received via phishing emails
- Investigating malware campaigns leveraging PDF exploits
- Extracting and analyzing embedded JavaScript or shellcode for threat intelligence
- Forensic examination of weaponized documents in incident response cases
- Developing custom detection signatures for PDF-based malware
Important Notes
- Isolated Environment: Always perform PDF malware analysis in a sandbox or virtual machine to prevent accidental execution of malicious code.
- Legal Considerations: Only analyze PDFs you are authorized to handle. Handling live malware may have legal and ethical implications.
- Tool Updates: Keep peepdf and companion tools updated to handle evolving PDF formats and obfuscation techniques.
- Limitations: While static analysis is powerful, some advanced threats may use encryption or multi-stage payloads that require dynamic analysis for full understanding.
By following this structured approach, you can safely and effectively analyze suspicious PDFs, extract malicious artifacts, and support broader security operations. This skill is a core component of modern malware analysis and digital forensics workflows.
More Skills You Might Like
Explore similar skills to enhance your workflow
Dependency Upgrade
Master major dependency version upgrades, compatibility analysis, staged upgrade strategies, and comprehensive testing approaches
Soc2 Compliance
Use when the user asks to prepare for SOC 2 audits, map Trust Service Criteria, build control matrices, collect audit evidence, perform gap analysis,
Code Health
Scans the codebase for dead code, tech debt, outdated dependencies, and code quality issues. Delegates to the Centinela (QA) agent
Nemoclaw Setup
A Claude Code skill for nemoclaw setup workflows and automation
C# NUnit
Enhance programming and development testing workflows with the C# NUnit skill
Red Team
Use when planning or executing authorized red team engagements, attack path analysis, or offensive security simulations. Covers MITRE ATT&CK kill-chai