Analyzing PDF Malware with PDFiD

Analyzes malicious PDF files using PDFiD, pdf-parser, and peepdf to identify embedded JavaScript, shellcode,

What Is This

"Analyzing PDF Malware with PDFiD" is a technical skill designed to help cybersecurity professionals and analysts identify and triage malicious PDF documents. This skill makes use of Didier Stevens’ PDFiD and pdf-parser tools, along with peepdf, to scan PDF files for suspicious objects, embedded scripts, shellcode, and exploit code. It allows analysts to perform static analysis of PDF files without opening or rendering them, reducing the risk of accidental exploitation. By examining the internal structure of a PDF, this skill helps uncover indicators of compromise, embedded payloads, and exploit techniques commonly used in targeted attacks and spam campaigns.

Why Use It

Malicious PDF files are a common vector for delivering malware, phishing links, and exploits targeting vulnerabilities in PDF readers, especially Adobe Reader. Attackers often embed JavaScript, shellcode, or files within PDFs to trigger exploits or drop additional payloads. Traditional antivirus solutions may miss sophisticated or obfuscated threats within PDFs. Analyzing the file’s structure with PDFiD and related tools enables you to:

  • Detect the presence of embedded JavaScript, launch actions, or automatic execution triggers
  • Identify suspicious objects such as embedded files, streams, or URLs
  • Extract and inspect potential payloads or exploit code
  • Perform rapid triage of potentially malicious attachments without executing them
  • Reduce risk by analyzing files statically before any dynamic or sandbox-based analysis

How to Use It

Prerequisites

  • Python 3.8 or above
  • Didier Stevens’ tools installed via pip:
    pip install pdfid pdf-parser
  • peepdf installed for deeper, interactive analysis:
    pip install peepdf

Step 1:

Initial Structure Scanning with PDFiD

PDFiD performs a lightweight scan for suspicious keywords and objects within a PDF file. It does not parse the entire file but provides a quick overview of potentially dangerous features.

Example usage:

pdfid malicious.pdf

Sample output:

PDFiD 0.2.7 malicious.pdf
  /JavaScript        2
  /JS                2
  /OpenAction        1
  /AA                0
  /Launch            0
  /EmbeddedFile      1
...

Interpretation:

  • /JavaScript and /JS indicate embedded scripts, commonly used in exploits.
  • /OpenAction can trigger code execution when the document is opened.
  • /EmbeddedFile suggests the presence of additional embedded content.

Step 2:

Deep Dive with pdf-parser

After identifying suspicious elements, use pdf-parser to inspect or extract specific objects, streams, or scripts.

Example usage:

pdf-parser.py -a malicious.pdf

This shows an overview of all objects and streams.

To extract a suspicious stream (e.g., object 8):

pdf-parser.py -o 8 -f -d malicious.pdf
  • -o 8 selects object 8
  • -f filters out non-stream data
  • -d dumps the raw stream data for further analysis

Step 3:

Interactive Analysis with peepdf

For complex or heavily obfuscated PDFs, peepdf offers an interactive shell for navigating objects, streams, and scripts.

Example usage:

peepdf malicious.pdf

Within peepdf, you can:

  • List objects: info
  • View JavaScript: js
  • Extract embedded files: extract
  • Search for suspicious keywords: /keyword

Typical Analysis Workflow

  1. Run PDFiD to quickly determine if the PDF warrants further investigation.
  2. Use pdf-parser to extract or analyze suspicious objects revealed by PDFiD.
  3. If needed, launch peepdf for interactive exploration, deobfuscation, and payload extraction.

When to Use It

This skill is valuable when:

  • A suspicious PDF is reported by users or flagged by email security systems
  • You need to assess a PDF document for embedded JavaScript, exploits, or payloads before opening it
  • Triaging potentially malicious attachments in a forensic or SOC environment
  • Investigating known PDF exploit kits or targeted attack campaigns
  • Extracting embedded executables, scripts, or suspicious URLs from PDF files

It is not designed for analyzing the visual or rendered content of PDFs. Its focus is on the static, structural analysis of the file format for evidence of malicious activity.

Important Notes

  • Never open suspicious PDFs in a standard PDF reader before analysis, as exploits may trigger on load.
  • PDFiD provides a high-level overview but does not decode obfuscated scripts or deeply nested objects; always follow up with pdf-parser or peepdf.
  • Some malware uses advanced obfuscation or encryption. Manual inspection, scripting, and deeper forensic analysis may be necessary.
  • This skill focuses on static analysis. For behavioral analysis, use dynamic sandbox environments after initial triage.
  • Keep analysis tools up to date to detect new exploitation techniques.

By systematically applying PDFiD, pdf-parser, and peepdf, analysts can quickly triage and investigate suspicious PDF files, identify embedded threats, and extract malicious payloads for further study. This approach significantly reduces the risk of accidental infection and improves the detection of document-based malware.