Sarif Parsing

Automate and integrate SARIF Parsing to streamline security analysis results

SARIF Parsing is a community skill for processing Static Analysis Results Interchange Format files, covering SARIF file parsing, result aggregation, rule extraction, code location mapping, and report generation for security and code quality tooling.

What Is This?

Overview

SARIF Parsing provides tools for reading and processing SARIF files that standardize static analysis tool output. It covers SARIF file parsing that reads the JSON-based format and extracts results, rules, and tool information from analysis runs, result aggregation that combines findings from multiple tools into a unified view with deduplication, rule extraction that maps result identifiers to rule descriptions, severity levels, and help text, code location mapping that resolves file paths, line numbers, and code regions from SARIF location objects, and report generation that creates summary reports with severity distributions and trend tracking. The skill helps teams process static analysis results programmatically.

Who Should Use This

This skill serves security engineers processing scanner output from multiple tools, DevOps teams integrating static analysis into CI pipelines, and developers building dashboards for code quality metrics.

Why Use It?

Problems It Solves

Different static analysis tools produce output in different formats making aggregation and comparison difficult. Extracting actionable information from raw SARIF files requires understanding the complex nested JSON structure. Deduplicating findings across multiple tools that detect the same issue requires matching by location and rule. Tracking analysis trends over time needs structured data extraction from each scan run.

Core Highlights

SARIF reader parses the standardized format and extracts structured results. Result aggregator combines findings across multiple tools and runs. Rule mapper connects result IDs to human-readable descriptions and severity. Location resolver maps findings to specific file paths and code regions.

How to Use It?

Basic Usage

import json

def parse_sarif(
  filepath: str
) -> list[dict]:
  with open(
    filepath) as f:
    sarif = json.load(f)

  findings = []
  for run in sarif.get(
    'runs', []
  ):
    tool = run['tool'][
      'driver']['name']
    rules = {
      r['id']: r
      for r in run['tool'][
        'driver'].get(
          'rules', [])}

    for result in run.get(
      'results', []
    ):
      rule_id = result.get(
        'ruleId', '')
      rule = rules.get(
        rule_id, {})
      locs = result.get(
        'locations', [{}])
      loc = locs[0].get(
        'physicalLocation',
        {})
      findings.append({
        'tool': tool,
        'rule': rule_id,
        'severity': result
          .get('level',
            'warning'),
        'message': result
          .get('message',
            {}).get(
              'text', ''),
        'file': loc.get(
          'artifactLocation',
          {}).get('uri', ''),
        'line': loc.get(
          'region', {}).get(
            'startLine', 0)
      })
  return findings

results = parse_sarif(
  'scan.sarif')
for r in results[:5]:
  print(
    f'{r["severity"]}: '
    f'{r["file"]}:'
    f'{r["line"]} '
    f'{r["rule"]}')

Real-World Examples

import json
from collections import (
  Counter)

class SARIFAggregator:
  def __init__(self):
    self.findings = []

  def load(
    self,
    filepath: str
  ):
    with open(
      filepath) as f:
      sarif = json.load(f)
    for run in sarif.get(
      'runs', []
    ):
      tool = run['tool'][
        'driver']['name']
      for result in (
        run.get(
          'results', [])
      ):
        self.findings\
          .append({
            'tool': tool,
            'rule': result
              .get('ruleId'),
            'level': result
              .get('level',
                'warning')})

  def summary(self):
    by_sev = Counter(
      f['level']
      for f in
        self.findings)
    by_tool = Counter(
      f['tool']
      for f in
        self.findings)
    return {
      'total': len(
        self.findings),
      'by_severity':
        dict(by_sev),
      'by_tool':
        dict(by_tool)}

agg = SARIFAggregator()
agg.load('semgrep.sarif')
agg.load('codeql.sarif')
report = agg.summary()
print(f'Total: '
  f'{report["total"]}')
for sev, count in (
  report['by_severity']
    .items()
):
  print(f'  {sev}: {count}')

Advanced Tips

Deduplicate findings by combining file path and line number as a key when multiple tools detect the same issue. Map SARIF severity levels to your organization's priority system for consistent triage. Use the SARIF fingerprint field when available for more reliable deduplication across scan runs.

When to Use It?

Use Cases

Parse SARIF output from security scanners to extract findings with file locations and severity. Aggregate results from multiple static analysis tools into a unified report. Track the number and severity of findings across builds in a CI pipeline.

Related Topics

SARIF, static analysis, code scanning, security tools, code quality, CI integration, and vulnerability management.

Important Notes

Requirements

SARIF version 2.1.0 files from compatible static analysis tools. JSON parsing library for reading the SARIF format structure. Source code repository access for validating file path references in SARIF location data.

Usage Recommendations

Do: validate SARIF file structure before processing since some tools produce non-standard output. Use rule metadata to provide context when displaying findings to developers. Track findings across builds to identify trends and regressions.

Don't: treat all findings as equal severity since SARIF level values carry important priority information. Parse SARIF files without error handling since malformed files from some tools may cause parsing failures. Ignore the tool version information since rule definitions may change between versions.

Limitations

SARIF file structure varies between tools with some using optional fields differently. Deduplication across tools is imperfect since different scanners may report the same issue with different locations. Large SARIF files from comprehensive scans may require streaming parsers for memory-efficient processing.