Analyzing Web Server Logs for Intrusion

Analyzing Web Server Logs for Intrusion

Parse Apache and Nginx access logs to detect SQL injection attempts, local file inclusion, directory traversal,

Category: development Source: mukul975/Anthropic-Cybersecurity-Skills

What Is This

Analyzing Web Server Logs for Intrusion is a cybersecurity skill focused on parsing and analyzing access logs from web servers, specifically Apache and Nginx, to detect evidence of common web-based attack techniques. This skill leverages regular expressions to match OWASP attack signatures within log entries, enriches data with GeoIP information for source attribution, and applies statistical anomaly detection to uncover patterns indicating brute-force attacks or other abnormal activity. It is particularly useful for identifying attempts at SQL injection, local file inclusion (LFI), directory traversal, web scanner activity, and brute-force login patterns.

The skill is implemented in Python and is designed for use by security operations center (SOC) analysts, incident responders, and threat hunters. By systematically analyzing access logs and highlighting suspicious activity, it helps organizations strengthen their security monitoring and incident response capabilities.

Why Use It

Threat actors frequently target web applications by exploiting vulnerabilities through HTTP requests. Attack methods such as SQL injection, directory traversal, and brute-force password guessing are often visible in access logs as anomalous or signature-bearing requests. However, given the sheer volume and complexity of web server logs, manual analysis is impractical and error-prone.

Using automated techniques to parse and analyze logs provides several advantages:

  • Early Detection: Quickly identify indicators of attack, enabling faster response and mitigation.
  • Comprehensive Coverage: Systematically scan for a wide range of attack techniques using well-known signatures.
  • Attribution: Enrich log entries with GeoIP data to identify the geographic origin of suspicious requests.
  • Pattern Recognition: Detect brute-force patterns and other statistical anomalies that may indicate automated attacks.
  • Incident Investigation: Provide structured evidence and context for post-incident analysis and reporting.

Applying this skill increases the effectiveness of security teams in discovering and responding to web application threats, ultimately reducing the risk of successful exploitation.

How to Use It

Prerequisites

Before using this skill, ensure you have:

  • Basic understanding of security operations and web server log formats
  • Python 3.8 or newer installed on your analysis machine
  • The required Python packages (geoip2 for GeoIP enrichment and user-agents for parsing user agent strings)
  • Permission to access and analyze the relevant web server logs

Step-by-Step Instructions

1. Install Dependencies

Install the required packages:

pip install geoip2 user-agents

2. Collect Access Logs

Obtain the web server access logs in the appropriate format. For Apache, this is typically the Combined Log Format. For Nginx, use the default access log format.

Example Apache Combined Log entry:

192.168.1.100 - - [12/Mar/2024:08:23:45 +0000] "GET /index.php?id=1' OR 1=1-- HTTP/1.1" 200 512 "-" "Mozilla/5.0"

3. Parse Log Entries

Extract key fields from each entry: source IP, timestamp, HTTP method, URI, status code, and response size.

Example Python code to parse an Apache log entry:

import re

log_entry = '''192.168.1.100 - - [12/Mar/2024:08:23:45 +0000] "GET /index.php?id=1' OR 1=1-- HTTP/1.1" 200 512 "-" "Mozilla/5.0"'''

pattern = re.compile(r'(?P<ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<timestamp>.*?)\] "(?P<method>\S+) (?P<uri>\S+) HTTP/[\d.]+" (?P<status>\d+) (?P<size>\d+)')
match = pattern.match(log_entry)
if match:
    print(match.groupdict())

4. Apply Attack Pattern Matching

Use regular expressions to detect OWASP-style attack signatures in the URI or query parameters.

Example for SQL injection:

sql_injection_pattern = re.compile(r"(\bUNION\b|\bSELECT\b|['\"].*?--)", re.IGNORECASE)
if sql_injection_pattern.search(match.group('uri')):
    print("Potential SQL injection attempt detected.")

Similar patterns can be used for detecting directory traversal (\.\./), local file inclusion (/etc/passwd), and common web scanner fingerprints.

5. GeoIP Enrichment

Enrich the IP address with geolocation data to understand the origin of the request.

import geoip2.database

reader = geoip2.database.Reader('/path/to/GeoLite2-City.mmdb')
response = reader.city(match.group('ip'))
print(f"Country: {response.country.name}, City: {response.city.name}")

6. Statistical Anomaly Detection

Track request frequency and response size per IP or URI to identify brute-force or scanning activity.

Example:

from collections import Counter

ip_counter = Counter()
## For each parsed log entry:
ip_counter[match.group('ip')] += 1

## Later, find IPs with high request counts
for ip, count in ip_counter.items():
    if count > 1000:  # Threshold should be tuned
        print(f"Brute-force or scanner activity detected from {ip}")

7. Review and Investigate

Flagged entries should be further reviewed. Correlate findings with other security telemetry (WAF, EDR, SIEM) for comprehensive incident response.

When to Use It

  • During Incident Response: When investigating post-incident logs to determine the scope and method of an attack.
  • Proactive Threat Hunting: Regularly scanning access logs for signs of attacks missed by other controls.
  • Developing Detection Rules: Building or tuning SIEM queries and detection rules for web-based threats.
  • Security Monitoring Validation: Testing and validating the effectiveness of your web application security coverage.

Important Notes

  • Always obtain authorization before analyzing production logs, especially in environments with sensitive data.
  • Attackers may use evasion techniques, such as encoding payloads or rotating IP addresses, which can limit detection effectiveness.
  • Regularly update your regex signatures based on the latest OWASP and threat intelligence sources.
  • GeoIP databases may not always provide accurate attribution and should be considered supplementary context.
  • High request frequency is not always malicious (e.g., legitimate web crawlers); verify before taking action.
  • Maintain audit logs and document findings for compliance and incident response purposes.

By systematically applying these techniques, security teams can greatly enhance their ability to detect and respond to common and emerging web application threats.