Database Optimizer

Database Optimizer automation and integration for query and performance tuning

Database Optimizer is a community skill for improving database query performance, covering query plan analysis, index tuning, query rewriting, connection pooling configuration, and slow query identification for relational database performance engineering.

What Is This?

Overview

Database Optimizer provides patterns for diagnosing and resolving database performance bottlenecks. It covers query plan analysis that reads EXPLAIN output to identify full table scans, missing indexes, and inefficient join strategies, index tuning that adds, modifies, or removes indexes based on workload profiling, query rewriting that restructures SQL statements to leverage optimizer capabilities like index-only scans and predicate pushdown, connection pooling configuration that sizes pools to balance throughput with resource limits, and slow query identification that captures and ranks queries by execution time for targeted optimization. The skill enables engineers to reduce query latency and increase database throughput systematically.

Who Should Use This

This skill serves backend engineers troubleshooting slow API endpoints, database administrators maintaining production database performance, and platform teams establishing query performance standards and monitoring.

Why Use It?

Problems It Solves

Slow queries degrade application response times and user experience as data volume grows. Missing or redundant indexes waste storage and increase write latency without improving read performance. Connection exhaustion causes application errors during traffic spikes. Performance problems are discovered in production rather than caught during development.

Core Highlights

Plan analyzer reads EXPLAIN output to identify scan types, join methods, and row estimates. Index tuner recommends additions and removals based on query workload analysis. Query rewriter suggests SQL restructuring for better optimizer plan selection. Pool sizer calculates connection pool settings from concurrency and resource data.

How to Use It?

Basic Usage

-- Query plan analysis
EXPLAIN ANALYZE
SELECT o.id,
  o.total,
  c.name
FROM orders o
JOIN customers c
  ON c.id
    = o.customer_id
WHERE o.status
  = 'pending'
  AND o.created_at
    > NOW()
      - INTERVAL '7 days'
ORDER BY
  o.created_at DESC
LIMIT 50;

-- Targeted index
CREATE INDEX
  idx_orders_pending
  ON orders(
    status,
    created_at DESC)
  WHERE status
    = 'pending';

-- Check index usage
SELECT
  indexrelname,
  idx_scan,
  idx_tup_read
FROM
  pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY
  pg_relation_size(
    indexrelid) DESC;

Real-World Examples

import re
from dataclasses\
  import dataclass

@dataclass
class SlowQuery:
  sql: str
  avg_ms: float
  calls: int
  total_ms: float

class QueryAnalyzer:
  def __init__(
    self,
    threshold_ms:\
      float = 100
  ):
    self.threshold =\
      threshold_ms
    self.queries = []

  def analyze_log(
    self,
    log_entries:\
      list[dict]
  ) -> list[SlowQuery]:
    grouped = {}
    for entry\
        in log_entries:
      sql = self\
        ._normalize(
          entry['sql'])
      if sql not\
          in grouped:
        grouped[sql] =\
          []
      grouped[sql]\
        .append(
          entry[
            'duration_ms'])
    for sql, times\
        in grouped\
          .items():
      avg = sum(times)\
        / len(times)
      if avg >\
          self.threshold:
        self.queries\
          .append(
            SlowQuery(
              sql=sql,
              avg_ms=avg,
              calls=\
                len(times),
              total_ms=\
                sum(times)
        ))
    self.queries.sort(
      key=lambda q:
        q.total_ms,
      reverse=True)
    return self.queries

  def _normalize(
    self, sql: str
  ) -> str:
    return re.sub(
      r'\d+',
      '?', sql)

Advanced Tips

Sort slow queries by total execution time rather than average time to prioritize queries that consume the most overall database resources. Use partial indexes with WHERE clauses matching common filter predicates to reduce index size while covering the most frequent queries. Monitor the buffer cache hit ratio to determine whether performance issues stem from insufficient memory rather than missing indexes.

When to Use It?

Use Cases

Analyze slow query logs to identify and optimize the highest-impact queries in a production database. Audit existing indexes to remove unused ones and add missing covering indexes. Configure connection pool settings to handle peak traffic without exhausting database connections.

Important Notes

Requirements

Access to database query logs or a performance monitoring tool like pg_stat_statements. EXPLAIN ANALYZE permissions on the target database. Understanding of the application query patterns and data distribution characteristics.

Usage Recommendations

Do: profile the actual production workload rather than optimizing queries in isolation with test data. Review EXPLAIN ANALYZE output after adding indexes to confirm the optimizer uses them. Monitor index usage statistics and remove unused indexes that only add write overhead.

Don't: add indexes without understanding the write overhead since each index slows INSERT and UPDATE operations. Optimize queries based on small test datasets which may produce different execution plans than production-scale data. Increase connection pool size as the first response to connection errors when the root cause may be long-running queries holding connections.

Limitations

Query plan analysis reflects current data distribution and statistics which may change as data grows or skews. Optimization recommendations for one database engine may not apply to another since PostgreSQL, MySQL, and SQL Server have different optimizer behaviors. Connection pooling configuration depends on hardware resources and workload patterns that vary between environments.

More Skills You Might Like

Explore similar skills to enhance your workflow