Tinybird

Tinybird file formats, SQL rules, optimization patterns, and best practices for datasources, pipes, endpoints, and materialized views

What Is This?

Tinybird is a real-time data platform built on top of ClickHouse that allows developers to ingest large volumes of data and expose it as low-latency HTTP API endpoints. It uses a file-based workflow where datasources, pipes, and endpoints are defined as plain text files with specific formats. This skill covers the file formats, SQL rules, optimization patterns, and best practices for working with Tinybird datasources, pipes, endpoints, and materialized views.

Why Use It?

Tinybird solves a common problem in data engineering: turning raw event data into fast, queryable APIs without managing infrastructure. Traditional approaches require setting up databases, writing backend services, and handling scaling manually. Tinybird abstracts all of that.

Key reasons to use Tinybird include:

  • Sub-second query performance on billions of rows using ClickHouse under the hood
  • File-based definitions that work well with version control and CI/CD pipelines
  • Materialized views that pre-aggregate data and dramatically reduce query time at read
  • HTTP endpoints generated directly from SQL pipes with no additional backend code
  • Real-time ingestion via the Events API, Kafka connectors, or batch file uploads

For teams building analytics dashboards, monitoring tools, or user-facing data features, Tinybird reduces the time from raw data to production API from weeks to hours.

How to Use It?

Tinybird projects are organized around three core file types: .datasource files, .pipe files, and the endpoints defined within those pipes.

Datasource files define the schema and storage engine for your data. A basic datasource looks like this:

SCHEMA >
    `timestamp` DateTime,
    `user_id` String,
    `event` String,
    `properties` String

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp, user_id"

Always define a sorting key that matches your most common query filters. ClickHouse uses the sorting key to skip data blocks, so choosing the right columns here is the single most impactful optimization you can make.

Pipe files contain the SQL transformation logic. Each node in a pipe is a named SQL block:

NODE filter_events
SQL >
    SELECT
        timestamp,
        user_id,
        event
    FROM events_datasource
    WHERE event = 'page_view'
      AND timestamp >= now() - INTERVAL 7 DAY

NODE aggregate_by_user
SQL >
    SELECT
        user_id,
        count() AS total_views
    FROM filter_events
    GROUP BY user_id

The final node in a pipe becomes the endpoint response when the pipe is published as an API endpoint. You can add query parameters using template syntax:

WHERE event = {{ String(event_type, 'page_view') }}

Materialized views are pipes that write their output to a datasource instead of serving an API. They run incrementally as new data arrives. Define them by adding TYPE materialized and a DATASOURCE target to the pipe file. Use materialized views to pre-compute aggregations that would otherwise run on every API request.

When to Use It?

Use Tinybird when you need to serve analytical queries over large datasets with low latency and you want to avoid managing your own ClickHouse cluster or building a custom API layer.

Specific scenarios where Tinybird fits well:

  • Building user-facing analytics features where query time must stay under 100ms
  • Processing high-volume event streams from product telemetry, logs, or IoT devices
  • Replacing slow PostgreSQL or MySQL analytical queries that are hurting application performance
  • Creating internal dashboards that need to query hundreds of millions of rows interactively
  • Prototyping data APIs quickly before committing to a full infrastructure build

Tinybird is less suitable for transactional workloads, frequent row-level updates, or use cases that require complex joins across many large tables without careful pre-aggregation.

Important Notes

Follow these rules to avoid common mistakes and performance problems:

  • Never use SELECT * in production pipes. Always name the columns you need to reduce data scanned.
  • Avoid high-cardinality columns in GROUP BY without filtering first. Pre-filter with WHERE before aggregating.
  • Use ReplacingMergeTree for datasources where you need to handle deduplication or upsert patterns.
  • Test sorting keys before loading large datasets. Changing a sorting key requires recreating the datasource and reloading all data.
  • Limit materialized view complexity. Keep each materialized view focused on one aggregation pattern. Chaining multiple transformations in a single materialized pipe makes debugging harder.
  • Use query parameters with defaults. Every template variable should have a sensible default so the endpoint works without required inputs during testing.
  • Monitor pipe performance using the Tinybird UI query analyzer to identify nodes that scan the most data and optimize them first.