Tinybird
Tinybird file formats, SQL rules, optimization patterns, and best practices for datasources, pipes, endpoints, and materialized views
What Is This?
Tinybird is a real-time data platform built on top of ClickHouse that allows developers to ingest large volumes of data and expose it as low-latency HTTP API endpoints. It uses a file-based workflow where datasources, pipes, and endpoints are defined as plain text files with specific formats. This skill covers the file formats, SQL rules, optimization patterns, and best practices for working with Tinybird datasources, pipes, endpoints, and materialized views.
Why Use It?
Tinybird solves a common problem in data engineering: turning raw event data into fast, queryable APIs without managing infrastructure. Traditional approaches require setting up databases, writing backend services, and handling scaling manually. Tinybird abstracts all of that.
Key reasons to use Tinybird include:
- Sub-second query performance on billions of rows using ClickHouse under the hood
- File-based definitions that work well with version control and CI/CD pipelines
- Materialized views that pre-aggregate data and dramatically reduce query time at read
- HTTP endpoints generated directly from SQL pipes with no additional backend code
- Real-time ingestion via the Events API, Kafka connectors, or batch file uploads
For teams building analytics dashboards, monitoring tools, or user-facing data features, Tinybird reduces the time from raw data to production API from weeks to hours.
How to Use It?
Tinybird projects are organized around three core file types: .datasource files, .pipe files, and the endpoints defined within those pipes.
Datasource files define the schema and storage engine for your data. A basic datasource looks like this:
SCHEMA >
`timestamp` DateTime,
`user_id` String,
`event` String,
`properties` String
ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp, user_id"Always define a sorting key that matches your most common query filters. ClickHouse uses the sorting key to skip data blocks, so choosing the right columns here is the single most impactful optimization you can make.
Pipe files contain the SQL transformation logic. Each node in a pipe is a named SQL block:
NODE filter_events
SQL >
SELECT
timestamp,
user_id,
event
FROM events_datasource
WHERE event = 'page_view'
AND timestamp >= now() - INTERVAL 7 DAY
NODE aggregate_by_user
SQL >
SELECT
user_id,
count() AS total_views
FROM filter_events
GROUP BY user_idThe final node in a pipe becomes the endpoint response when the pipe is published as an API endpoint. You can add query parameters using template syntax:
WHERE event = {{ String(event_type, 'page_view') }}Materialized views are pipes that write their output to a datasource instead of serving an API. They run incrementally as new data arrives. Define them by adding TYPE materialized and a DATASOURCE target to the pipe file. Use materialized views to pre-compute aggregations that would otherwise run on every API request.
When to Use It?
Use Tinybird when you need to serve analytical queries over large datasets with low latency and you want to avoid managing your own ClickHouse cluster or building a custom API layer.
Specific scenarios where Tinybird fits well:
- Building user-facing analytics features where query time must stay under 100ms
- Processing high-volume event streams from product telemetry, logs, or IoT devices
- Replacing slow PostgreSQL or MySQL analytical queries that are hurting application performance
- Creating internal dashboards that need to query hundreds of millions of rows interactively
- Prototyping data APIs quickly before committing to a full infrastructure build
Tinybird is less suitable for transactional workloads, frequent row-level updates, or use cases that require complex joins across many large tables without careful pre-aggregation.
Important Notes
Follow these rules to avoid common mistakes and performance problems:
- Never use
SELECT *in production pipes. Always name the columns you need to reduce data scanned. - Avoid high-cardinality columns in GROUP BY without filtering first. Pre-filter with WHERE before aggregating.
- Use ReplacingMergeTree for datasources where you need to handle deduplication or upsert patterns.
- Test sorting keys before loading large datasets. Changing a sorting key requires recreating the datasource and reloading all data.
- Limit materialized view complexity. Keep each materialized view focused on one aggregation pattern. Chaining multiple transformations in a single materialized pipe makes debugging harder.
- Use query parameters with defaults. Every template variable should have a sensible default so the endpoint works without required inputs during testing.
- Monitor pipe performance using the Tinybird UI query analyzer to identify nodes that scan the most data and optimize them first.
More Skills You Might Like
Explore similar skills to enhance your workflow
Expo UI Jetpack Compose
Build native Android UI components with Jetpack Compose in Expo projects
Gws Sheets
Read and write Google Sheets spreadsheet data via CLI
Code Refactor
Perform bulk code refactoring operations like renaming variables/functions across files, replacing patterns, and updating API calls. Use when users re
Database Schema Designer
Database Schema Designer automation and integration
Sprint Plan
Plan a sprint with capacity estimation, story selection, dependency mapping, and risk identification. Use when preparing for sprint planning,
Market Sizing
Estimate market size using TAM, SAM, and SOM with top-down and bottom-up approaches. Use when sizing a market opportunity, estimating addressable