2.2 - ETL Pipelines in Kestra: Detailed Walkthrough
This week, we're gonna build ETL pipelines for Yellow and Green Taxi data from NYC’s Taxi and Limousine Commission (TLC). You will:
Extract data from CSV files.
Load it into Postgres or Google Cloud (GCS + BigQuery).
Explore scheduling and backfilling workflows.
This introductory flow is added just to demonstrate a simple data pipeline which extracts data via HTTP REST API, transforms that data in Python and then queries it using DuckDB. For this stage, a new separate Postgres database is created for the exercises.
Last updated