Page cover

2.2.1 - Create an ETL Pipeline with Postgres in Kestra

Last updated Jan 28, 2025

Youtube Video | ~28 min

I recommend going through all 3 videos in one sitting

To get pgAdmin to connect, I use a docker-compose.yml that Bruno put in slack and then needed to modify one line in the video code...

Build postgres_taxi.yaml Flow

First, we need to connect our Kestra, pgAdmin, and Postgres using docker-compose.yml

  1. Open Docker Desktop

  2. Add this docker-compose.yml from Bruno to your module-2 directory:

  3. ▪️Terminal head to the directory of your docker-compose.yml and run

    1. ⚠️ Note that this load may take a bit of time for the first compose

  4. Check that Kestra 8080 and pgAdmin 9000 are running


Build postgres_taxi.yaml

Final YAMLs

1

Goal - View Empty Table in PgAdmin

To get the editor window, click 'Flows on the left hnd side -> 'Editor' tab -> 'Source and topography' as your file view option, similar to his in the video, ours should look like:

Get used to using Kestra and setting things up. I recommend writing this out, understand what it is doing, and then testing the run and output:

Line - url: jdbc:postgresql://kestra-metadata:5432/kestra was changed from video and github so we can use pgAdmin in our docker-compose and not have to download it locally.

Kestra, save and execute - then open pgAdmin and add a new server connection:

You should now see two empty green taxi tables in pgAdmin

2

Add green_copy_in_to_staging_table

Add to tasks the green_copy_in_to_staging_table and re-run in Kestra

You should now see data in the green taxi staging table in pgAdmin

3

Add green_add_unique_id_and_filename

Add to tasks the green_add_unique_id_and_filename and re-run in Kestra

You should now see unique id's and filename columns updated in the green taxi staging table in pgAdmin

⚠️ This appends rows so we have duplicates, thus we need to Truncate our tables first:

4

Add green_truncate_staging_table

Add to tasks green_truncate_staging_table after green_create_staging_table

Re-run in Kestra

You should now see data that is no longer appending itself in the green taxi staging table in pgAdmin

5

Add green_merge_data

Add to tasks green_merge_dataand re-run in Kestra

You should now see data in the main green taxi table in pgAdmin

6

Add purge_files

To keep Kestra clean and not run out of room, we want to purge these files after loading

7

Add yellow taxi's using 'if' statement

I feel like there should be a more efficient way?

Add to tasks and re-run in Kestra

You should now see yellow and green taxi table in pgAdmin

👀 Up next, we will add all the data using the cloud ☁️

Last updated