2.2.1 - Create an ETL Pipeline with Postgres in Kestra
Last updated Jan 28, 2025
Last updated
Last updated Jan 28, 2025
Last updated
Youtube Video | ~28 min
I recommend going through all 3 videos in one sitting
To get pgAdmin to connect, I use a docker-compose.yml that Bruno put in slack and then needed to modify one line in the video code...
First, we need to connect our Kestra, pgAdmin, and Postgres using docker-compose.yml
Open Docker Desktop
Add this docker-compose.yml
from Bruno to your module-2 directory:
Final YAMLs
To get the editor window, click 'Flows on the left hnd side -> 'Editor' tab -> 'Source and topography' as your file view option, similar to his in the video, ours should look like:
Get used to using Kestra and setting things up. I recommend writing this out, understand what it is doing, and then testing the run and output:
Line - url: jdbc:postgresql://kestra-metadata:5432/kestra was changed from video and github so we can use pgAdmin in our docker-compose and not have to download it locally.
Kestra, save and execute - then open pgAdmin and add a new server connection:
Terminal head to the directory of your docker-compose.yml and run
Note that this load may take a bit of time for the first compose
Check that Kestra 8080 and pgAdmin 9000 are running
You should now see two empty green taxi tables in pgAdmin
You should now see data in the green taxi staging table in pgAdmin
You should now see unique id's and filename columns updated in the green taxi staging table in pgAdmin
This appends rows so we have duplicates, thus we need to Truncate our tables first:
You should now see data that is no longer appending itself in the green taxi staging table in pgAdmin
You should now see data in the main green taxi table in pgAdmin
I feel like there should be a more efficient way?
You should now see yellow and green taxi table in pgAdmin
Up next, we will add all the data using the cloud