Page cover image

2.3.1 - Create an ETL Pipeline with GCS and BigQuery in Kestra

Last updated Jan 31, 2025

Youtube Video | ~20 min

https://youtu.be/nKqjjLJ7YXs

Now that you've learned how to build ETL pipelines locally using Postgres, we are ready to move to the cloud. In this section, we'll load the same Yellow and Green Taxi data to Google Cloud Platform (GCP) using:

  1. Google Cloud Storage (GCS) as a data lake

  2. BigQuery as a data warehouse.


🌐 GCS site and we will do some work here setting up our service account, permissions, and grabbing a json key.

⚠️ Be sure that your json key information is kept private and off github

To connect GCP to Kestra, we use modify and execute our flow 4. You can learn more about KVs here

🐛 I tried to change my location in flow 4, but that caused an error. Maybe the region was too large?

and then we run our flow 5 which will also create our bucket and bigquery dataset areas on GCP.

Your flows should now have connected you kestra to GCP and you should see a bucket created

Our end goal is for our flow to send out data (csv in our case) to our GCS data lake bucket where we can pass it over to bigquery. This dataset is large and would crash our computer locally.

👀 Run flow 6 to see how it sends our data to GCP bucket and then to bigquery

Last updated