1.2.2 - Ingesting NY Taxi Data to Postgres
Last updated Jan 19, 2025
Youtube Video | ~29min
⛔ I recommend not pausing your workflow and going through this entire video. You may run into a number of issues related to pgcli, so allow for extra time spent here (maybe hours). Search Slack, Search FAQ, try a new virtual environment, ask for help.
✍️ In this video we will learn how to configure and run Postgres in Docker. We will download the taxi NY dataset as a csv file and read it into a jupyter notebook. We will also look at the data using pgcli, but will use other options moving forward.
Datasets
zones_data - https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page and click:
Taxi Zone Lookup Table (CSV)
✅ Now you should have 2 .csv files locally
Running Postgres w/ Docker (Mac)
📼 ~minute 6
▪️ Terminal
✅ After running, you should see postgres files in your ny_taxi_postgres_data directory
PGCLI
📼 ~minute 7
▪️ Terminal
Using pgcli to connect to Postgres
hhostname p port u username d database name
✅ You can now explore your dataset in therteminal window (once you have some)
Jupyter Notebook
▪️ Terminal
This should open a jupyter notebook web browser tab. Follow along with the youtube video to finalize your jupyter notebook.
⚒️ In future videos we use the zone csv data as well. I'm unsure if this was done in a video, but I added the steps in my repo jupyter notebook
👀 In 1.2.4 we convert our python notebook into a python script and test loading in the data that way as well.
Resources
My repo for this video can be found here
📚 https://www.docker.com/blog/how-to-use-the-postgres-docker-official-image/
Last updated