1.2.2 - Ingesting NY Taxi Data to Postgres
Last updated Jan 19, 2025
Youtube Video | ~29min
⛔ I recommend not pausing your workflow and going through this entire video. You may run into a number of issues related to pgcli, so allow for extra time spent here (maybe hours). Search Slack, Search FAQ, try a new virtual environment, ask for help.
⚒️ The Taxi TLC data website now provides data in .parquet format instead of .csv. The website link gives directions on how to read .parquet files and convert it to Pandas data frame. For this course, we want to use the .csvbackup located here: https://github.com/DataTalksClub/nyc-tlc-data/releases/download/yellow/yellow_tripdata_2021-01.csv.gz
✍️ In this video we will learn how to configure and run Postgres in Docker. We will download the taxi NY dataset as a csv file and read it into a jupyter notebook. We will also look at the data using pgcli, but will use other options moving forward.
Datasets
zones_data - https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page and click:
Taxi Zone Lookup Table (CSV)
✅ Now you should have 2 .csv files locally
Running Postgres w/ Docker (Mac)
📼 ~minute 6
▪️ Terminal
Be sure to run this from your terminal (base) and that you run it in the correct directory to point to your ny_taxi_postgres_data folder correctly
✅ After running, you should see postgres files in your ny_taxi_postgres_data directory
PGCLI
📼 ~minute 7
▪️ Terminal
Be sure to run pip install pgcli on (base) not your environment.
If you are having issues with the above command, try:
conda install -c conda-forge pgcli
pip install -U mycli
Using pgcli to connect to Postgres
hhostname p port u username d database name
✅ You can now explore your dataset in therteminal window (once you have some)
If you run into issues, check out this video https://www.youtube.com/watch?v=3IkfkTwqHx4&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=6
Jupyter Notebook
Be sure jupyter is installed in your environment. Use pip install jupyter if not. Be sure your .csv taxi data is downloaded locally.
▪️ Terminal
This should open a jupyter notebook web browser tab. Follow along with the youtube video to finalize your jupyter notebook.
⚒️ In future videos we use the zone csv data as well. I'm unsure if this was done in a video, but I added the steps in my repo jupyter notebook
👀 In 1.2.4 we convert our python notebook into a python script and test loading in the data that way as well.
Resources
My repo for this video can be found here
📚 https://www.docker.com/blog/how-to-use-the-postgres-docker-official-image/
Last updated