whale1.2.2 - Ingesting NY Taxi Data to Postgres

Last updated Jan 19, 2025

Youtube Video | ~29min

I recommend not pausing your workflow and going through this entire video. You may run into a number of issues related to pgcli, so allow for extra time spent here (maybe hours). Search Slack, Search FAQ, try a new virtual environment, ask for help.

https://www.youtube.com/watch?v=2JM-ziJt0WI&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=5&pp=iAQBarrow-up-right
circle-info

⚒️ The Taxi TLC data websitearrow-up-right now provides data in .parquet format instead of .csv. The website linkarrow-up-right gives directions on how to read .parquet files and convert it to Pandas data frame. For this course, we want to use the .csvbackup located herearrow-up-right: https://github.com/DataTalksClub/nyc-tlc-data/releases/download/yellow/yellow_tripdata_2021-01.csv.gzarrow-up-right

✍️ In this video we will learn how to configure and run Postgres in Docker. We will download the taxi NY dataset as a csv file and read it into a jupyter notebook. We will also look at the data using pgcli, but will use other options moving forward.


Datasets

Now you should have 2 .csv files locally


https://www.docker.com/blog/how-to-use-the-postgres-docker-official-image/arrow-up-right

Running Postgres w/ Docker (Mac)

📼 ~minute 6

▪️ Terminal

circle-info

Be sure to run this from your terminal (base) and that you run it in the correct directory to point to your ny_taxi_postgres_data folder correctly

After running, you should see postgres files in your ny_taxi_postgres_data directory


PGCLI

📼 ~minute 7

▪️ Terminal

circle-info

Be sure to run pip install pgcli on (base) not your environment.

If you are having issues with the above command, try: conda install -c conda-forge pgcli pip install -U mycli

Using pgcli to connect to Postgres

hhostname p port u username d database name

You can now explore your dataset in therteminal window (once you have some)


Jupyter Notebook

circle-info

Be sure jupyter is installed in your environment. Use pip install jupyter if not. Be sure your .csv taxi data is downloaded locally.

▪️ Terminal

This should open a jupyter notebook web browser tab. Follow along with the youtube video to finalize your jupyter notebook.

⚒️ In future videos we use the zone csv data as well. I'm unsure if this was done in a video, but I added the steps in my repo jupyter notebook

👀 In 1.2.4 we convert our python notebook into a python script and test loading in the data that way as well.


Resources

My repo for this video can be found here

Not found

📚 https://www.docker.com/blog/how-to-use-the-postgres-docker-official-image/arrow-up-right

Last updated