# 1.2.4 - Dockerizing the Ingestion Script

Youtube Video | \~18 min

{% embed url="<https://www.youtube.com/watch?v=B1WwATwf-vY&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=8>" %}
<https://www.youtube.com/watch?v=B1WwATwf-vY&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=8>
{% endembed %}

:writing\_hand: In this video we will learn how to convert our jupyter notebook into a python script. Then we will learn a second way of how to ingest our data into postgres using our new python script.

:recycle: Recall this information from 1.2.2 where we are loading in our taxi csv data using python notebook. This is another way to acheive the same goal, but likely a more common practice.&#x20;

### Convert Jupyter Notebook&#x20;

:black\_medium\_square:Terminal - Converting the jupyter notebook to a python script

```bash
jupyter nbconvert --to=script {notebook_name.ipynb}
```

{% hint style="info" %}
Run from your project environment in the folder where the ipynb lives
{% endhint %}

:broom: Clean up your code as needed

### Using argparse

The goal is to allow user inputs for different values such as url or password. You can read more about how to use argparse here [/docs.python.org/3/library/argparse.html](https://docs.python.org/3/library/argparse.html). The final python script called ingest\_data.py can be found [here](https://github.com/Tinker0425/de-zoomcamp-my-work/tree/master/module-01/docker/video_4). I **recommend** using this version because at  \~11 min in the youtube Alexey mentions needed to add an exception to the code, which this version has - the `'try-except'` statement. There is a link in resources if you are unsure what a `'try-except'` statement is.&#x20;

***

### Second way to ingest data

:recycle: Recall that our first way was in 1.2.2 using python notebook. To complete the second method you will need to drop your table following the youtube video.

This second way is still a 'manual' method. You need to manually drop your table here  <http://localhost:8080/>  and then run a command.

:hammer\_pick: Note that our url link will look different because the taxi ny website no longer has the csv files

:black\_medium\_square: Terminal

```bash
URL="https://github.com/DataTalksClub/nyc-tlc-data/releases/download/yellow/yellow_tripdata_2021-01.csv.gz"

python ingest_data.py \
  --user=root \
  --password=root \
  --host=localhost \
  --port=5432 \
  --db=ny_taxi \
  --table_name=yellow_taxi_trips \
  --url=${URL}
```

:white\_check\_mark: This should print the loops in your terminal window and you should be able to now see the data here <http://localhost:8080/> again (if you dropped your table).

***

### Third way to ingest data

This is 'Dockerizing' the python script. This method will automatically 'replace' the table according to our python script.&#x20;

{% stepper %}
{% step %}

### Update Dockerfile

:pencil: Code editor

```docker
FROM python:3.12

RUN apt-get install wget
RUN pip install pandas sqlalchemy psycopg2

WORKDIR /app
COPY ingest_data.py ingest_data.py

ENTRYPOINT [ "python", "ingest_data.py" ]
```

:black\_medium\_square: Terminal window&#x20;

```bash
docker build -t taxi_ingest:v001 .
```

{% endstep %}

{% step %}

### Docker run

```sh
docker run -it \
  --network=pg-network \
  taxi_ingest:v001 \
    --user=root \
    --password=root \
    --host=pg-database \
    --port=5432 \
    --db=ny_taxi \
    --table_name=yellow_taxi_trips \
    --url=${URL}
```

{% endstep %}
{% endstepper %}

***

### Resources

{% @github-files/github-code-block url="<https://github.com/Tinker0425/de-zoomcamp-my-work/tree/master/module-01/docker/video_4>" %}

:books: <https://stackoverflow.com/questions/62062226/how-to-work-with-try-and-except-in-python>

:link:  <https://docs.python.org/3/library/argparse.html>
