πŸ–₯️
DE Zoomcamp Notes
Linkedin | Kayla TinkerGithub | Tinker0425Blog | From Clouds to CodeBlueSky | Cloudy Blue Wave
  • Welcome - Data Engineering Zoomcamp 2025 Notes
  • INTRODUCTION
    • Introduction & Set Up
      • Virtual Environments
  • MODULE 1
    • Introduction to Module 1
    • 1.1 - Google Cloud Platform GCP
      • 1.1.1 - Introduction to Google Cloud Platform
    • 1.2 - Docker & Docker-compose
      • 1.2.1 - Introduction to Docker
      • 1.2.2 - Ingesting NY Taxi Data to Postgres
      • 1.2.3 - Connecting pgAdmin and Postgres
      • 1.2.4 - Dockerizing the Ingestion Script
      • 1.2.5 - Running Postgres and pgAdmin with Docker-Compose
      • Docker-Compose Summary
      • 1.2.6 - SQL Refresher
      • Optional Docker Video
    • 1.3 - Setting up infrastructure on GCP with Terraform
      • 1.3.1 - Terraform Primer
      • 1.3.2 - Terraform Basics
      • 1.3.3 - Terraform Variables
    • Homework
  • Module 2
    • Introduction to Module 2
    • 2.1 - Introduction to Orchestration and Kestra
      • 2.1.1 - Workflow Orchestration Introduction
      • 2.1.2 - Learn Kestra
    • 2.2 - ETL Pipelines in Kestra: Detailed Walkthrough
      • 2.2.1 - Create an ETL Pipeline with Postgres in Kestra
      • 2.2.2 - Manage Scheduling and Backfills using Postgres in Kestra
      • 2.2.3 - Transform Data with dbt and Postgres in Kestra
    • 2.3 - ETL Pipelines in Kestra: Google Cloud Platform
      • 2.3.1 - Create an ETL Pipeline with GCS and BigQuery in Kestra
      • 2.3.2 - Manage Scheduling and Backfills using BigQuery in Kestra
      • 2.3.3 - Transform Data with dbt and BigQuery in Kestra
    • Bonus: Deploy to the Cloud
    • Homework
  • Module 3
    • Introduction to Module 3
    • 3.1 - Data Warehouse, Partitioning and Clustering
      • 3.1.1 - Data Warehouse and BigQuery
      • 3.1.2 - Partitioning and Clustering
    • 3.2 - BigQuery Internals and Best Practices
      • 3.2.1 - BigQuery Best Practices
      • 3.2.2 - Internals of Big Query
    • 3.3 - Machine Learning
      • 3.3.1 - BigQuery Machine Learning
      • 3.3.2 - BigQuery Machine Learning Deployment
    • Homework
  • Workshop
    • Workshop Week
    • Homework
  • Module 4
    • Introduction to Module 4
    • 4.1 - DBT the basics
      • 4.1.1 - Analytics Engineering Basics
      • 4.1.2 - What is dbt?
    • 4.2 - Creating your Project
      • 4.2.1 - Set Up Project
      • 4.2.2 - Start Your dbt Project BigQuery and dbt Cloud
      • 4.2.3 - Build the First dbt Models
      • 4.2.4 - Testing and Documenting the Project
    • 4.3 - Deployment & Visualizations
      • 4.3.1 - Deployment Using dbt Cloud
      • 4.3.2 - Visualising the data with Google Data Studio
    • Homework
  • Module 5
    • Introduction to Module 5
    • 5.1 - Install & Intro
      • 5.1.1 - Install
      • 5.1.2 - Intro to Batch Processing
      • 5.1.3 - Intro to Spark
    • 5.2 - Spark SQL and DataFrames
      • 5.2.1 - Spark & PySpark
      • 5.2.2 - Spark Dataframes
      • 5.2.3 - SQL with Spark
    • 5.3 - Spark Internals
      • 5.3.1 - Anatomy of a Spark Cluster
      • 5.3.2 - GroupBy in Spark
      • 5.3.3 - Joins in Spark
    • 5.4 - Running Spark in the Cloud
      • 5.4.1 - Connecting to Google Cloud Storage
      • 5.4.2 - Creating a Local Spark Cluster
      • 5.4.3 - Setting up a Dataproc Cluster
      • 5.4.4 - Connecting Spark to Big Query
    • Homework
  • Module 6
    • Introduction to Module 6
    • 6.1 - Stream Processing
      • 6.1.1 - Introduction
      • 6.1.2 - Intro to stream processing
      • 6.1.3 - What is Kafka?
      • 6.1.4 - Confluent cloud
      • 6.1.5 - Kafka producer consumer
      • 6.1.6 - Kafka configuration
    • Homework
  • Final Project
    • Final Project
    • How To!
      • 1 - Create a Google Cloud Project
      • 2 - API Key and Access Token Setup
      • 3 - Fork This Repo in Github
      • Ready to Run!
    • THE END
Powered by GitBook

Connect

  • Linkedin | Kayla Tinker
  • BlueSky | Cloudy Blue Wave
  • Blog | From Clouds to Code
  • Github | Tinker0425
On this page
  • What is...
  • Building a Container Image
  • Resources
  1. MODULE 1
  2. 1.2 - Docker & Docker-compose

1.2.1 - Introduction to Docker

Last updated Jan 22, 2025

Previous1.2 - Docker & Docker-composeNext1.2.2 - Ingesting NY Taxi Data to Postgres

Last updated 4 months ago

Youtube Video | ~24 min

To work through this video you will need Docker downloaded, a terminal window , and a code editor of choice (Pycharm for me). Please see the '' section if you need more info.

In this video, we learn about what Docker is, get a pipeline overview, and learn about Docker container and Docker image. Then we work through an example of how to build an image with Docker.

What is...

Building a Container Image

Example code you want to deploy

import sys
import pandas as pd

print(sys.argv)
day = sys.argv[1]

# some pandas things

print(f'Finished for day {day}')
from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"
1

Dockerfile

FROM python:3.12.8
RUN pip install pandas
WORKDIR /app
COPY pipeline.py pipeline.py

ENTRYPOINT ["python", "pipeline.py"]

# syntax=docker/dockerfile:1
FROM ubuntu:22.04

# install app dependencies
RUN apt-get update && apt-get install -y python3 python3-pip
RUN pip install flask==3.0.*

# install app
COPY hello.py /

# final configuration
ENV FLASK_APP=hello
EXPOSE 8000
CMD ["flask", "run", "--host", "0.0.0.0", "--port", "8000"]
2

Docker build

docker build -t {image_name}:{tag_name} .
-t
{image_name}:{tag_name}
.

used to denote adding a tag

replace these values for your image_name and tag_name

a build command that uses the current directory (.) as a build context

3

Docker run

docker run -it {image_name}:{tag_name}

Resources

My repo for this video can be found here

There is a supplemental video for those working on WSL found here:

"Docker helps developers build, share, run, and verify applications anywhere β€” without tedious environment configuration or management." -

"Simply put, containers are isolated processes for each of your app's components. Each component - the frontend React app, the Python API engine, and the database - runs in its own isolated environment, completely isolated from everything else on your machine." -

"If you’re new to container images, think of them as a standardized package that contains everything needed to run an application, including its files, configuration, and dependencies. These packages can then be distributed and shared with others." -

"A tag is a custom, human-readable identifier that's typically used to identify different versions or variants of an image. If no tag is specified, latest is used by default." -

.

Create a new 'Dockerfile' in your code editor. I recommend adding the Docker 'plug in' to your editor.

.

- Dockerfile Instruction options i.e. 'FROM', 'RUN', etc.

Terminal

Terminal

Note we will continue to build on this topic. We still need to talk about running containers, stopping containers, and viewing the front end of our containers.

Cleaning - be sure to open Docker Desktop and delete testing examples of your containers and images to free up space

I recommend working through the Docker site Intro & workshop if you're still confused on what Docker is

πŸ“
⬛
⬛
πŸ‘€
🧹
πŸ“š
βš’οΈ
https://www.youtube.com/watch?v=Mv4zFm2AwzQ&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=17
https://www.docker.com/
https://docs.docker.com/get-started/docker-concepts/the-basics/what-is-a-container/
https://docs.docker.com/get-started/introduction/build-and-push-first-image/
https://docs.docker.com/get-started/docker-concepts/building-images/build-tag-and-publish-an-image/
https://docs.docker.com/build/concepts/dockerfile/
https://docs.docker.com/build/concepts/dockerfile/
πŸ”–
https://docs.docker.com/reference/dockerfile/
πŸ”–
https://docs.docker.com/get-started/docker-overview/
πŸ”–
https://github.com/HangenYuu/docker-cheatsheet
πŸ‹
β–ͺ️
πŸ“
✍️
Introduction
Page cover image
Writing a DockerfileDocker Documentation
Supplemental Info on writing a Dockerfile
Logo
Build, tag, and publish an imageDocker Documentation
Logo
IntroductionDocker Documentation
Logo
https://www.youtube.com/watch?v=EYNwNlOrpr0&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=4&pp=iAQB
https://blog.devgenius.io/docker-working-and-image-building-2d4901524617
https://github.com/Tinker0425/de-zoomcamp-my-work/tree/master/module-01/docker/video_1