3.1.1 - Data Warehouse and BigQuery

Last updated Feb 2, 2025

🕓 Estimated time spent on this lesson | ~35 min

Youtube Video | ~23 min

✍️ In this video we learn about OLTP, OLAP, Data Warehouse, and BigQuery. We will do some exploring in BigQuery and then will have an introduction to partitioning and clustering.

Definitions

OLTP: Online transaction processing

Online Transaction Processing (OLTP) refers to database systems designed to manage and process real-time transaction data, typically for operational applications. These systems handle tasks like order processing, inventory management, and customer interactions, where quick, reliable processing is crucial. In contrast to OLAP (Online Analytical Processing), which focuses on analyzing large datasets for decision-making, OLTP is all about immediate, transactional data processing.

Example 1: An e-commerce website using OLTP to process customer orders in real-time. Example 2: A bank’s ATM system that instantly processes withdrawals and deposits.

BigQuery

🔎 BigQuery has some public tables and you can search for them:

SELECT station_id, name
FROM bigquery-public-data.new_york_citibike.citibike_stations
LIMIT 100

👀 In Module-2, we used Kestra to get out tables in BigQuery. Another option is you can create a table if your data is in your GCP bucket:

-- Creating external table referring to gcs path
CREATE OR REPLACE EXTERNAL TABLE `taxi-rides-ny.nytaxi.external_yellow_tripdata`
OPTIONS (
  format = 'CSV',
  uris = ['gs://nyc-tl-data/trip data/yellow_tripdata_2019-*.csv', 'gs://nyc-tl-data/trip data/yellow_tripdata_2020-*.csv']
);

Partitioning

-- Check yello trip data
SELECT * FROM taxi-rides-ny.nytaxi.external_yellow_tripdata limit 10;

-- Create a non partitioned table from external table
CREATE OR REPLACE TABLE taxi-rides-ny.nytaxi.yellow_tripdata_non_partitoned AS
SELECT * FROM taxi-rides-ny.nytaxi.external_yellow_tripdata;

-- Create a partitioned table from external table
CREATE OR REPLACE TABLE taxi-rides-ny.nytaxi.yellow_tripdata_partitoned
PARTITION BY
  DATE(tpep_pickup_datetime) AS
SELECT * FROM taxi-rides-ny.nytaxi.external_yellow_tripdata;

In BigQuery Details, it will show you if the table is partitioned. This is very cool, if you are often examining a table based on one column (pick up date in our example), this would save you time and money.

Clustering

❓It looks like you cannot cluster unless you partition first, as clustering is a second grouping. So first group by partitioning (date in our example), and then group by clustering (vendor in our example).

💡 I really like how in BigQuery it gives you a different table icon to show that the table had been partitioned or partitioned & clustered, and then the details tell you by which field.

✅ We should now understand how to create a table from an external source, and how to partition and cluster a table in BigQuery

I really enjoyed this lesson ⭐⭐⭐⭐⭐

Previous3.1 - Data Warehouse, Partitioning and Clustering Next3.1.2 - Partitioning and Clustering

Last updated 4 months ago