3.1.1 - Data Warehouse and BigQuery
Last updated Feb 2, 2025
🕓 Estimated time spent on this lesson | ~35 min
Youtube Video | ~23 min
✍️ In this video we learn about OLTP, OLAP, Data Warehouse, and BigQuery. We will do some exploring in BigQuery and then will have an introduction to partitioning and clustering.
Definitions
OLTP: Online transaction processing
Online Transaction Processing (OLTP) refers to database systems designed to manage and process real-time transaction data, typically for operational applications. These systems handle tasks like order processing, inventory management, and customer interactions, where quick, reliable processing is crucial. In contrast to OLAP (Online Analytical Processing), which focuses on analyzing large datasets for decision-making, OLTP is all about immediate, transactional data processing.
Example 1: An e-commerce website using OLTP to process customer orders in real-time. Example 2: A bank’s ATM system that instantly processes withdrawals and deposits.
Online Analytical Processing (OLAP) is a computing approach designed for quickly answering multi-dimensional analytical queries, often used in business intelligence. Unlike OLTP, which focuses on transaction processing, OLAP helps in complex data analysis for areas like sales reporting, budgeting, and forecasting. It enables users to analyze large datasets and perform tasks such as trend analysis and data mining.
Example 1: A company using OLAP to analyze sales performance across different regions and time periods. Example 2: Financial analysts using OLAP to forecast next quarter's revenues based on historical data.
A data warehouse (DW or EDW) is a centralized system used for reporting and data analysis, playing a key role in business intelligence. It stores both current and historical data from various operational systems, like sales or marketing, and organizes it for easy querying and reporting by analysts and managers. Data in the warehouse often undergoes cleansing and transformation before being used for insights and decision-making.
Examples of popular data warehouses include Amazon Redshift, Google BigQuery, Snowflake, and Microsoft Azure Synapse Analytics.
"BigQuery is a managed, serverless data warehouse product by Google, offering scalable analysis over large quantities of data. It is a Platform as a Service (PaaS) that supports querying using a dialect of SQL. It also has built-in machine learning capabilities." Wiki https://en.wikipedia.org/wiki/BigQuery
BigQuery
🔎 BigQuery has some public tables and you can search for them:

👀 In Module-2, we used Kestra to get out tables in BigQuery. Another option is you can create a table if your data is in your GCP bucket:
Partitioning
In BigQuery Details, it will show you if the table is partitioned. This is very cool, if you are often examining a table based on one column (pick up date in our example), this would save you time and money.
Clustering
❓It looks like you cannot cluster unless you partition first, as clustering is a second grouping. So first group by partitioning (date in our example), and then group by clustering (vendor in our example).
💡 I really like how in BigQuery it gives you a different table icon to show that the table had been partitioned or partitioned & clustered, and then the details tell you by which field.

✅ We should now understand how to create a table from an external source, and how to partition and cluster a table in BigQuery
I really enjoyed this lesson ⭐⭐⭐⭐⭐
Last updated