b5.2.1 - Spark & PySpark

Last updated Feb 23, 2025

🕓 Estimated time spent on this lesson | ~30 min

Youtube Video | ~18 min

✍️ Introduction to using Spark / Pyspark using ipynb.

I ended up changing a few things, because of the location of the csv file and I am using google collab:

In the video, we create the spark dataframe after using padas like spark.createDataFrame(df_pandas).schema

How to create Partitions?

How to save a parquet file?

Last updated