Pyspark just got a massive update

Manpreet Singh
3 min readJan 3, 2022

Welcome back! Pyspark is an awesome tool that allows us to process out data, if you’re new to Pyspark, check out the link below to learn more about it:

Well, it looks like Pyspark 3.2 is bringing an awesome feature, it officially supports the Pandas API! This is an awesome feature added that is built in natively within Pyspark, check out their blog below that goes into detail about this:

Essentially, this is a breakdown of how you can use Pandas within Pyspark:

#from pandas import read_csv (not needed)
from pyspark.pandas import read_csv
pdf = read_csv("data.csv")

If you would like to test this out for yourself, the Pyspark team has a few notebooks built out for this, here is a link for all of the projects:

--

--