Comparison of Dataframe libraries for parallel processing of large tabular files on CPU and GPU.
-
Updated
Jun 27, 2024 - Jupyter Notebook
Comparison of Dataframe libraries for parallel processing of large tabular files on CPU and GPU.
Use PySpark and SparkSQL to execute SQL queries through a temporary view of the DataFrame created. Conduct additional queries on cached and partitioned data to determine runtime comparisons.
Useful helper functions for PySpark dataframe operations
Add a description, image, and links to the pyspark-dataframes topic page so that developers can more easily learn about it.
To associate your repository with the pyspark-dataframes topic, visit your repo's landing page and select "manage topics."