pyspark-dataframes

Star

Here are 3 public repositories matching this topic...

sbl-sdsc / df-parallel

Star

Comparison of Dataframe libraries for parallel processing of large tabular files on CPU and GPU.

gpu-computing dask dataframes cuda-toolkit parallel-processing rapidsai dask-dataframes dask-cudf pyspark-dataframes

Updated Jun 27, 2024
Jupyter Notebook

RJBarker / home_sales

Star

Use PySpark and SparkSQL to execute SQL queries through a temporary view of the DataFrame created. Conduct additional queries on cached and partitioned data to determine runtime comparisons.

python big-data cached pyspark sparksql partitioning large-scale big-data-analytics pyspark-dataframes

Updated Jan 10, 2024
Jupyter Notebook

mhaseebtariq / pyspark-helpers

Star

Useful helper functions for PySpark dataframe operations

pyspark pyspark-notebook pyspark-dataframes pyspark-joins pyspark-helpers flexible-joins join-duplicate-columns

Updated May 25, 2022
Jupyter Notebook

Improve this page

Add a description, image, and links to the pyspark-dataframes topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark-dataframes topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyspark-dataframes

Here are 3 public repositories matching this topic...

sbl-sdsc / df-parallel

RJBarker / home_sales

mhaseebtariq / pyspark-helpers

Improve this page

Add this topic to your repo