This Visual Studio Code extension helps developers working with PySpark DataFrames to easily convert them to Pandas DataFrames during debugging sessions. With this extension, you can right-click a PySpark DataFrame in the Locals pane and instantly convert it to a Pandas DataFrame for easier inspection and manipulation. The newly converted Pandas DataFrame can also be viewed using the VSCode Data Viewer (with the Data Wrangler extension).
- Convert Spark DataFrame to Pandas DataFrame: Simply right-click on any PySpark DataFrame in the debugger and select "Convert Spark DataFrame to Pandas".
- Automatic Variable Creation: A new variable is created for the Pandas DataFrame, so it can be accessed and explored during your debug session.
- View Pandas DataFrame in Data Viewer: After conversion, you can inspect the Pandas DataFrame using the VSCode Data Wrangler extension.
- During a debug session, pause the execution where a PySpark DataFrame is present.
- In the Debugger’s Variables window, right-click on the Spark DataFrame you want to convert.
- Select the option
Convert Spark DataFrame to Pandas
from the context menu. - The extension converts the Spark DataFrame and assigns it to a new variable with the suffix
_pandas
. - Optionally, you can view the DataFrame using the Data Wrangler's
Data Viewer
.
- Install the extension via VSCode by building and packaging it:
vsce package
- Open the
VSIX
file generated and install it in your VSCode environment. - Install the Data Wrangler extension if you want to use the
View in Data Viewer
feature.
- VSCode: Ensure you have Visual Studio Code installed.
- VSCode Debugger: A working debugger setup in VSCode for your PySpark projects.
- Data Wrangler: For viewing DataFrames, install the Data Wrangler extension to enable the Data Viewer feature.
This extension currently doesn't require any configuration settings.
- Start a debug session in VSCode with a PySpark script.
- Pause the execution at a breakpoint where a Spark DataFrame is present.
- Right-click the DataFrame in the Locals window.
- Select
Convert Spark DataFrame to Pandas
. - The converted DataFrame will appear in the Locals window with a
_pandas
suffix. - Right-click the new variable and choose
View Value in Data Viewer
to inspect it (requires Data Wrangler).
Feel free to open issues and submit pull requests if you have suggestions for improvements or new features.
- Refreshing the variables window may take a few moments after converting the DataFrame. In some cases, manually stepping in the debugger might help update the view.
- The
View in Data Viewer
feature requires the Data Wrangler extension to be installed.
MIT License. See LICENSE
for more information.