Results

Databricks Inc.

06/09/2025 | News release | Distributed by Public on 06/09/2025 02:12

PySpark Native Plotting

Introduction

We're thrilled to introduce native plotting in PySpark with Databricks Runtime 17.0 (release notes), an exciting leap forward for data visualization. No more jumping between tools just to visualize your data; now, you can create beautiful, intuitive plots directly from your PySpark DataFrames. It's fast, seamless, and built right in. This long-awaited feature makes exploring your data easier and more powerful than ever.

Working with big data in PySpark has always been powerful, especially when it comes to transforming and analyzing large-scale datasets. While PySpark DataFrames are built for scale and performance, users previously needed to convert them into Pandas API on Apache Spark™ DataFrames to generate plots. But this extra step made visualization workflows more complicated than they needed to be. The difference in structure between PySpark and pandas-style DataFrames often led to friction, slowing down the process of exploring data visually.

Example

Here's an example of using PySpark Plotting to analyze Sales, Profit, and Profit Margins across various product categories.

We start with a DataFrame containing sales and profit data for different product categories, as shown below:

Our goal is to visualize the relationship between Sales and Profit, while also incorporating Profit Margin as an additional visual dimension to make the analysis more meaningful. Here is the code to create the plot:

Note that "fig" is of type "plotly.graph_objs._figure.Figure". We can enhance its appearance by updating the layout using existing Plotly functionalities. The adjusted figure looks like this:

From the figure, we can observe clear relationships between sales and profits across different categories. For instance, Electronics shows high sales and profits with a relatively moderate profit margin, indicating strong revenue generation but room for improved efficiency.

Databricks Inc. published this content on June 09, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on June 09, 2025 at 08:12 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at support@pubt.io