Databricks Inc.

07/21/2025 | News release | Distributed by Public on 07/22/2025 07:28

Introducing Recursive Common Table Expressions to Databricks

Recursive Common Table Expressions (CTEs) are now supported in Databricks. This brings a native way to express loops and traversals in SQL, useful for working with hierarchical and graph-structured data. These capabilities are aligned with the SQL standard and follow familiar patterns used in platforms like Teradata. Recursive CTEs have long been part of the SQL standard, so they will be familiar to customers migrating from legacy data warehouses. Databricks has also contributed Recursive CTE support to Apache Spark™, making it fully open source.

Databricks uses standard ANSI SQL syntax for recursive CTEs, including the RECURSIVEkeyword.

This seemingly small feature significantly enhances SQL's expressive capabilities, theoretically making it Turing complete-meaning it can perform any computation a computer can. Recursive CTEs enable composable solutions that previously required procedural code, such as Python or external tools.

Recursive CTEsare now available in Public Preview DBSQL 2025.20 and Databricks Runtime 17.0(coming soon to Lakeflow Declarative Pipelines). In this blog, we'll explore how recursive CTEs work-and how they can help you solve real-world problems using pure SQL.

Key features of recursive CTE support

Databricks' recursive CTE support includes:

  • Traversal of tree- and graph-like structures, such as org charts, folders, and routing networks
  • Fully open source and integrated into Apache Spark™
  • Built-in safeguards for infinite recursion (100 steps, 1M rows)
  • Customizable safeguards usingMAX RECURSION LEVEL
  • Support for controlled infinite recursion using LIMIT

Recursive CTEs work well with both traditional systems that store hierarchical data in normalized tables as well as data coming from modern applications that generate flexible JSON/XML hierarchies. See examples below of each including RCTEs leveraging the Variant data type for JSON hierarchies.

Plus, support for recursive CTEs simplifies migrations from legacy database systems. Teradata and Postgres are two examples of systems whose syntax is identical, while systems like Oracle, which use CONNECT BY syntax, are easily converted.

How recursive CTEs work

Recursive CTEs are common table expressions defined with the RECURSIVEkeyword. They consist of two parts combined by using UNION ALL:

  1. A base case subquery- this runs once and seeds the recursion
  2. A recursive step subquery- this refers to the CTE itself and is repeatedly applied to build new rows.

Execution starts with the base query. Then, on each iteration, the recursive step is run using the output of the previous step. This continues until no new rows are produced.

To prevent infinite recursion from consuming excessive resources, Databricks enforces two safety limits: a maximum recursion depthof 100 steps and a row limitof 1 million. If either threshold is exceeded, the query fails with an error.

If you're confident in your recursion requiring more than 100 steps to produce all the results, you may override the max level by using the MAX RECURSION LEVEL hint:

For more details, refer to the CTE documentation.

"At bp Supply Trading and Shipping - Market Risk, understanding portfolio hierarchy reporting across business units is critical for our business to operate efficiently. By replacing our legacy code with recursive CTEs in Databricks SQL, we reduced a hierarchical data preparation step from ~6 minutes to ~30 seconds, which is a 12× improvement." - Dharmik Prajapati, bp Staff Software Engineer
Databricks Inc. published this content on July 21, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on July 22, 2025 at 13:28 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]