07/21/2025 | News release | Distributed by Public on 07/22/2025 07:28
Recursive Common Table Expressions (CTEs) are now supported in Databricks. This brings a native way to express loops and traversals in SQL, useful for working with hierarchical and graph-structured data. These capabilities are aligned with the SQL standard and follow familiar patterns used in platforms like Teradata. Recursive CTEs have long been part of the SQL standard, so they will be familiar to customers migrating from legacy data warehouses. Databricks has also contributed Recursive CTE support to Apache Spark™, making it fully open source.
Databricks uses standard ANSI SQL syntax for recursive CTEs, including the RECURSIVEkeyword.
This seemingly small feature significantly enhances SQL's expressive capabilities, theoretically making it Turing complete-meaning it can perform any computation a computer can. Recursive CTEs enable composable solutions that previously required procedural code, such as Python or external tools.
Recursive CTEsare now available in Public Preview DBSQL 2025.20 and Databricks Runtime 17.0(coming soon to Lakeflow Declarative Pipelines). In this blog, we'll explore how recursive CTEs work-and how they can help you solve real-world problems using pure SQL.
Databricks' recursive CTE support includes:
Recursive CTEs work well with both traditional systems that store hierarchical data in normalized tables as well as data coming from modern applications that generate flexible JSON/XML hierarchies. See examples below of each including RCTEs leveraging the Variant data type for JSON hierarchies.
Plus, support for recursive CTEs simplifies migrations from legacy database systems. Teradata and Postgres are two examples of systems whose syntax is identical, while systems like Oracle, which use CONNECT BY syntax, are easily converted.
Recursive CTEs are common table expressions defined with the RECURSIVEkeyword. They consist of two parts combined by using UNION ALL:
Execution starts with the base query. Then, on each iteration, the recursive step is run using the output of the previous step. This continues until no new rows are produced.
To prevent infinite recursion from consuming excessive resources, Databricks enforces two safety limits: a maximum recursion depthof 100 steps and a row limitof 1 million. If either threshold is exceeded, the query fails with an error.
If you're confident in your recursion requiring more than 100 steps to produce all the results, you may override the max level by using the MAX RECURSION LEVEL hint:
For more details, refer to the CTE documentation.
"At bp Supply Trading and Shipping - Market Risk, understanding portfolio hierarchy reporting across business units is critical for our business to operate efficiently. By replacing our legacy code with recursive CTEs in Databricks SQL, we reduced a hierarchical data preparation step from ~6 minutes to ~30 seconds, which is a 12× improvement." - Dharmik Prajapati, bp Staff Software Engineer