You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
SQLGlot is a no-dependency SQL parser, transpiler, optimizer, and engine. It can be used to format SQL or translate between
20 different dialects
like
DuckDB
,
Presto
,
Spark
,
Snowflake
, and
BigQuery
. It aims to read a wide variety of SQL inputs and output syntactically and semantically correct SQL in the targeted dialects.
It is a very comprehensive generic SQL parser with a robust
test suite
. It is also quite
performant
, while being written purely in Python.
You can easily
customize
the parser,
analyze
queries, traverse expression trees, and programmatically
build
SQL.
Syntax
errors
are highlighted and dialect incompatibilities can warn or raise depending on configurations. However, it should be noted that SQL validation is not SQLGlot’s goal, so some syntax errors may go unnoticed.
As another example, let's suppose that we want to read in a SQL query that contains a CTE and a cast to REAL, and then transpile it to Spark, which uses backticks for identifiers and FLOAT instead of REAL:
importsqlglotsql="""WITH baz AS (SELECT a, c FROM foo WHERE a = 1) SELECT f.a, b.b, baz.c, CAST("b"."a" AS REAL) d FROM foo f JOIN bar b ON f.a = b.a LEFT JOIN baz ON f.a = baz.a"""print(sqlglot.transpile(sql, write="spark", identify=True, pretty=True)[0])
Comments are also preserved on a best-effort basis when transpiling SQL code:
sql="""/* multi commentSELECT tbl.cola /* comment 1 */ + tbl.colb /* comment 2 */, CAST(x AS INT), # comment 3 y -- comment 4 bar /* comment 5 */, tbl # comment 6print(sqlglot.transpile(sql, read='mysql', pretty=True)[0])
/*
multi commentSELECTtbl.cola/* comment 1 */+tbl.colb/* comment 2 */,
CAST(x ASINT), /* comment 3 */
y /* comment 4 */FROM bar /* comment 5 */, tbl /* comment 6 */
Metadata
You can explore SQL with expression helpers to do things like find columns and tables:
fromsqlglotimportparse_one, exp# print all column references (a and b)forcolumninparse_one("SELECT a, b + 1 AS c FROM d").find_all(exp.Column):
print(column.alias_or_name)
# find all projections in select statements (a and c)forselectinparse_one("SELECT a, b + 1 AS c FROM d").find_all(exp.Select):
forprojectioninselect.expressions:
print(projection.alias_or_name)
# find all tables (x, y, z)fortableinparse_one("SELECT * FROM x JOIN y JOIN z").find_all(exp.Table):
print(table.name)
Parser Errors
When the parser detects an error in the syntax, it raises a ParseError:
importsqlglotsqlglot.transpile("SELECT foo( FROM bar")
sqlglot.errors.ParseError: Expecting ). Line 1, Col: 13.
select foo( FROM bar
Structured syntax errors are accessible for programmatic use:
importsqlglottry:
sqlglot.transpile("SELECT foo( FROM bar")
exceptsqlglot.errors.ParseErrorase:
print(e.errors)
fromsqlglotimportparse_oneparse_one("SELECT x FROM y").from_("z").sql()
'SELECT x FROM z'
There is also a way to recursively transform the parsed tree by applying a mapping function to each tree node:
fromsqlglotimportexp, parse_oneexpression_tree=parse_one("SELECT a FROM x")
deftransformer(node):
ifisinstance(node, exp.Column) andnode.name=="a":
returnparse_one("FUN(a)")
returnnodetransformed_tree=expression_tree.transform(transformer)
transformed_tree.sql()
'SELECT FUN(a) FROM x'
SQL Optimizer
SQLGlot can rewrite queries into an "optimized" form. It performs a variety of techniques to create a new canonical AST. This AST can be used to standardize queries or provide the foundations for implementing an actual engine. For example:
importsqlglotfromsqlglot.optimizerimportoptimizeprint(
optimize(
sqlglot.parse_one(""" SELECT A OR (B OR (C AND D)) FROM x WHERE Z = date '2021-01-01' + INTERVAL '1' month OR 1 = 0 """),
schema={"x": {"A": "INT", "B": "INT", "C": "INT", "D": "INT", "Z": "STRING"}}
).sql(pretty=True)
SQLGlot can calculate the difference between two expressions and output changes in a form of a sequence of actions needed to transform a source expression into a target one:
fromsqlglotimportdiff, parse_onediff(parse_one("SELECT a + b, c, d"), parse_one("SELECT c, a - b, d"))
One can even interpret SQL queries using SQLGlot, where the tables are represented as Python dictionaries. Although the engine is not very fast (it's not supposed to be) and is in a relatively early stage of development, it can be useful for unit testing and running SQL natively across Python objects. Additionally, the foundation can be easily integrated with fast compute kernels (arrow, pandas). Below is an example showcasing the execution of a SELECT expression that involves aggregations and JOINs:
fromsqlglot.executorimportexecutetables= {
"sushi": [
{"id": 1, "price": 1.0},
{"id": 2, "price": 2.0},
{"id": 3, "price": 3.0},
"order_items": [
{"sushi_id": 1, "order_id": 1},
{"sushi_id": 1, "order_id": 1},
{"sushi_id": 2, "order_id": 1},
{"sushi_id": 3, "order_id": 2},
"orders": [
{"id": 1, "user_id": 1},
{"id": 2, "user_id": 2},
execute(
SELECT o.user_id, SUM(s.price) AS price FROM orders o JOIN order_items i ON o.id = i.order_id JOIN sushi s ON i.sushi_id = s.id GROUP BY o.user_id """,
tables=tables
SQLGlot uses dateutil to simplify literal timedelta expressions. The optimizer will not simplify expressions like the following if the module cannot be found: