Spark 支持 SELECT语句,该语句用于根据指定的子句从一个或多个表中检索行。SELECT 部分解释了受支持子句的完整语法和简要描述。
Spark 支持 SELECT 语句并符合 ANSI SQL 标准。查询用于从一个或多个表中检索结果集。语法如下:
[ WITH with_query [ , ... ] ]
select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_statement, ... ]
[ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ] } ]
[ SORT BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ] } ]
[ CLUSTER BY { expression [ , ... ] } ]
[ DISTRIBUTE BY { expression [, ... ] } ]
[ WINDOW { named_window [ , WINDOW named_window, ... ] } ]
[ LIMIT { ALL | expression } ]
其中 select_statement 语句为:
SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ [ named_expression | regex_column_names ] [ , ... ] | TRANSFORM (...) ] }
FROM { from_item [ , ... ] }
[ PIVOT clause ]
[ LATERAL VIEW clause ] [ ... ]
[ WHERE boolean_expression ]
[ GROUP BY expression [ , ... ] ]
[ HAVING boolean_expression ]
语法说明:
with_query:在主查询块之前指定公共表表达式(CTE)。这些表表达式允许稍后在 FROM 子句中引用。这有助于从 FROM 子句中提取重复的子查询块,并提高查询的可读性。
hints:可以指定提示来帮助 spark optimizer 做出更好的规划决策。目前 spark支持影响连接策略选择和数据重新分区的提示。
ALL:从关系中选择所有匹配的行,并在默认情况下启用。
DISTINCT:剔除结果中的重复项后,从关系中选择所有匹配的行。
named_expression:具有指定名称的表达式。通常,它表示一个列表达式。语法是:expression [AS] [alias]
from_item:指定查询的输入源。它可以是以下内容之一: