snowflake.snowpark.DataFrame.groupBy ¶
DataFrame. groupBy ( * cols : Union [ Column , str , Iterable [ Union [ Column , str ] ] ] ) RelationalGroupedDataFrame [source] ¶Groups rows by the columns specified by expressions (similar to GROUP BY in SQL).
This method returns a
RelationalGroupedDataFrame
that you can use to
perform aggregations on each group of data.
*cols – The columns to group by.
Valid inputs are:
Empty input
One or multiple
Column
object(s) or column name(s) (
str
)
A list of
Column
objects or column names (
str
)
Examples
>>> from snowflake.snowpark.functions import col, lit, sum as sum_, max as max_
>>> df = session.create_dataframe([(1, 1),(1, 2),(2, 1),(2, 2),(3, 1),(3, 2)], schema=["a", "b"])
>>> df.group_by().agg(sum_("b")).collect()
[Row(SUM(B)=9)]
>>> df.group_by("a").agg(sum_("b")).collect()
[Row(A=1, SUM(B)=3), Row(A=2, SUM(B)=3), Row(A=3, SUM(B)=3)]
>>> df.group_by("a").agg(sum_("b").alias("sum_b"), max_("b").alias("max_b")).collect()
[Row(A=1, SUM_B=3, MAX_B=2), Row(A=2, SUM_B=3, MAX_B=2), Row(A=3, SUM_B=3, MAX_B=2)]
>>> df.group_by(["a", lit("snow")]).agg(sum_("b")).collect()
[Row(A=1, LITERAL()='snow', SUM(B)=3), Row(A=2, LITERAL()='snow', SUM(B)=3), Row(A=3, LITERAL()='snow', SUM(B)=3)]
>>> df.group_by("a").agg((col("*"), "count"), max_("b")).collect()
[Row(A=1, COUNT(LITERAL())=2, MAX(B)=2), Row(A=2, COUNT(LITERAL())=2, MAX(B)=2), Row(A=3, COUNT(LITERAL())=2, MAX(B)=2)]
>>> df.group_by("a").median("b").collect()
[Row(A=2, MEDIAN(B)=Decimal('1.500')), Row(A=3, MEDIAN(B)=Decimal('1.500')), Row(A=1, MEDIAN(B)=Decimal('1.500'))]
>>> df.group_by("a").function("avg")("b").collect()
[Row(A=1, AVG(B)=Decimal('1.500000')), Row(A=2, AVG(B)=Decimal('1.500000')), Row(A=3, AVG(B)=Decimal('1.500000'))]