Pratibha Sarin •
Last Modified: September 5th, 2023
In today’s digital era, massive data sets are available to be consumed by organizationsfor analysis. The vast amount of data available seems challenging to handle, which is why data aggregation has become more critical than ever. With reduced dimensionality, aggregate data can offer a feature-rich data collection.
Table of Contents
The
PostgreSQL STRING_AGG()
function is one of the aggregate functions that PostgreSQL supports. It is used to concatenate a list of strings and adds a place for a delimiter symbol or a separator between all of the strings. The output string won’t have a separator or a delimiter symbol at the end of it. The PostgreSQL 9.0 version supports STRING_AGG() function. To concatenate the strings, we can employ a variety of separators or delimiter symbols.
The workings of the STRING_AGG() and ARRAY_AGG() functions are identical; however, STRING_AGG() is an
aggregate function
that returns results in string type, whereas ARRAY_AGG() delivers results in array type.
Syntax
The syntax for the STRING_AGG() function is as follows:
Expression:
This character string can be any legitimate expression.
Delimiter/separator:
This specifies the delimiter/separator used when concatenating strings.
The ORDER BY clause:
specifies the order of the concatenated string results and is optional.
The syntax for the
ORDER BY
is as follows:
ORDER BY expression1 {ASC | DESC}, [...]
Note
: PostgreSQL does not presumptively use a default delimiter. It must be mentioned as the second argument. Also, remember that string_agg() requires the usage of the group by clause, much like other aggregate functions.
Scale your data integration effortlessly with Hevo’s Fault-Tolerant No Code Data Pipeline
As the ability of businesses to collect data explodes, data teams have a crucial role in fueling data-driven decisions. Yet, they struggle to consolidate the data scattered across sources into their warehouse to build a single source of truth. Broken pipelines, data quality issues, bugs and errors, and lack of control and visibility over the data flow make data integration a nightmare.
1000+ data teams rely on Hevo’s Data Pipeline Platform to integrate data from over
150+ sources
in a matter of minutes. Billions of data events from sources as varied as SaaS apps, Databases, File Storage and Streaming sources can be replicated in near real-time with Hevo’s fault-tolerant architecture. What’s more – Hevo puts complete control in the hands of data teams with intuitive dashboards for pipeline monitoring, auto-schema management, custom ingestion/loading schedules.
All of this combined with
transparent pricing
and 24×7 support makes us the most loved data pipeline software on review sites.
Take our 14-day free trial to experience a better way to manage data pipelines.
INSERT INTO "players"
VALUES ( 'Virat', 'India', 'Batsman' ), ( 'Rohit', 'India', 'Batsman' ) , ( 'Jasprit', 'India', 'Bowler' );
INSERT INTO "players"
VALUES ( 'Chris', 'West Indies', 'Batsman' ), ( 'Shannon', 'West Indies', 'Bowler'), ('Bravo', 'West Indies', 'Batsman');
INSERT INTO "players"
VALUES ( 'James', 'New Zealand', 'All rounder' );
SELECT * FROM "players" ;
—OUTPUT—
player_name | team_name | player_positon
-------------+-------------+----------------
Virat | India | Batsman
Rohit | India | Batsman
Jasprit | India | Bowler
Chris | West Indies | Batsman
Shannon | West Indies | Bowler
Bravo | West Indies | Batsman
James | New Zealand | All rounder
(7 rows)
The output clearly shows that the values entered using the INSERT INTO command have been successfully inserted into the sample table.
Use Case#1: Using PostgreSQL STRING_AGG() create Comma-Separated Values
We will use the STRING_AGG() function to produce a list of values separated by commas. The syntax to create comma-separated values is as follows:
SELECT "team_name",string_agg("player_name", ',' )
FROM "players" GROUP BY "team_name" ;
—OUTPUT—
team_name | string_agg
-------------+---------------------
West Indies | Chris,Shannon,Bravo
India | Virat,Rohit,Jasprit
New Zealand | James
(3 rows)
The “player_name” column in the SELECT query is separated by commas and displayed alongside the “team_name” as seen in the output obtained. The rows are divided according to the field “team_name” using the GROUP BY command. The expression that needs to be separated is defined in the first parameter of the STRING_AGG() function, and the values are separated in the second parameter by the comma character “,”.
Use Case#2: Using PostgreSQL STRING_AGG() to separate multiple columns with commas
Multiple STRING_AGG() functions can be used in a single SELECT statement. The demonstration of that is provided in the example below.
SELECT "team_name",string_agg("player_name",', '), string_agg("player_positon",',')
FROM "players" GROUP BY "team_name";
—OUTPUT—
team_name | string_agg | string_agg
-------------+-----------------------+------------------------
West Indies | Chris, Shannon, Bravo | Batsman,Bowler,Batsman
India | Virat, Rohit, Jasprit | Batsman,Batsman,Bowler
New Zealand | James | All rounder
(3 rows)
Here, we’ve combined the values of columns separated by commas using two STRING_AGG() functions in two separate columns. The output obtained makes it quite evident that the list was produced with two-column values now separated by commas.
But as seen in the output, the column names are not defined and are set as “string_agg.” To
specify the column names
, we can use
AS
keyword in the following manner:
SELECT "team_name",string_agg ("player_name", ',') AS players_name,
string_agg ("player_positon", ',') AS players_positions
FROM "players" GROUP BY "team_name";
—OUTPUT—
team_name | players_name | players_positions
-------------+---------------------+------------------------
West Indies | Chris,Shannon,Bravo | Batsman,Bowler,Batsman
India | Virat,Rohit,Jasprit | Batsman,Batsman,Bowler
New Zealand | James | All rounder
(3 rows)
Use Case#3: Removing duplicates in our output string
In our sample table, there are multiple players with player_position as “batsman”. We have to use the
DISTINCT
keyword to omit duplicate values in the first parameter.
SELECT "team_name",string_agg(DISTINCT "player_name",', '), string_agg( DISTINCT "player_positon",',')
FROM "players" GROUP BY "team_name";
—OUTPUT—
team_name | string_agg | string_agg
-------------+-----------------------+----------------
India | Jasprit, Rohit, Virat | Batsman,Bowler
New Zealand | James | All rounder
West Indies | Bravo, Chris, Shannon | Batsman,Bowler
(3 rows)
Use Case#4: Ordering the contents within the output string
To obtain the output of the PostgreSQL STRING_AGG() function in an ordered manner(alphabetically), we can use the following command:
SELECT "team_name",string_agg ("player_name", ',' ORDER BY "player_name" asc) AS players_name,
string_agg ("player_positon", ',' ORDER BY "player_positon" asc) AS players_positions
FROM "players" GROUP BY "team_name";
—OUTPUT—
team_name | players_name | players_positions
-------------+---------------------+------------------------
India | Jasprit,Rohit,Virat | Batsman,Batsman,Bowler
New Zealand | James | All rounder
West Indies | Bravo,Chris,Shannon | Batsman,Batsman,Bowler
(3 rows)
How does PostgreSQL STRING_AGG() function works?
Character string data types should be used as the input expression. Other data types are likewise acceptable if we explicitly convert them to the character string data type before using them.
We receive the outcome in string type via PostgreSQL STRING_AGG().
Similar to how we use other PostgreSQL aggregate methods like MIN(), MAX(), AVG(), SUM(), and COUNT(), the STRING_AGG() is typically used with the GROUP BY clause.
Practical Example to Implement PostgreSQL STRING_AGG() function
The CREATE TABLE statement will be used to create the tables “student” and “course” as follows:
CREATE A STUDENT TABLE:
create table student
stud_id serial PRIMARY KEY,
stud_name VARCHAR(80) NOT NULL,
stud_grade CHAR(1) NOT NULL,
stud_country VARCHAR(80) NOT NULL,
course_id int NOT NULL
CREATE A COURSE TABLE:
create table course
course_id serial PRIMARY KEY,
course_name VARCHAR(80) NOT NULL
INSERT VALUES INTO THE COURSE TABLE:
INSERT INTO course(course_name)
VALUES
('MCA'),
('BCA'),
('MBA');
select * from course;
—OUTPUT—
course_id | course_name
-----------+-------------
1 | MCA
2 | BCA
3 | MBA
(3 rows)
INSERT VALUES INTO THE STUDENT TABLE:
INSERT INTO student(stud_name,stud_grade,stud_country,course_id)
VALUES
('Radhika','A','India',1),
('Dev','B','India',2),
('Divanshi','C','India',3),
('Bhavya','A','India',1),
('Bella','B','Canada',2),
('Hannah','A','Canada',3),
('Elena','A','USA',1),
('Damon','B','USA',2),
('Edward','C','USA',3);
select * from student;
—OUTPUT—
stud_id | stud_name | stud_grade | stud_country | course_id
---------+-----------+------------+--------------+-----------
1 | Radhika | A | India | 1
2 | Dev | B | India | 2
3 | Divanshi | C | India | 3
4 | Bhavya | A | India | 1
5 | Bella | B | Canada | 2
6 | Hannah | A | Canada | 3
7 | Elena | A | USA | 1
8 | Damon | B | USA | 2
9 | Edward | C | USA | 3
(9 rows)
Now let us perform a JOIN and in the output obtained, we can see that each row has a separate entry that matches the course with the student’s course.
SELECT c.course_name AS "course name", s.stud_name AS "student name"
FROM course c RIGHT JOIN student s ON c.course_id = s.course_id
ORDER BY 1;
—OUTPUT—
course name | student name
-------------+--------------
BCA | Damon
BCA | Bella
BCA | Dev
MBA | Edward
MBA | Divanshi
MBA | Hannah
MCA | Elena
MCA | Bhavya
MCA | Radhika
(9 rows)
By utilizing PostgreSQL STRING_AGG() function, we can concatenate the student names along with the course_name as follows:
SELECT
crs.course_name AS "course name",
string_agg(stud.stud_name, ', ') AS "student list"
FROM course crs
JOIN student stud ON crs.course_id = stud.course_id
GROUP BY 1
ORDER BY 1;
—OUTPUT—
course name | student list
-------------+--------------------------
BCA | Damon, Bella, Dev
MBA | Edward, Divanshi, Hannah
MCA | Elena, Bhavya, Radhika
(3 rows)
Here, we will employ the PostgreSQL STRING_AGG() function to concatenate the student names and the stud_grade. As seen in the output, a list of students with the same grade concatenated by commas is generated.
SELECT stud_grade, STRING_AGG(stud_name,', ') AS StudentsPerGrade
FROM student
GROUP BY stud_grade
ORDER BY 1 ;
—OUTPUT—
stud_grade | studentspergrade
------------+--------------------------------
A | Radhika, Hannah, Elena, Bhavya
B | Bella, Dev, Damon
C | Edward, Divanshi
(3 rows)
In the example below, a comma separator is used to group and concatenate all students from the same country.
SELECT STRING_AGG(stud_name, ', ') AS "student_names", stud_country
FROM student
GROUP BY stud_country;
—OUTPUT—
student_names | stud_country
--------------------------------+--------------
Elena, Damon, Edward | USA
Bella, Hannah | Canada
Radhika, Dev, Divanshi, Bhavya | India
(3 rows)
Advantages
Using the ORDER BY clause, we may govern the order of the result.
The STRING_AGG PostgreSQL function can concatenate all strings and insert a separator or delimiter between them.
The PostgreSQL STRING_AGG() function does not add delimiter symbols or separators to the end of the string and supports a variety of delimiter symbols and separators.
This approach is appropriate when we must aggregate all data labels into an array format to obtain fewer rows from a denormalized database (or query result).
Because of the enhanced query performance, massive data from object-based schemas and other pre-joined tables may be analyzed more quickly and effectively.
Conclusion
After reading this article, we trust that you now understand the working & usage of the PostgreSQL STRING_AGG() function. We have also explained it through several examples to fully comprehend the PostgreSQL STRING_AGG() function.
This article has taught us to create comma-separated values with one or more columns using the STRING_AGG() function. The names of chosen columns can also be specified using the AS keyword. You will be prepared to employ PostgreSQL STRING_AGG() function in your own PostgreSQL database with the help of this tutorial.
To continue your learning journey, you may check out these amazing articles at Hevo:–
Copying data into a warehouse using ETL for Data Analysis may be time-consuming if you frequently utilize PostgreSQL.
Hevo, No-Code Data Pipeline, is at your disposal to rescue you. You can save your engineering bandwidth by establishing a Data Pipeline and start to replicate your data from PostgreSQL to the desired warehouse using Hevo within minutes.