Data types in Amazon Athena
When you run
CREATE TABLE
, you specify column names
and the data type that each column can contain. Athena supports the data types listed below.
For information about the data type mappings that the JDBC driver supports between Athena,
JDBC, and Java, see
Data types
tinyint
– A 8-bit signed
integer in two's complement format, with a minimum value of
-2
7
and a maximum value of
2
7
-1.
smallint
– A 16-bit signed
integer in two's complement format, with a minimum value of
-2
15
and a maximum value of
2
15
-1.
int
and
integer
– Athena uses different expressions for
integer depending on the type of query.
integer
– In DML
queries like
SELECT * FROM
, use the
integer
data
type.
integer
is represented as a 32-bit signed value in two's
complement format, with a minimum value of -2
31
and a maximum value of 2
31
-1.
To ensure compatibility with business analytics applications, the
JDBC driver returns the
integer
type.
bigint
– A 64-bit signed
integer in two's complement format, with a minimum value of
-2
63
and a maximum value of
2
63
-1.
double
– A 64-bit signed
double-precision floating point number. The range is 4.94065645841246544e-324d to
1.79769313486231570e+308d, positive or negative.
double
follows the
IEEE Standard for Floating-Point Arithmetic (IEEE 754).
float
– A 32-bit signed
single-precision floating point number. The range is 1.40129846432481707e-45 to
3.40282346638528860e+38, positive or negative.
float
follows the IEEE
Standard for Floating-Point Arithmetic (IEEE 754). Equivalent to the
real
in Presto. In Athena, use
float
in DDL statements
like
CREATE TABLE
and
real
in SQL functions like
SELECT CAST
. The AWS Glue crawler returns values in
float
, and Athena translates
real
and
float
types internally (see the
June 5, 2018
release notes).
decimal
(
precision
,
scale
)
–
precision
is the total number of
digits.
scale
(optional) is the number of
digits in fractional part with a default of 0. For example, use these type
definitions:
decimal(11,5)
,
decimal(15)
. The maximum value
for
precision
is 38, and the maximum value for
scale
is 38.
To specify decimal values as literals, such as when selecting rows with a specific
decimal value in a query DDL expression, specify the
decimal
type
definition, and list the decimal value as a literal (in single quotes) in your
query, as in this example:
decimal_value = decimal '0.12'
.
char
– Fixed length character
data, with a specified length between 1 and 255, such as
char(10)
. For
more information, see
CHAR Hive data type
Note
To use the
substr
function to return a substring of specified
length from a
char
data type, you must first cast the
char
value as a
varchar
, as in the following
example.
substr(cast(col1 as varchar), 1, 4)
varchar
– Variable length
character data, with a specified length between 1 and 65535, such as
varchar(10)
. For more information, see
VARCHAR Hive data type
string
– A string literal
enclosed in single or double quotes. For more information, see
STRING Hive data type
Note
Non-string data types cannot be cast to
string
in Athena; cast
them to
varchar
instead.
binary
– Used for data in Parquet.
date
– A date in ISO format,
such as
.
For example,
YYYY
-
MM
-
DD
date '2008-09-15'
. An exception is the OpenCSVSerDe, which
uses the number of days elapsed since January 1, 1970. For more information, see
OpenCSVSerDe for processing CSV
.
timestamp
– Date and time
instant in a
java.sql.Timestamp
.
For example,
yyyy
-
MM
-
dd
HH
:
mm
:
ss
[.f
...
]
timestamp '2008-09-15 03:04:05.324'
. An exception is the
OpenCSVSerDe
, which
uses
timestamp
data in the UNIX numeric format (for example,
1579059880000
).
For more information about working with timestamps, see Working with timestamp data later in this document.
array
<
– An array of the given component type.
data_type
>
Example
CREATE TABLE table array_table (c1 array<integer>) LOCATION '...'; INSERT INTO array_table values(ARRAY[1,2,3]);
map
<
– A map between the
given component types.
primitive_type
,
data_type
>
Example
CREATE TABLE map_table(c1 map<string, integer>) LOCATION '...'; INSERT INTO map_table values(MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]));
struct
<
– A collection of elements of
different component types.
col_name
:
data_type
[comment
col_comment
] ,
...
>
Example
CREATE TABLE struct_table(c1 struct<name:varchar(10), age:integer>) LOCATION '...'; INSERT INTO struct_table SELECT CAST(ROW('Bob', 38) AS ROW(name VARCHAR(10), age INTEGER));
Working with timestamp data
This section describes some considerations for working with timestamp data in Athena.
Note
The treatment of timestamps has changed somewhat between Athena engine version 2 and Athena engine version 3. For information about timestamp-related errors that can occur in Athena engine version 3 and suggested solutions, see Timestamp changes in the Athena engine version 3 reference.
Format for writing timestamp data to Amazon S3 objects
The format in which timestamp data should be written into Amazon S3 objects depends on both the column data type and the SerDe library that you use.
If you have a table column of type
DATE
, Athena expects the
corresponding column or property of the data to be a string in the ISO format
YYYY-MM-DD
, or a built-in date type like those for Parquet or
If you have a table column of type
TIME
, Athena expects the
corresponding column or property of the data to be a string in the ISO format
HH:MM:DD
, or a built-in time type like those for Parquet or
If you have a table column of type
TIMESTAMP
, Athena expects the
corresponding column or property of the data to be a string in the format
YYYY-MM-DD HH:MM:SS.SSS
(note the space between the date and
time), or a built-in time type like those for Parquet, ORC, or Ion.
Note
OpenCSVSerDe timestamps are an exception and must be encoded as millisecond resolution UNIX epochs.
Ensuring that time-partitioned data matches the timestamp field in a record
The producer of the data must make sure partition values align with the data within
the partition. For example, if your data has a
timestamp
property and you
use Kinesis Data Firehose to load the data into Amazon S3, you must use
dynamic partitioning
because the default partitioning of Kinesis Data Firehose is wall-clock-based.
Use string as the data type for partition keys
For performance reasons, it is preferable to use
STRING
as the data type
for partition keys. Even though Athena recognizes partition values in the format
YYYY-MM-DD
as dates when you use the
DATE
type, this can
lead to poor performance. For this reason, we recommend that you use the
STRING
data type for partition keys instead.
How to write queries for timestamp fields that are also time-partitioned
How you write queries for timestamp fields that are time-partitioned depends on the type of table that you want to query.
Hive tables
With the Hive tables most commonly used in Athena, the query engine has no knowledge of relationships between columns and partition keys. For this reason, you must always add predicates in your queries for both the column and the partition
For example, suppose you have an
event_time
column and an
event_date
partition key and want to query events between 23:00 and
03:00. In this case, you must include predicates in your query for both the column
and the partition key, as in the following example.
WHERE event_time BETWEEN
start_time
ANDend_time
AND event_date BETWEENstart_time_date
ANDend_time_date
Iceberg tables
With Iceberg tables, you can use computed partition values, which simplifies your
queries. For example, suppose your Iceberg table was created with a
PARTITIONED BY
clause like the following:
PARTITIONED BY (event_date month(event_time))
In this case, the query engine automatically prunes partitions based on the values
of the
event_time
predicates. Because of this, your query only needs to
specify a predicate for
event_time
, as in the following example.