DataFrame.
to_parquet
(
path
=
None
,
engine
=
'auto'
,
compression
=
'snappy'
,
index
=
None
,
partition_cols
=
None
,
storage_options
=
None
,
**
kwargs
)
[source]
Write a DataFrame to the binary parquet format.
This function writes the dataframe as a
parquet file
. You can choose different parquet
backends, and have the option of compression. See
the user guide
for more details.
Parameters
path
str or file-like object, default None
If a string, it will be used as Root Directory path
when writing a partitioned dataset. By file-like object,
we refer to objects with a write() method, such as a file handle
(e.g. via builtin open function) or io.BytesIO. The engine
fastparquet does not accept file-like objects. If path is None,
a bytes object is returned.
Changed in version 1.2.0.
Previously this was “fname”
engine
{‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’
Parquet library to use. If ‘auto’, then the option
io.parquet.engine
is used. The default
io.parquet.engine
behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if
‘pyarrow’ is unavailable.
compression
{‘snappy’, ‘gzip’, ‘brotli’, None}, default ‘snappy’
Name of the compression to use. Use
None
for no compression.
index
bool, default None
If
True
, include the dataframe’s index(es) in the file output.
If
False
, they will not be written to the file.
If
None
, similar to
True
the dataframe’s index(es)
will be saved. However, instead of being saved as values,
the RangeIndex will be stored as a range in the metadata so it
doesn’t require much space and is faster. Other indexes will
be included as columns in the file output.
partition_cols
list, optional, default None
Column names by which to partition the dataset.
Columns are partitioned in the order they are given.
Must be None if path is not a string.
storage_options
dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to
urllib
as header options. For other URLs (e.g.
starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to
fsspec
. Please see
fsspec
and
urllib
for more details.
New in version 1.2.0.
**kwargs
Additional arguments passed to the parquet library. See
pandas io
for more details.
Returns
bytes if no path argument is provided else None
This function requires either the
fastparquet
or
pyarrow
library.
Examples
>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_parquet('df.parquet.gzip',
... compression='gzip')
>>> pd.read_parquet('df.parquet.gzip')
col1 col2
0 1 3
1 2 4
If you want to get a buffer to the parquet content you can use a io.BytesIO
object, as long as you don’t use partition_cols, which creates multiple files.
>>> import io
>>> f = io.BytesIO()
>>> df.to_parquet(f)
>>> f.seek(0)
>>> content = f.read()