Close the stream cleanly.
For writable streams, this will attempt to flush any pending data before releasing the underlying resource.
After Close() is called, closed() returns true and the stream is not available for further operations.
virtual Status Abort ( ) ¶Close the stream abruptly.
This method does not guarantee that any pending data is flushed. It merely releases any underlying resource used by the stream for its operation.
After Abort() is called, closed() returns true and the stream is not available for further operations.
virtual Result < int64_t > Read ( int64_t nbytes , void * out ) = 0 ¶Read data from current file position.
Read at most
nbytes
from the current file position into
out
. The number of bytes read is returned.
Read data from current file position.
Read at most
nbytes
from the current file position. Less bytes may be read if EOF is reached. This method updates the current file position.
In some cases (e.g. a memory-mapped file), this method may avoid a memory copy.
virtual const IOContext & io_context ( ) const ¶EXPERIMENTAL: The IOContext associated with this file.
By default, this is the same as default_io_context(), but it may be overriden by subclasses.
virtual Status Write ( const void * data , int64_t nbytes ) = 0 ¶Write the given data to the stream.
This method always processes the bytes in full. Depending on the semantics of the stream, the data may be written out immediately, held in a buffer, or written asynchronously. In the case where the stream buffers the data, it will be copied. To avoid potentially large copies, use the Write variant that takes an owned Buffer .
virtual Status Write ( const std :: shared_ptr < Buffer > & data ) ¶Write the given data to the stream.
Since the Buffer owns its memory, this method can avoid a copy if buffering is required. See Write(const void*, int64_t) for details.
class InputStream : public virtual arrow :: io :: FileInterface , public virtual arrow :: io :: Readable , public std :: enable_shared_from_this < InputStream > ¶Subclassed by arrow::io::internal::InputStreamConcurrencyWrapper< Derived >, arrow::io::RandomAccessFile , arrow::io::StdinStream, arrow::io::TransformInputStream , arrow::io::internal::InputStreamConcurrencyWrapper< BufferedInputStream >, arrow::io::internal::InputStreamConcurrencyWrapper< CompressedInputStream >, arrow::io::SlowInputStreamBase< InputStream >
Public Functions
Status Advance ( int64_t nbytes ) ¶Advance or skip stream indicated number of bytes.
nbytes – [in] the number to move forward
Return zero-copy string_view to upcoming bytes.
Do not modify the stream position. The view becomes invalid after any operation on the stream. May trigger buffering if the requested size is larger than the number of buffered bytes.
May return NotImplemented on streams that don’t support it.
nbytes – [in] the maximum number of bytes to see
virtual bool supports_zero_copy ( ) const ¶Return true if InputStream is capable of zero copy Buffer reads.
Zero copy reads imply the use of Buffer-returning Read() overloads.
virtual Result < std :: shared_ptr < const KeyValueMetadata > > ReadMetadata ( ) ¶Read and return stream metadata.
If the stream implementation doesn’t support metadata, empty metadata is returned. Note that it is allowed to return a null pointer rather than an allocated empty metadata.
virtual Future < std :: shared_ptr < const KeyValueMetadata > > ReadMetadataAsync ( const IOContext & io_context ) ¶Read stream metadata asynchronously.
class RandomAccessFile : public arrow :: io :: InputStream , public arrow :: io :: Seekable ¶Subclassed by arrow::io::HdfsReadableFile, arrow::io::internal::RandomAccessFileConcurrencyWrapper< Derived >, arrow::io::ReadWriteFileInterface , arrow::io::internal::RandomAccessFileConcurrencyWrapper< BufferReader >, arrow::io::internal::RandomAccessFileConcurrencyWrapper< CudaBufferReader >, arrow::io::internal::RandomAccessFileConcurrencyWrapper< ReadableFile >, arrow::io::SlowInputStreamBase< RandomAccessFile >
Public Functions
~RandomAccessFile ( ) override ¶Necessary because we hold a std::unique_ptr.
virtual Result < int64_t > ReadAt ( int64_t position , int64_t nbytes , void * out ) ¶Read data from given file position.
At most
nbytes
bytes are read. The number of bytes read is returned (it can be less than
nbytes
if EOF is reached).
This method can be safely called from multiple threads concurrently. It is unspecified whether this method updates the file position or not.
The default RandomAccessFile-provided implementation uses Seek() and Read() , but subclasses may override it with a more efficient implementation that doesn’t depend on implicit file positioning.
position – [in] Where to read bytes from
nbytes – [in] The number of bytes to read
out – [out] The buffer to read bytes into
The number of bytes read, or an error
virtual Result < std :: shared_ptr < Buffer > > ReadAt ( int64_t position , int64_t nbytes ) ¶Read data from given file position.
At most
nbytes
bytes are read, but it can be less if EOF is reached.
position – [in] Where to read bytes from
nbytes – [in] The number of bytes to read
A buffer containing the bytes read, or an error
virtual Future < std :: shared_ptr < Buffer > > ReadAsync ( const IOContext & , int64_t position , int64_t nbytes ) ¶EXPERIMENTAL: Read data asynchronously.
virtual std :: vector < Future < std :: shared_ptr < Buffer > > > ReadManyAsync ( const IOContext & , const std :: vector < ReadRange > & ranges ) ¶EXPERIMENTAL: Explicit multi-read.
Request multiple reads at once
The underlying filesystem may optimize these reads by coalescing small reads into large reads or by breaking up large reads into multiple parallel smaller reads. The reads should be issued in parallel if it makes sense for the filesystem.
One future will be returned for each input read range. Multiple returned futures may correspond to a single read. Or, a single returned future may be a combined result of several individual reads.
ranges – [in] The ranges to read
A future that will complete with the data from the requested range is available
std :: vector < Future < std :: shared_ptr < Buffer > > > ReadManyAsync ( const std :: vector < ReadRange > & ranges ) ¶EXPERIMENTAL: Explicit multi-read, using the file’s IOContext.
virtual Status WillNeed ( const std :: vector < ReadRange > & ranges ) ¶EXPERIMENTAL: Inform that the given ranges may be read soon.
Some implementations might arrange to prefetch some of the data. However, no guarantee is made and the default implementation does nothing. For robust prefetching, use ReadAt() or ReadAsync() .
static Result < std :: shared_ptr < InputStream > > GetStream ( std :: shared_ptr < RandomAccessFile > file , int64_t file_offset , int64_t nbytes ) ¶Create an isolated InputStream that reads a segment of a RandomAccessFile .
Multiple such stream can be created and used independently without interference
file – [in] a file instance
file_offset – [in] the starting position in the file
nbytes – [in] the extent of bytes to read. The file should have sufficient bytes available
Subclassed by arrow::io::BufferedOutputStream , arrow::io::BufferOutputStream , arrow::io::CompressedOutputStream , arrow::io::FileOutputStream , arrow::io::HdfsOutputStream, arrow::io::MockOutputStream , arrow::io::StderrStream, arrow::io::StdoutStream, arrow::io::WritableFile
class ReadWriteFileInterface : public arrow :: io :: RandomAccessFile , public arrow :: io :: WritableFile ¶Subclassed by arrow::io::MemoryMappedFile
class BufferReader : public arrow :: io :: internal :: RandomAccessFileConcurrencyWrapper < BufferReader > ¶Random access zero-copy reads on an arrow::Buffer .
Public Functions
explicit BufferReader ( std :: string_view data ) ¶Instantiate from std::string or std::string_view.
Does not own data
virtual bool supports_zero_copy ( ) const override ¶Return true if InputStream is capable of zero copy Buffer reads.
Zero copy reads imply the use of Buffer-returning Read() overloads.
virtual Future < std :: shared_ptr < Buffer > > ReadAsync ( const IOContext & , int64_t position , int64_t nbytes ) override ¶EXPERIMENTAL: Read data asynchronously.
virtual Status WillNeed ( const std :: vector < ReadRange > & ranges ) override ¶EXPERIMENTAL: Inform that the given ranges may be read soon.
Some implementations might arrange to prefetch some of the data. However, no guarantee is made and the default implementation does nothing. For robust prefetching, use ReadAt() or ReadAsync() .
class MockOutputStream : public arrow :: io :: OutputStream ¶A helper class to track the size of allocations.
Writes to this stream do not copy or retain any data, they just bump a size counter that can be later used to know exactly which data size needs to be allocated for actual writing.
Public Functions
virtual Status Close ( ) override ¶Close the stream cleanly.
For writable streams, this will attempt to flush any pending data before releasing the underlying resource.
After Close() is called, closed() returns true and the stream is not available for further operations.
virtual Status Write ( const void * data , int64_t nbytes ) override ¶Write the given data to the stream.
This method always processes the bytes in full. Depending on the semantics of the stream, the data may be written out immediately, held in a buffer, or written asynchronously. In the case where the stream buffers the data, it will be copied. To avoid potentially large copies, use the Write variant that takes an owned Buffer .
class BufferOutputStream : public arrow :: io :: OutputStream ¶An output stream that writes to a resizable buffer.
Public Functions
virtual Status Close ( ) override ¶Close the stream, preserving the buffer (retrieve it with Finish() ).
virtual Status Write ( const void * data , int64_t nbytes ) override ¶Write the given data to the stream.
This method always processes the bytes in full. Depending on the semantics of the stream, the data may be written out immediately, held in a buffer, or written asynchronously. In the case where the stream buffers the data, it will be copied. To avoid potentially large copies, use the Write variant that takes an owned Buffer .
Status Reset ( int64_t initial_capacity = 1024 , MemoryPool * pool = default_memory_pool ( ) ) ¶Initialize state of OutputStream with newly allocated memory and set position to 0.
initial_capacity – [in] the starting allocated capacity
pool – [inout] the memory pool to use for allocations
Create in-memory output stream with indicated capacity using a memory pool.
initial_capacity – [in] the initial allocated internal capacity of the OutputStream
pool – [inout] a MemoryPool to use for allocations
the created stream
class FixedSizeBufferWriter : public arrow :: io :: WritableFile ¶An output stream that writes into a fixed-size mutable buffer.
Public Functions
explicit FixedSizeBufferWriter ( const std :: shared_ptr < Buffer > & buffer ) ¶Input buffer must be mutable, will abort if not.
virtual Status Close ( ) override ¶Close the stream cleanly.
For writable streams, this will attempt to flush any pending data before releasing the underlying resource.
After Close() is called, closed() returns true and the stream is not available for further operations.
virtual Status Write ( const void * data , int64_t nbytes ) override ¶Write the given data to the stream.
This method always processes the bytes in full. Depending on the semantics of the stream, the data may be written out immediately, held in a buffer, or written asynchronously. In the case where the stream buffers the data, it will be copied. To avoid potentially large copies, use the Write variant that takes an owned Buffer .
class ReadableFile : public arrow :: io :: internal :: RandomAccessFileConcurrencyWrapper < ReadableFile > ¶An operating system file open in read-only mode.
Reads through this implementation are unbuffered. If many small reads need to be issued, it is recommended to use a buffering layer for good performance.
Public Functions
virtual bool closed ( ) const override ¶Return whether the stream is closed.
virtual Status WillNeed ( const std :: vector < ReadRange > & ranges ) override ¶EXPERIMENTAL: Inform that the given ranges may be read soon.
Some implementations might arrange to prefetch some of the data. However, no guarantee is made and the default implementation does nothing. For robust prefetching, use ReadAt() or ReadAsync() .
static Result < std :: shared_ptr < ReadableFile > > Open ( const std :: string & path , MemoryPool * pool = default_memory_pool ( ) ) ¶Open a local file for reading.
path – [in] with UTF8 encoding
pool – [in] a MemoryPool for memory allocations
ReadableFile instance
static Result < std :: shared_ptr < ReadableFile > > Open ( int fd , MemoryPool * pool = default_memory_pool ( ) ) ¶Open a local file for reading.
The file descriptor becomes owned by the ReadableFile , and will be closed on Close() or destruction.
fd – [in] file descriptor
pool – [in] a MemoryPool for memory allocations
ReadableFile instance
class FileOutputStream : public arrow :: io :: OutputStream ¶An operating system file open in write-only mode.
Public Functions
virtual Status Close ( ) override ¶Close the stream cleanly.
For writable streams, this will attempt to flush any pending data before releasing the underlying resource.
After Close() is called, closed() returns true and the stream is not available for further operations.
virtual Status Write ( const void * data , int64_t nbytes ) override ¶Write the given data to the stream.
This method always processes the bytes in full. Depending on the semantics of the stream, the data may be written out immediately, held in a buffer, or written asynchronously. In the case where the stream buffers the data, it will be copied. To avoid potentially large copies, use the Write variant that takes an owned Buffer .
static Result < std :: shared_ptr < FileOutputStream > > Open ( const std :: string & path , bool append = false ) ¶Open a local file for writing, truncating any existing file.
When opening a new file, any existing file with the indicated path is truncated to 0 bytes, deleting any existing data
path – [in] with UTF8 encoding
append – [in] append to existing file, otherwise truncate to 0 bytes
an open FileOutputStream
static Result < std :: shared_ptr < FileOutputStream > > Open ( int fd ) ¶Open a file descriptor for writing.
The underlying file isn’t truncated.
The file descriptor becomes owned by the
OutputStream , and will be closed on Close() or destruction.fd – [in] file descriptor
an open FileOutputStream
class MemoryMappedFile : public arrow :: io :: ReadWriteFileInterface ¶A file interface that uses memory-mapped files for memory interactions.
This implementation supports zero-copy reads. The same class is used for both reading and writing.
If opening a file in a writable mode, it is not truncated first as with FileOutputStream .
Public Functions
virtual Status Close ( ) override ¶Close the stream cleanly.
For writable streams, this will attempt to flush any pending data before releasing the underlying resource.
After Close() is called, closed() returns true and the stream is not available for further operations.
virtual Result < int64_t > Read ( int64_t nbytes , void * out ) override ¶Read data from current file position.
Read at most
nbytes
from the current file position into
out
. The number of bytes read is returned.
Read data from current file position.
Read at most
nbytes
from the current file position. Less bytes may be read if EOF is reached. This method updates the current file position.
In some cases (e.g. a memory-mapped file), this method may avoid a memory copy.
virtual Result < std :: shared_ptr < Buffer > > ReadAt ( int64_t position , int64_t nbytes ) override ¶Read data from given file position.
At most
nbytes
bytes are read, but it can be less if EOF is reached.
position – [in] Where to read bytes from
nbytes – [in] The number of bytes to read
A buffer containing the bytes read, or an error
virtual Result < int64_t > ReadAt ( int64_t position , int64_t nbytes , void * out ) override ¶Read data from given file position.
At most
nbytes
bytes are read. The number of bytes read is returned (it can be less than
nbytes
if EOF is reached).
This method can be safely called from multiple threads concurrently. It is unspecified whether this method updates the file position or not.
The default RandomAccessFile-provided implementation uses Seek() and Read() , but subclasses may override it with a more efficient implementation that doesn’t depend on implicit file positioning.
position – [in] Where to read bytes from
nbytes – [in] The number of bytes to read
out – [out] The buffer to read bytes into
The number of bytes read, or an error
virtual Future < std :: shared_ptr < Buffer > > ReadAsync ( const IOContext & , int64_t position , int64_t nbytes ) override ¶EXPERIMENTAL: Read data asynchronously.
virtual Status WillNeed ( const std :: vector < ReadRange > & ranges ) override ¶EXPERIMENTAL: Inform that the given ranges may be read soon.
Some implementations might arrange to prefetch some of the data. However, no guarantee is made and the default implementation does nothing. For robust prefetching, use ReadAt() or ReadAsync() .
virtual bool supports_zero_copy ( ) const override ¶Return true if InputStream is capable of zero copy Buffer reads.
Zero copy reads imply the use of Buffer-returning Read() overloads.
virtual Status WriteAt ( int64_t position , const void * data , int64_t nbytes ) override ¶Write data at a particular position in the file. Thread-safe.
static Result < std :: shared_ptr < MemoryMappedFile > > Create ( const std :: string & path , int64_t size ) ¶Create new file with indicated size, return in read/write mode.
class BufferedInputStream : public arrow :: io :: internal :: InputStreamConcurrencyWrapper < BufferedInputStream > ¶An InputStream that performs buffered reads from an unbuffered InputStream , which can mitigate the overhead of many small reads in some cases.
Public Functions
Status SetBufferSize ( int64_t new_buffer_size ) ¶Resize internal read buffer; calls to Read(…) will read at least.
new_buffer_size – [in] the new read buffer size
Release the raw InputStream .
Any data buffered will be discarded. Further operations on this object are invalid
raw the underlying InputStream
virtual Result < std :: shared_ptr < const KeyValueMetadata > > ReadMetadata ( ) override ¶Read and return stream metadata.
If the stream implementation doesn’t support metadata, empty metadata is returned. Note that it is allowed to return a null pointer rather than an allocated empty metadata.
virtual Future < std :: shared_ptr < const KeyValueMetadata > > ReadMetadataAsync ( const IOContext & io_context ) override ¶Read stream metadata asynchronously.
static Result < std :: shared_ptr < BufferedInputStream > > Create ( int64_t buffer_size , MemoryPool * pool , std :: shared_ptr < InputStream > raw , int64_t raw_read_bound = - 1 ) ¶Create a BufferedInputStream from a raw InputStream .
buffer_size – [in] the size of the temporary read buffer
pool – [in] a MemoryPool to use for allocations
raw – [in] a raw InputStream
raw_read_bound – [in] a bound on the maximum number of bytes to read from the raw input stream. The default -1 indicates that it is unbounded
the created BufferedInputStream
Result < std :: shared_ptr < OutputStream > > Detach ( ) ¶Flush any buffered writes and release the raw OutputStream .
Further operations on this object are invalid
the underlying OutputStream
virtual Status Abort ( ) override ¶Close the stream abruptly.
This method does not guarantee that any pending data is flushed. It merely releases any underlying resource used by the stream for its operation.
After Abort() is called, closed() returns true and the stream is not available for further operations.
virtual Status Write ( const void * data , int64_t nbytes ) override ¶Write the given data to the stream.
This method always processes the bytes in full. Depending on the semantics of the stream, the data may be written out immediately, held in a buffer, or written asynchronously. In the case where the stream buffers the data, it will be copied. To avoid potentially large copies, use the Write variant that takes an owned Buffer .
virtual Status Write ( const std :: shared_ptr < Buffer > & data ) override ¶Write the given data to the stream.
Since the Buffer owns its memory, this method can avoid a copy if buffering is required. See Write(const void*, int64_t) for details.
static Result < std :: shared_ptr < BufferedOutputStream > > Create ( int64_t buffer_size , MemoryPool * pool , std :: shared_ptr < OutputStream > raw ) ¶Create a buffered output stream wrapping the given output stream.
buffer_size – [in] the size of the temporary write buffer
pool – [in] a MemoryPool to use for allocations
raw – [in] another OutputStream
the created BufferedOutputStream
class CompressedInputStream : public arrow :: io :: internal :: InputStreamConcurrencyWrapper < CompressedInputStream > ¶Public Functions
virtual bool closed ( ) const override ¶Return whether the stream is closed.
virtual Result < std :: shared_ptr < const KeyValueMetadata > > ReadMetadata ( ) override ¶Read and return stream metadata.
If the stream implementation doesn’t support metadata, empty metadata is returned. Note that it is allowed to return a null pointer rather than an allocated empty metadata.
virtual Future < std :: shared_ptr < const KeyValueMetadata > > ReadMetadataAsync ( const IOContext & io_context ) override ¶Read stream metadata asynchronously.
static Result < std :: shared_ptr < CompressedInputStream > > Make ( util :: Codec * codec , const std :: shared_ptr < InputStream > & raw , MemoryPool * pool = default_memory_pool ( ) ) ¶Create a compressed input stream wrapping the given input stream.
virtual Status Abort ( ) override ¶Close the stream abruptly.
This method does not guarantee that any pending data is flushed. It merely releases any underlying resource used by the stream for its operation.
After Abort() is called, closed() returns true and the stream is not available for further operations.
virtual Status Write ( const void * data , int64_t nbytes ) override ¶Write the given data to the stream.
This method always processes the bytes in full. Depending on the semantics of the stream, the data may be written out immediately, held in a buffer, or written asynchronously. In the case where the stream buffers the data, it will be copied. To avoid potentially large copies, use the Write variant that takes an owned Buffer .
static Result < std :: shared_ptr < CompressedOutputStream > > Make ( util :: Codec * codec , const std :: shared_ptr < OutputStream > & raw , MemoryPool * pool = default_memory_pool ( ) ) ¶Create a compressed output stream wrapping the given output stream.
virtual Status Close ( ) override ¶Close the stream cleanly.
For writable streams, this will attempt to flush any pending data before releasing the underlying resource.
After Close() is called, closed() returns true and the stream is not available for further operations.
virtual Status Abort ( ) override ¶Close the stream abruptly.
This method does not guarantee that any pending data is flushed. It merely releases any underlying resource used by the stream for its operation.
After Abort() is called, closed() returns true and the stream is not available for further operations.
virtual Result < int64_t > Read ( int64_t nbytes , void * out ) override ¶Read data from current file position.
Read at most
nbytes
from the current file position into
out
. The number of bytes read is returned.
Read data from current file position.
Read at most
nbytes
from the current file position. Less bytes may be read if EOF is reached. This method updates the current file position.
In some cases (e.g. a memory-mapped file), this method may avoid a memory copy.
virtual Result < std :: shared_ptr < const KeyValueMetadata > > ReadMetadata ( ) override ¶Read and return stream metadata.
If the stream implementation doesn’t support metadata, empty metadata is returned. Note that it is allowed to return a null pointer rather than an allocated empty metadata.
virtual Future < std :: shared_ptr < const KeyValueMetadata > > ReadMetadataAsync ( const IOContext & io_context ) override ¶Read stream metadata asynchronously.