read.jdbc {SparkR}
|
R Documentation
|
Create a SparkDataFrame representing the database table accessible via JDBC URL
Description
Additional JDBC database connection properties can be set (...)
Usage
read.jdbc(url, tableName, partitionColumn = NULL, lowerBound = NULL,
upperBound = NULL, numPartitions = 0L, predicates = list(), ...)
Arguments
JDBC database url of the form
jdbc:subprotocol:subname
tableName
the name of the table in the external database
partitionColumn
the name of a column of integral type that will be used for partitioning
lowerBound
the minimum value of
partitionColumn
used to decide partition stride
upperBound
the maximum value of
partitionColumn
used to decide partition stride
numPartitions
the number of partitions, This, along with
lowerBound
(inclusive),
upperBound
(exclusive), form partition strides for generated WHERE
clause expressions used to split the column
partitionColumn
evenly.
This defaults to SparkContext.defaultParallelism when unset.
predicates
a list of conditions in the where clause; each one defines one partition
additional JDBC database connection named properties.
Details
Only one of partitionColumn or predicates should be set. Partitions of the table will be
retrieved in parallel based on the
numPartitions
or by the predicates.
Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
your external database systems.
Value
SparkDataFrame
read.jdbc since 2.0.0
Examples
## Not run:
##D sparkR.session()
##D jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
##D df <- read.jdbc(jdbcUrl, "table", predicates = list("field<=123"), user = "username")
##D df2 <- read.jdbc(jdbcUrl, "table2", partitionColumn = "index", lowerBound = 0,
##D upperBound = 10000, user = "username", password = "password")