Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
val results = fruits.
map(fruit => Seq(("aaa", "bbb", fruit)).toDF("aCol","bCol","name")).
reduce(_.union(_))
results.show()
Steffen Schmitz's answer is the most concise one I believe.
Below is a more detailed answer if you are looking for more customization (of field types, etc):
import org.apache.spark.sql.types.{StructType, StructField, StringType}
import org.apache.spark.sql.Row
//initialize DF
val schema = StructType(
StructField("aCol", StringType, true) ::
StructField("bCol", StringType, true) ::
StructField("name", StringType, true) :: Nil)
var initialDF = spark.createDataFrame(sc.emptyRDD[Row], schema)
//list to iterate through
var fruits = List(
"apple"
,"orange"
,"melon"
for (x <- fruits) {
//union returns a new dataset
initialDF = initialDF.union(Seq(("aaa", "bbb", x)).toDF)
//initialDF.show()
references:
How to create an empty DataFrame with a specified schema?
https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/sql/Dataset.html
https://docs.databricks.com/spark/latest/faq/append-a-row-to-rdd-or-dataframe.html
If you have different/multiple dataframes you can use below code, which is efficient.
val newDFs = Seq(DF1,DF2,DF3)
newDFs.reduce(_ union _)
–
–
you can first create a sequence and then use toDF
to create Dataframe
.
scala> var dseq : Seq[(String,String,String)] = Seq[(String,String,String)]()
dseq: Seq[(String, String, String)] = List()
scala> for ( x <- fruits){
| dseq = dseq :+ ("aaa","bbb",x)
scala> dseq
res2: Seq[(String, String, String)] = List((aaa,bbb,apple), (aaa,bbb,orange), (aaa,bbb,melon))
scala> val df = dseq.toDF("aCol","bCol","name")
df: org.apache.spark.sql.DataFrame = [aCol: string, bCol: string, name: string]
scala> df.show
+----+----+------+
|aCol|bCol| name|
+----+----+------+
| aaa| bbb| apple|
| aaa| bbb|orange|
| aaa| bbb| melon|
+----+----+------+
–
–
Well... I think your question is a bit mis-guided.
As per my limited understanding of whatever you are trying to do, you should be doing following,
val fruits = List(
"apple",
"orange",
"melon"
val df = fruits
.map(x => ("aaa", "bbb", x))
.toDF("aCol", "bCol", "name")
And this should be sufficient.
–
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.