向数据框添加一个新列.新列我希望它是一个 UUID 生成器
问题描述:
我想向 Dataframe(一个 UUID 生成器)添加一个新列.
I want to add a new column to a Dataframe, a UUID generator.
UUID 值类似于 21534cf7-cff9-482a-a3a8-9e7244240da7
我的研究:
我在 spark 中尝试过 withColumn
方法.
I've tried with withColumn
method in spark.
val DF2 = DF1.withColumn("newcolname", DF1("existingcolname" + 1)
因此 DF2 将有一个带有 newcolname
的附加列,并在所有行中添加 1.
So DF2 will have additional column with newcolname
with 1 added to it in all rows.
我的要求是我想要一个可以生成 UUID 的新列.
By my requirement is that I want to have a new column which can generate the UUID.
答
你应该尝试这样的事情:
You should try something like this:
val sc: SparkContext = ...
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val generateUUID = udf(() => UUID.randomUUID().toString)
val df1 = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value")
val df2 = df1.withColumn("UUID", generateUUID())
df1.show()
df2.show()
输出将是:
+---+-----+
| id|value|
+---+-----+
|id1| 1|
|id2| 4|
|id3| 5|
+---+-----+
+---+-----+--------------------+
| id|value| UUID|
+---+-----+--------------------+
|id1| 1|f0cfd0e2-fbbe-40f...|
|id2| 4|ec8db8b9-70db-46f...|
|id3| 5|e0e91292-1d90-45a...|
+---+-----+--------------------+