在spark scala中将1列拆分为3列

问题描述:

我在 Spark 中有一个使用 Scala 的数据框,其中有一列我需要拆分.

I have a dataframe in Spark using scala that has a column that I need split.

scala> test.show
+-------------+
|columnToSplit|
+-------------+
|        a.b.c|
|        d.e.f|
+-------------+

我需要将此列拆分为如下所示:

I need this column split out to look like this:

+--------------+
|col1|col2|col3|
|   a|   b|   c|
|   d|   e|   f|
+--------------+

我使用的是 Spark 2.0.0

I'm using Spark 2.0.0

谢谢

尝试:

import sparkObject.spark.implicits._
import org.apache.spark.sql.functions.split

df.withColumn("_tmp", split($"columnToSplit", "\\.")).select(
  $"_tmp".getItem(0).as("col1"),
  $"_tmp".getItem(1).as("col2"),
  $"_tmp".getItem(2).as("col3")
)

这里要注意的重点是 sparkObject 是您可能已经初始化的 SparkSession 对象.因此,(1) import 语句必须强制内联在代码中,而不是在类定义之前.

The important point to note here is that the sparkObject is the SparkSession object you might have already initialized. So, the (1) import statement has to be compulsorily put inline within the code, not before the class definition.