在Spark Scala中将1列拆分为3列

问题描述:

我在使用scala的Spark中有一个数据框,其中有一个需要拆分的列.

I have a dataframe in Spark using scala that has a column that I need split.

scala> test.show
+-------------+
|columnToSplit|
+-------------+
|        a.b.c|
|        d.e.f|
+-------------+

我需要将此列拆分为以下内容:

I need this column split out to look like this:

+--------------+
|col1|col2|col3|
|   a|   b|   c|
|   d|   e|   f|
+--------------+

我正在使用Spark 2.0.0

I'm using Spark 2.0.0

谢谢

尝试:

import sparkObject.spark.implicits._
import org.apache.spark.sql.functions.split

df.withColumn("_tmp", split($"columnToSplit", "\\.")).select(
  $"_tmp".getItem(0).as("col1"),
  $"_tmp".getItem(1).as("col2"),
  $"_tmp".getItem(2).as("col3")
)

这里要注意的重要一点是,sparkObject是您可能已经初始化的SparkSession对象.因此,必须将(1)import语句强制内联在代码中,而不是在类定义之前.

The important point to note here is that the sparkObject is the SparkSession object you might have already initialized. So, the (1) import statement has to be compulsorily put inline within the code, not before the class definition.