在spark数据帧中减去两列为空的列
问题描述:
我是火花新手,我拥有数据框df:
I new to spark, I have dataframe df:
+----------+------------+-----------+
| Column1 | Column2 | Sub |
+----------+------------+-----------+
| 1 | 2 | 1 |
+----------+------------+-----------+
| 4 | null | null |
+----------+------------+-----------+
| 5 | null | null |
+----------+------------+-----------+
| 6 | 8 | 2 |
+----------+------------+-----------+
当减去两列时,一列为空,因此结果列也为空.
when subtracting two columns, one column has null so resulting column also resulting as null.
df.withColumn("Sub", col(A)-col(B))
预期输出应为:
+----------+------------+-----------+
| Column1 | Column2 | Sub |
+----------+------------+-----------+
| 1 | 2 | 1 |
+----------+------------+-----------+
| 4 | null | 4 |
+----------+------------+-----------+
| 5 | null | 5 |
+----------+------------+-----------+
| 6 | 8 | 2 |
+----------+------------+-----------+
我不想将column2替换为0,它只能为null. 有人可以帮我吗?
I don't want to replace the column2 to replace with 0, it should be null only. Can someone help me on this?
答
您可以将when
函数用作
import org.apache.spark.sql.functions._
df.withColumn("Sub", when(col("Column1").isNull, lit(0)).otherwise(col("Column1")) - when(col("Column2").isNull, lit(0)).otherwise(col("Column2")))
您的最终结果应为
+-------+-------+----+
|Column1|Column2| Sub|
+-------+-------+----+
| 1| 2|-1.0|
| 4| null| 4.0|
| 5| null| 5.0|
| 6| 8|-2.0|
+-------+-------+----+