从CSV文件中拆分JSON值并基于Spark/Scala中的JSON键创建新列
问题描述:
在CSV文件中具有以下格式的数据.想要从Desc
列中拆分JSON并使用key创建一个新列.将spark 2与Scala结合使用.
Have data in CSV file below is the format. Want to split JSON from Desc
column and create a new column with key.Using spark 2 with Scala.
+------+------------+----------------------------------+
| id | Category | Desc |
+------+------------+----------------------------------+
| 201 | MIS20 | { "Total": 200,"Defective": 21 } |
+------+-----------------------------------------------+
| 202 | MIS30 | { "Total": 740,"Defective": 58 } |
+------+-----------------------------------------------+
输出:
So the desired output would be:
+------+------------+---------+-------------+
| id | Category | Total | Defective |
+------+------------+---------+-------------+
| 201 | MIS20 | 200 | 21 |
+------+----------------------+-------------+
| 202 | MIS30 | 740 | 58 |
+------+------------------------------------+
我们非常感谢您的帮助.
Any help is highly appreciated.
答
为内部json
创建一个schema
,并使用下面的from_json
函数应用该架构
Create a schema
for your inner json
and apply that schema with from_json
function as below
val schema = new StructType()
.add(StructField("Total", LongType, false)).
add("Defective", LongType, false)
d.select($"id",$"Category", from_json($"Desc", schema).as("desc"))
.select($"id",$"Category", $"desc.*")
.show(false)
输出:
+---+--------+-----+---------+
|id |Category|Total|Defective|
+---+--------+-----+---------+
|201|MIS20 |200 |21 |
|202|MIS30 |740 |58 |
+---+--------+-----+---------+
希望这会有所帮助!