有没有办法从 SparkSQL 中的 JSON 文件中按顺序获取列名?

有没有办法从 SparkSQL 中的 JSON 文件中按顺序获取列名?

问题描述:

我有一个 JSON 文件,在加载到 Spark SQL 时,键将是我的列.现在当我想检索列名时,它是按字母顺序检索的.但我希望细节应该按照文件中的顺序

I have a JSON file and the keys would be my column while loading into Spark SQL. Now when i want to retrieve the column names, it was retrieved in Alphabetical order. But i want the details should be in the order of how its present in the file

我的输入数据是

{"id":1,"name":"Judith","email":"jknight0@google.co.uk","city":"Évry","country":"France","ip":"199.63.123.157"}

下面是我检索列名并构建单个字符串的方法

Below is my way to retrieve the column names and build a single string

val dataframe = sqlContext.read.json("/virtual/home/587635/users.json")
    val columns = dataframe.columns 
    var query = columns.apply(0)+" STRING"  
    for (a <- 1 to (columns.length-1))
    {
      query = query + ","+ columns.apply(a) + " STRING" 
    }
    println(query)

这给了我如下的输出

city STRING,country STRING,email STRING,id STRING,ip STRING,name STRING

但我希望我的输出为

id STRING,name STRING,email STRING,city STRING,country STRING,ip STRING

添加一个 select 并正确排列列

Add a select with the columns correctly ordered

val dataframe = 
  sqlContext
    .read
    .json("/tmp/test.jsn")
    .select("id", "name", "email", "city", "country", "ip")

如果你在 shell 上试​​过这个,你会注意到正确的顺序

If you tried this at the shell, you will notice the correct order

数据帧:org.apache.spark.sql.DataFrame = [id: bigint, name: string,电子邮件:字符串,城市:字符串,国家:字符串,ip:字符串]

dataframe: org.apache.spark.sql.DataFrame = [id: bigint, name: string, email: string, city: string, country: string, ip: string]

通过执行脚本的其余部分,输出符合预期

By executing the rest of your script, the output is as expected

id STRING,name STRING,email STRING,city STRING,country STRING,ipSTRING

id STRING,name STRING,email STRING,city STRING,country STRING,ip STRING