有没有办法从 SparkSQL 中的 JSON 文件中按顺序获取列名?

问题描述：

我有一个 JSON 文件，在加载到 Spark SQL 时，键将是我的列.现在当我想检索列名时，它是按字母顺序检索的.但我希望细节应该按照文件中的顺序

I have a JSON file and the keys would be my column while loading into Spark SQL. Now when i want to retrieve the column names, it was retrieved in Alphabetical order. But i want the details should be in the order of how its present in the file

我的输入数据是

{"id":1,"name":"Judith","email":"jknight0@google.co.uk","city":"Évry","country":"France","ip":"199.63.123.157"}

下面是我检索列名并构建单个字符串的方法

Below is my way to retrieve the column names and build a single string

val dataframe = sqlContext.read.json("/virtual/home/587635/users.json")
    val columns = dataframe.columns 
    var query = columns.apply(0)+" STRING"  
    for (a <- 1 to (columns.length-1))
    {
      query = query + ","+ columns.apply(a) + " STRING" 
    }
    println(query)

这给了我如下的输出

city STRING,country STRING,email STRING,id STRING,ip STRING,name STRING

但我希望我的输出为

id STRING,name STRING,email STRING,city STRING,country STRING,ip STRING

答

添加一个 select 并正确排列列

Add a select with the columns correctly ordered

val dataframe = 
  sqlContext
    .read
    .json("/tmp/test.jsn")
    .select("id", "name", "email", "city", "country", "ip")

如果你在 shell 上试过这个，你会注意到正确的顺序

If you tried this at the shell, you will notice the correct order

数据帧:org.apache.spark.sql.DataFrame = [id: bigint, name: string,电子邮件:字符串，城市:字符串，国家:字符串，ip:字符串]

dataframe: org.apache.spark.sql.DataFrame = [id: bigint, name: string, email: string, city: string, country: string, ip: string]

通过执行脚本的其余部分，输出符合预期

By executing the rest of your script, the output is as expected

id STRING,name STRING,email STRING,city STRING,country STRING,ipSTRING

id STRING,name STRING,email STRING,city STRING,country STRING,ip STRING

有没有办法从 SparkSQL 中的 JSON 文件中按顺序获取列名?

相关推荐