有没有办法从SparkSQL中的JSON文件中按顺序获取列名?
我有一个JSON文件,并且当加载到Spark SQL中时,键将是我的专栏.现在,当我想检索列名时,它是按字母顺序检索的.但我希望细节应按其在文件中的显示顺序排列
I have a JSON file and the keys would be my column while loading into Spark SQL. Now when i want to retrieve the column names, it was retrieved in Alphabetical order. But i want the details should be in the order of how its present in the file
我的输入数据 是
{"id":1,"name":"Judith","email":"jknight0@google.co.uk","city":"Évry","country":"France","ip":"199.63.123.157"}
下面是我检索列名并构建单个字符串的方法
Below is my way to retrieve the column names and build a single string
val dataframe = sqlContext.read.json("/virtual/home/587635/users.json")
val columns = dataframe.columns
var query = columns.apply(0)+" STRING"
for (a <- 1 to (columns.length-1))
{
query = query + ","+ columns.apply(a) + " STRING"
}
println(query)
这给了我类似下面的输出
This gives me the output like below
city STRING,country STRING,email STRING,id STRING,ip STRING,name STRING
但是我希望我的输出为
id STRING,name STRING,email STRING,city STRING,country STRING,ip STRING
添加select
,其中列的顺序正确
Add a select
with the columns correctly ordered
val dataframe =
sqlContext
.read
.json("/tmp/test.jsn")
.select("id", "name", "email", "city", "country", "ip")
如果您在shell上尝试过此操作,则会注意到正确的顺序
If you tried this at the shell, you will notice the correct order
dataframe:org.apache.spark.sql.DataFrame = [id:bigint,名称:string, 电子邮件:字符串,城市:字符串,国家/地区:字符串,ip:字符串]
dataframe: org.apache.spark.sql.DataFrame = [id: bigint, name: string, email: string, city: string, country: string, ip: string]
通过执行脚本的其余部分,输出将达到预期的水平
By executing the rest of your script, the output is as expected
id STRING,名称STRING,电子邮件STRING,城市STRING,国家STRING,ip STRING
id STRING,name STRING,email STRING,city STRING,country STRING,ip STRING