使用golang解析CSV文件中的嵌套JSON对象

使用golang解析CSV文件中的嵌套JSON对象

问题描述:

I'm trying to parse a CSV file which contains a JSON object in the last column.
Here is an example with two rows from the input CSV file:

'id','value','createddate','attributes'
524256,CAFE,2018-04-06 16:41:01,{"Att1Numeric": 6, "Att2String": "abc"}
524257,BEBE,2018-04-06 17:00:00,{}

I tried using the parser from csv package:

func processFileAsCSV(f *multipart.Part) (int, error) {
  reader := csv.NewReader(f)
  reader.LazyQuotes = true
  reader.Comma = ','
  lineCount := 0
  for {
    line, err := reader.Read()
    if err == io.EOF {
        break
    } else if err != nil {
        fmt.Println("Error:", err)
        return 0, err
    }

    if lineCount%100000 == 0 {
        fmt.Println(lineCount)
    }
    lineCount++
    fmt.Println(lineCount, line)
    processLine(line) // do something with the line
  }

  fmt.Println("done!", lineCount)
  return lineCount, nil
}

But I got an error:

Error: line 2, column 0: wrong number of fields in line,

probably because the parser ignores the JSON scope which starts with {.

Should I be writing my own CSV parser, or is there a library that can handle this?

我正在尝试解析最后一列中包含JSON对象的CSV文件。
此处 是一个示例,其中包含来自输​​入CSV文件的两行: p>

 'id','value','createddate','attributes'
524256,CAFE,2018-04-  06 16:41:01,{“ Att1Numeric”:6,6,“ Att2String”:“ abc”} 
524257,BEBE,2018-04-06 17:00:00,{} 
  code>  pre>  
 
 

我尝试使用 csv code>包中的解析器: p>

  func processFileAsCSV(f * multipart.Part)(int,错误 ){
 reader:= csv.NewReader(f)
 reader.LazyQuotes = true 
 reader.Comma =','
 lineCount:= 0 
 for {
 line,err:= reader.Read(  )
如果err == io.EOF {
 break 
}否则,如果err!= nil {
 fmt.Println(“ Error:”,err)
返回0,err 
} 
 
 如果lineCount%100000 == 0 {
 fmt.Println(lineCount)
} 
 lineCount ++ 
 fmt.Println(lineCount,line)
 processLine(line)//使用该行执行某些操作
} 
  
 fmt.Println(“ done!”,lineCount  )
返回lineCount,nil 
} 
  code>  pre> 
 
 

但是我遇到了错误: p>

Error :第2行,第0列:错误的行数, p> blockquote>

可能是因为解析器忽略了以 { code>开头的JSON范围 。 p>

我应该编写自己的CSV解析器,还是有一个可以处理此问题的库? p> div>

Your CSV input doesn't follow normal CSV convention, by using unquoted fields (for JSON).

I think the best approach would be to pre-process your input, either in your Go program, or in an external script.

If your CSV input is predictable (as indicated in your question), it should be easy to properly quote last element, using a simple strings.Split call, for instance, before passing it to the CSV parser.