从Go中具有可变行尾的文件中读取行
How can I read lines from a file where the line endings are carriage return (CR), newline (NL), or both?
The PDF specification allows lines to end with CR, LF, or CRLF.
bufio.Reader.ReadString()
andbufio.Reader.ReadBytes()
allow a single delimiter byte.-
bufio.Scanner.Scan()
handles, but not a lone
.
The end-of-line marker is one optional carriage return followed by one mandatory newline.
Do I need to write my own function that uses bufio.Reader.ReadByte()
?
如何从文件的行尾为回车符(CR),换行符(NL), p>
PDF规范允许行以CR,LF或CRLF结尾。 p>
-
bufio.Reader.ReadString() code>和
bufio.Reader.ReadBytes() code>允许使用一个分隔符字节。 p> li>
bufio.Scanner.Scan() code>处理
code>(可选)后跟
code>,但不能处理单独的
code>。 p>
行尾标记是一个可选的回车符,后跟一个强制换行符。 p> blockquote> li> ul>
我需要编写自己的使用
bufio.Reader.ReadByte() code>的函数吗? p> div>
You can write custom bufio.SplitFunc
for bufio.Scanner
. E.g:
// Mostly bufio.ScanLines code:
func ScanPDFLines(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if i := bytes.IndexAny(data, "
"); i >= 0 {
if data[i] == '
' {
// We have a line terminated by single newline.
return i + 1, data[0:i], nil
}
advance = i + 1
if len(data) > i+1 && data[i+1] == '
' {
advance += 1
}
return advance, data[0:i], nil
}
// If we're at EOF, we have a final, non-terminated line. Return it.
if atEOF {
return len(data), data, nil
}
// Request more data.
return 0, nil, nil
}
And use it like:
scan := bufio.NewScanner(r)
scan.Split(ScanPDFLines)