如何在不共享bufio.Scanner的情况下反复从os.Stdin中读取

如何在不共享bufio.Scanner的情况下反复从os.Stdin中读取

问题描述:

In Go, can a single line of input be read from stdin in a simple way, which also meets the following requirements?

  • can be called by disparate parts of a larger interactive application without having to create coupling between these different parts of the application (e.g. by passing a global bufio.Scanner between them)
  • works whether users are running an interactive terminal or using pre-scripted input

I'd like to modify an existing large Go application which currently creates a bufio.Scanner instance every time it asks users for a line of input. Multiple instances work fine when standard input is from a terminal, but when standard input is piped from another process, calls to Scan only succeed on the first instance of bufio.Scanner. Calls from all other instances fail.

Here's some toy code that demonstrates the problem:

package main
import (
    "bufio"
    "fmt"
    "os"
)

func main() {
    // read with 1st scanner -> works for both piped stdin and terminal
    scanner1 := readStdinLine(1)
    // read with 2nd scanner -> fails for piped stdin, works for terminal
    readStdinLine(2)
    // read with 1st scanner -> prints line 2 for piped stdin, line 3 for terminal
    readLine(scanner1, 3)
}

func readStdinLine(lineNum int64) (scanner *bufio.Scanner) {
    scanner = readLine(bufio.NewScanner(os.Stdin), lineNum)
    return
}

func readLine(scannerIn *bufio.Scanner, lineNum int64) (scanner *bufio.Scanner) {
    scanner = scannerIn
    scanned := scanner.Scan()
    fmt.Printf("%d: ", lineNum)
    if scanned {
        fmt.Printf("Text=%s
", scanner.Text())
        return
    }
    if scanErr := scanner.Err(); scanErr != nil {
        fmt.Printf("Error=%s
", scanErr)
        return
    }
    fmt.Println("EOF")
    return
}

I build this as print_stdinand run interactively from a bash shell:

~$ ./print_stdin
ab
1: Text=ab
cd
2: Text=cd
ef
3: Text=ef

But if I pipe in the text, the second bufio.Scanner fails:

~$ echo "ab
> cd
> ef" | ./print_stdin
1: Text=ab
2: EOF
3: Text=cd

The suggestion in the comment by ThunderCat works.

The alternative to buffered read is reading a byte a time. Read single bytes until or some terminator is found and return the data up to that point.

Here's my implementation, heavily inspired by Scanner.Scan:

package lineio
import (
    "errors"
    "io"
)

const startBufSize = 4 * 1024
const maxBufSize = 64 * 1024
const maxConsecutiveEmptyReads = 100

var ErrTooLong = errors.New("lineio: line too long")

func ReadLine(r io.Reader) (string, error) {
    lb := &lineBuf {r:r, buf: make([]byte, startBufSize)}
    for {
        lb.ReadByte()
        if lb.err != nil || lb.TrimCrlf() {
            return lb.GetResult()
        }
    }
}

type lineBuf struct {
    r       io.Reader
    buf     []byte
    end     int
    err     error
}

func (lb *lineBuf) ReadByte() {
    if lb.EnsureBufSpace(); lb.err != nil {
        return
    }
    for empties := 0; ; {
        n := 0
        if n, lb.err = lb.r.Read(lb.buf[lb.end:lb.end+1]); lb.err != nil {
            return
        }
        if n > 0 {
            lb.end++
            return
        }
        empties++
        if empties > maxConsecutiveEmptyReads {
            lb.err = io.ErrNoProgress
            return
        }
    }
}

func (lb *lineBuf) TrimCrlf() bool {
    if !lb.EndsLf() {
        return false
    }
    lb.end--
    if lb.end > 0 && lb.buf[lb.end-1] == '' {
        lb.end--
    }
    return true
}

func (lb *lineBuf) GetResult() (string, error) {
    if lb.err != nil && lb.err != io.EOF {
        return "", lb.err
    }
    return string(lb.buf[0:lb.end]), nil
}

func (lb *lineBuf) EndsLf() bool {
    return lb.err == nil && lb.end > 0 && (lb.buf[lb.end-1] == '
')
}

func (lb *lineBuf) EnsureBufSpace() {
    if lb.end < len(lb.buf) {
        return
    }
    newSize := len(lb.buf) * 2
    if newSize > maxBufSize {
        lb.err = ErrTooLong
        return
    }
    newBuf := make([]byte, newSize)
    copy(newBuf, lb.buf[0:lb.end])
    lb.buf = newBuf
    return
}

TESTING

Compiled lineio with go install and main (see below) with go build -o read_each_byte.

Tested scripted input:

$ seq 12 22 78 | ./read_each_byte
1: Text: "12"
2: Text: "34"
3: Text: "56"

Tested input from an interactive terminal:

$ ./read_each_byte
abc
1: Text: "abc"
123
2: Text: "123"
x\y"z
3: Text: "x\\y\"z"

Here's main:

package main
import (
    "fmt"
    "lineio"
    "os"
)

func main() {
    for i := 1; i <= 3; i++ {
        text, _ := lineio.ReadLine(os.Stdin)
        fmt.Printf("%d: Text: %q
", i, text)
    }
}

Your sequence is:

  1. create scanner
  2. wait read terminal
  3. print result
  4. repeat 1 to 3 (creating new scanner about stdin)
  5. repeat 2 to 3
  6. exit program

When you exec echo in pipeline, only exists a stdin/stdout file being read/write, but you are trying to use two.

UPDATE: The flow of execution for echo is:

  1. read args
  2. process args
  3. write args in stdout
  4. terminal read stdout and print its

See that this occur on press ENTER key. The argument whole is sent to echo program and not by line.

The echo utility writes its arguments to standard output, followed by a . If there are no arguments, only the is written.

More here: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html.

See in source code how echo work:

while (argc > 0) 
{
  fputs (argv[0], stdout);//<-- send args to the same stdout
  argc--;
  argv++;
  if (argc > 0)
    putchar (' ');
}

So your code will work fine with this:

$ (n=1; while sleep 1; do echo a$n; n=$((n+1)); done) | ./print_stdin 
$ 1: Text=a1
$ 2: Text=a2
$ 3: Text=a3

If you need repeat args in differents stdout, use "yes" program or alternatives. yes program repeats the wrote args in stdout. More in: https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/yes.c

Example:

$ yes a | ./print_stdin 
$ 1: Text=a
$ 2: Text=a
$ 3: Text=a