如何在防范“切片边界超出范围”错误的同时轻松地在Go中获取子字符串?

如何在防范“切片边界超出范围”错误的同时轻松地在Go中获取子字符串?

问题描述:

Using Go, I want to truncate long strings to an arbitrary length (e.g. for logging).

const maxLen = 100

func main() {
    myString := "This string might be longer, so we'll keep all except the first 100 bytes."

    fmt.Println(myString[:10])      // Prints the first 10 bytes
    fmt.Println(myString[:maxLen])  // panic: runtime error: slice bounds out of range
}

For now, I can solve it with an extra variable and if statement, but that seems very long-winded:

const maxLen = 100

func main() {
    myString := "This string might be longer, so we'll keep all except the first 100 bytes."

    limit := len(myString)
    if limit > maxLen {
        limit = maxLen
    }

    fmt.Println(myString[:limit]) // Prints the first 100 bytes, or the whole string if shorter
}

Is there a shorter/cleaner way?

Use a simple function to hide the implementation details. For example,

package main

import "fmt"

func maxString(s string, max int) string {
    if len(s) > max {
        r := 0
        for i := range s {
            r++
            if r > max {
                return s[:i]
            }
        }
    }
    return s
}

func main() {
    s := "日本語"
    fmt.Println(s)
    fmt.Println(maxString(s, 2))
}

Output:

日本語
日本

Assuming you want to keep at most maxLen characters, i.e. what your code says, rather than what your string says.

If you don't need the original myString, you can overwrite it like this:

const maxLen = 100

func main() {
    myString := "This string might be longer, so we'll keep the first 100 bytes."

    if len(myString) >= maxLen {
        myString = myString[:maxLen] // slicing is a constant time operation in go
    }

    fmt.Println(myString) // Prints the first 100 bytes, or the whole string if shorter
}

This might cut unicode characters in half, leaving some garbage at the end. If you need to handle multi-byte unicode, which you probably do, try this:

func main() {
    myString := "日本語"

    mid := maxLen
    for len(myString) >= mid && utf8.ValidString(myString[:mid]) == false {
        mid++ // add another byte from myString until we have a whole multi-byte character
    }
    if len(myString) > mid {
        myString = myString[:mid]
    }

    fmt.Println(myString) // Prints the first 100 bytes, or the whole string if shorter
}

Or, if you can accept removing up to one character from the output, this version is a bit cleaner

func main() {
    myString := "日本語"

    for len(myString) >= maxLen || utf8.ValidString(myString) == false {
        myString = myString[:len(myString)-1] // remove a byte
    }

    fmt.Println(myString) // Prints the first 100 bytes, or the whole string if shorter
}