是否可以“安全地”将Golang字符串的内存归零?

是否可以“安全地”将Golang字符串的内存归零?

问题描述:

Recently I've been setting up libsodium in one of my projects by using cgo, in order to use the crypto_pwhash_str and crypto_pwhash_str_verify functions.

This has all gone very smoothly and I now have a small collection of functions that receive a []byte in the form of a plain-text password and either hash it, or compare it against another []byte to verify it.

My reason for using a []byte instead of a string is because, from what I've learnt so far about Go, I can at least loop over the plain-text password and zero all of the bytes, or even pass a pointer into libsodium's sodium_memzero function, in order to not leave it hanging around in memory longer than it needs to.

This is fine for applications where I have the ability to read input directly as bytes, but I'm now trying to use it in a small web application where I need to read passwords from a form using the POST method.

From what I can see in the Go source code and documentation, using r.ParseForm in a request handler will parse all of the form values into a map of strings.

The problem is that because strings in Go are immutable I don't think I can do anything about zeroing the memory of a password that was POSTed in the form; at least, using only Go.

So it seems like my only (easy) option would be to pass an unsafe.Pointer into a function in C along with the number of bytes and let C zero the memory for me instead (for example, passing it to the aforementioned sodium_memzero function).

I have tried this, and unsurprisingly it does of course work, but then I'm left with an unsafe string in Go, which, if used in a function like fmt.Println will crash the program.

My questions are as follows:

  • Should I just accept that passwords will be POSTed and parsed as strings and that I shouldn't mess with it and just wait for the GC to kick in? (not ideal)
  • Is zeroing the memory of a string using cgo ok, provided it's obviously documented in the code that the string variable should not be used again?
  • Will zeroing the memory of a string using cgo ever do something like crashing the GC?
  • Is it worth writing a sort of decorator for http.Request that adds a function to parse form values directly as []byte so I have complete control over the values when they arrive?

Edit: To clarify, the web app and form POST is just a convenient example of a case where I might be handed sensitive data just from using Go's standard library in the form of a string. I'm more just interested in whether all of my questions are possible/worthwhile in some case were cleaning up data in memory as quickly as possible was more of a security concern.

Given that there doesn't seem to be much activity on this question, I'm going to just assume that most people haven't needed/wanted to look into this before, or haven't thought it was worth the time. As such I will just post my own findings as an answer despite my ignorance regarding the inner-workings of Go.

I should preface this answer with a disclaimer that since Go is a Garbage Collected language and I do not know how it works internally the following information may not actually guarantee any memory to actually be cleared to zero at all, but that won't stop me from trying; after all, the fewer plain-text passwords in memory the better, in my opinion.

With that in mind this is everything I have found to work (as far as I can tell) in conjunction with libsodium; so far none of it has crashed any of my programs at least.

First of all, as you probably already know strings in Go are immutable, so technically their value shouldn't be changed, but if we use an unsafe.Pointer to the string in Go or in C via Cgo, we can actually overwrite the data stored in the string value; we just can't guarantee there aren't any other copies of the data anywhere else in memory.

For this reason I made my password related functions deal with []byte variables exclusively to cut down on the number of possible plain-text passwords being copied around memory.

I also return the []byte reference for the plain text password that gets passed into all password functions, since converting a string into a []byte will allocate new memory and copy the contents over. This way, at least if you convert your string to a []byte in-place without assigning it to a variable first you can still get access to the new []byte after the function call has finished and zero that memory as well.

Below is the gist of what I came up with. You can fill in the blanks, include the libsodium C library and compile it to see the results for yourself.

For me it output this before the MemZero* function were called:

pwd     : Correct Horse Battery Staple
pwdBytes: [67 111 114 114 101 99 116 32 72 111 114 115 101 32 66 97 116 116 101 114 121 32 83 116 97 112 108 101]

Then this after the MemZero* function were called:

pwd     :
pwdBytes: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Hash: $argon2i$v=19$m=131072,t=6,p=1$N05osI8nuTjftzfAYBIcbA$3yb92yt9S9dRmPtlSV/J8jY4DG3reqm+2eV+fi54Its

So it looks like a success, but since we can't guarantee there are no copies of the plain-text password elsewhere in memory I think that is as far as we can go with it.

The code below simply passes an unsafe.Pointer with the number of bytes to the sodium_memzero function in C to achieve this. So the actual zeroing of memory is left up to libsodium.

I apologise if I left any typos or anything in the code that doesn't work, but I didn't want to paste in too much, only the relevant parts.

For example, you could also employ the use of functions like mlock if you really needed to, but since this question was focused on zeroing a string I will just show that here.

package sodium

// Various imports, other functions and <sodium.h> here...

func init() {
    if err := sodium.Init(); err != nil {
        log.Fatalf("sodium: %s", err)
    }
}

func PasswordHash(pwd []byte, opslimit, memlimit int) ([]byte, []byte, error) {
    pwdPtr := unsafe.Pointer(&pwd[0])
    hashPtr := unsafe.Pointer(&make([]byte, C.crypto_pwhash_STRBYTES)[0])

    res := C.crypto_pwhash_str(
        (*C.char)(hashPtr),
        (*C.char)(pwdPtr),
        C.ulonglong(len(pwd)),
        C.ulonglong(opslimit),
        C.size_t(memlimit),
    )
    if res != 0 {
        return nil, pwd, fmt.Errorf("sodium: passwordhash: out of memory")
    }
    return C.GoBytes(hashPtr, C.crypto_pwhash_STRBYTES), pwd, nil
}

func MemZero(p unsafe.Pointer, size int) {
    if p != nil && size > 0 {
        C.sodium_memzero(p, C.size_t(size))
    }
}

func MemZeroBytes(bytes []byte) {
    if size := len(bytes); size > 0 {
        MemZero(unsafe.Pointer(&bytes[0]), size)
    }
}

func MemZeroStr(str *string) {
    if size := len(*str); size > 0 {
        MemZero(unsafe.Pointer(str), size)
    }
}

And then to use it all:

package main

// Imports etc here...

func main() {
    // Unfortunately there is no guarantee that this won't be
    // stored elsewhere in memory, but we will try to remove it anyway
    pwd := "Correct Horse Battery Staple"

    // I convert the pwd string to a []byte in place here
    // Because of this I have no reference to the new memory, with yet
    // another copy of the plain password hanging around
    // The function always returns the new []byte as the second value
    // though, so we can still zero it anyway
    hash, pwdBytes, err := sodium.PasswordHash([]byte(pwd), 6, 134217728)

    // Byte slice and string before MemZero* functions
    fmt.Println("pwd     :", pwd)
    fmt.Println("pwdBytes:", pwdBytes)

    // No need to keep a plain-text password in memory any longer than required
    sodium.MemZeroStr(&pwd)
    sodium.MemZeroBytes(pwdBytes)
    if err != nil {
      log.Fatal(err)
    }

    // Byte slice and string after MemZero* functions
    fmt.Println("pwd     :", pwd)
    fmt.Println("pwdBytes:", pwdBytes)

    // We've done our best to make sure we only have the hash in memory now
    fmt.Println("Hash:", string(hash))
}

Handling secure values in memory is harder in Go than it would be in something like C or C++. That's because of the GC, which goes around copying and messing with whatever memory it feels like.

So, the first step is to get some memory that the GC cannot mess with. For this, we'd either spin up cgo and malloc whatever we want; or use systemcalls like mmap and VirtualAlloc; then pass around the resulting slice as normal.

The next step is to tell the OS that you don't want this memory being swapped out to disk, so you mlock or VirtualLock it.

Before exiting, zero out the slice with either libsodium or by simply iterating over it, setting each element to zero. This wouldn't be possible with a string, and I'm not sure that I would recommend manually wiping the string's memory. I mean, I can't immediately spot anything wrong with it but... It just doesn't feel right. No one uses strings for secure values anyway.

There's a library (mine) that is designed specifically for storing secure values, and it does what I've described above alongside a few other things. You might find it useful: https://github.com/awnumar/memguard

"No one uses strings for secure values anyway."

Except for the passwords used in a KDF to unlock a ciphertext or decrypt directly.

The memory used in string allocations triggers a segmentation fault if you attempt to mutate the underlying buffer of a string:

https://medium.com/kokster/mutable-strings-in-golang-298d422d01bc

Same as memguard immutable buffers.

I have tried using unix.Mprotect on the address given but I think the trick is I have to find the actual memory page address where the string buffer is stored, not the pointer to the start of the buffer, to do this effectively.

It's a little too much work for me to find the proper solution for the time being, but knowing that strings are immutable and pile up copies from here to kingdom come in memory, I think it should be a rule if you are using memguard and have to process passwords, put it inside a memguard buffer at the first moment and only work with its data in that form thereafter.

It's exactly for reasons like this that Qubes was devised, to put a stronger boundary between applications. If your program is boxed inside a VM container, it cannot reach outside that box, at all. Only attack vector then is if your program runs malicious code.

Since network packets arrive as []byte, anything sensitive in them can be zeroed out as needed. Since the keyboard input side is controlled by the OS, one need simply to find (or maybe write) a console text input function that goes directly to mutable byte slices, and then the statement I quoted at the top applies.

Bearing this in mind I am now altering my code to not use a string variable anywhere I need to zero out data after using it.

I do not believe that your scheme will work in general if you want to accept passwords with multibyte characters.

Handling password with multibyte characters requires that you normalize them first (there are multiple different byte sequences that may underly something like "Å", and which you get as input will vary on keyboard, operating system, and perhaps the phase of the moon.

So unless you want to rewrite all of Go's Unicode normalization code to work on your byte arrays, you will run into problems.

Given that there doesn't seem to be much activity on this question, I'm going to just assume that most people haven't needed/wanted to look into this before, or haven't thought it was worth the time.

Actually, I hadn't noticed this question until today. Believe me, I've thought about this.