在 Python 3 中将字符串转换为字节的最佳方法?

在 Python 3 中将字符串转换为字节的最佳方法?

问题描述:

似乎有两种不同的方法可以将字符串转换为字节,如 TypeError: 'str' 不支持缓冲区接口

There appear to be two different ways to convert a string to bytes, as seen in the answers to TypeError: 'str' does not support the buffer interface

这些方法中哪个更好或更Pythonic?还是只是个人喜好问题?

Which of these methods would be better or more Pythonic? Or is it just a matter of personal preference?

b = bytes(mystring, 'utf-8')

b = mystring.encode('utf-8')

如果您查看 bytes 的文档,它会将您指向 bytearray:

If you look at the docs for bytes, it points you to bytearray:

bytearray([source[, encoding[, errors]]])

bytearray([source[, encoding[, errors]]])

返回一个新的字节数组.bytearray 类型是范围 0

Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.

可选的源参数可用于以几种不同的方式初始化数组:

The optional source parameter can be used to initialize the array in a few different ways:

如果是字符串,还必须给出编码(和可选的错误)参数;bytearray() 然后使用 str.encode() 将字符串转换为字节.

如果它是一个整数,则数组将具有该大小并使用空字节进行初始化.

如果是符合buffer接口的对象,会使用该对象的只读buffer来初始化bytes数组.

如果是可迭代的,则它必须是 0

如果没有参数,将创建一个大小为 0 的数组.

所以 bytes 可以做的不仅仅是编码一个字符串.Pythonic 允许您使用任何有意义的源参数类型调用构造函数.

So bytes can do much more than just encode a string. It's Pythonic that it would allow you to call the constructor with any type of source parameter that makes sense.

对于字符串的编码,我认为 some_string.encode(encoding) 比使用构造函数更 Pythonic,因为它是最自我记录的 -- "获取这个字符串并用这种编码"比 bytes(some_string, encoding) 更清晰——使用构造函数时没有显式动词.

For encoding a string, I think that some_string.encode(encoding) is more Pythonic than using the constructor, because it is the most self documenting -- "take this string and encode it with this encoding" is clearer than bytes(some_string, encoding) -- there is no explicit verb when you use the constructor.

我检查了 Python 源代码.如果您使用 CPython 将 unicode 字符串传递给 bytes,它会调用 PyUnicode_AsEncodedString,即encode的实现;所以如果你自己调用 encode,你只是跳过了一个间接层.

I checked the Python source. If you pass a unicode string to bytes using CPython, it calls PyUnicode_AsEncodedString, which is the implementation of encode; so you're just skipping a level of indirection if you call encode yourself.

此外,请参阅 Serdalis 的评论——unicode_string.encode(encoding) 也更加 Pythonic,因为它的逆是 byte_string.decode(encoding) 并且对称性很好.

Also, see Serdalis' comment -- unicode_string.encode(encoding) is also more Pythonic because its inverse is byte_string.decode(encoding) and symmetry is nice.