将文本转换为Unicode转义序列
我有一个 Text
对象,其中包含一些拉丁字符,需要将这些拉丁字符转换为 \ u ####
格式的unicode转义序列,其中#
为十六进制数字
I have a Text
object that contains some number of Latin characters that needs to be converted to a unicode escape sequence of the format \u####
with #
being hex digits
此处中所述,haskell可以轻松地将字符串转换为转义序列,反之亦然.但是,它将仅转到小数表示形式.例如
As described here, haskell easily converts strings to escape sequences and vice versa. However, it will only go to the decimal representation. For example,
> let s = "Ñ"
> s
"\209"
是否有一种方法可以指定转义序列编码以强制其以正确的格式吐出?即
Is there a way to specify the escape sequence encoding to force it to spit out in the correct format? i.e
> let s = encodeUnicode16 "Ñ"
> s
"\u00d1"
这是怎么回事:
import Text.Printf (printf)
encodeUnicode16 :: String -> String
encodeUnicode16 = concatMap escapeChar
where
escapeChar c
| ' ' <= c && c <= 'z' = [c]
| otherwise =
printf "\\u%04x" (fromEnum c)
我ghci,您可以按以下方式使用它:
I ghci, you can use it as follows:
> putStrLn $ encodeUnicode16 "Ñ"
\u00d1
请注意,如果您不使用 putStrLn
,它将被转义两次:
Note that if you don't use putStrLn
it will get escaped twice:
> encodeUnicode16 "Ñ"
"\\u00d1"
这是因为ghci将在命令前面隐式添加 print
.
This is because ghci will implicitly add a print
in front of the command.
编辑:我错过了您拥有 Text
而不是 String
的那部分.这是 Text
的相同代码:
Edit: I missed that part that you have a Text
and not a String
. Here's the same code for Text
:
import Data.Text (Text)
import qualified Data.Text as T
import qualified Data.Text.IO as T
import Text.Printf (printf)
encodeUnicode16 :: Text -> Text
encodeUnicode16 = T.concatMap escapeChar
where
escapeChar c
| ' ' <= c && c <= 'z' = T.singleton c
| otherwise =
T.pack $ printf "\\u%04x" (fromEnum c)
同样,您要使用 T.putStrLn
避免所有内容都双重转义.
Again, you want to use T.putStrLn
to avoid double escaping everything.