检查符文是否在基本多语言平面中的正确方法是什么?

检查符文是否在基本多语言平面中的正确方法是什么?

问题描述:

I want to check, whether a given rune is in a basic multilingual plane or not.

That is, what to put in this function - https://play.golang.org/p/3szTn8pP7xe

package main

import (
"fmt"
)

func isBMP(r rune) bool {
// ???
return false
}

func main() {
fmt.Println(isBMP(rune('պ'))) // expect true
fmt.Println(isBMP(rune('

我要检查给定的符文是否在基本的多语言平面。 p>

也就是说,要在此函数中添加什么内容- https:/ /play.golang.org/p/3szTn8pP7xe p>

  package main 
 
import(
“ fmt”  
)
 
func isBMP(r rune)bool {
 // ??? 
返回false 
} 
 
func main(){
 fmt.Println(isBMP(rune('պ')  ))//期望为真
 fmt.Println(isBMP(rune('

Basic Multilingual Plane have the following code point ranges allocated:

0000–​0FFF    8000–​8FFF
1000–​1FFF    9000–​9FFF
2000–​2FFF    A000–​AFFF
3000–​3FFF    B000–​BFFF
4000–​4FFF    C000–​CFFF
5000–​5FFF    D000–​DFFF
6000–​6FFF    E000–​EFFF
7000–​7FFF    F000–​FFFF

So to tell if a rune falls in the basic multilingual plane, just check if it falls inside any of these ranges. Since these ranges cover all values between 0 and 0xffff (both inclusive), just check it like this:

func isBMP(r rune) bool {
    return r >= 0 && r <= 0xffff
}

Note that since rune is alias for int32, it may have negative values, so also checking if it's not negative is important.

This will output your expected result. Try it on the Go Playground.

Note #2: iterating over the runes of a string which contains invalid UTF-8 bytes, you will get the Unicode replacement character for the invalid bytes, which is 0xfffd. If you want to exclude those from your test, you could modify it like:

func isBMP(r rune) bool {
    return r >= 0 && r <= 0xffff && r != 0xfffd
}

I'm not that familiar with go. However a bit of Googleing suggests that a rune is in fact an int32 so as anything in the basic multilingual plain has a code point between 0 and 65535 you should be able to do this

func isBMP(r rune) bool {
    if r <= 65535 {
        return true
    }
    else {
        return false
    }
}