如何在Redshift数据库中找到任何非ASCII字符

问题描述：

我有一个数据库表，我想返回一列中字符串中任何地方包含非ASCII字符的所有值.

I've a database table I'd like to return all values where a column contains a non ASCII character anywhere in the string.

有一种简单的方法吗?

我已经尝试过了

select col_name,  regexp_instr(col_name,'[^[:ascii:]]') 
from test_table s
where created > sysdate - 1 
and regexp_instr(col_name,'[^[:ascii:]]') > 0 
limit 5;

但出现此错误:

error:  Invalid character class name, collating name, or character range.  The error occured while parsing the regular expression: '[^[:>>>HERE>>>ascii:]]'.
  code:      8002
  context:   T_regexp_init
  query:     5059536
  location:  funcs_expr.cpp:130
  process:   query20_31 [pid=7903]

谢谢！

答

我最近尝试完成类似的操作，但是上述解决方案(在regex表达式中编写'[^\x00-\x7F]')无效.
通常，反斜杠与文字字符的组合可以创建具有特殊含义的regex令牌，在这种情况下，\x表示十六进制值为的字符"，其中00和7F是十六进制值. > 尽管Postgres支持它(请参阅9.7.3.3.正则表达式转义符) ，似乎Redshift的regex引擎没有.您可以在这里.

I was trying to accomplish something similar recently but the above solution (writing '[^\x00-\x7F]' in the regex expression) won't work.
Usually, a backslash in combination with a literal character can create a regex token with a special meaning, in this case \x represents "the character whose hexadecimal value is" where 00 and 7F are the hex values.
While Postgres supports it (see 9.7.3.3. Regular Expression Escapes), it seems that Redshift's regex engine does not. You can check what exactly Redshift supports here.

为此找到的最短，最简洁的解决方法是:

The shortest and cleanest workaround I've found for this is:

SELECT column_name,
       CASE
           WHEN regexp_instr(column_name, '[^[:print:][:cntrl:]]') > 0 THEN TRUE
           ELSE FALSE END AS has_non_ascii_char
FROM table_name
WHERE has_non_ascii_char;

如何在Redshift数据库中找到任何非ASCII字符

相关推荐