我可以改进这个正则表达式检查有效的域名吗?
所以,我一直在使用这个域名正则表达式。到目前为止,似乎用SLD和TLD(可选的ccTLD)接收域名,但TLD列表的重复。这可以进一步重构吗?
So, I have been working on this domain name regular expression. So far, it seems to pick up domain names with SLDs and TLDs (with the optional ccTLD), but there is duplication of the TLD listing. Can this be refactored any further?
params[:domain_name].downcase.strip.match(/^[a-z0-9\-]{2,63}
\.((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|
(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|
(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|
(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|
(m[acdghklmnopqrstuvwxyz]|me|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|
(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|
(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])
(\.((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|
(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|
(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|
(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|
m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|
(n[acefgilopruz]|name|net)|(om|org)|
(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|
(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]))?$/)
请,请请勿使用固定并且这样可怕的复杂的正则表达式与已知的域名相匹配。
Please, please, please don't use a fixed and horribly complicated regex like this to match for known domain names.
TLD的列表是不是静态的,特别是在ICANN看流行的新gTLD的过程。有些ccTLD列表也会有所变化!
The list of TLDs is not static, particularly with ICANN looking at a streamlined process for new gTLDs. Even the list of ccTLDs changes sometimes!
查看 http://publicsuffix.org/ ,并编写一些可以下载并解析该列表的代码。
Have a look at the list available from http://publicsuffix.org/ and write some code that's able to download and parse that list instead.