如何用awk变量在常规的前pressions?
我有一个名为域文件,其中包含某些域。例如:
I have a file called domain which contains some domains. For example:
google.com
facebook.com
...
yahoo.com
和我有另一个名为文件网站其中包含了一些网站的网址和电话号码。例如:
And I have another file called site which contains some sites URLs and numbers. For example:
image.google.com 10
map.google.com 8
...
photo.facebook.com 22
game.facebook.com 15
..
现在我要算的网址数量每个域都有。例如: google.com 的 10 + 8 。所以我写了一个awk脚本是这样的:
Now I'm going to count the url number each domain has. For example: google.com has 10+8. So I wrote an awk script like this:
BEGIN{
while(getline dom < "./domain" > 0) {
domain[dom]=0;
}
for(dom in domain) {
while(getline < "./site" > 0) {
if($1 ~/$dom$) #if $1 end with $dom {
domain[dom]+=$2;
}
}
}
}
不过,code 如果($ 1〜/ $ DOM $)
不运行像我想要的。因为在常规的前pression变量$ DOM是从字面上解释。所以,第一个问题是:
But the code if($1 ~/$dom$)
doesn't run like I want. Because the variable $dom in the regular expression was explained literally. So, the first question is:
有没有办法使用变量 $ DOM
在常规的前pression?
Is there any way to use variable $dom
in a regular expression?
然后,我是新来写剧本
有没有什么更好的办法来解决我的问题?
首先,该变量 DOM
不是 $ DOM
- 考虑 $
作为操作员提取存储在变量中的列数的值 DOM
First of all, the variable is dom
not $dom
-- consider $
as an operator to extract the value of the column number stored in the variable dom
其次,awk将不会插 //
之间有什么 - 那就是在那里一个字符串
Secondly, awk will not interpolate what's between //
-- that is just a string in there.
您想要的匹配()
函数,其中第二个参数可以是被当作常规的前pression的字符串:
You want the match()
function where the 2nd argument can be a string that is treated as the regular expression:
if (match($1, dom "$")) {...}
我想code的解决方案,如:
I would code a solution like:
awk '
FNR == NR {domain[$1] = 0; next}
{
for (dom in domain) {
if (match($1, dom "$")) {
domain[dom] += $2
break
}
}
}
END {for (dom in domain) {print dom, domain[dom]}}
' domain site