AWK与文件2和平均场7匹配文件1
问题描述:
我想匹配所有的在
和平均他们,如果有匹配。那里的比赛将是本场是文件1
名文件2 $ 5
在 |
标志和平均的总和 $ 7
匹配 $ 4'/ code>。谢谢:)
I am trying to match all the file 1
names in file 2
and average them if there is a match. The field where the match will be is $5
before the |
symbol and the average is the sum of $7
that matches $4
. Thank you :).
文件1
AGRN
CYP2J2
文件2
chr1 955543 955763 chr1:955543 AGRN-6|gc=75 1 2
chr1 955543 955763 chr1:955543 AGRN-6|gc=75 2 2
chr1 955543 955763 chr1:955543 AGRN-6|gc=75 3 2
chr1 957571 957852 chr1:957571 AGRN-7|gc=61.2 1 148
chr1 957571 957852 chr1:957571 AGRN-7|gc=61.2 2 149
chr1 957571 957852 chr1:957571 AGRN-7|gc=61.2 3 151
chr1 60381600 60381782 chr1:60381600 CYP2J2-1596|gc=40.7 153 274
chr1 60381600 60381782 chr1:60381600 CYP2J2-1596|gc=40.7 154 273
所需的输出(制表符分隔)
chr1:955543 AGRN-6 2
chr1:957571 AGRN 149.3
chr1:60381600 CYP2J2-1596 153.5
我到目前为止已经试过:
I have tried so far:
awk '
FNR==NR{d[$0]; next;}
{
for(k in d){
pat="(^|;)"k":";
if($5 ~ pat){
print;
break;
}
}
}' file 1 file2 > output.bed
的 AWK
确实运行,但输出文件,截至目前,为0字节。谢谢:)
The awk
does run but the output file, as of now, is 0 bytes. Thank you :).
答
脚本应该是这样的:
的 test.awk 的
BEGIN {
FS="[ \t|]*"
}
# Read search terms from file1 into 's'
FNR==NR {
s[$0]
next
}
{
# Check if $5 matches one of the search terms
for(i in s) {
if($5 ~ i) {
# Store first two fields for later usage
a[$5]=$1
b[$5]=$2
# Add $9 to total of $9 per $5
t[$5]+=$8
# Increment count of occurences of $5
c[$5]++
next
}
}
}
END {
# Calculate average and print output for all search terms
# that has been found
for( i in t ) {
avg = t[i] / c[i]
printf("%s:%s\t%s\t%s\n", a[i], b[i], i, avg)
}
}
调用它:
awk -f test.awk file1 file2
顺便说一下,在你的预期产出的第三平均是错误的。输出应该是这样的:
Btw, the third avg in your expected output is wrong. The output should look like this:
chr1:955543 AGRN-6 2
chr1:957571 AGRN-7 149.333
chr1:60381600 CYP2J2-1596 273.5