有条件awk的HashMap的匹配查找

问题描述:

我有2个表格文件。一个文件包含的50个重点值仅名为 lookup_file.txt的映射。
另一个文件具有行30列,数以百万计的实际表格数据。 的data.txt
我想,以取代从 lookup_file.txt值第二个文件的id列。

I have 2 tabular files. One file contains a mapping of 50 key values only called lookup_file.txt. The other file has the actual tabular data with 30 columns and millions of rows. data.txt I would like to replace the id column of the second file with the values from the lookup_file.txt..

我怎样才能做到这一点?我想preFER在bash脚本用awk ..
此外,有没有一个HashMap的数据结构,我可以在bash用于存储50个键/值,而不是另一个文件?

How can I do this? I would prefer using awk in bash script.. Also, Is there a hashmap data-structure i can use in bash for storing the 50 key/values rather than another file?

假设你的文件有逗号分隔的字段和ID列是场3:

Assuming your files have comma-separated fields and the "id column" is field 3:

awk '
BEGIN{ FS=OFS="," }
NR==FNR { map[$1] = $2; next }
{ $3 = map[$3]; print }
' lookup_file.txt data.txt

如果任何这些假设是错误的,线索我们如果修订不明显...

If any of those assumptions are wrong, clue us in if the fix isn't obvious...

编辑:如果你想避免的(恕我直言忽略不计)NR == FNR测试性能的影响,这将是那些每一个罕见病例之一,当使用函数getline是恰当的:

and if you want to avoid the (IMHO negligible) NR==FNR test performance impact, this would be one of those every rare cases when use of getline is appropriate:

awk '
BEGIN{
   FS=OFS=","
   while ( (getline line < "lookup_file.txt") > 0 ) {
      split(line,f)
      map[f[1]] = f[2]
   }
}
{ $3 = map[$3]; print }
' data.txt