用awk逐行读取并解析变量
我有一个脚本,可以读取日志文件并解析数据以将其插入到mysql表中..
I have a script that read log files and parse the data to insert them to mysql table..
我的脚本看起来像
while read x;do
var=$(echo ${x}|cut -d+ -f1)
var2=$(echo ${x}|cut -d_ -f3)
...
echo "$var,$var2,.." >> mysql.infile
done<logfile
问题在于日志文件成千上万行,并且要花费数小时....
The Problem is that log files are thousands of lines and taking hours....
我读到awk
更好,我尝试过,但是不知道解析变量的语法...
I read that awk
is better, I tried, but don't know the syntax to parse the variables...
输入是结构防火墙日志,因此它们是相当大的文件,例如
inputs are structure firewall logs so they are pretty large files like
@timestamp $ HOST原因=空闲超时" source-address ="x.x.x.x" source-port ="19219" destination-address ="x.x.x.x" destination-port ="53" service-name ="dns-udp" application ="DNS"....
@timestamp $HOST reason="idle Timeout" source-address="x.x.x.x" source-port="19219" destination-address="x.x.x.x" destination-port="53" service-name="dns-udp" application="DNS"....
所以我在grep
上使用了很多grep
来表示约60个变量,例如
So I'm using a lot of grep
for ~60 variables e.g
sourceaddress=$(echo ${x}|grep -P -o '.{0,0}
source-address=\".{0,50}'|cut -d\" -f2)
如果您认为perl
会更好,我欢迎您提出建议,也可能会提示如何编写脚本...
if you think perl
will be better I'm open to suggestions and maybe a hint how to script it...
为回答您的问题,我假设使用以下游戏规则:
To answer your question, I assume the following rules of the game:
- 每行包含各种变量
- 每个变量都可以通过不同的定界符找到.
这为您提供了以下awk脚本:
This gives you the following awk script :
awk 'BEGIN{OFS=","}
{ FS="+"; $0=$0; var=$1;
FS="_"; $0=$0; var2=$3;
...
print var1,var2,... >> "mysql.infile"
}' logfile
它基本上执行以下操作:
It basically does the following :
- 将输出分隔符设置为
,
- 阅读行
- 将字段分隔符设置为
+
,重新分析行($0=$0
)并确定第一个变量 - 将字段分隔符设置为"_",重新解析行(
$0=$0
)并确定第二个变量 - ...继续所有变量
- 将行打印到输出文件.
- set the output separator to
,
- read line
- set the field separator to
+
, re-parse the line ($0=$0
) and determine the first variable - set the field separator to '_', re-parse the line (
$0=$0
) and determine the second variable - ... continue for all variables
- print the line to the output file.