如何使用Bash解析HTTP标头?
我需要从正在使用curl的网页标题中获取2个值.我已经可以使用以下方法分别获取值:
I need to get 2 values from a web page header that I am getting using curl. I have been able to get the values individually using:
response1=$(curl -I -s http://www.example.com | grep HTTP/1.1 | awk {'print $2'})
response2=$(curl -I -s http://www.example.com | grep Server: | awk {'print $2'})
但是我无法弄清楚如何使用单个curl请求来分别grep值:
But I cannot figure out how to grep the values separately using a single curl request like:
response=$(curl -I -s http://www.example.com)
http_status=$response | grep HTTP/1.1 | awk {'print $2'}
server=$response | grep Server: | awk {'print $2'}
每次尝试都会导致错误消息或空值.我确信这只是语法问题.
Every attempt either leads to a error message or empty values. I am sure it is just a syntax issue.
完整的bash
解决方案.演示如何轻松解析其他标头,而无需awk
:
Full bash
solution. Demonstrate how to easily parse other headers without requiring awk
:
shopt -s extglob # Required to trim whitespace; see below
while IFS=':' read key value; do
# trim whitespace in "value"
value=${value##+([[:space:]])}; value=${value%%+([[:space:]])}
case "$key" in
Server) SERVER="$value"
;;
Content-Type) CT="$value"
;;
HTTP*) read PROTO STATUS MSG <<< "$key{$value:+:$value}"
;;
esac
done < <(curl -sI http://www.google.com)
echo $STATUS
echo $SERVER
echo $CT
制作:
302
GFE/2.0
text/html; charset=UTF-8
根据 RFC-2616 ,HTTP标头的建模方式如下所述 "ARPA Internet短信格式的标准" (RFC822),其中指出明确第3.1.2节:
According to RFC-2616, HTTP headers are modeled as described in "Standard for the Format of ARPA Internet Text Messages" (RFC822), which states clearly section 3.1.2:
字段名称必须由可打印的ASCII字符组成 (即,值在33.和126.之间的字符, 十进制,冒号除外).场体可以由任何 ASCII字符,CR或LF除外. (尽管CR和/或LF可能是 出现在实际文本中,通过以下操作将其删除 展开领域.)
The field-name must be composed of printable ASCII characters (i.e., characters that have values between 33. and 126., decimal, except colon). The field-body may be composed of any ASCII characters, except CR or LF. (While CR and/or LF may be present in the actual text, they are removed by the action of unfolding the field.)
因此,上述脚本应该捕获任何符合RFC- [2] 822的标头,但明显例外的是
So the above script should catch any RFC-[2]822 compliant header with the notable exception of folded headers.