Solaris 8-10:host2ip 转换问题
我正在处理一个非常特殊的问题.我有旧编译器(gcc 2.95 或更早版本)在 solaris 8/sparc 平台上编译的代码.它在solaris 8/sparc 上运行良好,但在solaris 10/sparc 上崩溃.(solaris 10 据说向后兼容 solaris 8)
I have a very peculiar problem that I am working on. I have code compiled by an old compiler (gcc 2.95 or older) on solaris 8/sparc platform. It runs fine on solaris 8/sparc but crashes on solaris 10/sparc. (solaris 10 is supposedly backward compatible with solaris 8)
在调试时,我发现当应用程序尝试将主机名转换为其对应的 i/p 地址时出现问题.它使用 gethostbyname_r,然后使用 inet_ntoa 来获取 ipv4 四点数.gdb'ing 通过该解决方案使我看到 gethostbyname_r 返回的 in_addr 具有表示 i/p 地址的正确整数,但 inet_ntoa 调用返回格式错误的字符串.确认真的是 inet_ntoa 失败的一个难点是代码是这样写的
On debugging, I see that the problem comes up when the app tries to convert a hostname into it's corresponding i/p address. It uses gethostbyname_r, followed by inet_ntoa to obtain the ipv4 quad dotted number. gdb'ing through the solution brings me to the point where I see that the in_addr returned by gethostbyname_r has the correct integer representing the i/p address, but inet_ntoa call returns a malformed string. One difficulty in confirming that it is really inet_ntoa failing is that the code is written like below
strcpy(hostaddr, inet_ntoa(*((struct in_addr *) hostdata.h_addr)));
所以从技术上讲,我看不到 inet_ntoa 返回的值.但我可以做一个
So technically I cannot see the value returned by inet_ntoa. But I can do a
print (char*)inet_ntoa(*((struct in_addr *) hostdata.h_addr_list[0]))
在 gdb 上查看(我认为足够接近)并且它打印格式错误的 i/p 地址.例如,0.0.".(主机名具有有效的 i/p 地址并且可以从该机器解析,因此以 0.0. 开头的 i/p 也不是正确的值)
on gdb to see (which is close enough I presume) and it prints malformed i/p addresses . For example, "0.0.". (The hostname has a valid i/p address and can be resolved from that machine so an i/p starting with 0.0. is also not the right value)
您可以看到,将不安全的 strcpy 与 inet_ntoa 结合使用会产生一些未知信息,并导致分段错误.
You can see that using the unsafe strcpy with inet_ntoa creates a bit of unknown, and causes a segmentation fault.
很高兴听到经历过类似事情的人的意见,关于 inet_ntoa 失败的原因可能是什么.不知何故,系统正在发挥作用,我无法确定它甚至思考可以做些什么来解决这个问题.
It would be great to hear from someone who has experienced something similar, as to what might be the cause of such a failure of inet_ntoa. Somehow the system is playing a role, and I cannot pinpoint it to even think about what can be done to fix that.
非常感谢所有评论.
约束:我无法修改代码使其工作(否则这很容易解决).因此,尽管知道 strcpy 是一个非常不安全的函数 w.r.t seg faults,而且 inet_ntoa 已被弃用,但我在这方面无能为力.
Constraints: I cannot modify the code to make it work (else this is trivial to solve). So despite knowing that strcpy is a very unsafe function w.r.t seg faults, and inet_ntoa is deprecated I am helpless on that front.
我有一种感觉,这是一个并行处理问题.我不确定,但我不认为该应用程序是多线程的.但是新的sol10机器是64核机器.思想链的原因是 inet_ntoa 唯一真正的问题是静态缓冲区,并且代码确实在循环中进行了这个调用.
I have a feeling that it is a parallel processing issue. I am not sure but I don't think the app is multi-threaded. But the new sol10 machine is a 64 core machine. The reason for the thought chain is that the only real issue with inet_ntoa is the static buffer and the code does make this call in a loop.
我发现链接器(当然还有错误的代码)是问题所在.一些非常好的人,尽管拥有一个与标准库函数(inet_ntoa_r)同名的函数是个好主意.当我在将代码与库链接时尝试使用 -static 选项时,它开始抱怨用户库文件中存在此符号.一旦我从用户库中摆脱了该功能,它就会从崩溃中转移(到我正在尝试修复的其他一些问题.期待另一个问题:)).希望有人觉得这有用
I found out that the linker (and bad code of-course) was the issue. Some super nice person, though that it was a great idea to have a function by the exact same name as a standard library function (inet_ntoa_r). When I tried to use the -static option while linking my code with the libraries, it started complaining about the presence of this symbol in the user library file. Once I got rid of that function from the user library it moved on from that crash (to some other issue which I am trying to fix. Expect another question :) ). Hope someone find this useful