为什么此DOM-replaceNode函数有时会崩溃?
第一个函数(如下)可以正常工作,可以在同一 DOMDocument ... 但有时会崩溃(没有错误消息,但正在停止服务器).
The first function (below) works fine, in a loop over many nodes of the same DOMDocument... But sometimes crashes (no error message but stopping the server).
当我们在同一个节点循环中使用第二个( replace_innerXML_secure
)时,它永远不会崩溃.为什么?第一个问题是什么?
When we use the second (replace_innerXML_secure
), in the same node loop, it never crashes. Why? What is wrong with the first?
- 首先使用
$ e-> nodeValue =''
删除所有 childNodes (可以吗?); - 第二个保留一个(任意) childNode 并使用removeChild 删除...一种扩展的解决方法,可以避免某些标签在那里时被完全删除.
- The first use
$e->nodeValue=''
to delete all childNodes (it its ok?); - The second preserves one (arbitrary) childNode and use removeChild to delete... A extrange workaround to avoid full deletion when some tag was there.
等价"功能#1和#2:
The "equivalent" functions #1 and #2:
// 1. What is wrong with THIS function??
function replace_innerXML(DOMNode $e,$innerXML='') {
if ($e && ($innerXML>'' || $e->nodeValue>'')) {
$e->nodeValue='';
if ($innerXML>'') {
$tmp = $this->dom->createDocumentFragment();
$tmp->appendXML($innerXML);
$e->appendChild( $tmp );
}
return true;
}
return false;
}
// 2. Here a workaround... slower but... NOT crashes (!), WHY??
function replace_innerXML_secure(DOMNode $e,$innerXML='') {
if ($e) {
$tmp = $e->ownerDocument->createDocumentFragment();
$tmp->appendXML($innerXML);
$once=null;
foreach(iterator_to_array($e->childNodes) as $e2)
if (!$once && $e2->nodeType===1) $once=$e2;
else $e->removeChild($e2);
if ($once)
$once->parentNode->replaceChild( $tmp, $once );
else {
$e->nodeValue='';
$e->appendChild( $tmp );
}
return true;
}
return false;
}
注释
EDIT2 用于@Prix请求,例如.
NOTES
EDIT2 for @Prix request, some example.
循环非常复杂,但是可以模拟为
The loop is very complex, but it can be simulated as
// use this with ANY (and a lot of) BIG HTML files from web...
// I have ~1 error/100 samples
$dom = new DOMDocument();
$dom->load($file); // any XML, or loadHTMLfile()
$plst = array(); // you can take off the rand()
foreach ($dom->getElementsByTagName('*') as $node) if (1 || rand(1,3)==1) {
$plst[] = $node->getNodePath();
}
rsort($plst); // from leaves to root
foreach ($plst as $p) {
$xp = new DOMXpath($dom); // refresh for each $p
$node = $xp->query($p);
if ($node->length && $node=$node->item(0))
// USING HERE the function#1 or #2:
replace_innerXML($node,'<new x="1">text</new>');
}
$dom->normalizeDocument();
这里有一些$ dom的示例XML,但是您可以使用任何 $ dom-> loadHTML($ file)
进行测试(!).
Here some sample XML for $dom, but you can use any $dom->loadHTML($file)
to test (!).
<?xml version="1.0" encoding="utf-8"?>
<article dtd-version="3.0" article-type="research-article" xml:lang="en">
<front><journal-meta>
<journal-title-group><journal-title>text text text</journal-title>
<abbrev-journal-title abbrev-type="acronym">aaaa</abbrev-journal-title>
<abbrev-journal-title abbrev-type="publisher">aaabbb aaa</abbrev-journal-title>
</journal-title-group>
<etc>....</etc>
<history><date date-type="received"><label>Received</label> 9 July 2014</date>
<date date-type="accepted"><label>Accepted</label> 25 July 2014</date>
</history>
</journal-meta></front>
<body>
<p>Nonnnononn onononono nonono</p>
<fn><p><label>XXXXX yyyyy</label>: xxxx@aaa.com</p></fn>
<p>Nonnnononn onononono nonono nonono </p>
</body>
</article>
EDIT1 (版本和日志)
版本:
- libxml2: 2.8.0 + dfsg1-7 + wheezy1
- php5 :5.4.4-14 + deb7u14
- apache2 :2.2.22-13 + deb7u3
- libxml2: 2.8.0+dfsg1-7+wheezy1
- php5: 5.4.4-14+deb7u14
- apache2: 2.2.22-13+deb7u3
日志:在哪里?我只知道/var/log/apache2/error.log
,但那里没有错误(成功的http中只有一个常用的png文件不存在").
Logs: where? I know only /var/log/apache2/error.log
, but no error there (only a usual png "File does not exist" that are in a sucess http).
...在此机器上,今天再次运行,http崩溃后,没有大的错误报告,仅文件不存在:/var/www/favicon.ico"崩溃之前...但是我也在Ubuntu机器上运行,在其中我发现(!)有关崩溃日期和瞬间的报告:
... in this machine, running again today, after http crashes, no big error reported, only "File does not exist: /var/www/favicon.ico" before the crash... But I was running also in a Ubuntu machine, where I find (!) a report about the date and instant of a crash:
[Wed Oct 15 20:16:16.840578 2014] [core:notice] [pid 1770] AH00051: child pid 14873 exit signal Segmentation fault (11), possible coredump in /etc/apache2
[Wed Oct 15 20:16:16.840684 2014] [core:notice] [pid 1770] AH00051: child pid 14879 exit signal Segmentation fault (11), possible coredump in /etc/apache2
*** Error in `/usr/sbin/apache2': corrupted double-linked list: 0x00007f457b81af70 ***
[Wed Oct 15 20:16:56.886473 2014] [core:notice] [pid 1770] AH00051: child pid 14844 exit signal Aborted (6), possible coredump in /etc/apache2
[Wed Oct 15 20:16:57.887638 2014] [core:notice] [pid 1770] AH00051: child pid 14894 exit signal Segmentation fault (11), possible coredump in /etc/apache2
是的,崩溃很大,不知道为什么.(我记得 LibXML2 中的标准coredump问题"是删除或写入不存在的节点).
yes, a big crash, no clue about why. (I remember that the "standard coredump problem" in LibXML2 is delete or write nodes that not exists).
虽然我没有发现任何奇怪的代码(在我的计算机上用一些XML对其进行了测试,但没有发现问题),但我怀疑某些东西正在使用它在某种程度上,这导致无限递归.
While I didn't find anything odd about the code (tested it on my machine with a few XMLs and found no problems), I suspect that something uses it in a way, that leads to an infinite recursion.
输入太深的递归函数导致PHP生成 SEGFAULT
.[ 1 ,
Functions that enter too deep recursion are known for causing PHP to SEGFAULT
s. [1, 2]
Either that, or a serious PHP/libxml2 bug.
问题可能出在其他地方吗?
Perhaps the problem lies elsewhere?