php DOMDocument提取与锚点或alt的链接
问题描述:
I which to extract all the link include on page with anchor or alt attribute on image include in the links if this one come first.
$html = '<a href="lien.fr">Anchor</a>';
Must return "lien.fr;Anchor"
$html = '<a href="lien.fr"><img alt="Alt Anchor">Anchor</a>';
Must return "lien.fr;Alt Anchor"
$html = '<a href="lien.fr">Anchor<img alt="Alt Anchor"></a>';
Must return "lien.fr;Anchor"
I did:
$doc = new DOMDocument();
$doc->loadHTML($html);
$out = "";
$n = 0;
$links = $doc->getElementsByTagName('a');
foreach ($links as $element) {
$href = $img_alt = $anchor = "";
$href = $element->getAttribute('href');
$n++;
if (!strrpos($href, "panier?")) {
if ($element->firstChild->nodeName == "img") {
$imgs = $element->getElementsByTagName('img');
foreach ($imgs as $img) {
if ($anchor = $img->getAttribute('alt')) {
break;
}
}
}
if (($anchor == "") && ($element->nodeValue)) {
$anchor = $element->nodeValue;
}
$out[$n]['link'] = $href;
$out[$n]['anchor'] = $anchor;
}
}
This seems to work but if there some space or indentation it doesn't as
$html = '<a href="link.fr">
<img src="ceinture-gris" alt="alt anchor"/>
</a>';
the $element->firstChild->nodeName will be text
答
Something like this:
$doc = new DOMDocument();
$doc->loadHTML($html);
// Output texts that will later be joined with ';'
$out = [];
// Maximum number of items to add to $out
$max_out_items = 2;
// List of img tag attributes that will be parsed by the loop below
// (in the order specified in this array!)
$img_attributes = ['alt', 'src', 'title'];
$links = $doc->getElementsByTagName('a');
foreach ($links as $element) {
if ($href = trim($element->getAttribute('href'))) {
$out []= $href;
if (count($out) >= $max_out_items)
break;
}
foreach ($element->childNodes as $child) {
if ($child->nodeType === XML_TEXT_NODE &&
$text = trim($child->nodeValue))
{
$out []= $text;
if (count($out) >= $max_out_items)
break;
} elseif ($child->nodeName == 'img') {
foreach ($img_attributes as $attr_name) {
if ($attr_value = trim($child->getAttribute($attr_name))) {
$out []= $attr_value;
if (count($out) >= $max_out_items)
goto Result;
}
}
}
}
}
Result:
echo $out = implode(';', $out);