如何使用php搜索xml文件中的多个关键字并返回包含标签?
I have an xml file like this, which stores video subtitles:
<videos>
<video>
<id>1</id>
<enSub>Hello Foo! Good morning!</enSub>
<cnSub>你好 Foo! 早上好!</cnSub>
</video>
<video>
<id>2</id>
<enSub>Hello Bar! Good afternoon!</enSub>
<cnSub>你好 Bar! 下午好!</cnSub>
</video>
</videos>
I want to search certain keywords through this xml, like I enter "hello moning" in the search text area, and the search result could find the video element with id "1".
I guess that using php xpath can only find single keyword in the xml file, and it has to iterate through the whole tree. I'm not confident that I can write a function with good performance.
I tried to use external resource like google custom search to search my web, but it turned out that I'm not using pages to display each video. I pass different video id as parameter to the video-play-page.
I also thought of regular expression, but don't know how to handle the orders of keywords.
So is there any search engine that I can use to search multiple keywords to pinpoint a video. I designed this to help my users to quickly find the video the watched.
I googled a lot. It's really slow, sometimes I just can't access google, in my place in China here. I tried "multiple keywords search xml" as searching keywords. Maybe my English isn't intelligent enough for google to understand my intent. I hope you guys here understand my question.
Thank you so much!!
Please see my example code below on how to accomplish this.
<?php
$xml = <<<XML
<videos>
<video>
<id>1</id>
<enSub>Hello Foo! Good morning!</enSub>
<cnSub>你好 Foo! 早上好!</cnSub>
</video>
<video>
<id>2</id>
<enSub>Hello Bar! Good afternoon!</enSub>
<cnSub>你好 Bar! 下午好!</cnSub>
</video>
</videos>
XML;
// Lowercase the XML so we can do a non-case-sensitive search.
$xml = strtolower($xml);
// Create a DOMDocument based on the xml.
$dom = new DOMDocument;
$dom->loadXML($xml);
// Create an xpath based on the dom document so we can search it.
$xpath = new DOMXpath($dom);
// Search for any video tag that contains the text good morning.
$nodes = $xpath->query('//video[contains(.,\'good morning\')]');
// Iterate all nodes
foreach($nodes as $node){
// find the ID node and print its content.
var_dump($xpath->query('id',$node)->item(0)->textContent);
}
-- Edit
I reread your post and it looks like you're using keywords and not strings. If that's the case, then try this snippet on for size:
<?php
$xml = <<<XML
<videos>
<video>
<id>1</id>
<enSub>Hello Foo! Good morning!</enSub>
<cnSub>你好 Foo! 早上好!</cnSub>
</video>
<video>
<id>2</id>
<enSub>Hello Bar! Good afternoon!</enSub>
<cnSub>你好 Bar! 下午好!</cnSub>
</video>
</videos>
XML;
// Lowercase the XML so we can do a non-case-sensitive search.
$xml = strtolower($xml);
// Create an DOMDocument based on the xml.
$dom = new DOMDocument;
$dom->loadXML($xml);
// Create an xpath based on the dom document so we can search it.
$xpath = new DOMXpath($dom);
// Define the search keywords
$searchKeywords = array('good','hello');
// Iterate all of them to make them into valid xpath
$searchKeywords = array_map(
function($keyword){
// Replace any single quotes with an escaped single quote.
$keyword = str_replace('\'','\\\'',$keyword);
return 'contains(.,\''.$keyword.'\')';
},
$searchKeywords
);
// Implode all the keywords using and, you could change this to be
// an"or" condition if you so desire.
$searchKeywords = implode(' and ',$searchKeywords);
// The search keywords now look like contains(.,'good') and contains(.,'hello')
// Search for any video tag that contains the text good morning.
$nodes = $xpath->query('//video['.$searchKeywords.']');
// Iterate all nodes
foreach($nodes as $node){
// find the ID node and print its content.
var_dump($xpath->query('id',$node)->item(0)->textContent);
}
First of all your xml is messy, the opening and closing tags has to match. You can use DomDOcument for manipulating xml.
$searchStr ="hello afternoon";
$searchArr = explode(" ",$searchStr);
$result = array();
$xmlData = "<videos>
<video>
<id>1</id>
<enSub>Hello Foo! Good morning!</enSub>
<cnSub>你好 Foo! 早上好!</cnSub>
</video>
<video>
<id>2</id>
<enSub>Hello Bar! Good afternoon!</enSub>
<cnSub>你好 Bar! 下午好!</cnSub>
</video>
</videos>";
$dom = new DOMDocument();
$dom->loadXML($xmlData);
foreach ($dom->documentElement->childNodes as $node) {
if($node->nodeType==1){
$enSub = $node->getElementsByTagName('enSub')->Item(0)->nodeValue;
$cnSub = $node->getElementsByTagName('cnSub')->Item(0)->nodeValue;
$id = $node->getElementsByTagName('id')->Item(0)->nodeValue;
foreach($searchArr as $key=>$val){
$temp = array();
if( strpos($enSub,$val) != false ){
$temp[$id] = array(
'id'=>$id,
'enSub'=>$enSub,
'cnSub'=>$cnSub
);
$result[$id]=$temp;
}
}
}
}
echo "<pre>";
print_r($result);
You can find the working demo here
I guess you could use a search server like ElasticSearch. Its using Lucene to index any kind of content. The indexed content can then be queried via a JSON API.
This of course only makes sense when you are constantly working with a large amount of data.
The other approach would be to parse the xml and build up an array which has each term in the sub-tag as an index. The value would then be an array containing the ids of the movies which have that term in their respective tag. Basically you are building up a simple data index of your own.
You could then query your index like this:
<?php
$index = array(
'Hello' => array(1,3),
'World' => array(1),
'Good' => array(2),
'Morning' => array(2),
'Vietnam' => array(2,3),
);
$searchTerms = array('Hello', 'World');
$found = null;
foreach($searchTerms as $term){
if(array_key_exists($term, $index)){
if(is_null($found)){
$found = $index[$term];
} else {
$found = array_intersect($found, $index[$term]);
}
} else {
$found = array();
break;
}
}
print_r($found);
The main benefit of this approach is that you would only have to traverse the xml document once while having a rather fast search. BTW - if you want to treat the search terms with OR instead of AND you can use array_merge and array_unique instead of array_intersect.
Somewhere in the middle would be the approach to set up a real database like MySQL and do the above search in a query.
It really depends on what you want to accomplish.