如何从同一列中包含3个以上所需单词的MySQL数据库行中进行选择?

如何从同一列中包含3个以上所需单词的MySQL数据库行中进行选择?

问题描述:

I have a large database (over 2 million rows) where I store statistics from users. Users have Unique IDs which contain parts (for example hashed MAC address). The example Unique ID string is:

"AAAAAA-BBBBBB-CCCCCC-DDDDDD-EEEEEE-FFFFFFF"

Sometimes a part of the Unique ID changes and then it's for example:

"ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX"

I want to identify users and select all rows, where at least 3 (or different value) Unique ID parts match, using PHP. Is there any useful syntax which I could use to do that? I know that I can select all rows containing at least 1 Unique ID part, then explode to an array by the "-" character and check if contains at least 3 same words, but I believe that it isn't the fastest possible way.

Basically, I would like to use something like:

WHERE `UniqueID` REGEXP 'AAAAAA|BBBBBB|CCCCCC|DDDDDD|EEEEEE|FFFFFFF';

But selecting only if 3 of 6 words match.

我有一个大型数据库(超过200万行),我存储用户的统计信息。 用户具有包含部分的唯一ID(例如散列的MAC地址)。 示例唯一ID字符串为: p>

“AAAAAA-BBBBBB-CCCCCC-DDDDDD-EEEEEE-FFFFFFF” p >

有时,唯一ID的一部分会发生变化,例如: p>

“ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX” p>

我想识别用户并使用PHP选择所有行,其中至少3个(或不同值)唯一ID部分匹配。 是否有任何有用的语法可以用来做到这一点? 我知道我可以选择包含至少1个唯一ID部分的所有行,然后通过“ - ”字符爆炸到数组并检查是否包含至少3个相同的单词,但我相信这不是最快的方法。 p>

基本上,我想使用类似的东西: p>

  WHERE`NitidID`REGESP'AAAAAA | BBBBBB | CCCCCC | DDDDDD | EEEEEE  | FFFFFFF'; 
  code>  pre> 
 
 

但只选择6个单词中的3个匹配。 p> div>

YOU MIGHT WANT TO CHECK PERFORMANCE

Consider this query which uses inStr to check if a string is available within your id string. i used substring_index to extract the parts. just in case if you are interested in splitting or extracting part of the id string to create a temporary table but for this answer you can ignore it.

Ideally create your own stored procedure with keyword1, keyword2 & keyword3 as in parameter and then you can perform the search and return results.

Where condition checks whether the keyword1, 2 & 3 are found in your id string.

select 
    'ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX' as Id,
    substring_index('ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX','-', 1) as part1,
    substring_index(substring_index('ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX','-', 2),'-', -1) as part2,
    substring_index(substring_index('ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX','-', 3),'-', -1) as part3,
    substring_index(substring_index('ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX','-', 4),'-', -1) as part4,
    substring_index(substring_index('ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX','-', 5),'-', -1) as part5,
    substring_index(substring_index('ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX','-', 6),'-', -1) as part6
from dual
WHERE

    instr('ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX','BBBBBB') >= 1 -- keyword1
    and instr('ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX','CCCCCC') >= 1 -- keyword2
    and instr('ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX','DDDDDD') >= 1 -- keyword3
;

EDIT if the above query is working you can add your logic.

select 
    'ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX' as Id


from dual
WHERE
    (
    -- below logic gives true when 3 or more keywords are found. change this accordingly
        (instr('ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX',ik1) >0)+
        (instr('ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX',ik2) >0)+
        (instr('ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX',ik3) >0)+
        (instr('ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX',ik4) >0)+
        (instr('ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX',ik5) >0)+
        (instr('ZZZZZZ-BBBBBB-CCCCCC-DDDDDD-EEEEEE-XXXXXXX',ik6) >0)
    ) >=3

;

Thanks to @krishKM I found the best solution for my Unique IDs selection using PHP.

First I made a function which puts Unique ID parts to an array, but keeps "delimeters" on both sides. It helps to prevent incorrect selections, for example when the ID part is very short.

function SplitUniqueIDKeepDelimeters($UniqueID, $Delimeter = "-"){
    $UniqueIDParts = array();
    $Remaining = $UniqueID;
    $pos = strpos($Remaining, $Delimeter);
    $firstPart = true;
    while ($pos !== false) {
        $Part = substr($Remaining,0,$pos + 1);
        if(!$firstPart){
            $Part = $Delimeter.$Part;
        }
        $firstPart = false;
        $UniqueIDParts[] = $Part;
        $Remaining = substr($Remaining,$pos+1);
        $pos = strpos($Remaining, $Delimeter);
    }
    if(!$firstPart){
        $Remaining = $Delimeter.$Remaining;
    }
    $UniqueIDParts[] = $Remaining;
    return $UniqueIDParts;
}

Then based on @krishKM's answer I created the selection syntax:

function GenerateSelectingString($UniqueIDParts, $RowName = "UniqueID", $HowManyMatches = 3){
    $First = true;
    $FinishedString = " (";
    foreach ($UniqueIDParts as $key => $value){
        if(!$First){
            $FinishedString .= "+";
        }
        $First = false;
        $FinishedString .= "(instr(`".$RowName."`,'".$value."') > 0)";

    }
    $FinishedString .= ") >=".$HowManyMatches;
    return $FinishedString;
}

Test:

$TestArray = SplitUniqueIDKeepDelimeters("21680-7886-2761736077-72316693-753974682-041548389245");
$TestSelectString = GenerateSelectingString($TestArray, "UniqueID");

echo "Unique ID parts array:<br>";
var_dump($TestArray);
echo "<br><br>Query String:<br>";
echo $TestSelectString;
echo "<br>";

$sql2 = "SELECT * FROM `Statistics`.`Statistics` WHERE ".$TestSelectString. " order by `lognumber` DESC;";
    $result2 = mysqli_query($conn, $sql2);
    if (mysqli_num_rows($result2) > 0) {
    while($row2 = mysqli_fetch_assoc($result2)) {
    echo "<br>";
    var_dump($row2);
    echo "<br>";
    }
}
else{
    echo "<br>".mysqli_errno($conn) . ": " . mysqli_error($conn) . "
";
}

Output:

Unique ID parts array: array(6) { [0]=> string(6) "21680-" [1]=> string(6) "-7886-" [2]=> string(12) "-2761736077-" [3]=> string(10) "-72316693-" [4]=> string(11) "-753974682-" [5]=> string(13) "-041548389245" } 

Query String: ((instr(`UniqueID`,'21680-') > 0)+(instr(`UniqueID`,'-7886-') > 0)+(instr(`UniqueID`,'-2761736077-')
> 0)+(instr(`UniqueID`,'-72316693-') > 0)+(instr(`UniqueID`,'-753974682-') > 0)+(instr(`UniqueID`,'-041548389245') > 0)) >=3

array(60) {  ...