解析< a>之间的html段落< p为H. <预> < H1,...,6个使用正则表达式的标签

问题描述:

大家好我需要帮助的是我的代码;

Hello everyone i need some help here is my code;

private void button1_Click(object sender, EventArgs e)
    {
        string s = KaynakKodunuCek("http://tr.wikipedia.org/wiki/Lale");
        // <a ... > </a> tagları arasını alıyor.(taglar dahil)
        Regex regex = new Regex("(?i)<a([^>]+)>(.+?)</a>");
        string gelen = s;
        string inside = null;
        Match match = regex.Match(gelen);
        if (match.Success)
        {
            inside= match.Value;
            richTextBox2.Text = inside;
        }
        string outputStr = "";
        foreach (Match ItemMatch in regex.Matches(gelen))
        {
            Console.WriteLine(ItemMatch);
            inside = ItemMatch.Value;
            //boşluk bırakıp al satır yazıyor 
            outputStr += inside + "\r\n";
        }
        richTextBox2.Text = outputStr;
    }

当我点击button2时,它将html代码解析为richtextbox2,但结果如这个。

when i click button2 it parsing the html codes to richtextbox2 but the result is like this.

< a class =external texthref =// tr.wikipedia.org/w/index.php?title =%C3%96zel:G%C3%BCnl%C3%BCk& amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; page =
< a class =external texthref =// tr.wikipedia.org/w/index.php?title=Lale&oldid=13373007&amp;diff=cur\"> 1değişiklik&lt ; / A&GT;
< a href =#mw-navigation> kullan< / a>

但我想看看我的仅输出标签之间的段落,例如> kontroledilmiş

but i want to see my output only the paragraphs between tags for example >kontrol edilmiş<

HTML不能用正则表达式解析。您最好使用类似 HTML Agility Pack 的内容。

HTML wasn't designed to be parsed with regex. You're better off using something like the HTML Agility Pack.