多线程并行抓取网页议论

多线程并行抓取网页讨论
     以前对多线程更多只是一种概念上的认知,并没有多深入接触编码。这两天刚好遇到一个要求实时性非常高的抓取问题。于是就尝试了多线程编程进行性能方面的评估。
   1.采用开源线程池SmartThreadPool
   代码类似这样:
   

      static void Main(string[] args)
        {
            Stopwatch sw = new Stopwatch();
            sw.Start();
            Console.WriteLine("启动时间:"+DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff"));
            Random rnd=new Random();
            var urls = new List<string>();
            urls.Add("http://www.cnblogs.com/chenmh/p/3944116.html?rnd="+rnd.Next(1,999));
            urls.Add("http://www.cnblogs.com/dolphin0520/p/3932906.html?rnd=" + rnd.Next(1, 999));
            urls.Add("http://www.cnblogs.com/beatless/p/3944101.html?rnd=" + rnd.Next(1, 999));
            urls.Add("http://www.cnblogs.com/html5tricks/p/3944054.html?rnd=" + rnd.Next(1, 999));
            urls.Add("http://www.cnblogs.com/hookjoy/p/3944077.html?rnd=" + rnd.Next(1, 999));
            urls.Add("http://www.cnblogs.com/superlcr/p/3944045.html?rnd=" + rnd.Next(1, 999));
            urls.Add("http://www.cnblogs.com/fangkm/p/3943896.html?rnd=" + rnd.Next(1, 999));
            urls.Add("http://www.cnblogs.com/wunaozai/p/3936295.html?rnd=" + rnd.Next(1, 999));
            urls.Add("http://www.cnblogs.com/lienhua34/p/3943362.html?rnd=" + rnd.Next(1, 999));
            urls.Add("http://www.cnblogs.com/xiaoqiang001/p/3942412.html?rnd=" + rnd.Next(1, 999));
            urls.Add("http://www.cnblogs.com/zeusro/p/nopcommerce_002.html?rnd=" + rnd.Next(1, 999));
            urls.Add("http://www.cnblogs.com/yexiaochai/p/3942194.html?rnd=" + rnd.Next(1, 999));
            urls.Add("http://www.cnblogs.com/kaituorensheng/p/3941580.html?rnd=" + rnd.Next(1, 999));
            urls.Add("http://www.cnblogs.com/knowledgesea/p/3942169.html?rnd=" + rnd.Next(1, 999));
            urls.Add("http://www.cnblogs.com/en-heng/p/3941759.html?rnd=" + rnd.Next(1, 999));

             SmartThreadPool smartThreadPool = new SmartThreadPool();
            IWorkItemResult wir = smartThreadPool.QueueWorkItem(() =>
            {

                foreach (var url in urls)
                {
                    Console.WriteLine("耗费时间:" + sw.Elapsed + GetTitle(url));
                    // System.Threading.Thread.Sleep(1000);
                }

            });
            smartThreadPool.WaitForIdle();
             Console.Read();
}
 public static string GetTitle(string url)
        {
            Console.WriteLine("进入抓取页面时间:" + DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff"));
             //抓取页面标题
}


  


上面的试验结果:
多线程并行抓取网页议论
不知道是不是我对这个SmartThreadPool认知上的陌生,好像没传说中那么好可以进行负载平衡,最大化提供程序性能。
看了下监控日志:
  循环过程进入GetTitle(url)方法的线程数量,几乎是从一个开始经过20毫秒进行创建第二个线程,然后下去完全看不出有任何
并发执行的概念,我的CPU是4核心。。。。
--------------------------------------------------------------------------------------------------------------------------------------------------------------
2.采用开源.net自带线程池ThreadPool
 

      static void Main(string[] args)