了解python GIL-I/O绑定vs CPU绑定
来自 Python线程文档
在CPython中,由于具有全局解释器锁,只能有一个线程可以 一次执行Python代码(即使某些面向性能的代码 库可能会克服此限制).如果你想要你的 应用程序以更好地利用计算资源 多核机器,建议您使用多处理.然而, 如果要运行多个线程,线程仍然是合适的模型 同时执行受I/O约束的任务.
In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
现在我有一个这样的线程工作者
Now I have a thread worker like this
def worker(queue):
queue_full = True
while queue_full:
try:
url = queue.get(False)
w = Wappalyzer(url)
w.analyze()
queue.task_done()
except Queue.Empty:
queue_full = False
这里w.analyze()
做两件事
- 使用
requests
库抓取网址 - 使用
pyv8
javascript库 分析抓取的html
- Scrape the url using
requests
library - Analyzing the scraped html using
pyv8
javascript library
据我所知,1
是I/O绑定的,而2
是CPU绑定的.
As far as I know, 1
is I/O bound and 2
is CPU bound.
这是否意味着GIL申请了2
并且我的程序无法正常运行?
Does that mean, GIL applied for 2
and my program won't work properly?
GIL
描述中没有提到正确性,只是关于效率.
The GIL
description does not say anything about correctness, only about efficiency.
如果2
受CPU限制,您将无法从线程中获得多核性能,但是您的程序仍将正确执行 .
If 2
is CPU bound, you will not be able to get multicore performance out of threading, but your program will still perform correctly.
如果您关心CPU并行性,则应该使用Python的multiprocessing
库.
If you care about CPU Parallelism, you should use Python's multiprocessing
library.