独角兽工人和线程

问题描述:

就Gunicorn而言,我知道有各种各样的工作者类,但是对于本次对话,我只是查看同步和异步类型.

In terms of Gunicorn, I am aware there are various worker classes but for this conversation I am just looking at the sync and async types.

据我了解...

sync
workers = (2 * cpu) + 1
worker_class = sync

async (gevent)
workers = 1
worker_class = gevent
worker_connections = a value (lets say 2000)

因此(基于4核系统),使用同步工作器,我最多可以并行处理9个连接.有了Async,我最多可以拥有2000,并且附带了异步警告.

So (based on a 4 core system) using sync workers I can have a maximum of 9 connections processing in parallel. With Async I can have up to 2000, with the caveats that come with async.

问题

  • 那么线程适合放在哪里?我可以同时向同步和同步添加线程吗 异步工作者类型?
  • 在枪械工作人员附近最好的选择是什么? 我是否应该将gunicorn放在Django API的前面,并且 要求并行处理数百个请求?
  • gevent和sync worker类的线程安全吗?
  • So where do threads fit in? Can I add threads to both the sync and async worker types?
  • What is the best option around gunicorn workers? Should I wish to place gunicorn in front of a Django API, with the requirement of processing 100s of requests in parallel?
  • Are gevent and sync worker classes thread safe?

让我尝试一个答案.让我们假设,一开始我的部署只有一个gunicorn工人.这使我一次只能处理一个请求.我的工作人员只是打电话给google.com并获取查询的搜索结果.现在,我想增加吞吐量.我有以下选择

Let me attempt an answer. Let us assume that at the beginning my deployment only has a single gunicorn worker. This allows me to handle only one request at a time. My worker's work is just to make a call to google.com and get the search results for a query. Now I want to increase my throughput. I have the below options

这是最简单的.由于线程比进程更轻量(更少的内存消耗),因此我只保留一个工作线程,并向其中添加多个线程. Gunicorn将确保主服务器随后可以向工作人员发送多个请求.由于工作线程是多线程的,因此它能够处理4个请求.极好的.现在,为什么我需要更多的工人?

This is the easiest. Since threads are more lightweight (less memory consumption) than processes, I keep only one worker and add several threads to that. Gunicorn will ensure that the master can then send more than one requests to the worker. Since the worker is multithreaded, it is able to handle 4 requests. Fantastic. Now why would I need more workers ever?

要回答这个问题,请假设我需要对Google返回的搜索结果进行一些处理.例如,我可能还想为每个结果查询计算素数.现在,我正在限制工作量计算,并且遇到了python的全局解释器锁定的问题.即使我有4个线程,一次实际上也只能有一个线程处理结果.这意味着要获得真正的并行性能,我需要多个工作人员.

To answer that, assume that I need to do some work on the search results that google returned. For instance I might also want to calculate a prime number for each result query. Now I am making my workload compute bound and I hit the problem with python's global interpreter lock. Even though I have 4 threads, only one thread can actually process the results at a time. This means to get true parallel performance I need more than one workers.

所以为什么我需要这个才是当我需要获得真正的并行处理时.每个工作人员都可以并行调用google.com,获取结果并进行任何处理.全部并行.极好的.但是不利的是,流程更加繁重,而且我的系统可能无法满足不断增加的工人完成并行化的需求.因此,最好的解决方案是增加工作人员,并为每个工作人员添加更多线程.

So why I need this would be when I need to get true parallel processing. Each worker can parallely make a call to google.com, get results and do any processing. All in parallel. Fantastic. But the downside is that processes are more heavier, and my system might not keep up with the demands of increasing workers to accomplish parallelism. So the best solution is to increase workers and also add more threads to each worker.

我想这不需要进一步的解释.

I guess this needs no further explanation.

现在我为什么要这么做?要回答这个问题,请记住,即使线程也要占用内存. gevent库实现了一些协程(您可以查找的基本构造),这些协程使您无需创建线程即可获取线程.因此,如果您制作出可以使用工作者类型的gevent的Gunicorn,那么您将不必在工作者中创建线程.假设您正在获得不需要显式创建线程的线程.

Now why would I ever want to do this? To answer, remember that even threads consume memory. There are coroutines (a radical construct that you can look up) implemented by gevent library that allow you to get threads without having to create threads. SO if you craft your gunicorn to use worker-type of gevent, you get the benefit of NOT having to create threads in your workers. Assume that you are getting threads w/o having to explicitly create them.

因此,要回答您的问题,如果您使用的是不是Sync的worker_type,则不需要增加gunicorn配置中的线程数.您一定可以做到,但这有点违背了目的.

So, to answer your question, if you are using worker_type of anything other than Sync, you do not need to increase the number of threads in your gunicorn configuration. You can do it, by all means, but it kinda defeats the purpose.

希望这会有所帮助.

我还将尝试回答具体问题.

I will also attempt to answer the specific questions.

  • 否,Async worker类没有线程选项. 实际上,这需要通过文档来弄清楚. 想知道为什么还没有发生.

  • No, the threaded option is not present for the Async worker class. This actually needs to be made clearer through the documentation. Wondering why that has not happened.

这是一个需要您对自己的具体情况有更多了解的问题 应用.如果处理这数百个并行请求 只是涉及I/O类型的操作,例如从数据库中获取,保存, 从其他应用程序收集数据,然后您就可以利用 线程工人.但是如果不是这样,您想 在n核CPU上执行,因为任务需要大量计算 约束,也许就像计算素数一样,您需要利用 同步工作者.异步的原因略有不同.使用 异步,您需要确保您的处理不受计算限制, 这意味着您将无法使用多个内核. 您获得的优势是多个线程将占用的内存 不会在那里.但是您还有其他问题,例如非猴子修补 库.仅在线程工作程序不满足时移动到异步 您的要求.

This is a question that needs more knowledge of your specific application. If the processing of these 100s of parallel requests just involves I/O kind of operations, like fetching from DB, saving, collecting data from some other application, then you can make use of the threaded worker. But if that is not the case and you want to execute on a n core CPU because the tasks are extremely compute bound, maybe like calculating primes, you need to make use of the Sync worker. The reasoning for Async is slightly different. To use Async, you need to be sure that your processing is not compute bound, this means you will not be able to make use of multiple cores. Advantage you get is that the memory that multiple threads would take would not be there. But you have other issues like non monkey patched libraries. Move to Async only if the threaded worker does not meet your requirements.

同步,如果您想要绝对的,非线程工作程序是最佳选择 您的库中的线程安全.

Sync, non threaded workers are the best option if you want absolute thread safety amongst your libraries.