如何在python Django框架中仅强制执行一个正在运行的进程实例?
我有一个python Django manage命令,应在接收到输入文件时调用,但此命令对并行调用不安全.因此,仅当没有其他文件正在处理时,才应处理输入文件.
I have a python Django manage command that should be called upon receiving an input file but this command is not safe for parallel calls. So an input file should be processed only and only when there is no other file being processed.
我拥有的一种解决方案是使用锁定文件.基本上,在过程开始时创建一个锁定文件,并在最后将其删除.
One solution that I have is to use a lock file. Basically, create a lock file at the start of the process and delete it at the end.
我担心,如果进程崩溃,锁定文件将不会被删除,因此在我们手动删除该锁定文件之前,不会处理任何其他文件.
I'm worried that if the process crashes the lock file won't be deleted and consequently none of the other files would be processed until we manually remove that lock file.
解决方案不需要特定于Django甚至python,但是最佳实践是强制执行该过程的仅一个实例?
The solution doesn't need to be specific for Django or even python, but what is the best practice to enforce that only one instance of this process is being run?
正如KlausD在他的评论中提到的,规范的(与语言无关的)解决方案是使用包含正在运行的进程的pid的锁定文件,因此代码负责锁获取的可以检查进程是否仍在运行.
As KlausD mentions in his comment, the canonical (and language-agnostic) solution is to use a lock file containing the pid of the running process, so the code responsible for the lock acquisition can check if the process is still running.
如果您在项目中使用Redis,另一种解决方案是将锁存储在Redis中,其TTL比任务的最坏情况下的运行时间更长.这样可以确保将锁释放出来,并且还可以根据需要轻松地在多个服务器之间共享锁.
An alternative solution if you use redis in your project is to store the lock in redis with a TTL that's a bit longer than the worst case runtime of the task. This makes sure the lock will be freed whatever, and also allow to easily share the lock between multiple servers if needed.
该进程是否有可能崩溃,而另一个进程获取了相同的pid?
is it possible that the process crashes and another process pick up the same pid?
是, ,甚至是获取进程统计信息检查进程的开始时间,命令行,父进程等,并确定是同一进程还是新进程的可能性.
Yes, of course, and that's even rather likely (and this is an understatement) on a server running for month or more without reboot, and even more so if the server runs a lot of short-lived processes. You will not only have to check if there's a running process matching this pid but also get the process stats to inspect the process start time, the command line, the parent etc and decides the likelyhood it's the same process or a new one.
请注意,这并不是什么新鲜事物-大多数过程监控工具都面临相同的问题,因此您可能需要检查一下它们是如何解决的(在这里,独角兽可能是一个很好的起点).
Note that this is nothing new - most process monitoring tools face the same problem, so you may want to check how they solved it (gunicorn might be a good starting point here).