在用户等待时(在Rails中)处理大量数据的最佳实践?

问题描述:

我有一个小书签,当使用小书签时,它将当前浏览器页面上的所有URL提交给Rails 3应用进行处理.在后台,我正在使用 Typhoeus 来检查每个URL是否返回2XX状态代码.目前,我是通过对Rails服务器的AJAX请求来启动此过程的,只是等待它处理并返回结果.对于较小的集合,这是非常快的,但是当URL的数量很大时,用户最多可以等待10到15秒.

I have a bookmarklet that, when used, submits all of the URLs on the current browser page to a Rails 3 app for processing. Behind the scenes I'm using Typhoeus to check that each URL returns a 2XX status code. Currently I initiate this process via an AJAX request to the Rails server and simply wait while it processes and returns the results. For a small set, this is very quick, but when the number of URLs is quite large, the user can be waiting for up to, say, 10-15 seconds.

我已经考虑过使用延迟作业"在用户线程之外进行处理,但这似乎不是正确的用例.由于用户需要等到处理完成才能看到结果,并且延迟的作业"可能要花多达五秒钟才能开始执行,因此我无法保证尽快进行处理.不幸的是,这种等待时间在这种情况下是不可接受的.

I've considered using Delayed Job to process this outside the user's thread, but this doesn't seem like quite the right use-case. Since the user needs to wait until the processing is finished to see the results and Delayed Job may take up to five seconds before the job is even started, I can't guarantee that the processing will happen as soon as possible. This wait time isn't acceptable in this case unfortunately.

理想情况下,我认为应该这样:

Ideally, what I think should happen is this:

  • 用户点击了书签
  • 数据发送到服务器进行处理
  • 在拆分线程以进行处理的同时立即返回等待页面
  • 等待页面定期通过ajax轮询处理结果并更新等待页面(例如:已处理567个URL中的4个...")
  • 准备好结果后,等待页面就会更新

一些额外的细节:

  • 我正在使用Heroku(长时间运行的进程会在30秒后被杀死)
  • 已登录的用户和匿名用户都可以使用此功能

这是执行此操作的典型方法,还是有更好的方法?我应该只是发布自己的脱线程处理程序以在处理过程中更新数据库,还是应该使用诸如延迟作业"之类的东西(可以在Heroku上使用)?向正确方向的任何推动将不胜感激.

Is this a typical way to do this, or is there a better way? Should I just roll my own off-thread processing that updates the DB during processing or is there something like Delayed Job that I can use for this (and that works on Heroku)? Any pushes in the right direction would be much appreciated.

我认为您后面的想法最有意义.我只是将每个url检查的处理工作转移到了自己的线程中(因此所有url检查都是同时运行的-无论如何,这应该比顺序检查要快得多).每次完成时,它都会更新数据库(确保线程不会踩到彼此的写操作).一个AJAX端点-正如您所说,您在客户端定期进行轮询-它将从数据库中获取并返回已完成进程的计数.这是一种非常简单的方法,我真的看不到需要任何额外的组件.

I think your latter idea makes the most sense. I would just offload the processing of each url-check to its own thread (so all the url checks run concurrently -- which should be a lot faster than sequential checks anyway). As each finishes, it updates the database (making sure the threads don't step on each other's writes). An AJAX endpoint -- which, as you said, you poll regularly on the client side -- will grab and return the count of completed processes from the database. This is a simple enough method that I don't really see the need for any extra components.