服务器发事件(SSE)需要数据库操作时,Rails应用程序挂起

问题描述:

我正在学习&第一次在铁轨上做SSE!我的控制器代码:

I'm learning & doing SSE for the first time in rails! My controller code:

  def update
    response.headers['Content-Type'] = 'text/event-stream'
    sse = SSE.new(response.stream, event: 'notice')
    begin
      User.listen_to_creation do |user_id|
        sse.write({id: user_id})
      end
    rescue ClientDisconnected
    ensure
      sse.close
    end
  end

前端:

  var source = new EventSource('/site_update');
  source.addEventListener('notice', function(event) {
    var data = JSON.parse(event.data)
    console.log(data)
  });

模型发布/订阅

class User
  after_commit :notify_creation, on: :create

  def notify_creation
      ActiveRecord::Base.connection_pool.with_connection do |connection|
        self.class.execute_query(connection, ["NOTIFY user_created, '?'", id])
      end
  end

  def self.listen_to_creation
    ActiveRecord::Base.connection_pool.with_connection do |connection|
      begin
        execute_query(connection, ["LISTEN user_created"])
        connection.raw_connection.wait_for_notify do |event, pid, id|
          yield id
        end
      ensure
        execute_query(connection, ["UNLISTEN user_created"])
      end
    end
  end

  def self.clean_sql(query)
    sanitize_sql(query)
  end

  private

  def self.execute_query(connection, query)
    sql = self.clean_sql(query)
    connection.execute(sql)
  end
end

我注意到,如果我正在写SSE,就像在sse.write({time_now: Time.now})这样的教程中那样微不足道,那么一切都很好.在命令行中,CTRL+C成功关闭了本地服务器.

I've noticed that if I'm writing to SSE, something trivial like in a tutorial like... sse.write({time_now: Time.now}), everything works great. In command line, CTRL+C successfully shuts down the local server.

但是,每当我需要编写需要某种数据库操作的东西时,例如当我在进行postgres pub/sub时,例如

However, whenever I need to write something that requires some kind of database action, for example when I'm doing a postgres pub/sub as in this tutorial, then CTRL+C doesn't shut down the local server, it's just stuck and hangs and requires me to manually kill the PID.

在实际启动的服务器上,有时页面刷新也将永远挂起.其他时候,它将引发超时错误:

On the actual spun up server, sometimes a page refresh will hang forever as well. Other times, it will throw a timeout error:

ActiveRecord::ConnectionTimeoutError (could not obtain a connection from the pool within 5.000 seconds (waited 5.001 seconds); all pooled connections were in use):

不幸的是,这个问题在我使用Heroku的生产环境中也仍然存在.我只是收到很多超时错误.但是我想我已经正确配置了Heroku,还配置了本地设置……我的理解是,我只需要有一个较大的池(我有5)来拉连接并允许多个线程.在下面,您将找到一些配置代码.

Unfortunately this issue persists in production as well, where i'm using Heroku. I just get lots of timeout errors. But I think I have Heroku properly configured, and also local settings... my understanding is I just need to have a sizable pool (I have 5) to pull connections from and allow multiple threads. Below you'll find some config code.

没有环境变量,已使用默认值!

# config/database.yml
default: &default
  adapter: postgresql
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000

development:
  <<: *default
  database: proper_development


# config/puma.rb
workers Integer(ENV['WEB_CONCURRENCY'] || 1)

threads_count = Integer(ENV['MAX_THREADS'] || 5)
threads threads_count, threads_count

preload_app!

rackup      DefaultRackup
port        ENV['PORT']     || 3000
environment ENV['RACK_ENV'] || 'development'

on_worker_boot do
  # Worker specific setup for Rails 4.1+
  # See: https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#on-worker-boot
  ActiveRecord::Base.establish_connection
end

如果有帮助,这是我运行rails s

If it's helpful here's the output when I run rails s

=> Booting Puma
=> Rails 5.0.2 application starting in development on http://localhost:3000
=> Run `rails server -h` for more startup options
Puma starting in single mode...
* Version 4.3.3 (ruby 2.4.0-p0), codename: Mysterious Traveller
* Min threads: 0, max threads: 16
* Environment: development
* Listening on tcp://127.0.0.1:3000
* Listening on tcp://[::1]:3000
Use Ctrl-C to stop

这里的问题似乎是puma线程与数据库连接之间缺乏一致性.如果某个连接是由中间件等通过AR启动的,则您编写的代码可能导致两个连接保持在同一请求周期中,直到您收到通知并且线程完成其工作为止. AR缓存每个线程的连接,因此如果发出请求并从池中检出连接,连接将由该连接保留.请查看此问题以了解更多详细信息.如果最终使用连接池检查另外一个连接并使该连接等待,直到您收到Postgres的通知,则可能由正在等待的同一Puma线程保留两个连接.

The issue here seems to be the lack of consistency between the puma threads and the database connections. If some connection was initiated by middleware etc through AR, the code you have written can lead to two connections being held in the same request cycle until you receive a notification and the thread finishes its job. AR caches connections per thread, so if a request was made and connection was checked out from the pool it will be held by that. Look at this issue for more details. If you end up using the connection pool to check out one more connection and make that connection wait till you get a notification from Postgres, potentially two connections can be held by the same Puma thread that is waiting.

要实际执行此操作,请在开发中启动一个新的Rails服务器实例,并向您的SSE端点发出请求.如果您在遇到超时之前才对新启动的服务器发出一个请求,则可能会看到与Postgres的两个连接.因此,即使线程数和连接池大小相同,也可能耗尽了池中的可用连接.一种更简单的方法可能是在签出连接以查看当前正在保留多少个高速缓存的连接之后,才在开发中添加此行.

To see this in action, start a new Rails server instance in development and make a request to your SSE endpoint. If you were getting timeouts before you might see two connections to Postgres while you made just one request to a newly launched server. So, even though your number of threads and connection pool size was same, you might run out of free connections from the pool. An easier way might be to just add this line in development after you checkout a connection to see how many cached connections are being held right now.

def self.listen_to_creation
    ActiveRecord::Base.connection_pool.with_connection do |connection|
      # Print or log this array
      p ActiveRecord::Base.connection_pool.instance_variable_get(:@thread_cached_conns).keys.map(&:object_id)

      begin
        execute_query(connection, ["LISTEN user_created"])
.........
.........

此外,您发布的摘录似乎表明您在开发环境中的大小为5的连接池上正在运行多达16个线程,所以这是一个不同的问题.

Also, the snippets you have posted seem to indicate you are running up to 16 threads on a connection pool of size 5 in development environment, so that is a different issue.

要解决此问题,您需要调查哪个线程正在保持连接,以及是否可以将其重新用于通知或只是增加数据库池大小.

To fix this, you would want to investigate which thread is holding the connection and if you can reuse it for your notification or just increase the DB pool size.

现在,进入SSE本身.由于SSE连接会阻塞并在您当前的设置中保留一个线程.如果对此端点有多个请求,您可能会很快饿死Puma线程本身,从而使请求等待.如果您不希望对此端点有很多请求,这可能会起作用,但是如果您希望,则需要更多的空闲线程,因此您甚至可能希望增加Puma线程数.理想情况下,尽管非阻塞服务器在这里可以更好地工作.

Now, coming to SSE itself here. Since a SSE connection blocks and reserves a thread in your current setup. If you have multiple requests to this endpoint you might quickly starve out of Puma threads itself making requests wait. This might work in case you are not expecting a lot of requests to this endpoint but if you are, you would need more free threads so you might even want to increase the Puma thread count. Ideally, though a non blocking server would work better here.

另外,忘记在 rails中存在问题的情况下添加SSE,它不知道连接已死.您可能让线程无休止地以这种方式等待,直到有一些数据出现,他们才意识到连接不再有效.

Also, forgot to add that SSE in rails has an issue of keeping alive dead connections if it doesn't know the connection is dead. You might have threads endlessly waiting this way until some data comes and they realize the connection is no longer valid.