使用Java Google App Engine标准在随机缓存中的内存缓存中出现服务错误

问题描述:

在过去的一个月中,我们的Java Google App Engine Standard Web App在看似随机的时间开始出现奇怪的错误(请参见下面的堆栈跟踪).在这段时间左右,我们进行了以下更改:

In the past month, our Java Google App Engine Standard Web App started getting strange errors at seemingly random times (see stack trace below). Around this time we made the following changes:

  1. 从Java7运行时切换到Java8/Jetty9运行时(这使我们可以更灵活地链接到第三方支付库).

  1. Switch from Java7 runtime to Java8/Jetty9 runtime (which allowed us more flexibility in linking to a 3rd party payments library).

切换到使用Google Cloud SDK(而不是单独的Google App Engine SDK)进行部署.

Switch to deploying with the Google Cloud SDK, instead of the separate Google App Engine SDK.

昨天我们经历了3个周期的错误.其中一种发生在2月28日的0530PST到大约0600PST.突然所有从数据库加载的尝试都开始失败:

Yesterday we experienced 3 periods with errors. One of these occurred from 0530PST to about 0600PST on Feb 28. Suddenly all attempts to load from the database started failing:

com.google.appengine.api.memcache.LogAndContinueErrorHandler handleServiceError: Service error in memcache (LogAndContinueErrorHandler.java:50)
com.google.appengine.api.memcache.MemcacheServiceException: Memcache getIdentifiables: exception getting multiple keys
    at com.google.appengine.api.memcache.MemcacheServiceApiHelper$RpcResponseHandler.handleApiProxyException(MemcacheServiceApiHelper.java:77)
    at com.google.appengine.api.memcache.MemcacheServiceApiHelper$1.absorbParentException(MemcacheServiceApiHelper.java:122)
    at com.google.appengine.api.utils.FutureWrapper.handleParentException(FutureWrapper.java:56)
    at com.google.appengine.api.utils.FutureWrapper.get(FutureWrapper.java:95)
    at com.google.appengine.api.memcache.MemcacheServiceImpl.quietGet(MemcacheServiceImpl.java:28)
    at com.google.appengine.api.memcache.MemcacheServiceImpl.getIdentifiables(MemcacheServiceImpl.java:61)
    at com.googlecode.objectify.cache.EntityMemcache.getAll(EntityMemcache.java:215)
    at com.googlecode.objectify.cache.CachingAsyncDatastoreService.get(CachingAsyncDatastoreService.java:253)
    at com.googlecode.objectify.impl.engine.LoadEngine$Round.fetchPending(LoadEngine.java:172)
    at com.googlecode.objectify.impl.engine.LoadEngine$Round.execute(LoadEngine.java:118)
    at com.googlecode.objectify.impl.engine.LoadEngine.execute(LoadEngine.java:258)
    at com.googlecode.objectify.impl.cmd.LoaderImpl.refs(LoaderImpl.java:110)
    at com.googlecode.objectify.impl.cmd.LoaderImpl.refs(LoaderImpl.java:98)
    at com.googlecode.objectify.impl.cmd.LoaderImpl.ref(LoaderImpl.java:88)
    at com.googlecode.objectify.impl.cmd.LoadTypeImpl.refOf(LoadTypeImpl.java:92)
    at com.googlecode.objectify.impl.cmd.LoadTypeImpl.id(LoadTypeImpl.java:86)
    at com.codeavengers.LoginServlet.doPost(LoginServlet.java:62)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
    at com.codeavengers.UrlFilter.doFilter(UrlFilter.java:143)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
    at com.codeavengers.payments.PermissionsFilter.doFilter(PermissionsFilter.java:114)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
    at com.googlecode.objectify.cache.AsyncCacheFilter.doFilter(AsyncCacheFilter.java:59)
    at com.googlecode.objectify.ObjectifyFilter.doFilter(ObjectifyFilter.java:49)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
    at com.google.apphosting.utils.servlet.ParseBlobUploadFilter.doFilter(ParseBlobUploadFilter.java:125)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
    at com.google.apphosting.runtime.jetty9.SaveSessionFilter.doFilter(SaveSessionFilter.java:43)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
    at com.google.apphosting.utils.servlet.JdbcMySqlConnectionCleanupFilter.doFilter(JdbcMySqlConnectionCleanupFilter.java:60)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
    at com.google.apphosting.utils.servlet.TransactionCleanupFilter.doFilter(TransactionCleanupFilter.java:48)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at com.google.apphosting.runtime.jetty9.AppVersionHandlerMap.handle(AppVersionHandlerMap.java:297)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
    at org.eclipse.jetty.server.Server.handle(Server.java:534)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
    at com.google.apphosting.runtime.jetty9.RpcConnection.handle(RpcConnection.java:202)
    at com.google.apphosting.runtime.jetty9.RpcConnector.serviceRequest(RpcConnector.java:81)
    at com.google.apphosting.runtime.jetty9.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:108)
    at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.dispatchServletRequest(JavaRuntime.java:680)
    at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.dispatchRequest(JavaRuntime.java:642)
    at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:612)
    at com.google.apphosting.runtime.JavaRuntime$NullSandboxRequestRunnable.run(JavaRuntime.java:806)
    at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:274)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.apphosting.api.ApiProxy$UnknownException: An error occurred for the API request memcache.Get().
    at com.google.apphosting.utils.runtime.ApiProxyUtils.statusException(ApiProxyUtils.java:207)
    at com.google.apphosting.utils.runtime.ApiProxyUtils.getRpcError(ApiProxyUtils.java:128)
    at com.google.apphosting.runtime.ApiProxyImpl$AsyncApiFuture.setRpcError(ApiProxyImpl.java:601)
    at com.google.apphosting.runtime.ApiProxyImpl$AsyncApiFuture.failure(ApiProxyImpl.java:589)
    at com.google.apphosting.runtime.http.HttpApiHostClient.communicationFailure(HttpApiHostClient.java:170)
    at com.google.apphosting.runtime.http.JettyHttpApiHostClient$Listener.onComplete(JettyHttpApiHostClient.java:141)
    at org.eclipse.jetty.client.ResponseNotifier.notifyComplete(ResponseNotifier.java:193)
    at org.eclipse.jetty.client.ResponseNotifier.notifyComplete(ResponseNotifier.java:185)
    at org.eclipse.jetty.client.HttpExchange.notifyFailureComplete(HttpExchange.java:269)
    at org.eclipse.jetty.client.HttpExchange.abort(HttpExchange.java:240)
    at org.eclipse.jetty.client.HttpConversation.abort(HttpConversation.java:141)
    at org.eclipse.jetty.client.HttpRequest.abort(HttpRequest.java:735)
    at org.eclipse.jetty.client.HttpDestination.abort(HttpDestination.java:267)
    at org.eclipse.jetty.client.PoolingHttpDestination.failed(PoolingHttpDestination.java:90)
    at org.eclipse.jetty.client.DuplexConnectionPool$1.failed(DuplexConnectionPool.java:159)
    at org.eclipse.jetty.util.Promise$Wrapper.failed(Promise.java:84)
    at org.eclipse.jetty.client.HttpClient$1$1.failed(HttpClient.java:587)
    at org.eclipse.jetty.client.AbstractHttpClientTransport.connectFailed(AbstractHttpClientTransport.java:152)
    at org.eclipse.jetty.client.AbstractHttpClientTransport.connect(AbstractHttpClientTransport.java:141)
    at org.eclipse.jetty.client.HttpClient$1.connect(HttpClient.java:592)
    at org.eclipse.jetty.client.HttpClient$1.succeeded(HttpClient.java:569)
    at org.eclipse.jetty.client.HttpClient$1.succeeded(HttpClient.java:562)
    at org.eclipse.jetty.util.SocketAddressResolver$Async.lambda$resolve$1(SocketAddressResolver.java:181)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
    ... 1 more

根据我到目前为止所做的调查,我认为这可能与我们使用的Objectify版本有关.我们在objectify-4.0b1.jar上. 我尝试升级以实现4.1的客观化,并且该站点的版本正在正常运行,而4.0的版本却无法正常工作.那么也许与它有关?

From the investigation I have done so far, I believe it may have something to do with the version of Objectify we were using. We were on objectify-4.0b1.jar. I tried upgrading to objectify 4.1, and that version of the site is working at the same time that the version with 4.0 is not working. So perhaps that has something to do with it?

我不愿意将使用objectify 4.1的版本切换到生产环境,因为一旦这样做,我们将无法切换回该站点的较旧的稳定版本.因此,我想确保升级能够真正解决我们的问题.

I am hesitant to switch the version that uses objectify 4.1 over to production because once you do that, we can't switch back to an older stable version of the site. So I want to make sure that the upgrade will actually solve our problems.

更新1

应用程序正常运行22个小时后,该应用程序再次因出现更多Memcache错误而开始失败.我们尝试清除Memcache,但没有做任何事情. 24小时后,该应用程序的版本仍无法正常工作.

After 22 hours of the app functioning normally, the app started failing again with more Memcache errors. We tried clearing Memcache but that didn't do anything. 24 hours later, that version of the app is still not working.

我们通过使用完全相同的代码重新部署了新版本来解决了该问题,并且它再次开始正常工作. 24小时后,该版本仍然可以正常运行.

We overcame the problem by redeploying a new version with the exact same code and it started working fine again. 24 hours later that version is still functioning fine.

更新2

我们切换到使用Objectify 4.1.3的版本,到目前为止,该版本在24小时后运行良好.

We switched to a version that uses Objectify 4.1.3 and so far after 24 hours that version is doing fine.

相关问题的更多信息

自从改用Java8 + CloudSDK以来,过去一个月我们还有其他奇怪的问题.

We had other strange issues over the past month since the switch to Java8+CloudSDK.

例如,在随机时间(主要是在太平洋标准时间星期五下午),该站点将在某些计算机上停止正常工作.然后,大约30分钟后,它将重新开始工作.然后一周后,发生了同样的问题.

For example, at random times (mostly on Friday afternoon PST) the site would stop working correctly on some computers. Then after 30 minutes or so it would work again. Then a week later the same problem happened.

在仔细阅读了Google Cloud SDK的文档后,它说他们更改了delete-jsps的Cloud SDK的默认登台选项为true,而对于App Engine SDK则为false.我们的服务器正在从几个JSP文件中读取数据...因此此更改破坏了我们的网站.

After carefully reading the documentation for Google Cloud SDK, it said they changed the default staging option of Cloud SDK for the delete-jsps was true, whereas with App Engine SDK it was false. Our server was reading data from a couple of JSP files... so this change broke our site.

问题是,它仍然可以在本地服务器上正常运行,而在实时服务器上,它也可以在99%的时间内正常运行.好像他们正在AB测试随机时间在某些百分比的实例上更改该默认参数的效果.这似乎很疯狂,因为他们本可以检查我们的数据日志并看到我们的站点正在从jsp文件加载和读取值.

The thing was, it still worked fine on local server, and on the live server it also worked fine 99% of the time. It was as if they were AB testing the effect of changing that default parameter on some percentage of instances at random times. Which seems crazy, since they could have checked our data logs and seen that our site was loading and reading values from jsp files.

一旦我们从JSP文件中删除了这些数据并将其放在单独的配置文件中,该问题就消失了.

Once we removed that data from the JSP files and put it in separate config files, that problem went away.

似乎Google再次在随机时间尝试对Memcache进行一些更改,并且在使用Objectify 4.0b2时,这完全破坏了我们的网站吗?那可行吗?

It seems that once again Google is experimenting with some changes to Memcache at random times, and that is completely breaking our site when using Objectify 4.0b2? Is that feasible?

IIRC,Objectify使用内存缓存的方式在4.0b2和4.1之间进行了一些更改.尽管这似乎不太可能,但使用模式可能会引起内存缓存中某些潜在的问题,这是可能.

IIRC there were some changes in the way Objectify uses memcache between 4.0b2 and 4.1. While it seems unlikely, it's possible that the usage pattern may tickle some sort of underlying issue in memcache.

4.0b2确实很老-已有超过4年的历史-自那时以来,已经解决了许多问题.最好将其迁移到当前版本.

4.0b2 is really old - more than 4 years - and there have been a large number of issues fixed since then. It would be wise to migrate up to the current version.