Cassandra在查询超过10,000行的密钥时超时,即使超时10秒

问题描述:

使用具有预先安装的默认设置的DataStax Community v 2.1.2-1(AMI v 2.5)。
我有一个表:

Im using a DataStax Community v 2.1.2-1 (AMI v 2.5) with preinstalled default settings. And i have a table :

CREATE TABLE notificationstore.note (
  user_id text,
  real_time timestamp,
  insert_time timeuuid,
  read boolean,
  PRIMARY KEY (user_id, real_time, insert_time))
WITH CLUSTERING ORDER BY (real_time DESC, insert_time ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}
AND **default_time_to_live** = 20160

其他配置是:

我有2个节点。在具有1×32(SSD)的m3.large上。
即使一致性在此特定表上设置为ONE,仍面临超时问题。

I have 2 nodes. on m3.large having 1 x 32 (SSD). Im facing the issue of timeouts even if consistency is set to ONE on this particular table.


  1. 我将堆空间增加到3gb [RAM大小为8gb]

  2. 我将读取超时设置为10秒。

    从记事中选择计数where user_id ='xxx'limit 2; // errors = {},last_host = 127.0.0.1。

  1. I increased the heap space to 3gb [ram size of 8gb]
  2. I increased the read timeout to 10 secs.
    select count (*) from note where user_id = 'xxx' limit 2; // errors={}, last_host=127.0.0.1.

有时间生活吗?或者有任何其他配置任何调整对这一点很重要。

I am wondering if the problem could be with time to live? or is there any other configuration any tuning that matters for this.

数据库中的数据非常小。

此外,此问题不会在插入时立即发生。

The data in the database is pretty small.
Also this problem occurs not as soon as you insert. This happens after some time (more than 6 hours)

感谢。

[从这里复制我的答案,因为它是相同的环境/问题: amazon ec2 - Cassandra由于TTL过期而超时。]

[Copying my answer from here because it's the same environment/problem: amazon ec2 - Cassandra Timing out because of TTL expiration.]

您遇到的问题是墓碑数(删除的值)

You're running into a problem where the number of tombstones (deleted values) is passing a threshold, and then timing out.

如果您打开跟踪,然后尝试使用select语句,您可以看到这一点,例如:

You can see this if you turn on tracing and then try your select statement, for example:

cqlsh> tracing on;
cqlsh> select count(*) from test.simple;

 activity                                                                        | timestamp    | source       | source_elapsed
---------------------------------------------------------------------------------+--------------+--------------+----------------
...snip...
 Scanned over 100000 tombstones; query aborted (see tombstone_failure_threshold) | 23:36:59,324 |  172.31.0.85 |         123932
                                                    Scanned 1 rows and matched 1 | 23:36:59,325 |  172.31.0.85 |         124575
                           Timed out; received 0 of 1 responses for range 2 of 4 | 23:37:09,200 | 172.31.13.33 |       10002216

对于Cassandra,你可能会遇到一个反模式,短时间被删除之前。有一些选项可以更好地处理此问题,包括如果需要重新访问您的数据模型。以下是一些资源:

You're kind of running into an anti-pattern for Cassandra where data is stored for just a short time before being deleted. There are a few options for handling this better, including revisiting your data model if needed. Here are some resources:

  • The cassandra.yaml configuration file - See section on tombstone settings
  • Cassandra anti-patterns: Queues and queue-like datasets
  • About deletes

对于您的示例问题,我尝试将 gc_grace_seconds 设置降低到300 。这使得墓碑被清理比默认的10天更频繁,但是可能或不适合根据您的应用程序。阅读删除的影响,你可以根据需要为你的应用程序进行调整。

For your sample problem, I tried lowering the gc_grace_seconds setting to 300 (5 minutes). That causes the tombstones to be cleaned up more frequently than the default 10 days, but that may or not be appropriate based on your application. Read up on the implications of deletes and you can adjust as needed for your application.