DynamoDB 和 Cassandra 的数据模型怎么称呼?

问题描述:

DynamoDB *文章说 DynamoDB 是一个键值" 数据库.然而,将其称为键值"数据库完全忽略了 DynamoDB 的一个极其基本的特性,即排序键:键有两部分(分区键和排序键)和具有相同分区的项目可以通过排序键有效地检索到一起排序的键.

The DynamoDB Wikipedia article says that DynamoDB is a "key-value" database. However, calling it a "key-value" database completely misses an extremely fundamental feature of DynamoDB, that of the sort key: Keys have two parts (partition key and sort key) and items with the same partition key can be efficiently retrieved together sorted by the sort key.

Cassandra 也具有完全相同的 sort-items-inside-a-partition 功能(它称为集群键"),以及 Cassandra *文章使用术语宽列存储来描述它.然而,虽然这个术语宽列"比键值"更好,但它仍然有些不合适,因为它描述了更一般的情况,即一个项目可以有大量不相关的列——不一定是单独的排序列表项.

Cassandra also has exactly the same sorting-items-inside-a-partition feature (which it calls "clustering key"), and the Cassandra Wikipedia article uses the term wide column store to describe it. However, while this term "wide column" is better than "key-value", it is still somewhat inappropriate because it describes the more general situation where an item can have a very large number of unrelated columns - not necessarily a sorted list of separate items.

所以我的问题是是否有更合适的术语可以描述像 DynamoDB 和 Cassandra 这样的数据库的数据模型——像键值存储这样的数据库可以有效地检索单个键的项目,但也可以有效地检索按键排序的项目或它的一部分(DynamoDB 的 sort key 或 Cassandra 的 clustering key).

So my question is whether there is a more appropriate term that can describe the data model of a database like DynamoDB and Cassandra - databases which like a key-value store can efficiently retrieve items for individual keys, but can also efficiently retrieve items sorted by the key or just a part of it (DynamoDB's sort key or Cassandra's clustering key).

在引入 CQL 之前,Cassandra 更严格地遵循宽列存储数据模型,其中只有行键标识的行并包含排序的键/值列.随着 CQL 的引入,行被称为分区,列可以选择通过集群键分组为逻辑行.

Before CQL was introduced, Cassandra adhered more strictly the wide column store data model, where you only had rows identified by a row key and containing sorted key/value columns. With the introduction of CQL, rows became known as partitions and columns could optionally be grouped in to logical rows via clustering keys.

即使在 Cassandra 3.0 之前,CQL 也只是原始 thrift 数据模型之上的抽象,并且存储引擎中没有 CQL 行的概念.它们只是一组已排序的列,其中包含一个复合键,该键由集群键的串联值组成.这篇文章中提供了更多详细信息.现在存储引擎中原生支持 CQL,这使得 CQL 数据模型可以更有效地存储.

Even until Cassandra 3.0, CQL was simply an abstraction on top of the original thrift data model and there was no concept of CQL rows within the storage engine. They were just a sorted set of columns with a compound key consisting of the concatenated values of the clustering keys. More details are given in this article. Now there is native support for CQL in the storage engine, which allows CQL data models to be stored more efficiently.

但是,如果您将 CQL 行视为同一分区内的列的逻辑分组,那么 Cassandra 仍然可以被视为一个广泛的列存储.无论如何,据我所知,没有另一个成熟的术语来描述这种数据库.

However, if you think of a CQL row as a logical grouping of columns within the same partition, Cassandra still could be considered a wide column store. In any case, there isn't, to my knowledge, another well established term to describe this kind of database.