App Engine数据存储区非规范化:主实体或非规范化实体中的索引属性?

问题描述:

考虑博客数据建模的经典示例,其中我们有一个具有许多属性的Blog实体,并且我们希望在页面中列出最新的博客.

Consider the classic example of blog data modelling, where we have a Blog entity with many properties, and we want to list the latest blogs in a page.

BlogPost实体反规范化为BlogPostSummary实体(将在列表视图中显示)是有意义的,从而避免了获取和反序列化许多不需要的属性.

It makes sense to denormalize the BlogPost entity into a BlogPostSummary entity which will be shown in the list view, avoiding fetching and deserializing many unwanted properties.

class BlogPost(db.Model):
  title = db.StringProperty()
  content = db.TextProperty()
  created = db.DateProperty()
  ...

class BlogPostSummary(db.Model):
  title = db.StringProperty()
  content_excerpt = db.TextProperty()

问题是:哪个实体应拥有索引属性?有3个选项:

The question is: which entity should hold the indexed properties? There are 3 options:

  • 优点:
    • 轻松查询两个实体.
    • Pros:
      • Easy query on both entities.
      • 维护非规范化索引非常昂贵.
      • 优点:
        • 主要实体中的索引属性更加安全,因为非规范化实体被视为冗余.
        • Pros:
          • Indexing properties in the main entity is more safe, as the denormalized entity is treated as redundancy.
          • 查询列表视图将需要两次往返数据存储:对BlogPost实体进行一次单键查询,然后对BlogPostSummary进行批量获取.
          • Querying the list view will need a double roundtrip to datastore: One to key-only query for BlogPost entities, followed by a batch get for BlogPostSummary.
          • 优点:
            • 可以通过单个查询轻松构建列表视图.
            • Pros:
              • The list view can be easily built by a single query.
              • 这些属性无法再查询主要实体.
              • 当非规范化实体是主要实体的子代时,索引会占用更多空间.

              哪个选项会更好?还有其他选择吗?

              Which option would work better? Are there other options?

              在选项2中进行数据存储的双向往返是否会出现问题?

              Would the double round trip to datastore in option 2 be a problem?

这是一个抽象问题,没有正确"的答案.数据模型的选择取决于项目的特定要求,包括:

This is an abstract question that does not have a "correct" answer. The choice of a data model depends on specific requirements of a project, including:

  • 使用模式(您需要多久访问一次不同的数据)
  • 更新模式(例如,将经常更新的属性与稳定的属性分开,以减少写入成本)
  • 平均性能和极端性能要求(例如,一个普通博客可能有10个帖子,一个非常受欢迎的博客可能有10,000个帖子)
  • 使用Memcache减少数据存储行程并提高性能的能力
  • 数据复杂性(即,多少个不同的实体子实体取决于这种特定的实体类型)
  • 交易要求
  • 安全和访问角色注意事项(例如,不要错误地公开私有数据)

通过另一种方式,可以使用子实体在数据存储区中对数据进行建模.例如,博客文章可以是博客实体的子实体.这样,您可以通过提供父键来通过一次查询来检索所有博客帖子-无需在博客实体中存储帖子ID或关键字,也无需在帖子实体中存储博客ID/关键字.

By the way, there is another way to model data in the Datastore - using child entities. For example, blog posts may be child entities of a blog entity. This way you can retrieve all blog posts with a single query by providing a parent key - without storing post IDs or keys in the blog entity or blog ID/key in the post entities.