将数据从产品数据存储区传输到Google App Engine(Python)中的本地开发环境数据存储区

将数据从产品数据存储区传输到Google App Engine(Python)中的本地开发环境数据存储区

问题描述:

TL; DR我需要找到一个真正的解决方案,以便从产品数据存储区下载数据并将其加载到本地开发环境中.

TL;DR I need to find a real solution to download my data from product datastore and load it to the local development environment.

详细问题:

我需要使用产品服务器数据存储区中的真实数据(而非实时数据)在本地开发服务器中测试我的应用程序.文档和其他资源提供了三种选择:

I need to test my app in local development server with the real data (not real-time data) on datastore of the product server. The documentation and other resources offer three option:

  1. 使用 appfg.py 从产品服务器下载数据,然后将其加载到本地开发环境中.当我使用此方法时,由于Oauth问题,我收到错误请求"错误.此外,不建议使用此方法.官方文档建议使用第二种方法:
  2. 使用通过托管导出和导入的gcloud .此方法的史诗文档说明了我们如何在控制台上备份所有数据(在 https://console.cloud.google .com/).我已经尝试过这种方法.备份数据正在云中的存储上生成.我下载了.它是LevelDB格式.我需要将其加载到本地开发服务器中.没有官方解释.第一种方法的加载方法与LevelDB格式不兼容.我找不到解决此问题的正式方法.有一个 StackOverflow条目,但它对我不起作用,因为它只是将所有实体作为字典.到"ndb"实体的"dic"对象的对话成为棘手的问题.
  3. 我对前两种方法失去了希望,然后决定使用 Cloud Datastore Emulator(beta),它提供了在本地开发环境中模拟真实数据的功能.它仍然是beta版本,并存在一些问题.无论如何,我运行命令时都遇到了DATASTORE_EMULATOR_HOST问题.
  1. Using appfg.py downloading data from the product server then loading it into the local development environment. When I use this method I am getting 'bad request' error due to Oauth problem. Besides, this method will be deprecated. The official documentation advises using the second method:
  2. Using the gcloud via managed export and the import. The epic documentation of this method explains how we backup all data on console (in https://console.cloud.google.com/). I have tried this method. The backup data is being generated on storage in the cloud. I downloaded it. It is in the LevelDB format. I need to load it into local development server. There is no official explanation for it. The loading method of the first method is not compatible with LevelDB format. I couldn't find an official way to solve the problem. There is a StackOverflow entry but it is not worked for me because of it just gets all entities as the dict. The conversation the 'dic' object to the 'ndb' Entities becomes the tricky problem.
  3. I have lost my hope with the first two methods then I have decided the use Cloud Datastore Emulator (beta) which provides the emulating real data on local development environment. It is still beta and has several problems. When I run the command I encountered the problem DATASTORE_EMULATOR_HOST anyway.

听起来您应该使用远程沙箱

即使您可以使用它,本地主机数据存储的行为仍然与实际数据存储不同.

It sounds like you should be using a remote sandbox

Even if you get this to work, the localhost datastore still behaves differently than the actual datastore.

如果您想真正模拟您的生产环境,那么我建议您将您的App Engine项目的克隆设置为远程沙箱.您可以将应用程序部署到新的gae项目ID appcfg.py update . -A sandbox-id,然后使用数据存储区管理员在Google云端存储中创建生产备份,然后在沙箱中使用数据存储区管理员在沙盒中还原此备份.

If you want to truly simulate your production environment, then i would recommend setting up a clone of your app engine project as a remote sandbox. You could deploy your app to a new gae project id appcfg.py update . -A sandbox-id, and use datastore admin to create a backup of production in google cloud storage and then use datastore admin in your sandbox to restore this backup in your sandbox.

我确实使用一些生产数据来填充我的localhost数据存储,但这不是一个完整的克隆.只是核心所需的对象和一些测试用户.

I do prime my localhost datastore with some production data, but this is not a complete clone. Just the core required objects and a few test users.

为此,我编写了一个google dataflow作业,该作业导出选定模型并将其以jsonl格式保存在Google云存储中.然后在我的本地主机上,我有一个名为/init/的终结点,它启动任务队列作业以下载并导出这些导出.

To do this I wrote a google dataflow job that exports select models and saves them in google cloud storage in jsonl format. Then on my local host I have an endpoint called /init/ which launches a taskqueue job to download these exports and import them.

为此,我重用了我的JSON REST处理程序代码,该代码能够将任何模型转换为json,反之亦然.

To do this i reuse my JSON REST handler code which is able to convert any model to json and vice versa.

理论上,您可以对整个数据存储区执行此操作.

In theory you could do this for your entire datastore.

编辑-这是我的to-json/from-json代码如下:

EDIT - This is what my to-json/from-json code looks like:

我的所有ndb.Model子类都是我的BaseModel子类,具有通用的转换代码:

All of my ndb.Models subclass my BaseModel which has generic conversion code:

get_dto_typemap = {
    ndb.DateTimeProperty: dt_to_timestamp,
    ndb.KeyProperty: key_to_dto,
    ndb.StringProperty: str_to_dto,
    ndb.EnumProperty: str,
}
set_from_dto_typemap = {
    ndb.DateTimeProperty: timestamp_to_dt,
    ndb.KeyProperty: dto_to_key,
    ndb.FloatProperty: float_from_dto,
    ndb.StringProperty: strip,
    ndb.BlobProperty: str,
    ndb.IntegerProperty: int,
}

class BaseModel(ndb.Model):

    def to_dto(self):
        dto = {'key': key_to_dto(self.key)}
        for name, obj in self._properties.iteritems():
            key = obj._name
            value = getattr(self, obj._name)
            if obj.__class__ in get_dto_typemap:
                if obj._repeated:
                    value = [get_dto_typemap[obj.__class__](v) for v in value]
                else:
                    value = get_dto_typemap[obj.__class__](value)
            dto[key] = value
        return dto

    def set_from_dto(self, dto):
        for name, obj in self._properties.iteritems():
            if isinstance(obj, ndb.ComputedProperty):
                continue
            key = obj._name
            if key in dto:
                value = dto[key]
                if not obj._repeated and obj.__class__ in set_from_dto_typemap:
                    try:
                        value = set_from_dto_typemap[obj.__class__](value)
                    except Exception as e:
                        raise Exception("Error setting "+self.__class__.__name__+"."+str(key)+" to '"+str(value) + "': " + e.message)
                try:
                    setattr(self, obj._name, value)
                except Exception as e:
                    print dir(obj)
                    raise Exception("Error setting "+self.__class__.__name__+"."+str(key)+" to '"+str(value)+"': "+e.message)

class User(BaseModel):
    # user fields, etc

然后我的请求处理程序使用set_from_dto&像这样的to_dto(BaseHandler还提供了一些方便的方法,可将json负载转换为python dict,而不是):

My request handlers then use set_from_dto & to_dto like this (BaseHandler also provides some convenience methods for converting json payloads to python dicts and what not):

class RestHandler(BaseHandler):
    MODEL = None

    def put(self, resource_id=None):
        if resource_id:
            obj = ndb.Key(self.MODEL, urlsafe=resource_id).get()
            if obj:
                obj.set_from_dto(self.json_body)
                obj.put()
                return obj.to_dto()
            else:
                self.abort(422, "Unknown id")
        else:
            self.abort(405)

    def post(self, resource_id=None):
        if resource_id:
            self.abort(405)
        else:
            obj = self.MODEL()
            obj.set_from_dto(self.json_body)
            obj.put()
            return obj.to_dto()

    def get(self, resource_id=None):
        if resource_id:
            obj = ndb.Key(self.MODEL, urlsafe=resource_id).get()
            if obj:
                return obj.to_dto()
            else:
                self.abort(422, "Unknown id")
        else:
            cursor_key = self.request.GET.pop('$cursor', None)
            limit = max(min(200, self.request.GET.pop('$limit', 200)), 10)
            qs = self.MODEL.query()
            # ... other code that handles query params
            results, next_cursor, more = qs.fetch_page(limit, start_cursor=cursor)
            return {
                '$cursor': next_cursor.urlsafe() if more else None,
                'results': [result.to_dto() for result in results],
            }

class UserHandler(RestHandler):
    MODEL = User