将MongoDB集合移至另一个集合的更好方法

将MongoDB集合移至另一个集合的更好方法

问题描述:

在我的网络抓取项目中,我需要将前一天抓取的数据从mongo_collection移到mongo_his_collection

In my web scraping project i need to move previous day scraped data from mongo_collection to mongo_his_collection

我正在使用此查询来移动数据

I am using this query to move data

for record in collection.find():
    his_collection.insert(record)

collection.remove()

它工作正常,但有时会在MongoDB collection包含超过 1万行

It works fine but sometimes it break when MongoDB collection contain above 10k rows

建议我进行一些优化的查询,这将占用更少的资源并完成相同的任务

Suggest me some optimized query which will take less resources and do the same task

您可以使用 MapReduce 作业.

MapReduce允许您指定一个外部集合以将结果存储在其中.

MapReduce allows you to specify a out-collection to store the results in.

当您拥有一个map函数(该函数以其自己的_id作为键来发出每个文档)和一个reduce函数(其返回值数组的第一个(在这种情况下,仅因为_id是唯一的)返回)时,MapReduce本质上是一个副本从源集合到外部集合的操作.

When you hava a map function which emits each document with its own _id as key and a reduce function which returns the first (and in this case only because _id's are unique) entry of the values array, the MapReduce is essentially a copy operation from the source-collection to the out-collection.

未经测试的代码:

db.runCommand(
           {
             mapReduce: "mongo_collection",
             map: function(document) {
                  emit(document._id, document);
             },
             reduce: function(key, values) {
                  return values[0];
             },
             out: {
                  merge:"mongo_his_collection"
             }
           }
         )