将已经排序的小清单合并为一个大清单的最佳方法?

问题描述:

我有一个排序对象的列表(数组,而不是链接列表).它不是很长的列表:3到20个元素之间的任意位置,尽管在大多数情况下,它可能在较短的一端附近.此列表以及所有其他列表将来自HTTP请求.将有大约30-50个请求,每个请求都产生相同数量的元素的数组.我的代码现在的工作方式是请求是同步的.我意识到这不是很有效,可能很快会更改为某些多线程,但是目前我仍处于起步阶段.将所有这些数组连接到一个大的,排序的数组中的最佳方法是什么?是将每个数组从请求中返回并附加到结果数组中吗?或者在所有请求完成后进行排序?相对而言,既然没有那么多的元素,甚至有关系吗?多线程会在解决方案上有所作为吗?

I have a list (array, not linked list) of sorted objects. It is not a long list: anywhere from 3 to 20 elements, though most of the time it would probably be around the shorter end. This list, and all others, will come from HTTP requests. There will be about 30-50 requests, each producing an array of the same number of elements. The way my code works now is that the requests are synchronous. I realize this is not efficient, and will probably be changed to some multi-threading soon, but for now I'm still in the initial stages. What would be the best way to join all these arrays into one big, sorted array? Would it be as each array is returned from the request and appended to the resulting array? Or maybe sorted once all requests are done? Since there's not that many elements, relatively speaking, does it even matter? Would multi-threading make a difference on the solution?

我不确定这些数组所包含的值是否相似是否有什么区别.例如: [100,200,300],[99、105、290],[115,215、280]

I am not sure if it makes any difference that the arrays will be similar in the values that they hold. For example: [100,200,300], [99, 105, 290], [115,215, 280]

将多个排序列表合并为一个列表的最快方法是执行

The fastest way to merge multiple sorted lists into a single list is to do a k-way merge.

从一个空的优先级队列开始,如果要按升序排序,通常是一个最小堆,然后将每个列表中的第一项推入堆中.您存储在堆中的结构必须具有值(即数字)以及对它来自的列表的引用.然后:

Start with an empty priority queue, usually a min-heap if you're sorting in ascending order, and push the first item from each of the lists onto the heap. The structure you store in the heap must have the value (i.e. the number) and also a reference to the list it came from. Then:

  1. 从堆中弹出第一项并将其值添加到输出中.
  2. 从包含刚从堆中弹出的项目的列表中取出下一个项目,并将其添加到堆中.
  3. 继续直到堆为空.

简而言之,最小堆始终在每个列表中只有一个项,而这些项中最低的总是堆中的第一项.由于各个列表是从头开始的,因此可以保证堆顶部的项目始终是所有列表中剩余的最小项目,因此它是下一个要输出的项目.

In short, the min-heap always has one item from each of the lists, and the lowest of those items is always the first item on the heap. Since the individual lists are in order to begin with, this guarantees that the item at the top of the heap is always the smallest remaining item in all lists, so it's the one that gets output next.