计算一个元组中列表中所有项目的出现次数

问题描述:

我有一个元组(1,5,2,3,4,5,6,7,3,2,2,4,3)和一个列表[1,2,3],现在想计算该列表中所有项目在该元组中出现的频率(因此它应该返回7).

I have a tuple (1,5,2,3,4,5,6,7,3,2,2,4,3) and a list [1,2,3] and now want to count how often all items of the list occur in the tuple (so it should return 7).

我可以循环列表,对元组中的每一项进行计数,然后对结果求和,但是我敢肯定python中存在更好的可能性.

I could loop the list, count each item in the tuple and then sum up the results, but I bet there is a better possibility in python.

那不是如何计算,因为我明确地说我不仅在要求list.count(item_of_list)(这需要循环执行),而且还要求更好的方法.

Thats not a duplicate of How to count the occurrences of a list item? because I explicitly said that I am not just asking for list.count(item_of_list) (that would need to be done in a loop) but for a better method.

被标记为NumPy,这是一个NumPy解决方案-

Being NumPy tagged, here's a NumPy solution -

In [846]: import numpy as np

In [847]: t = (1,5,2,3,4,5,6,7,3,2,2,4,3)

In [848]: a = [1,2,3]

In [849]: np.in1d(t,a).sum()
Out[849]: 7

# Alternatively with np.count_nonzero for summing booleans
In [850]: np.count_nonzero(np.in1d(t,a))
Out[850]: 7

对于输入中正数元素的特定情况,另一个带有np.bincount的NumPy,基本上使用数字作为bin,然后进行基于bin的求和,索引到具有list元素的元素以获取计数和最终求和最终输出-

Another NumPy one with np.bincount for the specific case of positive numbered elements in the inputs, basically using the numbers as bins, then doing bin based summing, indexing into those with the list elements to get the counts and a final summation for the final output -

In [856]: np.bincount(t)[a].sum()
Out[856]: 7

其他方法-

from collections import Counter
# @Brad Solomon's soln
def collections_counter(tgt, tup):
    counts = Counter(tup)
    return sum(counts[t] for t in tgt)

# @timgeb's soln
def set_sum(l, t):
    l = set(l)
    return sum(1 for x in t if x in l)

# @Amit Tripathi's soln
def dict_sum(l, t):
    dct = {}
    for i in t:
        if not dct.get(i):
            dct[i] = 0
        dct[i] += 1
    return sum(dct.get(i, 0) for i in l)

运行时测试

案例1:具有10,000元素并具有100随机元素列表的元组上的时间-

Case #1 : Timings on a tuple with 10,000 elements and with a list of 100 random elements off it -

In [905]: a = np.random.choice(1000, 100, replace=False).tolist()

In [906]: t = tuple(np.random.randint(1,1000,(10000)))

In [907]: %timeit dict_sum(a, t)
     ...: %timeit set_sum(a, t)
     ...: %timeit collections_counter(a, t)
     ...: %timeit np.in1d(t,a).sum()
     ...: %timeit np.bincount(t)[a].sum()
100 loops, best of 3: 2 ms per loop
1000 loops, best of 3: 437 µs per loop
100 loops, best of 3: 2.44 ms per loop
1000 loops, best of 3: 1.18 ms per loop
1000 loops, best of 3: 503 µs per loop

@timgeb的soln中的

set_sum对于这种输入看起来非常有效.

set_sum from @timgeb's soln looks quite efficient for such inputs.

案例2:具有100,000元素且具有10,000唯一元素且具有1000唯一随机元素列表的元组上的时间-

Case #2 : Timings on a tuple with 100,000 elements that has 10,000 unique elements and with a list of 1000 unique random elements off it -

In [916]: t = tuple(np.random.randint(0,10000,(100000)))

In [917]: a = np.random.choice(10000, 1000, replace=False).tolist()

In [918]: %timeit dict_sum(a, t)
     ...: %timeit set_sum(a, t)
     ...: %timeit collections_counter(a, t)
     ...: %timeit np.in1d(t,a).sum()
     ...: %timeit np.bincount(t)[a].sum()
10 loops, best of 3: 21.1 ms per loop
100 loops, best of 3: 5.33 ms per loop
10 loops, best of 3: 24.2 ms per loop
100 loops, best of 3: 13.4 ms per loop
100 loops, best of 3: 5.05 ms per loop