计算元组列表中项目的频率

问题描述：

我有一个元组列表，如下所示.我必须计算有多少个项目的数字大于1.到目前为止，我编写的代码非常慢.即使大约有1万个元组，如果您在下面看到示例字符串出现两次，因此我也必须获得这种字符串.我的问题是，通过遍历生成器来实现此处的字符串计数的最佳方法是什么

I have a list of tuples as shown below. I have to count how many items have a number greater than 1. The code that I have written so far is very slow. Even if there are around 10K tuples, if you see below example string appears two times, so i have to get such kind of strings. My question is what is the best way to achieve the count of strings here by iterating over the generator

列表:

 b_data=[('example',123),('example-one',456),('example',987),.....]

到目前为止，我的代码:

My code so far:

blockslst=[]
for line in b_data:
    blockslst.append(line[0])

blocklstgtone=[]
for item in blockslst:
    if(blockslst.count(item)>1):
        blocklstgtone.append(item)

答

您已经从每个元组中提取第一项了.您可以使用列表/生成器理解使代码更简洁，如下所示.

You've got the right idea extracting the first item from each tuple. You can make your code more concise using a list/generator comprehension, as I show you below.

从那时起，最常见的查找元素频率计数的方法是使用collections.Counter对象.

From that point on, the most idiomatic manner to find frequency counts of elements is using a collections.Counter object.

从元组列表中提取第一个元素(使用理解)
将此内容传递给Counter
查询计数为example

Extract the first elements from your list of tuples (using a comprehension)
Pass this to Counter
Query count of example

from collections import Counter

counts = Counter(x[0] for x in b_data)
print(counts['example'])

当然，如果您只是要查找频率计数的一个项目，则可以使用list.count，但是通常情况下，Counter是解决方法.

Sure, you can use list.count if it’s only one item you want to find frequency counts for, but in the general case, a Counter is the way to go.

Counter的优点是它可以在线性(O(N))时间内对所有元素(不只是example)执行频率计数.假设您还想查询另一个元素的计数，例如foo.那可以用-

The advantage of a Counter is it performs frequency counts of all elements (not just example) in linear (O(N)) time. Say you also wanted to query the count of another element, say foo. That would be done with -

print(counts['foo'])

如果列表中不存在'foo'，则返回0.

If 'foo' doesn’t exist in the list, 0 is returned.

如果要查找最常见的元素，请致电counts.most_common-

If you want to find the most common elements, call counts.most_common -

print(counts.most_common(n))

其中n是要显示的元素数.如果您想查看所有内容，请不要通过n.

Where n is the number of elements you want to display. If you want to see everything, don't pass n.

要检索最常见元素的计数，一种有效的方法是查询most_common，然后使用itertools有效地提取计数超过1的所有元素.

To retrieve counts of most common elements, one efficient way to do this is to query most_common and then extract all elements with counts over 1, efficiently with itertools.

from itertools import takewhile

l = [1, 1, 2, 2, 3, 3, 1, 1, 5, 4, 6, 7, 7, 8, 3, 3, 2, 1]
c = Counter(l)

list(takewhile(lambda x: x[-1] > 1, c.most_common()))
[(1, 5), (3, 4), (2, 3), (7, 2)]

(OP编辑)或者，使用列表理解来获取计数> 1-

(OP edit) Alternatively, use a list comprehension to get a list of items having count > 1 -

[item[0] for item in counts.most_common() if item[-1] > 1]

请记住，这不如itertools.takewhile解决方案有效.例如，如果您有一项计数大于1的项，而一百万个计数项等于1的项，则最终不必(因为most_common以降序返回频率计数).使用takewhile并非如此，因为一旦count> 1的条件变为假，您就会立即停止迭代.

Keep in mind that this isn’t as efficient as the itertools.takewhile solution. For example, if you have one item with count > 1, and a million items with count equal to 1, you’d end up iterating over the list a million and one times, when you don’t have to (because most_common returns frequency counts in descending order). With takewhile that isn’t the case, because you stop iterating as soon as the condition of count > 1 becomes false.

计算元组列表中项目的频率

相关推荐