计算元组列表中项目的频率
我有一个元组列表,如下所示.我必须计算有多少个项目的数字大于1.到目前为止,我编写的代码非常慢.即使大约有1万个元组,如果您在下面看到示例字符串出现两次,因此我也必须获得这种字符串.我的问题是,通过遍历生成器来实现此处的字符串计数的最佳方法是什么
I have a list of tuples as shown below. I have to count how many items have a number greater than 1. The code that I have written so far is very slow. Even if there are around 10K tuples, if you see below example string appears two times, so i have to get such kind of strings. My question is what is the best way to achieve the count of strings here by iterating over the generator
列表:
b_data=[('example',123),('example-one',456),('example',987),.....]
到目前为止,我的代码:
My code so far:
blockslst=[]
for line in b_data:
blockslst.append(line[0])
blocklstgtone=[]
for item in blockslst:
if(blockslst.count(item)>1):
blocklstgtone.append(item)
您已经从每个元组中提取第一项了.您可以使用列表/生成器理解使代码更简洁,如下所示.
You've got the right idea extracting the first item from each tuple. You can make your code more concise using a list/generator comprehension, as I show you below.
从那时起,最常见的查找元素频率计数的方法是使用collections.Counter
对象.
From that point on, the most idiomatic manner to find frequency counts of elements is using a collections.Counter
object.
- 从元组列表中提取第一个元素(使用理解)
- 将此内容传递给
Counter
- 查询计数为
example
- Extract the first elements from your list of tuples (using a comprehension)
- Pass this to
Counter
- Query count of
example
from collections import Counter
counts = Counter(x[0] for x in b_data)
print(counts['example'])
当然,如果您只是要查找频率计数的一个项目,则可以使用list.count
,但是通常情况下,Counter
是解决方法.
Sure, you can use list.count
if it’s only one item you want to find frequency counts for, but in the general case, a Counter
is the way to go.
Counter
的优点是它可以在线性(O(N)
)时间内对所有元素(不只是example
)执行频率计数.假设您还想查询另一个元素的计数,例如foo
.那可以用-
The advantage of a Counter
is it performs frequency counts of all elements (not just example
) in linear (O(N)
) time. Say you also wanted to query the count of another element, say foo
. That would be done with -
print(counts['foo'])
如果列表中不存在'foo'
,则返回0
.
If 'foo'
doesn’t exist in the list, 0
is returned.
如果要查找最常见的元素,请致电counts.most_common
-
If you want to find the most common elements, call counts.most_common
-
print(counts.most_common(n))
其中n
是要显示的元素数.如果您想查看所有内容,请不要通过n
.
Where n
is the number of elements you want to display. If you want to see everything, don't pass n
.
要检索最常见元素的计数,一种有效的方法是查询most_common
,然后使用itertools
有效地提取计数超过1的所有元素.
To retrieve counts of most common elements, one efficient way to do this is to query most_common
and then extract all elements with counts over 1, efficiently with itertools
.
from itertools import takewhile
l = [1, 1, 2, 2, 3, 3, 1, 1, 5, 4, 6, 7, 7, 8, 3, 3, 2, 1]
c = Counter(l)
list(takewhile(lambda x: x[-1] > 1, c.most_common()))
[(1, 5), (3, 4), (2, 3), (7, 2)]
(OP编辑)或者,使用列表理解来获取计数> 1-
(OP edit) Alternatively, use a list comprehension to get a list of items having count > 1 -
[item[0] for item in counts.most_common() if item[-1] > 1]
请记住,这不如itertools.takewhile
解决方案有效.例如,如果您有一项计数大于1的项,而一百万个计数项等于1的项,则最终不必(因为most_common
以降序返回频率计数).使用takewhile
并非如此,因为一旦count> 1的条件变为假,您就会立即停止迭代.
Keep in mind that this isn’t as efficient as the itertools.takewhile
solution. For example, if you have one item with count > 1, and a million items with count equal to 1, you’d end up iterating over the list a million and one times, when you don’t have to (because most_common
returns frequency counts in descending order). With takewhile
that isn’t the case, because you stop iterating as soon as the condition of count > 1 becomes false.