熊猫从列中选择唯一值

熊猫从列中选择唯一值

问题描述:

通过这样做,我能够在 jupyter 笔记中摄取 csv:

I was able to ingest a csv in jupyter notes by doing this :

csvData= pd.read_csv("logfile.csv")

我的数据如下所示:

event_timestamp ip  url 
2018-01-10 00:00 111.111.111.111 http://webpage1.com
2018-01-10 00:00 222.222.222.222 http://webpage2.com
...
..
.

我得到了一个唯一 ip 列表:

I got a list of unique ips:

list_ips = csvData("[ip]")

我想要做的是获得一个独特的.通常我会这样做:

What I'm trying to do is get a unique. Normally I would do:

list_ips.unique()

但在这种情况下,我收到此错误:

But in this case I get this error:

AttributeError: 'DataFrame' object has no attribute 'unique'

(我可以使用 list_ips.head(),它会列出一些 IP,但它不是唯一的列表)

(I can use list_ips.head() and it will list a few IPs but it's not a unique list)

谢谢

编辑我的问题是我实际上有:

EDIT My problem is I actually had:

list_ips = csvData([["ip"]]) 

所以我删除了一组括号,所以它变成了:

So I removed 1 set of brackets so it became:

list_ips = csvData(["ip"]) 

然后我就可以按照温的例子做:

Then I was able to follow Wen's example and do:

list_ips.unique().tolist()

输出:

['111.111.111.111','222.222.222.222'...]

您需要正确选择列然后应用unique

You need to select the column correctly then apply unique

csvData['ip'].unique().tolist()
Out[677]: ['111.111.111.111', '222.222.222.222']