如何在powershell中合并两个具有相同标题的csv文件并丢弃重复的行
我正在从 NetApp Performance Manager 软件 (OPM) 收集性能计数器.OPM 在 MySQL 数据库中保存了 30 天的数据.所以我必须输入两个查询来检索数据:
I am collecting performance counters from NetApp Performance Manager software (OPM). OPM saves 30 days worth of data in MySQL database. So i have to put two queries to retrieve the data:
- 每月 30 日第一次查询并保存在 csv 文件中.
- 每月 1 号进行第二次查询并保存在 csv 文件中.
然后合并两个csv文件,如果一个月有31天就得到数据.
Then merge the two csv files to get data if there are 31 days in a month.
两个文件如下所示:
"Time","objid","cpuBusy","cifsOps","avgLatency"
"2016:06:04 00:04","72","50.6196","2069.11","7622.1"
"2016:06:04 00:09","72","30.2233","2075.94","7633.27"
"2016:06:04 00:14","72","35.2559","1980.64","8352.17"
当我使用以下代码合并两个 csv 文件时.我从相同的数据/时间得到重复的行.
When i merge the two csv files with below code. I get duplicate rows with data from same data/time.
@(Import-Csv au2004npsa003-mm-business.csv) + @(Import-Csv au2004npsa003-nn-business.csv) | export-csv joined.csv -NoTypeInformation
如何在不获取重复数据的情况下合并两个 csv 文件?我试过 select -unique
但是,它只给出一行.
How can i merge the two csv files without getting duplicate data?
I have tried select -unique
however, it gives just one row.
至于 为什么 Select-Object -Unique
不起作用:
As for why Select-Object -Unique
didn't work:
Select-Object -Unique
,当给定引用类型(字符串除外)的实例时,比较它们的.ToString()
值 以确定唯一性.
Select-Object -Unique
, when given instances of reference types (other than strings), compares their.ToString()
values in order to determine uniqueness.
[pscustomobject]
实例,例如 Import-Csv
创建的实例,遗憾的是 返回 空字符串来自他们的.ToString()
方法.
[pscustomobject]
instances, such as the ones Import-Csv
creates, regrettably return the empty string from their .ToString()
method.
因此,所有输入对象都比较相同,并且只会返回第一个输入对象.
Thus, all input objects compare the same, and only the first input object is ever returned.
S9uare 有用的Select-Object -Property * -Unique
方法通过强制所有属性单独比较,但附带一个性能警告:
输入对象被有效地重新创建,在这种情况下比较所有属性值是多余的,因为比较Time
值就足够了;对于大型输入文件,处理可能需要很长时间.
S9uare's helpful Select-Object -Property * -Unique
approach overcomes this problem by forcing all properties to be compared invidually, but comes with a performance caveat:
The input objects are effectively recreated, and comparing all property values is overkill in this case, because comparing Time
values would suffice; with large input files, processing can take a long time.
由于手头的数据来自 CSV 文件,性能问题可以通过字符串处理来解决,使用 Get-Content
而不是导入-CSV
:
Since the data at hand comes from CSV files, the performance problem can be helped with string processing, using Get-Content
rather than Import-Csv
:
Get-Content au2004npsa003-mm-business.csv, au2004npsa003-nn-business.csv |
Select-Object -Unique |
Set-Content -Encoding ASCII joined.csv
请注意,我使用 -Encoding ASCII
来模仿 Export-Csv
的默认行为;根据需要更改.
Note that I'm using -Encoding ASCII
to mimic Export-Csv
's default behavior; change as needed.
对于字符串形式的输入对象,Select-Object -Unique
按预期工作 - 并且速度更快.
但是请注意,对于大型输入文件,您可能会耗尽内存,因为 Select-Object
需要构建一个包含所有行的内存数据结构以确定唯一性.
With input objects that are strings, Select-Object -Unique
works as expected - and is faster.
Note, however, that with large input files that you may run out of memory, given that Select-Object
needs to build up an in-memory data structure containing all rows in order to determine uniqueness.