如何将一个 CSV 中的一行与另一个 CSV 文件中的所有行进行比较?

问题描述:

我有两个 CSV 文件:

I have two CSV files:

  1. Identity(no,name,Age) 有 10 行
  2. Location(Address,no,City) 有 100 行
  1. Identity(no,name,Age) which has 10 rows
  2. Location(Address,no,City) which has 100 rows

我需要提取行并使用 Location CSV 文件检查 Identity 中的 no 列.

I need to extract rows and check the no column in the Identity with Location CSV files.

Identity CSV 文件中获取单行并检查 Identity.noLocation.noLocation CSV 文件.

Get the single row from Identity CSV file and check Identity.no with Location.no having 100 rows in Location CSV file.

如果匹配则在Identity, Location

注意:我需要将 Identity 的第一行与 Location CSV 文件中的 100 行进行比较,然后将第二行与 100 行进行比较.它将在 Identity CSV 文件中继续最多 10 行.

Note: I need to get 1st row from Identity compare it with 100 rows in Location CSV file and then get the 2nd row compare it with 100 rows. It will be continue up to 10 rows in Identity CSV file.

并将整体结果转换为 Json.然后将结果移入 SQL Server.

And overall results convert into Json.Then move the results in to SQL Server.

是否可以在 Apache Nifi 中使用?

感谢任何帮助.

您可以在 NiFi 中使用 DistributedMapCache 功能执行此操作,该功能实现了用于查找的键/值存储.该设置需要一个分布式地图缓存,以及两个流 - 一个用于使用您的地址记录填充缓存,另一个用于通过 no 字段查找地址.

You can do this in NiFi by using the DistributedMapCache feature, which implements a key/value store for lookups. The setup requires a distributed map cache, plus two flows - one to populate the cache with your Address records, and one to look up the address by the no field.

  1. DistributedMapCache 由两个控制器服务定义,一个 DistributedMapCacheServerDistributeMapCacheClientService.如果您的数据集很小,您可以使用localhost"作为服务器.

  1. The DistributedMapCache is defined by two controller services, a DistributedMapCacheServer and a DistributeMapCacheClientService. If your data set is small, you can just use "localhost" as the server.

填充缓存需要读取地址文件、拆分记录、提取no 键,并将键/值对放入缓存.大致流程可能包括 GetFile -> SplitText -> ExtractText -> UpdateAttribute -> PutDistributedMapCache.

Populating the cache requires reading the Address file, splitting the records, extracting the no key, and putting key/value pairs to the cache. An approximate flow might include GetFile -> SplitText -> ExtractText -> UpdateAttribute -> PutDistributedMapCache.

查找您的身份记录实际上与上面的流程非常相似,因为它需要读取身份文件、拆分记录、提取no 键,然后获取地址记录.处理器流程可能包括 GetFile -> SplitText -> ExtractText -> UpdateAttribute -> FetchDistributedMapCache.

Looking up your identity records is actually fairly similar to the flow above, in that it requires reading the Identity file, splitting the records, extracting the no key, and then fetching the address record. Processor flow might include GetFile -> SplitText -> ExtractText -> UpdateAttribute -> FetchDistributedMapCache.

您可以使用 AttributesToJSON 或 ExecuteScript 将整个或部分从 CSV 转换为 JSON.

You can convert the whole or parts from CSV to JSON with AttributesToJSON, or maybe ExecuteScript.