在Perl中,如何正确地分析用引号引起来的制表符/空格分隔的文件?

在Perl中,如何正确地分析用引号引起来的制表符/空格分隔的文件?

问题描述:

我需要解析在Perl中具有很多列的制表符/空格分隔文件.这些值应使大字符串括在双引号中.这些字符串可以包含任何字符,例如制表符和空格或其他任何字符.

I need to parse tab/space delimited files that have a lot of columns in Perl. The values are such that the there are large strings enclosed within double quotes. These strings can have any characters such as tabs and spaces or anything else.

当我尝试使用split函数解析它们时,它也会拆分这些字符串.现在,如何使perl理解"中的字符串是单个列条目?

When I try to parse them with the split function it splits these strings as well. Now how can I make perl understand that the strings within the " " are a single column entry?

一个简单的例子是

12  345546.67677   "Hello World!!!" -567.55656 0.5465767 "Hello_Again;   "

使用 Text::CSV 库,它可以为您处理所有边缘情况.它使您可以设置定界符:

Use the Text::CSV library, which handles all the edge cases for you. It lets you set the delimiter:

my $csv = Text::CSV->new({sep_char => "\t"});