将CSV文件拆分为具有标头和给定记录数的多个文件

问题描述：

我有一个巨大的CSV文件，需要将其拆分为小的CSV文件，在每个文件中保留标题，并确保保留所有记录.例如，这是原始文件:

I have a huge CSV file that I need to split into small CSV files, keep headers in each file and make sure that all records are kept. For example, here is the original file:

 ID    Date   
 1     01/01/2010
 1     02/01/2010
 2     01/01/2010 
 2     05/01/2010
 2     06/01/2010
 3     06/01/2010
 3     07/01/2010
 4     08/01/2010
 4     09/01/2010

如果我正确分割文件，我应该在data_1.csv中看到前5条记录，在data_2.csv中看到后4条记录.

If I split the file right, I should see the first 5 records in data_1.csv and the last 4 records in data_2.csv.

我仅有的代码按行分割，并且不保留标题.我不知道如何修改它:

The code I have only splits by rows and does not keep the header. I don't know how to modify it:

 @echo off
 setLocal EnableDelayedExpansion

 set limit=5
 set file=data.csv
 set lineCounter=1
 set filenameCounter=1


 set name=
 set extension=

 for %%a in (%file%) do (
set "name=%%~na"
set "extension=%%~xa"
 )

 for /f "tokens=*" %%a in (%file%) do (
set splitFile=!name!-part!filenameCounter!!extension!
if !lineCounter! gtr !limit! (
    set /a filenameCounter=!filenameCounter! + 1
    set lineCounter=1
    echo Created !splitFile!.
)
echo %%a>> !splitFile!

set /a lineCounter=!lineCounter! + 1
)

答

这是使用 for /F循环读取输入文件.但是，性能不是很好，因为每个输出文件都是针对写入的每一行打开和关闭的:

Here is a method similar to yours using a for /F loop to read the input file. The performance is not quite good however, because each output file is opened and closed for every single line written:

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "_FILE=%~1"   & rem // (first command line argument is input file)
set /A "_LIMIT=5" & rem // (number of records or rows per output file)

rem // Split file name:
set "NAME=%~dpn1" & rem // (path and file name)
set "EXT=%~x1"    & rem // (file name extension)

rem // Split file into multiple ones:
set "HEADER=" & set /A "INDEX=0, COUNT=0"
rem // Read file once:
for /F "usebackq delims=" %%L in ("%_FILE%") do (
    rem // Read header if not done yet:
    if not defined HEADER (
        set "HEADER=%%L"
    ) else (
        set "LINE=%%L"
        rem // Compute line index, previous and current file count:
        set /A "PREV=COUNT, COUNT=INDEX/_LIMIT+1, INDEX+=1"
        rem // Write header once per output file:
        setlocal EnableDelayedExpansion
        if !PREV! lss !COUNT! (
            > "!NAME!_!COUNT!!EXT!" echo/!HEADER!
        )
        rem // Write line:
        >> "!NAME!_!COUNT!!EXT!" echo/!LINE!
        endlocal
    )
)

endlocal
exit /B

要完成任务，您甚至不需要 for /F循环;相反，您可以将 set /P 与

To accomplish your task you do not even need a for /F loop; rather you could use set /P, together with input redirection, in a for /L loop, like this (see all the explanatory comments):

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "_FILE=%~1"   & rem // (first command line argument is input file)
set /A "_LIMIT=5" & rem // (number of records or rows per output file)

rem // Split file name:
set "NAME=%~dpn1" & rem // (path and file name)
set "EXT=%~x1"    & rem // (file name extension)

rem // Determine number of lines excluding header:
for /F %%I in ('^< "%_FILE%" find /V /C ""') do set /A "COUNT=%%I-1"

rem // Split file into multiple ones:
setlocal EnableDelayedExpansion
rem // Read file once:
< "!_FILE!" (
    rem // Read header (first line):
    set /P HEADER=""
    rem // Calculate number of output files:
    set /A "DIV=(COUNT-1)/_LIMIT+1"
    rem // Iterate over output files:
    for /L %%J in (1,1,!DIV!) do (
        rem // Write an output file:
        > "!NAME!_%%J!EXT!" (
            rem // Write header:
            echo/!HEADER!
            rem // Write as many lines as specified:
            for /L %%I in (1,1,%_LIMIT%) do (
                set "LINE=" & set /P LINE=""
                if defined LINE echo/!LINE!
            )
        )
    )
)
endlocal

endlocal
exit /B

此方法的优点是输入文件以及每个输出文件仅打开一次.

The advantage of this method is that the input file as well as each output file is opened once only.

将CSV文件拆分为具有标头和给定记录数的多个文件

相关推荐