将CSV文件拆分为具有标头和给定记录数的多个文件
我有一个巨大的CSV文件,需要将其拆分为小的CSV文件,在每个文件中保留标题,并确保保留所有记录.例如,这是原始文件:
I have a huge CSV file that I need to split into small CSV files, keep headers in each file and make sure that all records are kept. For example, here is the original file:
ID Date
1 01/01/2010
1 02/01/2010
2 01/01/2010
2 05/01/2010
2 06/01/2010
3 06/01/2010
3 07/01/2010
4 08/01/2010
4 09/01/2010
如果我正确分割文件,我应该在data_1.csv中看到前5条记录,在data_2.csv中看到后4条记录.
If I split the file right, I should see the first 5 records in data_1.csv and the last 4 records in data_2.csv.
我仅有的代码按行分割,并且不保留标题.我不知道如何修改它:
The code I have only splits by rows and does not keep the header. I don't know how to modify it:
@echo off
setLocal EnableDelayedExpansion
set limit=5
set file=data.csv
set lineCounter=1
set filenameCounter=1
set name=
set extension=
for %%a in (%file%) do (
set "name=%%~na"
set "extension=%%~xa"
)
for /f "tokens=*" %%a in (%file%) do (
set splitFile=!name!-part!filenameCounter!!extension!
if !lineCounter! gtr !limit! (
set /a filenameCounter=!filenameCounter! + 1
set lineCounter=1
echo Created !splitFile!.
)
echo %%a>> !splitFile!
set /a lineCounter=!lineCounter! + 1
)
这是使用 for /F
循环读取输入文件.但是,性能不是很好,因为每个输出文件都是针对写入的每一行打开和关闭的:
Here is a method similar to yours using a for /F
loop to read the input file. The performance is not quite good however, because each output file is opened and closed for every single line written:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~1" & rem // (first command line argument is input file)
set /A "_LIMIT=5" & rem // (number of records or rows per output file)
rem // Split file name:
set "NAME=%~dpn1" & rem // (path and file name)
set "EXT=%~x1" & rem // (file name extension)
rem // Split file into multiple ones:
set "HEADER=" & set /A "INDEX=0, COUNT=0"
rem // Read file once:
for /F "usebackq delims=" %%L in ("%_FILE%") do (
rem // Read header if not done yet:
if not defined HEADER (
set "HEADER=%%L"
) else (
set "LINE=%%L"
rem // Compute line index, previous and current file count:
set /A "PREV=COUNT, COUNT=INDEX/_LIMIT+1, INDEX+=1"
rem // Write header once per output file:
setlocal EnableDelayedExpansion
if !PREV! lss !COUNT! (
> "!NAME!_!COUNT!!EXT!" echo/!HEADER!
)
rem // Write line:
>> "!NAME!_!COUNT!!EXT!" echo/!LINE!
endlocal
)
)
endlocal
exit /B
要完成任务,您甚至不需要 for /F
循环;相反,您可以将 set /P
与
To accomplish your task you do not even need a for /F
loop; rather you could use set /P
, together with input redirection, in a for /L
loop, like this (see all the explanatory comments):
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~1" & rem // (first command line argument is input file)
set /A "_LIMIT=5" & rem // (number of records or rows per output file)
rem // Split file name:
set "NAME=%~dpn1" & rem // (path and file name)
set "EXT=%~x1" & rem // (file name extension)
rem // Determine number of lines excluding header:
for /F %%I in ('^< "%_FILE%" find /V /C ""') do set /A "COUNT=%%I-1"
rem // Split file into multiple ones:
setlocal EnableDelayedExpansion
rem // Read file once:
< "!_FILE!" (
rem // Read header (first line):
set /P HEADER=""
rem // Calculate number of output files:
set /A "DIV=(COUNT-1)/_LIMIT+1"
rem // Iterate over output files:
for /L %%J in (1,1,!DIV!) do (
rem // Write an output file:
> "!NAME!_%%J!EXT!" (
rem // Write header:
echo/!HEADER!
rem // Write as many lines as specified:
for /L %%I in (1,1,%_LIMIT%) do (
set "LINE=" & set /P LINE=""
if defined LINE echo/!LINE!
)
)
)
)
endlocal
endlocal
exit /B
此方法的优点是输入文件以及每个输出文件仅打开一次.
The advantage of this method is that the input file as well as each output file is opened once only.