使用java将大量数据从数据库导出到.csv时出现问题

问题描述:

我,感谢您的关注。

我想导出大量数据,真的很多数据(600万行)到.csv文件使用java。该应用程序是一个摇摆应用程序,与JPA,使用toplink(ojdbc14)。

I want to export a lot of data, really a lot of data (6 million of rows) to a .csv file using java. The app is a swing application, with JPA, using toplink (ojdbc14).

我试图使用:

BufferedWriter
RandomAccessFile
FileChannel

BufferedWriter RandomAccessFile FileChannel

等等,但是内存消耗仍然很高,导致Java Heap Out of Memory Exception我将最大堆大小设置为800m(-Xmx800m)。

etc etc, but the consumption of memory remains very high, causing a Java Heap Out of Memory Exception, although I set the maximun heap size in 800m (-Xmx800m).

我的最后一个版本的souce代码:

My last version of the souce code:

...(more lines of code)

FileChannel channel = getRandomAccessFile(tempFile).getChannel();
Object[][] data = pag.getRawData(); //Database data in a multidimentional array

            for (int j = 0; j < data.length; j++) {
                write(data[j], channel); //write data[j] (an array) into the channel
                freeStringLine(data[j]); //data[j] is an array, this method sets all positions =null
                data[j] = null;//sets reference in null
            }

            channel.force(false); //force writing in file system (HD)
            channel.close(); //Close the channel
            pag = null; 

...(more lines of code)

 private void write(Object[] row, FileChannel channel) throws DatabaseException {
    if (byteBuff == null) {
        byteBuff = ByteBuffer.allocateDirect(1024 * 1024);
    }
    for (int j = 0; j < row.length; j++) {
        if (j < row.length - 1) {
            if (row[j] != null) {
                byteBuff.put(row[j].toString().getBytes());
            }
            byteBuff.put(SPLITER_BYTES);
        } else {
            if (row[j] != null) {
                byteBuff.put(row[j].toString().getBytes());
            }
        }
    }
    byteBuff.put("\n".toString().getBytes());        
    byteBuff.flip();
    try {
        channel.write(byteBuff);
    } catch (IOException ex) {
        throw new DatabaseException("Imposible escribir en archivo temporal de exportación : " + ex.getMessage(), ex.getCause());
    }
    byteBuff.clear();
}

有6百万行,我不想存储数据创建文件时的内存。我做了很多临时文件(每个5000行),并在最后的过程,附加所有这些临时文件在一个单一的,使用两个FileChannel。

Being 6 millions of rows, I don't want to store that data in memory while the file is created. I made many temp files (wtih 5000 rows each one), and at the final of the process, append all those temp files in a single one, using two FileChannel. However, the exception for lack of memory is launched before the joining.

你现在是另一个导出大量数据的策略吗?

Do you now another strategy for export a lot of data?

非常感谢任何ansmwer。对不起我的英语,我正在改进xD

Thanks a lot for any ansmwer. Sorry for my English, I'm improving xD

答案是使用流 ,在滚动数据集时写入一行。

The answer is to use a "stream" approach - ie read one row, write one row as you scroll through the dataset. You'll need to get the query result as a cursor and iterate through it, not get the whole result set.

在JPA中,使用类似这样的代码:

In JPA, use code something like this:

ScrollableResults cursor = session.createQuery("from SomeEntity x").scroll();

while (cursor.next()) {
    writeToFile(cursor);
}

这意味着每次只有一行内存,

This means you only have one row in memory at a time, which is totally scalable to any number of rows and uses minimal memory (it's faster anyway).

在结果集中同时获取所有行是一种方便的方法,适用于小的结果集(这是大部分时间),但像往常一样,方便来付出代价,它不工作在所有情况下。

Getting all rows at once in a result set is a convenience approach which works for small result set (which is most of the time), but as usual, convenience comes at a cost and it doesn't work in all situations.