如何加载具有200万行的JTable
我正在编写一个使用JTable来显示日志文件行的应用程序。我解析了数据,但是当我尝试将行添加到AbstractTableModel时,我收到超出gc开销限制或java.lang.OutOfMemoryError:Java堆空间错误。是否配置垃圾收集器或更改我的AbstractTableModel以允许我加载所需的行?
I am writing an application that uses a JTable to display lines of a log file. I have the data parsed, but when I try to add the rows to my AbstractTableModel I receive either a "gc overhead limit exceeded" or "java.lang.OutOfMemoryError: Java heap space"error. Is there to configure the garbage collector or change my AbstractTableModel to allow me to load the needed rows?
package gui;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Date;
import java.util.List;
import javax.swing.table.AbstractTableModel;
import saxxmlparse.logEvent;
/**
*
* @author David.Crosser
*/
public class MyTableModel extends AbstractTableModel {
private String[] columnNames = new String[]{"Type", "Time", "TID", "LID", "User", "Message", "Query", "Protocol", "Port", "IP", "Error"};
private List<logEvent> data;
public MyTableModel() {
data = new ArrayList<>(25);
}
@Override
public Class<?> getColumnClass(int columnIndex) {
if (columnIndex == 1) {
//return Date.class;
return String.class;
} else {
return String.class;
}
}
@Override
public String getColumnName(int col) {
return columnNames[col];
}
@Override
public int getColumnCount() {
return columnNames.length;
}
@Override
public int getRowCount() {
return data.size();
}
@Override
public Object getValueAt(int row, int col) {
logEvent value = data.get(row);
Object retObj=null;
switch (col) {
case 0:
retObj = value.getType();
break;
case 1:
retObj = value.getTime();
break;
case 2:
retObj = value.getTid();
break;
case 3:
retObj = value.getLid();
break;
case 4:
retObj = value.getUser();
break;
case 5:
retObj = value.getMsg();
break;
case 6:
retObj = value.getQuery();
break;
case 7:
retObj = value.getProtocol();
break;
case 8:
retObj = value.getPort();
break;
case 9:
retObj = value.getIp();
break;
case 10:
retObj = "N";
break;
}
return retObj;
}
public void addRow(logEvent value) {
int rowCount = getRowCount();
data.add(value);
fireTableRowsInserted(rowCount, rowCount);
}
public void addRows(logEvent... value) {
addRows(Arrays.asList(value));
}
public void addRows(List<logEvent> rows) {
int rowCount = getRowCount();
data.addAll(rows);
fireTableRowsInserted(rowCount, getRowCount() - 1);
}
}
package gui;
import java.sql.ResultSet;
import java.util.List;
import javax.swing.SwingWorker;
import saxxmlparse.logEvent;
/**
*
* @author David.Crosser
*/
public class TableSwingWorker extends SwingWorker<MyTableModel, logEvent> {
private final MyTableModel tableModel;
String query;
dataBase.Database db;
int totalRows=0;
public TableSwingWorker(dataBase.Database db, MyTableModel tableModel, String query) {
this.tableModel = tableModel;
this.query = query;
this.db = db;
}
@Override
protected MyTableModel doInBackground() throws Exception {
// This is a deliberate pause to allow the UI time to render
Thread.sleep(2000);
ResultSet rs = db.queryTable(query);
System.out.println("Start polulating");
while (rs.next()) {
logEvent data = new logEvent();
for (int i = 0; i <= tableModel.getColumnCount(); i++) {
switch (i) {
case 0:
data.setType((String)rs.getObject(i+1));
break;
case 1:
data.setTime((String)rs.getObject(i+1));
break;
case 2:
data.setTid((String)rs.getObject(i+1));
break;
case 3:
data.setLid((String)rs.getObject(i+1));
break;
case 4:
data.setUser((String)rs.getObject(i+1));
break;
case 5:
data.setMsg((String)rs.getObject(i+1));
break;
case 6:
data.setQuery((String)rs.getObject(i+1));
break;
case 7:
data.setProtocol((String)rs.getObject(i+1));
break;
case 8:
data.setPort((String)rs.getObject(i+1));
break;
case 9:
data.setIp((String)rs.getObject(i+1));
break;
case 10:
data.setError((String)rs.getObject(i+1));
break;
}
}
publish(data);
Thread.yield();
}
return tableModel;
}
@Override
protected void process(List<logEvent> chunks) {
totalRows += chunks.size();
System.out.println("Adding " + chunks.size() + " rows --- Total rows:" + totalRows);
tableModel.addRows(chunks);
}
}
我的答案将适用于您需要处理非常大的数据集的一般类型的问题,而不仅仅是您特定的表中的200万行问题。
My answer will be applicable to the general type of problem where you need to work on a very large data set, not just your specific "2 million rows in a table" problem.
当您遇到需要操作的数据大于某个容器的问题时(在您的情况下,内存比您的系统实际拥有的内存多,但这适用于任何大于其容器的数据 - 物理,虚拟,逻辑,或者),你需要创建一种机制,只在任何给定时间流式传输你需要的数据,如果你想要一个缓冲区,可能需要更多。
When you have the problem where the data you need to operate on is larger than some container (in your case, more memory than your system physically has, but this can apply to any data larger than its container - physical, virtual, logical, or otherwise), you need to create a mechanism for streaming only the data you need at any given time, and possibly slightly more if you want a buffer.
例如,如果您希望能够在表中显示10行,并且数据集太大,那么您需要创建一个表模型,该模型知道当前正在显示的10行,并让它将数据交换出来在视图发生变化时需要它。因此,创建一个表模型,其中包含10条记录,而不是200万条记录。或者对于我提到的可选缓冲,使模型保持30条记录;视图中的10条记录,以及它之前和之后的10条记录,以便您可以在用户滚动表格时立即更改数据,以便小滚动条增量具有高响应性 - 然后即时流式传输数据的问题用户滚动得非常快(即:单击滚动条拇指并立即从上到下拖动它。)
For example, if you want to be able to show 10 rows in your table, and the data set is way too large, then you need to create a table model which knows about the 10 rows that are currently being displayed, and have it swap that data out with what it needs when the view changes. So create a table model which holds the 10 records, not 2 million. Or for the optional buffering I mentioned, make the model hold 30 records; the 10 records in the view, and the 10 before and after it so that you can immediately change the data as the user scrolls the table so that small scrollbar increments are highly responsive - then the problem of streaming data on the fly is only apparent when the user scrolls very far very fast (ie: click the scrollbar "thumb" and drag it immediately from top to bottom).
这与压缩算法压缩的方式相同/解压缩100GB数据;这不是一次全部记忆。或者磁带驱动器支持的软件如何工作(他们别无选择,因为它不是随机访问)。或者,几乎每个人都熟悉的例子:这就是在线视频流的工作原理。想想YouTube和视频底部的加载栏及其灰色缓冲区;如果你快进到该缓冲区中的某个时间,它通常会立即切换,但是如果你改变一段时间,视频可能会停止一秒,同时加载下一帧(然后它会缓冲更多)。这是一个巨大的表也是如此工作的,除了你从内存或从磁盘流到数据模型,流源和目标在同一个进程中。否则,同样的想法。
This would be the same way compression algorithms compress/decompress 100GB of data; that's not all in memory at once. Or how software which is backed by tape drives work (they have no choice since it's not random-access). Or, the example almost everyone is familiar with: this is how online video streaming works. Think of YouTube and the loading bar at the bottom of the video with its grey buffer zone; if you "fast-forward" to a time in that buffer zone it often switches immediately, but if you change to a time past that the video might stop for a second while it loads the next frame (and then it buffers more). This is how an enormous table works too, except that you are "streaming" to the data model from either memory or from disk, the stream source and destination being in the same process. Otherwise, same idea.