Hibernate 处置大数据量的方案

Hibernate 处理大数据量的方案

大家知道,Hibernate 有 一级 cache (Session 级) 和二级 cache (需另外配置,如 ehcache),
以下代码,Hibernate 在处理到大约50000条记录时,就会抛出 OutOfMemoryException, 这是因为,Hibernate 把所有新建的 MiniMessage 对象都放在了 Session 级的缓存中了。

Session session = null;
Transaction tx = null;
try {     
session = HibernateUtil.getSessionFactory().openSession();
tx = session.beginTransaction();
for(int i=0; i<300000; i++ ) {          
 System.out.println(i + ".................");         
 MiniMessage message = new MiniMessage("Hello World" + i);        
 session.save(message);

      }
   
tx.commit();

} catch (HibernateException he) {  
tx.rollback();     
throw he;
} finally {       
session.close();
}

解决办法:


使用"批处理”(Batch process)

Session session = null;

try {
session = HibernateUtil.getSessionFactory().getCurrentSession();        
Transaction tx = session.beginTransaction();

       for(int i=0; i<200000; i++ ) {
 log.debug(i + ".................");

 MiniMessage message = new MiniMessage("Hello World" + (i+1));
 session.save(message);
 if ( i % 100 == 0 ) {
  //100, same as the JDBC batch size set in xml file:
  // <property name="hibernate.jdbc.batch_size">100</property>
  //flush a batch of inserts and release memory:
  log.debug("fulsh at : " + i + ".................");

  session.flush();
  session.clear();
 }
}

        
session.getTransaction().commit();
} catch (HibernateException he) {
      
session.getTransaction().rollback();
      
throw he;
}



在这种情况下,需要在hibernate.cfg.xml 配置几个参数来达到更好的效果:

1. 配置批处理的大小

 

 <property name="hibernate.jdbc.batch_size">100</property> 

 

2. 放弃二级缓存:

 

  <!-- Disable the second-level cache because the
batch process is one-off process. -->

   <property
name="hibernate.cache.provider_class">org.hibernate.cache.NoCacheProvider</property>


   <property
name="hibernate.cache.use_second_level_cache">false</property>

   <property
name="hibernate.cache.use_query_cache">false</property>

   <property
name="hibernate.cache.use_minimal_puts">false</property>


这样,Hibernate 会在每 100 个插入后,与数据库同步一次,并将一级缓存中的实体对象清除。