Java HashMap 内部兑现机制学习

Java HashMap 内部实现机制学习

HashMap 主要属性
1. transient Entry[] table; Entry数组,Entry这个对象自身是个链表数据结构. 当hash发生冲撞时,则根据key放入的先后顺序放到链表中去.比如在java中, Aa和BB这两个字符串的hash code 是一样的.这样当先后调用map.put("Aa")和map.put("BB")后,map的size会变成2,但是这两个元素会放在同一个Entry[i]中,
2. transient int size ; map集合的元素数目
3. int threshold ; 当你不断地从map中put 元素后, map 的size 会不断增大;这样就会存在一个问题,假如这样无限制put下去,最终会超过map集合的最大容量导致程序出错.所以,在HashMap的实现中,当map的size 大到一定值后,会自动将map的容量扩大.而这里的"当map的size 大到一定值后","一定值"即为 threshold,map容量扩大的阈值,该值默认是 capacity(容量) * load factor(负载因子)
4. static final int DEFAULT_INITIAL_CAPACITY = 16; //MUST be a power of two. capacity 该值必须是2的指数,主要是用于static int indexFor(int h, int length),具体分析到该方法时再详细阐述

HashMap 主要方法

get 方法

public V get(Object key) { 
        if (key == null) 
            return getForNullKey(); 
        int hash = hash(key.hashCode());//1.对key的hash code计算hash值 
        for (Entry<K,V> e = table[indexFor(hash, table.length)]; //2.调用indexFor 方法获取对应数组元素下标 
             e != null; 
             e = e.next) { //3.获取数组里面的指定元素后,然后对该链表进行遍历 
            Object k; 
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) //4.找到"相同"的key后,则返回 
                return e.value; 
        } 
        return null; 
    }

1. 存在2个疑问,请大家赐教.
2. 在if (e.hash == hash && ((k = e.key) == key || key.equals(k))) 为什么要重复判断 e.hash == hash ,因为put方法已经保证了目标查找元素与Entry[i]中的hash都是相等的; 写完这个问题,发现原因应该是:如果仅留一个key.equals(k), 可以覆写 key的 equals 方法,返回意料之外的值
3. 为什么需要判断(k = e.key) == key?同样,写完这个问题后,发现原因应该是:存在 Map m = new HashMap();String x = "Aa";m.put(x, 1);
4. 这样的代码,这样就可以避免调用对象的equals方法,直接返回指定value即可.
5. hash方法
  static int hash(int h) {
```
        // This function ensures that hashCodes that differ only by 
        // constant multiples at each bit position have a bounded 
        // number of collisions (approximately 8 at default load factor). 
        h ^= (h >>> 20) ^ (h >>> 12); 
        return h ^ (h >>> 7) ^ (h >>> 4); 
    } 
```
6. 目前对算法还是没有形成比较深刻的理解,等更加深入的理解再写吧
7. static int indexFor(int h, int length) {
```
        return h & (length-1); 
    }//通过位与,16-1的二进制为1111 1111 然后 去 h的后8个bit 作为 数组的下标,这样可以迅速找到该hash值对应的数组索引
```
8. put方法
10. ```
public V put(K key, V value) { 
       if (key == null) 
            return putForNullKey(value); 
        int hash = hash(key.hashCode()); 
        int i = indexFor(hash, table.length); 
        for (Entry<K,V> e = table[i]; e != null; e = e.next) { 
            Object k; 
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { 
                V oldValue = e.value; 
                e.value = value; 
                e.recordAccess(this); 
                return oldValue; //返回被替换的值 
            } 
        } 


        modCount++; 
        addEntry(hash, key, value, i);//当前面一直没有匹配的key后,则把该k/v键值对放到map中
        return null; 
    }
```
12. ```
void addEntry(int hash, K key, V value, int bucketIndex) { 
Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<K,V>(hash, key, value, e); //此时如果发生hash冲撞时,链表中的顺序和放入元素的顺序相反 
        if (size++ >= threshold)  //当大于阈值时,即 容量*因子 
            resize(2 * table.length); 
    }
```
15. ```
void resize(int newCapacity) { 
        Entry[] oldTable = table; 
        int oldCapacity = oldTable.length; 
        if (oldCapacity == MAXIMUM_CAPACITY) { 
            threshold = Integer.MAX_VALUE; 
            return; 
        } 


        Entry[] newTable = new Entry[newCapacity]; 
        transfer(newTable); //需要将当前map中的元素复制到新的数组中 
        table = newTable;  
        threshold = (int)(newCapacity * loadFactor); 
    }
```
18. ```
 

void transfer(Entry[] newTable) { 
        Entry[] src = table; 
        int newCapacity = newTable.length; 
        for (int j = 0; j < src.length; j++) {//遍历每个数组 
            Entry<K,V> e = src[j]; //找到第i个数组元素 
            if (e != null) { 
                src[j] = null; //及时将不需要的元素置为空,对GC友好. 
                do { 
                    Entry<K,V> next = e.next; //假设该链表中的元素是 "Aa"->"BB" ,那么next值为"BB" 
                    int i = indexFor(e.hash, newCapacity); //获取新数组元素位置,因为newCapacity变了,所以该元素在newTable中的顺序一般也会发生改变 
                    e.next = newTable[i]; //将e.next置为null 
                    newTable[i] = e; //然后将 newTable[i] 为 "Aa","Aa".next的值是null 
                    e = next; //此时e为"BB" TIPS:在实际分析时,可以从源码复制HashMap和AbstractMap 进行debug,便于看到HashMap内部的数据变化 
                } while (e != null); 
            } 
        } 
    }
```
19. HashMap 冲撞
21. static int hash(int h) {
          // This function ensures that hashCodes that differ only by
          // constant multiples at each bit position have a bounded
          // number of collisions (approximately 8 at default load factor).
          h ^= (h >>> 20) ^ (h >>> 12);
          return h ^ (h >>> 7) ^ (h >>> 4);
  }
  
  1）在Java里， Aa和BB这两个字符串的hash code(或hash key) 是一样的，也就是Collision 。
  2）于是，我们就可以通过这两个种子生成更多的拥有同一个hash key的字符串。如：”AaAa”, “AaBB”, “BBAa”, “BBBB”。这是第一次迭代。其实就是一个排列组合，写个程序就搞定了。
  3）然后，我们可以用这4个长度的字符串，构造8个长度的字符串，如下所示：
  "AaAaAaAa", "AaAaBBBB", "AaAaAaBB", "AaAaBBAa", "BBBBAaAa", "BBBBBBBB", "BBBBAaBB", "BBBBBBAa", "AaBBAaAa", "AaBBBBBB", "AaBBAaBB", "AaBBBBAa", "BBAaAaAa", "BBAaBBBB", "BBAaAaBB", "BBAaBBAa",
  4）同理，我们就可以生成16个长度的，以及256个长度的字符串，总之，很容易生成N多的这样的值。
  在攻击时，我只需要把这些数据做成一个HTTP POST 表单，然后写一个无限循环的程序，不停地提交这个表单。这样在大量请求下容易导致系统响应缓慢
22. CPU100%
  之所以出现了CPU100%,是因为在并发情况下出现了死循环.下文产生死循环的原因
  假设map中的容量为2,并且存在一个"Aa"元素,然后2个线程 T1,T2,同时调用了 map.put("BB"),然后会接着transfer方法 ,同时进入了下面的for循环.
  这里有个小细节,map.put方法在调用resize方法后,会将链表中的元素逆向后放入.如原来是 Aa->BB,resize后变成BB->Aa假设当前map元素是 BB->Aa
23. T1因为某种原因在for循环口暂停,T2顺利执行完毕,此时在执行Entry<K,V> e = src[j];暂停(T1线程内的元素顺序是BB ->Aa ,e 为 BB,next 为 AA);
  T2顺利执行完毕,此时这个被多线程共享的map元素顺序依此是 Aa->BB
  T1恢复执行,在执行e.next = newTable[i];这句话时,BB的next为新table种的Aa元素,而此时新table的next元素是BB,这样
  "环"就形成了. BB->Aa-BB .
  (在分析的时候,要注意他们之间都是引用传递.这样要好理解些.)
    void transfer(Entry[] newTable) {
          Entry[] src = table;
          int newCapacity = newTable.length; // newTable 在2个线程内互相独立
          for (int j = 0; j < src.length; j++) {
              Entry<K,V> e = src[j];
              if (e != null) {
                  src[j] = null;
                  do {
                      Entry<K,V> next = e.next;
                      int i = indexFor(e.hash, newCapacity);
                      e.next = newTable[i];
                      newTable[i] = e;
                      e = next;
                  } while (e != null);
              }
          }
  }

1. 参考:
2. http://coolshell.cn/articles/6424.html Hash Collision DoS 问题
3. http://www.iteye.com/topic/962172 HashMap 死循环的探究
4. http://www.iteye.com/topic/754887?page=2#1658753 HashMap深度分析

1 楼 vavi 2012-11-06

要上班了,等回来有空再排下版.. 这个编辑器真难用.. Java HashMap 内部兑现机制学习

Java HashMap 内部兑现机制学习

相关推荐