Hadoop源码分析之读文件时NameNode和DataNode的处理过程

转自: http://blog.csdn.net/workformywork/article/details/21783861

从NameNode节点获取数据块所在节点等信息

客户端在和数据节点建立流式接口的TCP连接,读取文件数据前需要定位数据的位置,所以首先客户端在 DFSClient.callGetBlockLocations() 方法中调用了远程方法 ClientProtocol.getBlockLocations() ,调用该方法返回一个LocatedBlocks对象,包含了一系列的LocatedBlock实例,通过这些信息客户端就知道需要到哪些数据节点上去获取数据。这个方法会在NameNode.getBlockLocations()中调用,进而调用FSNamesystem.同名的来进行实际的调用过程,FSNamesystem有三个重载方法,代码如下:

LocatedBlocks getBlockLocations(String clientMachine, String src,
      long offset, long length) throws IOException {
    LocatedBlocks blocks = getBlockLocations(src, offset, length, true, true,
        true);
    if (blocks != null) {//如果blocks不为空,那么就对数据块所在的数据节点进行排序
      //sort the blocks
      // In some deployment cases, cluster is with separation of task tracker 
      // and datanode which means client machines will not always be recognized 
      // as known data nodes, so here we should try to get node (but not 
      // datanode only) for locality based sort.
      Node client = host2DataNodeMap.getDatanodeByHost(
          clientMachine);
      if (client == null) {
        List<String> hosts = new ArrayList<String> (1);
        hosts.add(clientMachine);
        String rName = dnsToSwitchMapping.resolve(hosts).get(0);
        if (rName != null)
          client = new NodeBase(clientMachine, rName);
      }   

      DFSUtil.StaleComparator comparator = null;
      if (avoidStaleDataNodesForRead) {
        comparator = new DFSUtil.StaleComparator(staleInterval);
      }
      // Note: the last block is also included and sorted
      for (LocatedBlock b : blocks.getLocatedBlocks()) {
        clusterMap.pseudoSortByDistance(client, b.getLocations());
        if (avoidStaleDataNodesForRead) {
          Arrays.sort(b.