difference between 零 reducer and identity reducer

difference between 0 reducer and identity reducer
  • 0 reducer means reduce step will be skipped and mapper output will be the final out
  • Identity reducer means then shuffling/sorting will still take place

If you do not need sorting of map results - you set 0 reduced,and the job is called map only.


If you need to sort the mapping results, but do not need any aggregation - you choose identity reducer.

 

Another use-case for using the Identity Reducer is to combine all the results into <# of reducers> output files. This can be handy if you are using Amazon Web Services to write to S3 directly, especially if the mapper output is small (e.g. a grep/search for a record), and you have a lot of mappers (e.g. 1000's).

 

 

References

http://*.com/questions/10630447/hadoop-difference-between-0-reducer-and-identity-reducer