difference between 零 reducer and identity reducer
difference between 0 reducer and identity reducer
- 0 reducer means reduce step will be skipped and mapper output will be the final out
- Identity reducer means then shuffling/sorting will still take place
If you do not need sorting of map results - you set 0 reduced,and the job is called map only.
If you need to sort the mapping results, but do not need any aggregation - you choose identity reducer.
Another use-case for using the Identity Reducer is to combine all the results into <# of reducers> output files. This can be handy if you are using Amazon Web Services to write to S3 directly, especially if the mapper output is small (e.g. a grep/search for a record), and you have a lot of mappers (e.g. 1000's).
References
http://stackoverflow.com/questions/10630447/hadoop-difference-between-0-reducer-and-identity-reducer