data processed by mapper and reducer is in (k, v) pair
Example: word count
Mapper make each word in a sentence as key, assign it with a value ‘1’ indicating that this word appeared once in the sentence. If a word appeared twice, there are 2 pairs for that.
Reducer Aggregates all (k, v) pairs, i.e. combine
Example 2: calc mean score
Mapper: processes the original values:
Key: Student ID
Value: Score
Reducer: processes (k, v) pairs:
Operation: Average
Key: Student ID
Value: Score
k, v can be matrix
“Parallel” because each student’s operation can be done in separate machines. Machine A can calc student A’s avg score, Machine B can calc student B’s avg score, etc. Each machine can be allocated with more than 1 student
Multiple Mapper-Reducer pairs can be chained together, called “Chaining Jobs”
Limitations: see P18
Cannot control number of reducers ( <= number of keys)
Cannot control when a mapper/reducer starts or ends
Comments are closed