Cloud computing mapreduce

May 28, 2024 | 14:12

Map and Reduce

Basics:
- Map = divide
- Reduce = aggregate
- data processed by mapper and reducer is in (k, v) pair
- Example: word count
  - Mapper make each word in a sentence as key, assign it with a value ‘1’ indicating that this word appeared once in the sentence. If a word appeared twice, there are 2 pairs for that.
  - Reducer Aggregates all (k, v) pairs, i.e. combine
- Example 2: calc mean score
  - Mapper: processes the original values:
    - Key: Student ID
    - Value: Score
  - Reducer: processes (k, v) pairs:
    - Operation: Average
    - Key: Student ID
    - Value: Score
- k, v can be matrix
“Parallel” because each student’s operation can be done in separate machines. Machine A can calc student A’s avg score, Machine B can calc student B’s avg score, etc. Each machine can be allocated with more than 1 student
Multiple Mapper-Reducer pairs can be chained together, called “Chaining Jobs”
Limitations: see P18
- Cannot control number of reducers ( <= number of keys)
- Cannot control when a mapper/reducer starts or ends
- Which input a mapper is processing
- Which key a reducer is processing
Market basket:
- Mapper key: item pairs
- Mapper value: 1
- Reduced key: item pairs
- Reduced value: support (total count)
- Reducer ops: sum
P35 example:
- Transaction: sentence
- Item: article
- Mapper 1
  - key: sentence ID
  - value: article ID
- Reducer 1
  - key: sentence ID
  - value: list of articles
- Mapper 2
  - key: article pairs
  - value: 1
- Reducer 2
  - key: article pairs
  - value: sum

Post Views: 78

Tags:

No tags

Comments are closed

Map and Reduce

Latest Comments