Cloud computing mapreduce

Map and Reduce

  • Basics:
    • Map = divide
    • Reduce = aggregate
    • data processed by mapper and reducer is in (k, v) pair
    • Example: word count
      • Mapper make each word in a sentence as key, assign it with a value ‘1’ indicating that this word appeared once in the sentence. If a word appeared twice, there are 2 pairs for that.
      • Reducer Aggregates all (k, v) pairs, i.e. combine
    • Example 2: calc mean score
      • Mapper: processes the original values:
        • Key: Student ID
        • Value: Score
      • Reducer: processes (k, v) pairs:
        • Operation: Average
        • Key: Student ID
        • Value: Score
    • k, v can be matrix
  • “Parallel” because each student’s operation can be done in separate machines. Machine A can calc student A’s avg score, Machine B can calc student B’s avg score, etc. Each machine can be allocated with more than 1 student
  • Multiple Mapper-Reducer pairs can be chained together, called “Chaining Jobs”
  • Limitations: see P18
    • Cannot control number of reducers ( <= number of keys)
    • Cannot control when a mapper/reducer starts or ends
    • Which input a mapper is processing
    • Which key a reducer is processing
  • Market basket:
    • Mapper key: item pairs
    • Mapper value: 1
    • Reduced key: item pairs
    • Reduced value: support (total count)
    • Reducer ops: sum
  • P35 example:
    • Transaction: sentence
    • Item: article
    • Mapper 1
      • key: sentence ID
      • value: article ID
    • Reducer 1
      • key: sentence ID
      • value: list of articles
    • Mapper 2
      • key: article pairs
      • value: 1
    • Reducer 2
      • key: article pairs
      • value: sum

Tags:

Comments are closed

Latest Comments