Hadoop MapReduce is a software framework for easily writing applications which process big amounts of data in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.
The term MapReduce actually refers to the following two different tasks that Hadoop programs perform:
- The Map Task: This is the first task, which takes input data and converts it into a set of data, where individual elements are broken down into tuples (key/value pairs).
- The Reduce Task: This task takes the output from a map task as input and combines those data tuples into a smaller set of tuples. The reduce task is always performed after the map task.
The MapReduce framework consists of a single master JobTracker and one slave TaskTracker
per cluster-node. The master is responsible for resource management,
tracking resource consumption/availability and scheduling the jobs
component tasks on the slaves, monitoring them and re-executing the
failed tasks. The slaves TaskTracker execute the tasks as directed by
the master and provide task-status information to the master
periodically.
The JobTracker is a single point of failure for the Hadoop MapReduce
service which means if JobTracker goes down, all running jobs are
halted.
No comments:
Post a Comment