Data-Intensive Text Processing with MapReduce (Jimmy Lin, et al)

Data-driven methodologies are revolutionizing our world; access to vast volumes of data has led to fresh discoveries and unlocked intriguing new possibilities in business, science, and computing applications.

Large clusters are required to process the vast amounts of data required for these advancements, making distributed computing concepts more important than ever.

MapReduce is an execution framework for large-scale data processing on clusters of commodity computers and a programming model for expressing distributed computations on big datasets. The execution framework transparently manages numerous system-level concerns, such as scheduling, synchronization, and fault tolerance, while the programming paradigm offers an understandable abstraction for developing scalable algorithms.

This book focuses on the creation of the MapReduce algorithm with a particular emphasis on text processing techniques used in machine learning, information retrieval, and natural language processing. We explain the idea of MapReduce design patterns, which stand for all-purpose, reusable solutions to problems that crop up often across many problem areas.

Ebook Details

About the Authors
At the University of Waterloo's David R. Cheriton School of Computer Science, Jimmy Lin currently holds the titles of Professor and David R. Cheriton Chair.
Published Date / Year
(April 30, 2010)
178 pages
eBook Format

