There has been a tremendous increase in computation and information technology during the last ten years. Huge volumes of data from a range of disciplines, including medicine, biology, finance, and marketing, have also come along with it.
The difficulty of comprehending enormous data has prompted the creation of new statistical methods and given rise to brand-new fields like data mining, machine learning, and bioinformatics. Although many of these technologies share similar foundations, they are frequently expressed using different nomenclature.
This book presents the key concepts in these fields under a shared conceptual framework. Although the method is statistical, the focus is on ideas rather than arithmetic. Numerous examples are provided, and color visuals are used extensively. For statisticians and anybody interested in data mining in science or business, it is an invaluable resource. The scope of the book's coverage ranges from supervised (prediction) learning to unsupervised learning. The several subjects covered in this book—the first thorough coverage of the subject in any book—include neural networks, support vector machines, classification trees, and boosting.
Graphical models, random forests, ensemble approaches, least angle regression & route algorithms for the lasso, non-negative matrix factorization, and spectral clustering are just a few of the topics that are included in this significant new edition that was not in the original. A chapter on approaches for "broad" data (p greater than n) is also included. This chapter covers multiple testing and false discovery rates.