Willkommen. Schön, sind Sie da!

DEFR

KontaktHilfeServiceÜber Ex LibrisFirmenkundschaftFilialen

Erweiterte Suche

BücherE-BooksFilmeMusikGamesPapeterieGeschenke & FunSpiele

Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS

Sam Alapati

(0)

Erste Bewertung abgeben

Autorentext Sam R. Alapati has been working with various aspects of the Hadoop environment for the past six years. He is currently...

Tiefpreis

CHF49.20

Auslieferung erfolgt in der Regel innert 2 bis 3 Wochen.

Kartonierter Einband

Beschreibung

Autorentext
Sam R. Alapati has been working with various aspects of the Hadoop environment for the past six years. He is currently the principal Hadoop administrator at Sabre Corporation in Westlake, Texas, and works on a daily basis with multiple large Hadoop 2 clusters. In addition to being the point person for all Hadoop administration at Sabre, Sam manages multiple critical data-science- and data-analysis-related Hadoop job flows and is also an expert Oracle Database Administrator. His vast knowledge of relational databases and SQL contributes to his work with Hadoop related projects. Sam’s recognition in the database and middleware area includes having published 18 well-received books over the past 14 years, mostly on Oracle Database Administration and Oracle Weblogic Server. His experience dealing with numerous configuration, architectural, and performance-related Hadoop issues over the years led him to the realization that many working Hadoop administrators and developers would appreciate having a handy reference such as this book to turn to when creating, managing, securing and optimizing their Hadoop infrastructure.

Klappentext

In Expert Hadoop® Administration, leading Hadoop administrator Sam R. Alapati brings together authoritative knowledge for creating, configuring, securing, managing, and optimizing production Hadoop clusters in any environment. Drawing on his experience with large-scale Hadoop administration, Alapati integrates action-oriented advice with carefully researched explanations of both problems and solutions. He covers an unmatched range of topics and offers an unparalleled collection of realistic examples.

Alapati demystifies complex Hadoop environments, helping readers understand exactly what happens behind the scenes when they administer their cluster. Students will gain unprecedented insight as they walk through building clusters from scratch and configuring high availability, performance, security, encryption, and other key attributes.

Zusammenfassung

The Comprehensive, Up-to-Date Apache Hadoop Administration Handbook and Reference

“Sam Alapati has worked with production Hadoop clusters for six years. His unique depth of experience has enabled him to write the go-to resource for all administrators looking to spec, size, expand, and secure production Hadoop clusters of any size.”

–Paul Dix, Series Editor

Alapati demystifies complex Hadoop environments, helping you understand exactly what happens behind the scenes when you administer your cluster. You’ll gain unprecedented insight as you walk through building clusters from scratch and configuring high availability, performance, security, encryption, and other key attributes. The high-value administration skills you learn here will be indispensable no matter what Hadoop distribution you use or what Hadoop applications you run.

Understand Hadoop’s architecture from an administrator’s standpoint
Create simple and fully distributed clusters
Run MapReduce and Spark applications in a Hadoop cluster
Manage and protect Hadoop data and high availability
Work with HDFS commands, file permissions, and storage management
Move data, and use YARN to allocate resources and schedule jobs
Manage job workflows with Oozie and Hue
Secure, monitor, log, and optimize Hadoop
Benchmark and troubleshoot Hadoop Normal 0 false false false EN-US X-NONE X-NONE

Inhalt

Foreword xxvii

Preface xxix

Acknowledgments xxxv

About the Author xxxvii

Part I: Introduction to Hadoop—Architecture and Hadoop Clusters 1

Chapter 1: Introduction to Hadoop and Its Environment 3

Hadoop—An Introduction 4

Cluster Computing and Hadoop Clusters 12

Hadoop Components and the Hadoop Ecosphere 15

What Do Hadoop Administrators Do? 18

Key Differences between Hadoop 1 and Hadoop 2 21

Distributed Data Processing: MapReduce and Spark, Hive and Pig 24

Data Integration: Apache Sqoop, Apache Flume and

Apache Kafka 27

Key Areas of Hadoop Administration 28

Summary 31

Chapter 2: An Introduction to the Architecture of Hadoop 33

Distributed Computing and Hadoop 33

Hadoop Architecture 34

Data Storage—The Hadoop Distributed File System 37

Data Processing with YARN, the Hadoop Operating System 48

Summary 57

Chapter 3: Creating and Configuring a Simple Hadoop Cluster 59

Hadoop Distributions and Installation Types 60

Setting Up a Pseudo-Distributed Hadoop Cluster 62

Performing the Initial Hadoop Configuration 71

Operating the New Hadoop Cluster 86

Summary 90

Chapter 4: Planning for and Creating a Fully Distributed Cluster 91

Planning Your Hadoop Cluster 92

Going from a Single Rack to Multiple Racks 95

Creating a Multinode Cluster 102

Modifying the Hadoop Configuration 106

Starting Up the Cluster 114

Configuring Hadoop Services, Web Interfaces and Ports 119

Summary 126

Part II: Hadoop Application Frameworks 127

Chapter 5: Running Applications in a Cluster—The MapReduce Framework (and Hive and Pig) 129

The MapReduce Framework 129

Apache Hive 141

Apache Pig 144

Summary 145

Chapter 6: Running Applications in a Cluster—The Spark Framework 147

What Is Spark? 148

Why Spark? 149

The Spark Stack 153

Installing Spark 155

Spark Run Modes 158

Understanding the Cluster Managers 159

Spark and Data Access 164

Summary 167

Chapter 7: Running Spark Applications 169

The Spark Programming Model 169

Spark Applications 173

Architecture of a Spark Application 179

Running Spark Applications Interactively 181

Creating and Submitting Spark Applications 185

Configuring Spark Applications 192

Monitoring Spark Applications 194

Handling Streaming Data with Spark Streaming 194

Using Spark SQL for Handling Structured Data 198

Summary 201

Part III: Managing and Protecting Hadoop Data and High Availability 203

Chapter 8: The Role of the NameNode and How HDFS Works 205

HDFS—The Interaction between the NameNode and the DataNodes 205

Rack Awareness and Topology 209

HDFS Data Replication 212

How Clients Read and Write HDFS Data 218

Understanding HDFS Recovery Processes 224

Centralized Cache Management in HDFS 227

Hadoop Archival Storage, SSD and Memory (Heterogeneous Storage) 232

Summary 241

Chapter 9: HDFS Commands, HDFS Permissions and HDFS Storage 243

Managing HDFS through the HDFS Shell Commands 243

Using the dfsadmin Utility to Perform HDFS Operations 251

Managing HDFS Permissions and Users 255

Managing HDFS Storage 260

Rebalancing HDFS Data 267

Reclaiming HDFS Space 274

Summary 276

Chapter 10: Data Protection, File Formats and Accessing HDFS 277

Safeguarding Data 278

Data Compression 289

Hadoop File Formats 295

Using Hadoop WebHDFS and HttpFS 308

Summary 315

Chapter 11: NameNode Operations, High Availability and Federation 317

Understanding NameNode Operations 318

The Checkpointing Process 323

NameNode Safe Mode Operations 329

Configuring HDFS High Availability 334

HDFS Federation 349

Summa…

Produktinformationen

Titel:

Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS

Untertitel:

Managing, Tuning, and Securing Spark, YARN, and HDFS

Autor:

Sam Alapati

EAN:

9780134597195

ISBN:

978-0-13-459719-5

Format:

Kartonierter Einband

Herausgeber:

Pearson Academic

Genre:

Netzwerke

Veröffentlichung:

19.01.2017

Anzahl Seiten:

848

Gewicht:

1326g

Größe:

H236mm x B172mm x T40mm

Jahr:

2017

Untertitel:

Englisch

Mehr entdecken: English Books

Belletristik & Unterhaltung, Essen & Trinken, Geisteswissenschaften, Kunst & Musik, IT & Informatik, Kinder- & Jugendbücher, Naturwissenschaften & Medizin, Ratgeber & Freizeit, Reisen, Sachbücher, Schule & Lernen, Sozialwissenschaften, Recht & Wirtschaft, Technik

Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS

Wird oft zusammen gekauft

Andere Kunden kauften auch

Beschreibung

Produktinformationen

Mehr entdecken: English Books