Bienvenue chez nous !

DEFR

ContactOffrir des bonsSuccursales

Recherche detaillée

Foundations of Data Intensive Applications

Supun Kamburugamuve

Saliya Ekanayake

eBook (pdf)

(0)

Donner la première évaluation

PEEK "UNDER THE HOOD" OF BIG DATA ANALYTICS The world of big data analytics grows ever more complex. And while many people can wo...

CHF35.65

Download est disponible immédiatement

eBook (pdf)

Description

PEEK "UNDER THE HOOD" OF BIG DATA ANALYTICS

The authors discuss foundational components of large-scale data systems and walk readers through the major software design decisions that define performance, application type, and usability. You???ll learn how to recognize problems in your applications resulting in performance and distributed operation issues, diagnose them, and effectively eliminate them by relying on the bedrock big data principles explained within.

Moving beyond individual frameworks and APIs for data processing, this book unlocks the theoretical ideas that operate under the hood of every big data processing system.

Ideal for data scientists, data architects, dev-ops engineers, and developers, Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood shows readers how to:

Identify the foundations of large-scale, distributed data processing systems
Make major software design decisions that optimize performance
Diagnose performance problems and distributed operation issues
Understand state-of-the-art research in big data
Explain and use the major big data frameworks and understand what underpins them
Use big data analytics in the real world to solve practical problems

Auteur

SUPUN KAMBURUGAMUVE, PhD, is a computer scientist researching and designing large scale data analytics tools. He received his doctorate in Computer Science from Indiana University, Bloomington and architected the data processing systems Twister2 and Cylon.

SALIYA EKANAYAKE, PhD, is a Senior Software Engineer at Microsoft working in the intersection of scaling deep learning systems and parallel computing. He is also a research affiliate at Berkeley Lab. He received his doctorate in Computer Science from Indiana University, Bloomington.

Texte du rabat
PEEK UNDER THE HOOD OF BIG DATA ANALYTICS

The world of big data analytics grows ever more complex. And while many people can work superficially with specific frameworks, far fewer understand the fundamental principles of large-scale, distributed data processing systems and how they operate. In Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood, renowned big-data experts and computer scientists Drs. Supun Kamburugamuve and Saliya Ekanayake deliver a practical guide to applying the principles of big data to software development for optimal performance. The authors discuss foundational components of large-scale data systems and walk readers through the major software design decisions that define performance, application type, and usability. You???ll learn how to recognize problems in your applications resulting in performance and distributed operation issues, diagnose them, and effectively eliminate them by relying on the bedrock big data principles explained within. Moving beyond individual frameworks and APIs for data processing, this book unlocks the theoretical ideas that operate under the hood of every big data processing system. Ideal for data scientists, data architects, dev-ops engineers, and developers, Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood shows readers how to:

Identify the foundations of large-scale, distributed data processing systems
Make major software design decisions that optimize performance
Diagnose performance problems and distributed operation issues
Understand state-of-the-art research in big data
Explain and use the major big data frameworks and understand what underpins them
Use big data analytics in the real world to solve practical problems

Contenu
Introduction xxvii

Chapter 1 Data Intensive Applications 1

Anatomy of a Data-Intensive Application 1

A Histogram Example 2

Program 2

Process Management 3

Communication 4

Execution 5

Data Structures 6

Putting It Together 6

Application 6

Resource Management 6

Messaging 7

Data Structures 7

Tasks and Execution 8

Fault Tolerance 8

Remote Execution 8

Parallel Applications 9

Serial Applications 9

Lloyd's K-Means Algorithm 9

Parallelizing Algorithms 11

Decomposition 11

Task Assignment 12

Orchestration 12

Mapping 13

K-Means

Algorithm 13

Parallel and Distributed Computing 15

Memory Abstractions 16

Shared Memory 16

Distributed Memory 18

Hybrid (Shared + Distributed) Memory 20

Partitioned Global Address Space Memory 21

Application Classes and Frameworks 22

Parallel Interaction Patterns 22

Pleasingly Parallel 23

Dataflow 23

Iterative 23

Irregular 23

Data Abstractions 24

Data-Intensive

Frameworks 24

Components 24

Workflows 25

An Example 25

What Makes It Difficult? 26

Developing Applications 27

Concurrency 27

Data Partitioning 28

Debugging 28

Diverse Environments 28

Computer Networks 29

Synchronization 29

Thread Synchronization 29

Data Synchronization 30

Ordering of Events 31

Faults 31

Consensus 31

Summary 32

References 32

Chapter 2 Data and Storage 35

Storage Systems 35

Storage for Distributed Systems 36

Direct-Attached Storage 37

Storage Area Network 37

Network-Attached Storage 38

DAS or SAN or NAS? 38

Storage Abstractions 39

Block Storage 39

File Systems 40

Object Storage 41

Data Formats 41

XML 42

JSON 43

CSV 44

Apache Parquet 45

Apache Avro 47

Avro Data Definitions (Schema) 48

Code Generation 49

Without Code Generation 49

Avro File 49

Schema Evolution 49

Protocol Buffers, Flat Buffers, and Thrift 50

Data Replication 51

Synchronous and Asynchronous Replication 52

Single-Leader and Multileader Replication 52

Data Locality 53

Disadvantages of Replication 54

Data Partitioning 54

Vertical Partitioning 55

Horizontal Partitioning (Sharding) 55

Hybrid Partitioning 56

Considerations for Partitioning 57

NoSQL Databases 58

Data Models 58

Key-Value Databases 58

Document Databases 59

Wide Column Databases 59

Graph Databases 59

CAP Theorem 60

Message Queuing 61

Message Processing Guarantees 63

Durability of Messages 64

Acknowledgments 64

Storage First Brokers and Transient Brokers 65

Summary 66

References 66

Chapter 3 Computing Resources 69

A Demonstration 71

Computer Clusters 72

Anatomy of a Computer Cluster 73

Data Analytics in Clusters 74

Dedicated Clusters 76

Classic Parallel Systems 76

Big Data Systems 77

Shared Clusters 79

OpenMPI on a Slurm Cluster 79

Spark on a Yarn Cluster 80

Distributed Application Life Cycle 80 Life Cycle Steps 80</p...

Informations sur le produit

Titre:

Foundations of Data Intensive Applications

Sous-titre :

Large Scale Data Analytics under the Hood

Auteur:

Supun Kamburugamuve

Saliya Ekanayake

EAN:

9781119713036

Format:

eBook (pdf)

Producteur:

Wiley

Genre:

Informatique

Parution:

05.08.2021

Protection contre la copie numérique:

Adobe DRM

Taille de fichier:

6.82 MB

Nombre de pages:

416

Mehr entdecken: Informatique, traitement des données

Généralités, lexiques, Logiciels, Systèmes d'exploitation, interfaces utilisateur, Communication des données, réseaux, Matériel, Informatique, Internet, Langages de programmation, Autres

Foundations of Data Intensive Applications

Souvent achetés ensemble

D’autres clients ont aussi acheté

Description

Informations sur le produit

Mehr entdecken: Informatique, traitement des données