CHF32.70
Download est disponible immédiatement
Jump-start your career as a data scientist--learn to develop datasets for exploration, analysis, and machine learning
SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis is a resource that's dedicated to the Structured Query Language (SQL) and dataset design skills that data scientists use most. Aspiring data scientists will learn how to how to construct datasets for exploration, analysis, and machine learning. You can also discover how to approach query design and develop SQL code to extract data insights while avoiding common pitfalls.
You may be one of many people who are entering the field of Data Science from a range of professions and educational backgrounds, such as business analytics, social science, physics, economics, and computer science. Like many of them, you may have conducted analyses using spreadsheets as data sources, but never retrieved and engineered datasets from a relational database using SQL, which is a programming language designed for managing databases and extracting data.
This guide for data scientists differs from other instructional guides on the subject. It doesn't cover SQL broadly. Instead, you'll learn the subset of SQL skills that data analysts and data scientists use frequently. You'll also gain practical advice and direction on "how to think about constructing your dataset."
Gain an understanding of relational database structure, query design, and SQL syntax
Develop queries to construct datasets for use in applications like interactive reports and machine learning algorithms
Review strategies and approaches so you can design analytical datasets
Practice your techniques with the provided database and SQL code
In this book, author Renee Teate shares knowledge gained during a 15-year career working with data, in roles ranging from database developer to data analyst to data scientist. She guides you through SQL code and dataset design concepts from an industry practitioner's perspective, moving your data scientist career forward!
Auteur
RENÉE M. P. TEATE is the Director of Data Science at HelioCampus, a higher ed tech startup based in the Washington, DC area. She prepares datasets with SQL, develops predictive models with Python, and designs interactive dashboards in Tableau for university decision-makers. She created the Becoming a Data Scientist podcast, helped build the data science learning community on Twitter, and is a sought-after speaker at industry conferences.
Texte du rabat
Jumpstart your data science career with crucial SQL skills
Today, many organizations expect their data scientists to be able to design and generate their own datasets by extracting and combining raw data from the company's data warehouses without the assistance of data engineers.
In SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis, experienced data scientist and database developer Renée M. P. Teate delivers a singular guide to the SQL skills and techniques every data scientist should know. You'll discover how to approach query design and develop SQL code to construct datasets for exploration, analysis, and data science.
SQL for Data Scientists shows you how to create datasets for use in applications like interactive reports and dashboards, as well as in machine learning algorithms. You'll skip right to the subset of SQL skills that data scientists and analysts use most frequently, and receive expert advice on extracting insights from data while avoiding common pitfalls.
Résumé
Jump-start your career as a data scientistlearn to develop datasets for exploration, analysis, and machine learning
SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis is a resource that's dedicated to the Structured Query Language (SQL) and dataset design skills that data scientists use most. Aspiring data scientists will learn how to how to construct datasets for exploration, analysis, and machine learning. You can also discover how to approach query design and develop SQL code to extract data insights while avoiding common pitfalls.
You may be one of many people who are entering the field of Data Science from a range of professions and educational backgrounds, such as business analytics, social science, physics, economics, and computer science. Like many of them, you may have conducted analyses using spreadsheets as data sources, but never retrieved and engineered datasets from a relational database using SQL, which is a programming language designed for managing databases and extracting data.
This guide for data scientists differs from other instructional guides on the subject. It doesn't cover SQL broadly. Instead, you'll learn the subset of SQL skills that data analysts and data scientists use frequently. You'll also gain practical advice and direction on "how to think about constructing your dataset."
Contenu
Introduction xix
Chapter 1 Data Sources 1
Data Sources 1
Tools for Connecting to Data Sources and Editing SQL 2
Relational Databases 3
Dimensional Data Warehouses 7
Asking Questions About the Data Source 9
Introduction to the Farmer's Market Database 11
A Note on Machine Learning Dataset Terminology 12
Exercises 13
Chapter 2 The SELECT Statement 15
The SELECT Statement 15
The Fundamental Syntax Structure of a SELECT Query 16
Selecting Columns and Limiting the Number of Rows Returned 16
The ORDER BY Clause: Sorting Results 18
Introduction to Simple Inline Calculations 20
More Inline Calculation Examples: Rounding 22
More Inline Calculation Examples: Concatenating Strings 24
Evaluating Query Output 26
SELECT Statement Summary 29
Exercises Using the Included Database 30
Chapter 3 The WHERE Clause 31
The WHERE Clause 31
Filtering SELECT Statement Results 32
Filtering on Multiple Conditions 34
Multi-Column Conditional Filtering 40
More Ways to Filter 41
BETWEEN 41
IN 42
LIKE 43
IS NULL 44
A Warning About Null Comparisons 44
Filtering Using Subqueries 46
Exercises Using the Included Database 47
Chapter 4 CASE Statements 49
CASE Statement Syntax 50
Creating Binary Flags Using CASE 52
Grouping or Binning Continuous Values Using CASE 53
Categorical Encoding Using CASE 56
CASE Statement Summary 59
Exercises Using the Included Database 60
Chapter 5 SQL JOINs 61
Database Relationships and SQL JOINs 61
A Common Pitfall when Filtering Joined Data 71
JOINs with More than Two Tables 74
Exercises Using the Included Database 76
Chapter 6 Aggregating Results for Analysis 79
GROUP BY Syntax 79…