Complete Data Engineer With SQL & Python

The first course will teach you the fundamental concepts of data engineering, including the Extract-Transform-Load (ETL) and Extract-Load-Transform (ELT) workflows. The second course will take you on the advance journey to becoming a Data Engineer, which is ideal for those with foundational SQL knowledge from our Fundamental Data Engineer course.

Get Advice

Complete Data Engineer With SQL & Python Courses

Who this course is for:

Computer Science or IT Students or other graduates with passion to get into IT.
Data Warehouse Developers who want to transition to Data Engineering roles.

ETL Developers who want to transition to Data Engineering roles.

Database or PL/SQL Developers who want to transition to Data Engineering roles.
BI Developers who want to transition to Data Engineering roles.

QA Engineers to learn about Data Engineering.
Application Developers to gain Data Engineering Skills.

What you will Learn:

Setup Environment to learn SQL and Python essentials for Data Engineering.
Database Essentials for Data Engineering using Postgres such as creating tables, indexes, running SQL Queries, using important pre-defined functions, etc.

Data Engineering Programming Essentials using Python such as basic programming constructs, collections, Pandas, Database Programming, etc.

Data Engineering using Spark Dataframe APIs (PySpark) using Databricks. Learn all important Spark Data Frame APIs such as select, filter, groupBy, orderBy, etc.
Data Engineering using Spark SQL (PySpark and Spark SQL). Learn how to write high quality Spark SQL queries using SELECT, WHERE, GROUP BY, ORDER BY, ETC.

Relevance of Spark Metastore and integration of Dataframes and Spark SQL

Ability to build Data Engineering Pipelines using Spark leveraging Python as Programming Language.
Use of different file formats such as Parquet, JSON, CSV etc in building Data Engineering Pipelines.

Setup Hadoop and Spark Cluster on GCP using Dataproc.

Understanding Complete Spark Application Development Life Cycle to build Spark Applications using Pyspark. Review the applications using Spark UI.

Course Benefits & Key Features

Complete Data Engineer with SQL & Python’s benefits and key features.

Modules

30+ Modules.

Lessons

80+ Lessons

Practical

40+ Hands-On Labs

Life Projects

5+ Projects

Resume

CV Preparations

Job

Jobs Reference

Recording

Session Recording

Interviews

Mock Interviews

Support

On Job Supports

Membership

Membership Access

Networks

Networking

Certification

Certificate of Completion

INSTRUCTOR-LED LIVE ONLINE CLASSES

Our learn-by-building-project method enables you to build

practical or coding experience that sticks. 95% of our

learners say they have confidence and remember more

when they learn by building real world projects which is

required to work in your real life.

Get step-by-step guidance to practice your skills without getting stuck
Validate your technical problem-solving skills in a real environment
Troubleshoot complex scenarios to practice what you learned
Develop production experience that translates into real-world

.

Python Developer Program Job Outlook

.

Ranked #1 Programming
Language

TIOBE and PYPL ranks Python as the most popular global programming language.

Python Salary
Trend

The average salary for a Python Developer is $114,489 per year in the United States.

44.8% Compound annual growth rate (CAGR)

The global python market size is expected to reach USD 100.6 million in 2030.

Why Data Engineer with SQL & Python?

The Backbone of Data Science

Data engineers are on the front lines of data strategy so that others don’t need to be. They are the first people to tackle the influx of structured and unstructured data that enters a company’s systems.

Technically Challenging

One of the Python functions data analysts and scientists use the most is read_csv — from the pandas library. It reads tabular data stored in a text file into Python, so that it can be explored and manipulated.

It's Rewarding

Every day, we create 2.5 quintillion bytes of data. Business Insider reports that there will be more than 64 billion IoT devices by 2025, up from about 10 billion in 2018, and 9 billion in 2017″.

It Pays Well

According to IBM’s The Quant Crunch: “Jobs specifying machine learning skills pay an average of $114,000. Advertised data scientist jobs pay an average of $105,000 and advertised data engineering jobs pay an average of $117,000.”.

Become a Leader

Being a central part of an organization’s decision-making processes, analytics experts often pick up strong leadership skills as well.

It’s Valuable Even If You Don’t Want to Be a Data Engineer

Even if you don’t want to pursue a career as a data engineer, if you want to work in data science, it can be very useful to have some knowledge of data engineering.

GreaterHeight Certificates holders are prepared to work at companies like these.

Some Alumni Testimonies

Investing in the course "Become a Data Analyst" with GreaterHeight Academy is great value for the money and I highly recommend. The trainer is very knowledgeable, very engaging, provided us with quality training sessions on all courses and was easily acessible for queries. We also had access to the course materials and also the timely availability of the recorded videos made it easy and aided the learning process..

QUEEN OBIWULU

Team Lead, Customer Success

The training was fantastic, the instructor is an awesome lecturer, relentless and not tired in his delivery. He obviously enjoys teaching, it comes natural to him. We got more than we expected. He extended my knowledge of Excel beyond what I knew, and the courses were brilliantly delivered. They reach out, follow up, ask questions, and in fact the support has been great. They are highly recommended and I would definitely subscribe to other training programs from them.

BISOLA OGUNRO

Fraud Analytics Risk Oversight Manager

It's one thing to look for just a Data Analysis training, and it's another to get the knowledge transferred through certified professional trainers. No matter your initial level of proficiency in any of the Data Analysis tools, GreaterHeight Academy would meet you there and take you up to a highly proficienct and confident level in a short time at a reasonable pace. I learnt a lot of Data Analysis tools and skills at GreaterHeight from patient and resourceful teachers.

TUNDE MEREDITH

Operation Director - Abbfem Technology

The Data Analysis training program was one of the best I have attended. The way GreaterHeight took off with Excel and concluded the four courses with Excel was a mind blowing - it was WOW!! I concluded that I'm on the right path with the right mentor to take me from a novice to professional. GreaterHeight is the best as far as impacting Data Analysis knowledge is concern. I would shout it at the rooftop to recommend GreaterHeight to any trainee that really wants to learn.

JOHN OSI PETER

Greaterheight

I wanted to take a moment to express my deepest gratitude for the opportunity to study data analytics at GreaterHeight Academy. I am truly impressed by the level of dedication and support that the sponsor and CEO have put into this program. GreaterHeight Academy is without a doubt the best tech institution out there, providing top-notch education and resources for its students. One of the advantages of studying at GreaterHeight Academy is the access to the best tools and technologies in the field.

AYODELE PAYNE

Sales/Data Analyst

It is an unforgettable experience that will surely stand the test of time learning to become a Data Analyst with GreaterHeights Academy. The Lecture delivery was so impactful and the Trainer is vast and well knowledgeable in using the applicable tools for the Sessions. Always ready to go extra mile with you. The supports you get during and after the lectures are top notch with materials and resources available to build your confidence on and off the job.

ADEBAYO OLADEJO

Customer Service Advisor (Special Operations)

Fundamental Data Engineer with SQL Course

In this course, you'll learn the fundamental concepts of data engineering, including the Extract-Transform-Load (ETL) and Extract-Load-Transform (ELT) workflows. You'll discover how to interact with relational databases such as PostgreSQL to store, modify, and query data. Moving through the track, you'll pick up techniques for querying structured data using SQL, including joining multiple tables, calculating aggregated statistics, filtering, grouping, and writing subqueries. Switching gears, you'll go on to discover database design principles such as star and snowflake schemas, and normalization. You'll use this knowledge to perform typical data engineering tasks such as creating, altering, and deleting tables, and enforcing data consistency by casting data to different data types. The track also shows how you can download PostgreSQL to your operating system, along with setting up and modifying users. Conclude by learning about data warehouse technologies and familiarizing with Snowflake, a popular cloud technology for data engineering!

Understanding Data Engineering

Understand the Basics of Data Engineering

In this course, you’ll learn about a data engineer’s core responsibilities, how they differ from data scientists, and facilitate the flow of data through an organization. Through hands-on exercises you’ll follow Spotflix, a fictional music streaming company, to understand how their data engineers collect, clean, and catalog their data.

Apply in Personal Cases

By the end of the course, you’ll understand what your company's data engineers do, be ready to have a conversation with a data engineer, and have a solid foundation to start your own data engineer journey.

3 Modules | 4+ Hours | 3+ Skills

Course Modules

In this module, you’ll learn what data engineering is and why demand for them is increasing. You’ll then discover where data engineering sits in relation to the data science lifecycle, how data engineers differ from data scientists, and have an introduction to your first complete data pipeline.

Data engineering and big data
Go with the flow
Not responsible
Big time
Data engineers vs. data scientists
Tell me the truth
Who is it
The data pipeline
It's not true
Pipeline

It’s time to talk about data storage—one of the main responsibilities for a data engineer. In this module, you’ll learn how data engineers manage different data structures, work in SQL—the programming language of choice for querying and storing data, and implement appropriate data storage solutions with data lakes and data warehouses.

Data structures
Structures
What's the difference
SQL databases
We can work it out
Columns
Different breeds
Data warehouses and data lakes
Tell the truth
Our warehouse (in the middle of our street)

Data engineers make life easy for data scientists by preparing raw data for analysis using different processing techniques at different steps. These steps need to be combined to create pipelines, which is when automation comes into play. Finally, data engineers use parallel and cloud computing to keep pipelines flowing smoothly.

Processing data
Connect the dots
Scheduling data
Schedules
One or the other
Parallel computing
Whenever, whenever
Parallel universe
Cloud computing
Obscured by clouds
Somewhere I belong
We are the champions

Introduction to SQL

Learn how Relational Databases are Organized

SQL is an essential language for building and maintaining relational databases, which opens the door to a range of careers in the data industry and beyond. You’ll start this course by covering data organization, tables, and best practices for database construction.

Write Your First SQL Queries

The second half of this course looks at creating SQL queries for selecting data that you need from your database. You’ll have the chance to practice your querying skills before moving on to customizing and saving your results.

Understand the Difference Between PostgreSQL and SQL Server

PostgreSQL and SQL Server are two of the most popular SQL flavors. You’ll finish off this course by looking at the differences, benefits, and applications of each. By the end of the course, you’ll have some hands-on experience in learning SQL and the grounding to start applying it to projects or continue your learning in a more specialized direction.

5 Modules | 3+ Hours | 4 Skills

Course Modules

Before writing any SQL queries, it’s important to understand the underlying data. In this module, we’ll discover the role of SQL in creating and querying relational databases. Using a database for a local library, we will explore database and table organization, data types and storage, and best practices for database construction.

Introduction to Database Management System
What are the advantages of databases?
Data organization
Introduction to SQL
Tables in SQL
Views in SQL
Table vs Views
Picking a unique ID
Setting the table in style
Finding data types

Introduction
Entity Relationship Model
Relationships in SQL
Recap

Introduction
Downloading SQL Developer Edition
Installing SQL Developer Edition
Connecting to SQL Server
Downloading Sample SQL Database in SQL Management Studio (SSMS)
Configuring SQL Server, and SSMS
Recap

Database Manipulation in SQL
SQL Storage Engines
Creating and Managing Tables in SQL
Creating and Managing Tables in SQL CREATE, DESCRIBE, and SHOW Table
Creating and Managing Tables in SQL ALTER, TRUNCATE, and DROP Tables
Inserting and Querying Data in Tables
Filtering Data From Tables in SQL
Filtering Data From Tables in SQL WHERE and DISTINCT Clauses
Filtering Data From Tables in SQL AND and OR Operators
Filtering Data From Tables in SQL IN and NOT IN Operators
Filtering Data From Tables in SQL BETWEEN and LIKE Operators
Filtering Data From Tables in SQL TOP, IS NULL, and IS NOT NULL Operators
Sorting Table Data
Recap

Learn your first SQL keywords for selecting relevant data from database tables! After practicing querying skills in a database of books, you’ll customize query results using aliasing and save them as views so they can be shared. Finally, you’ll explore the differences between SQL flavors and databases such as SQL Server.

Introducing queries
SQL strengths
Developing SQL style
Querying the books table
Writing queries
Comments in SQL
Making queries DISTINCT
Aliasing
Viewing your query
SQL flavors
Comparing flavors
Limiting results

Intermediate SQL

SQL is widely recognized as the most popular language for turning raw data stored in a database into actionable insights. This course uses a films database to teach how to navigate and extract insights from the data using SQL.

Discover Filtering with SQL

You'll discover techniques for filtering and comparing data, enabling you to extract specific information to gain insights and answer questions about the data.

Get Acquainted with Aggregation

Next, you'll get a taste of aggregate functions, essential for summarizing data effectively and gaining valuable insights from large datasets. You'll also combine this with sorting and grouping data, adding another layer of meaning to your insights and analysis.

Write Clean Queries

Finally, you'll be shown some tips and best practices for presenting your data and queries neatly. Throughout the course, you'll have hands-on practice queries to solidify your understanding of the concepts. By the end of the course, you'll have everything you need to know to analyze data using your own SQL code today!

7 Modules | 8+ Hours | 5+ Skills

Course Modules

In this first module, you’ll learn how to query a films database and select the data needed to answer questions about the movies and actors. You'll also understand how SQL code is executed and formatted.

SELECT Statement
SELECT DISTINCT
Query execution
Order of execution
SQL style
SQL best practices
Formatting
Non-standard fields

Arithmetic Operators: +, -, *, /, %
Comparison Operators: =, >, <, >=, <=, <>, !=
Logical Operators: AND, OR, NOT
Special Operators: LIKE, IN, NOT, NOT EQUAL, IS NULL, UNION , UNION ALL ,
| Except, Between, ALL and ANY, INTERSECT Clause, EXISTS

Learn about how you can filter numerical and textual data with SQL. Filtering is an important use for this language. You’ll learn how to use new keywords and operators to help you narrow down your query to get results that meet your desired criteria and gain a better understanding of NULL values and how to handle them.

Filtering numbers
Filtering results
Using WHERE with numbers
Using WHERE with text
Multiple criteria
Using AND
Using OR
Using BETWEEN
Filtering text
LIKE and NOT LIKE
WHERE IN
Combining filtering and selecting
Understanding NULL values
Practice with NULLs

Here, we will teach you how to sort and group data. These skills will take your analyses to a new level by helping you uncover critical business insights and identify trends and performance. You'll get hands-on experience to determine which films performed the best and how movie durations and budgets changed over time.

Sorting results
Sorting text
The SQL ORDER BY
ORDER BY - ascending
ORDER BY - descending
Sorting single fields
Sorting multiple fields

Data Definition Language (DDL): CREATE, DROP, ALTER, TRUNCATE
Data Query Language (DQL): SELECT, WHERE
Data Manipulation Language (DML): INSERT, UPDATE, DELETE

NOT NULL Constraints
UNIQUE Constraints
Primary Key Constraints
Foreign Key Constraints
Composite Key
Unique Constraints
Alternate Key
CHECK Constraints
DEFAULT Constraints

SQL allows you to zoom in and out to better understand an entire dataset, its subsets, and its individual records. You'll learn to summarize data using aggregate functions and perform basic arithmetic calculations inside queries to gain insights into what makes a successful film.

COUNT, SUM, AVG, MIN, MAX, Num
Summarizing data
Aggregate functions and data types
Practice with aggregate functions
Summarizing subsets
Grouping data
GROUP BY single fields
GROUP BY multiple fields
Answering business questions
Filtering grouped data
Filter with HAVING
HAVING and sorting
Combining aggregate functions with WHERE
Using ROUND()
ROUND() with a negative parameter
Aliasing and arithmetic
Using arithmetic
Aliasing with functions
Rounding results

Joining Data In SQL

Joining data is an essential skill in data analysis, enabling you to draw information from separate tables together into a single, meaningful set of results. In this comprehensive course on joining data, you'll delve into the intricacies of table joins and relational set theory, learning how to optimize your queries for efficient data retrieval.

Understand Data Joining Fundamentals

You will learn how to work with multiple tables in SQL by navigating and extracting data from various tables within a SQL database using various join types, including inner joins, outer joins, and cross joins. With practice, you'll gain the knowledge of how to select the appropriate join method.

Explore Advanced Data Manipulation Techniques

Next up, you'll explore set theory principles such as unions, intersects, and except clauses, as well as discover the power of nested queries in SQL. Every step is accompanied by exercises and opportunities to apply the theory and grow your confidence in SQL.

5 Modules | 5+ Hours | 4+ Skills

Course Modules

Introduction to Alias
Introduction to JOINS
Right Cross and Self Join
Operators in SQL
Operators in SQL Updated
Intersect and Emulation
Minus and Emulation
Subquery in SQL
Subqueries with Statements and Operators
Subqueries with Commands
Derived Tables in SQL
EXISTS Operator
NOT EXISTS Operator
EXISTS vs IN Operators
Recap

In this closing Module, you’ll begin by investigating semi-joins and anti-joins. Next, you'll learn how to use nested queries. Last but not least, you’ll wrap up the course with some challenges!

Subquerying with semi joins and anti joins
Multiple WHERE clauses
Semi join
Diagnosing problems using anti join
Subqueries inside WHERE and SELECT
Subquery inside WHERE
WHERE do people live?
Subquery inside SELECT
Subqueries inside FROM
Subquery inside FROM
Subquery challenge
Final challenge
The finish line

In this module, you’ll be introduced to the concept of joining tables and will explore all the ways you can enrich your queries using joins—beginning with inner joins.

The ins and outs of INNER JOIN
Your first join
Joining with aliased tables
USING in action
Defining relationships
Relationships in our database
Inspecting a relationship
Multiple joins
Joining multiple tables
Checking multi-table joins

After familiarizing yourself with inner joins, you will come to grips with different kinds of outer joins. Next, you will learn about cross joins. Finally, you will learn about situations in which you might join a table with itself.

LEFT and RIGHT JOINs
Remembering what is LEFT
This is a LEFT JOIN, right?
Building on your LEFT JOIN
Is this RIGHT?
FULL JOINs
Comparing joins
Chaining FULL JOINs
Crossing into CROSS JOIN
Histories and languages
Choosing your join
Self joins
Comparing a country to itself
All joins on deck

In this module, you will learn about using set theory operations in SQL, with an introduction to UNION, UNION ALL, INTERSECT, and EXCEPT clauses. You’ll explore the predominant ways in which set theory operations differ from join operations.

Set theory for SQL Joins
UNION vs. UNION ALL
Comparing global economies
Comparing two set operations
At the INTERSECT
INTERSECT
Review UNION and INTERSECT
EXCEPT
You've got it, EXCEPT...
Calling all set operators

Introduction to Relational Databases in SQL

Explore the Role of SQL in Relational Database Management

There are a lot of reasons why SQL is the go-to query language for relational database management. The main one is that SQL is a powerful language that can handle large amounts of data in complex ways and solve tricky analytical questions. In this course, you will gain an introduction to relational databases in SQL.

Learn how to create tables and specify their relationships, as well as how to enforce data integrity. Additionally, discover other unique features of database systems, such as constraints.

Create Your First Database

You begin the section by creating your first database with simple SQL commands. Next, you’ll learn how to update your database as the structure changes by migrating data and deleting tables.

In the final module, you will glue tables in foreign keys together and establish relationships that greatly benefit your data quality. Finally, you will run ad hoc analyses on your new database.

Understand the Basics of Relational Databases

By the end of the course, you will gain a basic yet essential understanding of SQL relational databases. They are widely used in various data science fields (from healthcare to finance) and have consequently become one of the crucial languages for data scientists. If you're interested in deepening your knowledge further, you may be interested in our SQL for Database Administrators, SQL Server Developer, and SQL Server for Database Administrators Tracks!

4 Modules | 5+ Hours | 4+ Skills

Course Modules

In this module, you'll create your very first database with a set of simple SQL commands. Next, you'll migrate data from existing flat tables into that database. You'll also learn how meta-information about a database can be queried.

Introduction to relational databases
Attributes of relational databases
Query information_schema with SELECT
Tables: At the core of every database
CREATE your first few TABLEs
ADD a COLUMN with ALTER TABLE
Update your database as the structure changes
RENAME and DROP COLUMNs in affiliations
Migrate data with INSERT INTO SELECT DISTINCT
Delete tables with DROP TABLE

After building a simple database, it's now time to make use of the features. You'll specify data types in columns, enforce column uniqueness, and disallow NULL values in this module.

Better data quality with constraints
Types of database constraints
Conforming with data types
Type CASTs
Working with data types
Change types with ALTER COLUMN
Convert types USING a function
The not-null and unique constraints
Disallow NULL values with SET NOT NULL
What happens if you try to enter NULLs?
Make your columns UNIQUE with ADD CONSTRAINT

Now let’s get into the best practices of database engineering. It's time to add primary and foreign keys to the tables. These are two of the most important concepts in databases, and are the building blocks you’ll use to establish relationships between tables.

Keys and superkeys
Get to know SELECT COUNT DISTINCT
Identify keys with SELECT COUNT DISTINCT
Primary keys
Identify the primary key
ADD key CONSTRAINTs to the tables
Surrogate keys
Add a SERIAL surrogate key
CONCATenate columns to a surrogate key
Test your knowledge before advancing

In the final module, you'll leverage foreign keys to connect tables and establish relationships that will greatly benefit your data quality. And you'll run ad hoc analyses on your new database.

Model 1:N relationships with foreign keys
REFERENCE a table with a FOREIGN KEY
Explore foreign key constraints
JOIN tables linked by a foreign key
Model more complex relationships
Add foreign keys to the "affiliations" table
Populate the "professor_id" column
Drop "firstname" and "lastname"p
Referential integrity
Referential integrity violations
Change the referential integrity behavior of a key
Roundup
Count affiliations per university
Join all the tables together.

Database Design

A good database design is crucial for a high-performance application. Just like you wouldn't start building a house without the benefit of a blueprint, you need to think about how your data will be stored beforehand. Taking the time to design a database saves time and frustration later on, and a well-designed database ensures ease of access and retrieval of information. While choosing a design, a lot of considerations have to be accounted for. In this course, you'll learn how to process, store, and organize data in an efficient way. You'll see how to structure data through normalization and present your data with views. Finally, you'll learn how to manage your database and all of this will be done on a variety of datasets from book sales, car rentals, to music reviews.

4 Modules | 5+ Hours | 4 Skills

Course Modules

Start your journey into database design by learning about the two approaches to data processing, OLTP and OLAP. In this first module, you'll also get familiar with the different forms data can be stored in and learn the basics of data modeling.

OLTP and OLAP
OLAP vs. OLTP
Which is better?
Storing data
Name that data type!
Ordering ETL Tasks
Recommend a storage solution
Database design
Classifying data models
Deciding fact and dimension tables
Querying the dimensional model

In this module, you will take your data modeling skills to the next level. You'll learn to implement star and snowflake schemas, recognize the importance of normalization and see how to normalize databases to different extents.

Star and snowflake schema
Running from star to snowflake
Adding foreign keys
Extending the book dimension
Normalized and denormalized databases
Querying the star schema
Querying the snowflake schema
Updating countries
Extending the snowflake schema
Normal forms
Converting to 1NF
Converting to 2NF
Converting to 3NF

Get ready to work with views! In this module, you will learn how to create and query views. On top of that, you'll master more advanced capabilities to manage them and end by identifying the difference between materialized and non-materialized views.

Database views
Tables vs. views
Viewing views
Creating and querying a view
Managing views
Creating a view from other views
Granting and revoking access
Updatable views
Redefining a view
Materialized views
Materialized versus non-materialized
Creating and refreshing a materialized view
Managing materialized views

This final module ends with some database management-related topics. You will learn how to grant database access based on user roles, how to partition tables into smaller pieces, what to keep in mind when integrating data, and which DBMS fits your business needs best.

Database roles and access control
Create a role
GRANT privileges and ALTER attributes
Add a user role to a group role
Table partitioning
Reasons to partition
Partitioning and normalization
Creating vertical partitions
Creating horizontal partitions
Data integration
Data integration do's and dont's
Analyzing a data integration plan
Picking a Database Management System (DBMS)
SQL versus NoSQL
Choosing the right DBMS

Data Warehousing Concepts

This introductory and conceptual course will help you understand the fundamentals of data warehousing. You’ll gain a strong understanding of data warehousing basics through industry examples and real-world datasets. Some have forecasted that the global data warehousing market is expected to reach over $50 billion in 2028. This industry has continued to evolve over the years and has been a critical component of the data revolution for many organizations. There has never been a better time to learn about data warehousing.

4 Modules | 6+ Hours | 4+ Skills

Course Modules

Prepare for your data warehouse learning journey by grounding yourself in some foundational concepts. To begin this course, you’ll learn what a data warehouse is and how it compares and contrasts to similar-sounding technologies, data marts and data lakes. You’ll also learn how different personas help support the various stages of a data warehouse project.

What is a data warehouse?
Knowing the what and why
Possible use cases for a data warehouse for Zynga
What's the difference between data warehouses and data lakes?
Data warehouses vs. data lakes
Data warehouses vs. data marts
Deciding between a data lake, warehouse, and mart
Data warehouses support organizational analysis
Data warehouse life cycle
Support where needed
Who does what?

Now, you’ll gain a better understanding of data warehouse architecture by learning the typical layers of a data warehouse and how the presentation layer supports analysts. Additionally, you’ll learn about Bill Inmon and his top-down approach and how it compares to Ralph Kimball and his bottom-up approach. Finally, you’ll understand the difference between OLAP and OLTP systems.

What are the different layers of a data warehouse?
Ordering data warehouse layers
Understanding ETL
Pick the correct layer
The presentation layer
Stepping into a consultant's shoes
Supporting analysts and data scientist users
Data warehouse architectures
Top-down vs bottom-up
Characteristics of top-down and bottom-up
Choosing a top-down approach
OLAP and OLTP systems
The OLAP data cube
OLAP vs. OLTP scenarios
Understanding OLTP

Here, you’ll learn how to organize the data in your data warehouse with an excellent data model. First, you’ll cover the basics of data modeling by learning what a fact and a dimension table are and how you use them in the star and snowflake schemes. Then, you’ll review how to create a data model using Kimball's four-step process and how to deal with slowly changing dimensions.

Data warehouse data modeling
Understanding facts and dimensional tables
One starry and snowy night
Fact or dimension?
Kimball's four step process
Ordering Kimball's steps
Deciding on the grain
Selecting reasonable facts
Slowly changing dimensions
Pop-quiz on slow changes
Difference between type I, II, and III
Row vs. column data store
Categorizing row and column store scenarios
Why is column store faster?
Which queries are faster?

You’ll wrap up the course by learning the pros and cons of ETL and ELT processes and on-premise versus an in-cloud implementation. You’ll conclude by walking through an example, making key decisions on warehouse design and implementation.

ETL and ELT
ETL compared to ELT
Differences between ETL and ELT
Selecting ELT
Data cleaning
Cleaning operations
Finding truth in data transformations
Understanding data governance
On premise and cloud data warehouses
Knowing the differences between on-premise and cloud
Matching implementation to justification
Data warehouse design example
Connecting it all
Selecting bottom-up
Do you know it all?
Wrap-up!

Introduction to Snowflake

Dive into Snowflake's universe! This course will take you from its foundational architecture to mastering advanced SQL techniques. In our data-driven era, data warehousing is crucial. Snowflake, a cloud-native platform, is redefining scalability and performance. You will dive deep into its significance and learn what differentiates it from competitors like Google BigQuery, Amazon Redshift, Databricks, and Postgres.

Snowflake Basics

You'll start by uncovering Snowflake's distinct architecture. Grasp fundamental database concepts, including DDL (Data Definition Language) and DML (Data Manipulation Language). Dive deeper into the importance of data types, their conversions, and the specifics of Snowflake's functionality.

Advanced Techniques

Once you have the basics, it's time to elevate your skills. You'll delve into joins, subqueries, and query optimization. Play with semi-structured data, focusing on `JSON`.

Seal Your Snowflake Expertise

By the end of this course, you'll have a strong Snowflake understanding, ready to handle data and conduct deep SQL analyses. Whether you're an analyst, data engineer, or a curious tech enthusiast, this course offers a comprehensive view of Snowflake's capabilities, preparing you for the ever-evolving data-driven landscape!

3 Modules | 4+ Hours | 3+ Skills

Course Modules

In this module, you will learn about Snowflake, a cloud-based data warehouse that offers a unique architecture. We will discuss its key features, use cases, architecture, and how it compares to its competitors. You will also get started with SnowflakeSQL, exploring its basic syntax and similarities with PostgreSQL.

What is Snowflake?
Traditional vs. cloud data warehouse
Row versus column oriented database
Snowflake use cases
Introduction to Snowflake SQL
Snowflake Architecture
Decoupling Compute & Storage
Snowflake Architecture Layers
Virtual Warehouse
Snowflake Competitors and why use Snowflake
Data warehousing platforms
Features: Snowflake & its competitors
Snowflake SQL: Using SELECT and WHERE in Snowflake

In this module, you'll embark on a journey through Snowflake SQL. You'll start by discovering various methods to connect and interface with Snowflake. As you delve deeper, you'll grasp the significance of Snowflake Staging. Navigate the vast landscapes of Snowflake's databases using essential commands, and broaden your understanding of its data types, learning to convert them and drawing comparisons with Postgres. Conclude your exploration by mastering Snowflake's functions and honing data sorting and grouping techniques.

Connecting to Snowflake and DDL commands
Snowflake connections and DDL commands
Snowflake Staging
Snowflake database structures and DML
Loading data
DESCRIBE & SHOW
Snowflake data type and data type conversion
Data types
Datatype conversion
Functions, sorting, and grouping
String functions
Functions & Grouping
DATE & TIME

In module 3, you'll advance your skills in Snowflake SQL. You'll begin by exploring diverse join methods and building complex queries with subqueries and CTEs. We'll emphasize query optimization, showing you ways to enhance the speed and efficiency of your SQL tasks. At the end, we'll delve into handling semi-structured data like JSON.

Joining in Snowflake
NATURAL JOIN
The world of JOINS
Subquerying and Common Table Expressions
Subqueries
Understanding CTE
CTEs
Snowflake Query Optimization
Essentials of query optimization
Early filtering
Query history
Handling semi-structured data
PARSE_JSON & OBJECT_CONSTRUCT
Querying JSON data
JSONified
Wrap-up!

Understanding Data Visualization

Visualizing data using charts, graphs, and maps is one of the most impactful ways to communicate complex data. In this course, you’ll learn how to choose the best visualization for your dataset, and how to interpret common plot types like histograms, scatter plots, line plots and bar plots. You'll also learn about best practices for using colors and shapes in your plots, and how to avoid common pitfalls. Through hands-on exercises, you'll visually explore over 20 datasets including global life expectancies, Los Angeles home prices, ESPN's 100 most famous athletes, and the greatest hip-hop songs of all time.

4 Modules | 5+ Hours | 4+ Skills

Course Modules

In this module you’ll learn the value of visualizations, using real-world data on British monarchs, Australian salaries, Panamanian animals, and US cigarette consumption, to graphically represent the spread of a variable using histograms and box plots.

A plot tells a thousand words
Motivating visualization
Continuous vs. categorical variables
Histograms
Interpreting histograms
Adjusting bin width
Box plots
Interpreting box plots
Ordering box plots

You’ll learn how to interpret data plots and understand core data visualization concepts such as correlation, linear relationships, and log scales. Through interactive exercises, you’ll also learn how to explore the relationship between two continuous variables using scatter plots and line plots. You'll explore data on life expectancies, technology adoption, COVID-19 coronavirus cases, and Swiss juvenile offenders. Next you’ll be introduced to two other popular visualizations—bar plots and dot plots—often used to examine the relationship between categorical variables and continuous variables. Here, you'll explore famous athletes, health survey data, and the price of a Big Mac around the world.

Scatter plots
Interpreting scatter plots
Trends with scatter plots
Line plots
Interpreting line plots
Logarithmic scales for line plots
Line plots without dates on the x-axis
Bar plots
Interpreting bar plots
Interpreting stacked bar plots
Dot plots
Interpreting dot plots
Sorting dot plots

It’s time to make your insights even more impactful. Discover how you can add color and shape to make your data visualizations clearer and easier to understand, especially when you find yourself working with more than two variables at the same time. You'll explore Los Angeles home prices, technology stock prices, math anxiety, the greatest hiphop songs, scotch whisky preferences, and fatty acids in olive oil.

Higher dimensions
Another dimension for scatter plots
Another dimension for line plots
Using color
Eye-catching colors
Qualitative, sequential, diverging
Highlighting data
Plotting many variables at once
Interpreting pair plots
Interpreting correlation heatmaps
Interpreting parallel coordinates plots

In this final module, you’ll learn how to identify and avoid the most common plot problems. For example, how can you avoid creating misleading or hard to interpret plots, and will your audience understand what it is you’re trying to tell them? All will be revealed! You'll explore wind directions, asthma incidence, and seats in the German Federal Council.

Polar coordinates
Pie plots
Rose plots
Axes of evil
Bar plot axes
Dual axes
Sensory overload
Chartjunk
Multiple plots
Wrap-up!

How to Install PostgreSQL on Windows

In this tutorial, you will learn how to install PostgreSQL on two different operating systems - Windows and Mac.

PostgreSQL is an open-source and light-weighted relational database management system (RDBMS). It is widely popular among developers and has been well-accepted by the industry. This tutorial is going to show you how you can install a specific version of PostgreSQL on either Windows or Mac.

Install PostgreSQL

Data Engineer with Python Course

Advance your journey to becoming a Data Engineer with our Python-focused track, which is ideal for those with foundational SQL knowledge from our Associate Data Engineer track. This track dives deeper into the world of data engineering, emphasizing Python's role in automating and optimizing data processes. Starting with an understanding of cloud computing, you'll progress through Python programming from basics to advanced topics, including data manipulation, cleaning, and analysis. Engage in hands-on projects to apply what you've learned in real-world scenarios. You'll explore efficient coding practices, software engineering principles, and version control with Git, preparing you for professional data engineering challenges. Introduction to data pipelines and Airflow will equip you with the skills to design, schedule, and monitor complex data workflows!

Understanding Cloud Computing

Learn About Cloud Computing

Every day, we interact with the cloud—whether it’s using Google Drive, apps like Salesforce, or accessing our favorite websites. Cloud computing has become the norm for many companies, but what exactly is the cloud, and why is everyone rushing to adopt it?

Designed for complete novices, this cloud computing course breaks down what the cloud is and explains terminology such as scalability, latency, and high-availability.

Understand the Cloud Computing Basics

You’ll start by looking at the very basics of cloud computing, learning why it’s growing in popularity, and what makes it such a powerful option. You’ll explore the different service models businesses can choose from and how they're implemented in different situations.

As this is a no-code course, you can learn about cloud computing at a more conceptual level, exploring ideas of data protection, the various cloud providers, and how organizations can use cloud deployment.

Discover the Advantages of Cloud Computing

This course will demonstrate the many advantages of cloud computing, including ease of remote collaboration, how there are no hardware limitations, and reliable disaster recovery.

As you progress, you'll also discover the range of tools provided by major cloud providers and look at cloud computing examples from Amazon Web Services (AWS), Microsoft Azure, and Google Cloud. By the end of this course, you'll be able to confidently explain how cloud tools can increase productivity and save money, as well as ask the right questions about how to optimize your use of cloud tools.

3 Modules | 4+ Hours | 3+ Skills

Course Modules

In this module, you’ll learn why cloud computing is growing in popularity, how it compares to an on-premise solution, and what makes it so powerful. Next, you’ll learn about the three different service models—IaaS, PaaS, and SaaS—and how they each satisfy a unique set of business requirements.

What is cloud computing?
Understanding the cloud
Cloud vs. on-premise
Cloud computing services
The power of the cloud
Primary cloud services
Key characteristics
Cloud service models
Outsourcing IT services
IaaS, PaaS, or SaaS?
Level of abstraction

Now that you understand the power of cloud computing, it’s time to discover how it’s implemented using one of three deployment methods—private, public, and hybrid. You'll then find out how data protection regulations can affect cloud infrastructure. Lastly, you’ll meet the important roles within an organization that can make your cloud deployment a reality.

Cloud deployment models
Private or public?
Pick the best model
Regulations on the cloud
Time limits on storing data
Personal data
Cloud computing roles
Microsoft cloud skills report
Cloud roles
In other tracks

In the final module, you’ll be introduced to the major cloud infrastructure players, including AWS, Microsoft Azure, and Google Cloud. You’ll become more familiar with their market positioning, the products they offer, and who their main customers are and how they use cloud computing.

An overview of providers
The big three
The risk of vendor lock-in
Amazon Web Services
AWS or not AWS!
NerdWallet
Microsoft Azure
Which service to pick?
The Ottawa hospital
Google Cloud
Lush migration
True or false?
Cloud providers and their services
Wrap-up!

Introduction to Python for Developers

What is Python and why use it?

Learn all about Python a versatile and powerful language, perfect for software development. No prior experience required!

Learn the fundamentals

Perform calculations, store and manipulate information in variables using various data structures, and write descriptive comments describing your code to others.

Build your workflow

Use comparison operators in combination with for and while loops to execute code based on conditions being met, enabling a fully customizable workflow.

3 Modules | 4+ Hours | 3+ Skills

Course Modules

Discover the wonders of Python - why it is popular and how to use it. No prior knowledge required!

What is Python?
The benefits of Python
Use-cases for Python
How to run Python code
Working with Python files
Python as a calculator
Advanced calculations
Variables and data types
Naming conventions
Checking data types
Working with variables
Checking and updating conditions

Learn how and when to use Python's built-in data structures, including lists, dictionaries, sets, and tuples!

Working with strings
Multi-line strings
Modifying string variables
Lists
Building a party playlist
Subsetting lists
Dictionaries
Building a playlist dictionary
Working with dictionaries
Sets and tuples
Last quarter's revenue
DJ Sets
Choosing a data structure

Conditional statements and operators, for and while loops all combine to enable customized workflows for your needs!

Conditional statements and operators
Conditional statements
Checking inflation
On the rental market
For loops
Looping through a list
Updating a variable with for loops
Conditional looping with a dictionary
While loops
Breaking a while loop
Converting to a while loop
Conditional while loops
Building a workflow
Appending to a list
Book genre popularity
Working with keywords
Recap!

Intermediate Python for Developers

Elevate your Python skills to the next level

This course will delve deeper into Python's rich ecosystem, focusing on essential aspects such as built-in functions, modules, and packages. You'll learn how to harness the power of Python's built-in functions effectively, enabling you to streamline your code. The course will introduce you to the power of Python's modules, empowering you to develop quicker by reusing existing code rather than writing your own from scratch every time! You'll see how people have extended modules to create their own open-source software, known as packages, discovering how to download, import, and work with packages in your programs.

Master custom functions

You'll learn best practices for defining functions, including comprehensive knowledge of how to write user-friendly docstrings to ensure clarity and maintainability. You'll dive into advanced concepts such as default arguments, enabling you to create versatile functions with predefined values. The course will equip you with the knowledge and skills to handle arbitrary positional and keyword arguments effectively, enhancing the flexibility and usability of your functions. By understanding how to work with these arguments, you'll be able to create more robust and adaptable solutions to various programming challenges.

Debug your code and use error handling techniques

You'll learn to interpret error messages, including tracebacks from incorrectly using functions from packages. You'll use keywords and techniques to adapt your custom functions, effectively handling errors and providing bespoke feedback messages to developers who misuse your code!

3 Modules | 3+ Hours | 3+ Skills

Course Modules

Discover Python's rich ecosystem of built-in functions and modules, plus how to download and work with packages.

Built-in functions
Get some assistance
Counting the elements
Performing calculations
Modules
What is a module?
Working with the string module
Importing from a module
Packages
Package or module?
Working with pandas
Performing calculations with pandas

Learn the fundamentals of functions, from Python's built-in functions to creating your own from scratch!

Defining a custom function
Custom function syntax
Cleaning text data
Building a password checker
Default and keyword arguments
Positional versus keyword arguments
Adding a keyword argument
Data structure converter function
Docstrings
Single-line docstrings
Multi-line docstrings
Arbitrary arguments
Adding arbitrary arguments
Arbitrary keyword arguments

Build lambda functions on the fly, and discover how to error-proof your code!

Lambda functions
Adding tax
Calling lambda in-line
Lambda functions with iterables
Introduction to errors
Debugging code
Module and package tracebacks
Fixing an issue
Error handling
Avoiding errors
Returning errors
Recap!

Introduction to Importing Data in Python

As a data scientist, you will need to clean data, wrangle and munge it, visualize it, build predictive models, and interpret these models. Before you can do so, however, you will need to know how to get data into Python. In this course, you'll learn the many ways to import data into Python: from flat files such as .txt and .csv; from files native to other software such as Excel spreadsheets, Stata, SAS, and MATLAB files; and from relational databases such as SQLite and PostgreSQL.

3 Modules | 4+ Hours | 3+ Skills

Course Modules

In this module, you'll learn how to import data into Python from all types of flat files, which are a simple and prevalent form of data storage. You've previously learned how to use NumPy and pandas—you will learn how to use these packages to import flat files and customize your imports.

Welcome to the course!
Importing entire text files
Importing text files line by line
The importance of flat files in data science
Pop quiz: what exactly are flat files?
Why we like flat files and the Zen of Python
Importing flat files using NumPy
Using NumPy to import flat files
Customizing your NumPy import
Importing different datatypes
Importing flat files using pandas
Using pandas to import flat files as DataFrames (1)
Using pandas to import flat files as DataFrames (2)
Customizing your pandas import
Final thoughts on data import

You've learned how to import flat files, but there are many other file types you will potentially have to work with as a data scientist. In this module, you'll learn how to import data into Python from a wide array of important file types. These include pickled files, Excel spreadsheets, SAS and Stata files, HDF5 files, a file type for storing large quantities of numerical data, and MATLAB files.

Introduction to other file types
Not so flat any more
Loading a pickled file
Listing sheets in Excel files
Importing sheets from Excel files
Customizing your spreadsheet import
Importing SAS/Stata files using pandas
How to import SAS7BDAT
Importing SAS files
Using read_stata to import Stata files
Importing Stata files
Importing HDF5 files
Using File to import HDF5 files
Using h5py to import HDF5 files
Extracting data from your HDF5 file
Importing MATLAB files
Loading .mat files
The structure of .mat in Python

In this module, you'll learn how to extract meaningful data from relational databases, an essential skill for any data scientist. You will learn about relational models, how to create SQL queries, how to filter and order your SQL records, and how to perform advanced queries by joining database tables.

Introduction to relational databases
Pop quiz: The relational model
Creating a database engine in Python
Creating a database engine
What are the tables in the database?
Querying relational databases in Python
The Hello World of SQL Queries!
Customizing the Hello World of SQL Queries
Filtering your database records using SQL's WHERE
Ordering your SQL records with ORDER BY
Querying relational databases directly with pandas
Pandas and The Hello World of SQL Queries!
Pandas for more complex querying
Advanced querying: exploiting table relationships
The power of SQL lies in relationships between tables: INNER JOIN
Filtering your INNER JOIN
Final Thoughts

Intermediate Importing Data in Python

As a data scientist, you will need to clean data, wrangle and munge it, visualize it, build predictive models and interpret these models. Before you can do so, however, you will need to know how to get data into Python. In the prequel to this course, you learned many ways to import data into Python: from flat files such as .txt and .csv; from files native to other software such as Excel spreadsheets, Stata, SAS, and MATLAB files; and from relational databases such as SQLite and PostgreSQL. In this course, you'll extend this knowledge base by learning to import data from the web and by pulling data from Application Programming Interfaces— APIs—such as the Twitter streaming API, which allows us to stream real-time tweets.

3 Modules | 4+ Hours | 3+ Skills

Course Modules

The web is a rich source of data from which you can extract various types of insights and findings. In this module, you will learn how to get data from the web, whether it is stored in files or in HTML. You'll also learn the basics of scraping and parsing web data.

Importing flat files from the web
Importing flat files from the web: your turn!
Opening and reading flat files from the web
Importing non-flat files from the web
HTTP requests to import files from the web
Performing HTTP requests in Python using urllib
Printing HTTP request results in Python using urllib
Performing HTTP requests in Python using requests
Scraping the web in Python
Parsing HTML with BeautifulSoup
Turning a webpage into data using BeautifulSoup: getting the text
Turning a webpage into data using BeautifulSoup: getting the hyperlinks

In this module, you will gain a deeper understanding of how to import data from the web. You will learn the basics of extracting data from APIs, gain insight on the importance of APIs, and practice extracting data by diving into the OMDB and Library of Congress APIs.

Introduction to APIs and JSONs
Pop quiz: What exactly is a JSON?
Loading and exploring a JSON
Pop quiz: Exploring your JSON
APIs and interacting with the world wide web
Pop quiz: What's an API?
API requests
JSON–from the web to Python
Checking out the Wikipedia API

In this module, you will consolidate your knowledge of interacting with APIs in a deep dive into the Twitter streaming API. You'll learn how to stream real-time Twitter data, and how to analyze and visualize it.

The Twitter API and Authentication
Streaming tweets
Load and explore your Twitter data
Twitter data to DataFrame
A little bit of Twitter text analysis
Plotting your Twitter data
Final Thoughts

Data Cleaning in Python

Discover How to Clean Data in Python

It's commonly said that data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time analyzing it. Data cleaning is an essential step for every data scientist, as analyzing dirty data can lead to inaccurate conclusions.

In this course, you will learn how to identify, diagnose, and treat various data cleaning problems in Python, ranging from simple to advanced. You will deal with improper data types, check that your data is in the correct range, handle missing data, perform record linkage, and more!

Learn How to Clean Different Data Types

The first module of the course explores common data problems and how you can fix them. You will first understand basic data types and how to deal with them individually. After, you'll apply range constraints and remove duplicated data points.

The last module explores record linkage, a powerful tool to merge multiple datasets. You'll learn how to link records by calculating the similarity between strings. Finally, you'll use your new skills to join two restaurant review datasets into one clean master dataset.

Gain Confidence in Cleaning Data

By the end of the course, you will gain the confidence to clean data from various types and use record linkage to merge multiple datasets. Cleaning data is an essential skill for data scientists. If you want to learn more about cleaning data in Python and its applications, check out the following tracks: Data Scientist with Python and Importing & Cleaning Data with Python.

4 Modules | 5+ Hours | 4 Skills

Course Modules

In this module, you'll learn how to overcome some of the most common dirty data problems. You'll convert data types, apply range constraints to remove future data points, and remove duplicated data points to avoid double-counting.

Data type constraints
Common data types
Numeric data or ... ?
Summing strings and concatenating numbers
Data range constraints
Tire size constraints
Back to the future
Uniqueness constraints
How big is your subset?
Finding duplicates
Treating duplicates

Categorical and text data can often be some of the messiest parts of a dataset due to their unstructured nature. In this module, you’ll learn how to fix whitespace and capitalization inconsistencies in category labels, collapse multiple categories into one, and reformat strings for consistency.

Membership constraints
Members only
Finding consistency
Categorical variables
Categories of errors
Inconsistent categories
Remapping categories
Cleaning text data
Removing titles and taking names
Keeping it descriptive

In this module, you’ll dive into more advanced data cleaning problems, such as ensuring that weights are all written in kilograms instead of pounds. You’ll also gain invaluable skills that will help you verify that values have been added correctly and that missing values don’t negatively impact your analyses.

Uniformity
Ambiguous dates
Uniform currencies
Uniform dates
Cross field validation
Cross field or no cross field?
How's our data integrity?
Completeness
Is this missing at random?
Missing investors
Follow the money

Record linkage is a powerful technique used to merge multiple datasets together, used when values have typos or different spellings. In this module, you'll learn how to link records by calculating the similarity between strings—you’ll then use your new skills to join two restaurant review datasets into one clean master dataset.

Comparing strings
Minimum edit distance
The cutoff point
Remapping categories II
Generating pairs
To link or not to link?
Pairs of restaurants
Similar restaurants
Linking DataFrames
Getting the right index
Linking them together!
Wrap-up!

Writing Efficient Python Code

As a Data Scientist, the majority of your time should be spent gleaning actionable insights from data -- not waiting for your code to finish running. Writing efficient Python code can help reduce runtime and save computational resources, ultimately freeing you up to do the things you love as a Data Scientist. In this course, you'll learn how to use Python's built-in data structures, functions, and modules to write cleaner, faster, and more efficient code. We'll explore how to time and profile code in order to find bottlenecks. Then, you'll practice eliminating these bottlenecks, and other bad design patterns, using Python's Standard Library, NumPy, and pandas. After completing this course, you'll have the necessary tools to start writing efficient Python code!

4 Modules | 5+ Hours | 4 Skills

Course Modules

In this module, you'll learn what it means to write efficient Python code. You'll explore Python's Standard Library, learn about NumPy arrays, and practice using some of Python's built-in tools. This module builds a foundation for the concepts covered ahead.

Welcome!
Pop quiz: what is efficient
A taste of things to come
Zen of Python
Building with built-ins
Built-in practice: range()
Built-in practice: enumerate()
Built-in practice: map()
The power of NumPy arrays
Practice with NumPy arrays
Bringing it all together: Festivus!

In this module, you will learn how to gather and compare runtimes between different coding approaches. You'll practice using the line_profiler and memory_profiler packages to profile your code base and spot bottlenecks. Then, you'll put your learnings to practice by replacing these bottlenecks with efficient Python code.

Examining runtime
Using %timeit: your turn!
Using %timeit: specifying number of runs and loops
Using %timeit: formal name or literal syntax
Using cell magic mode (%%timeit)
Code profiling for runtime
Pop quiz: steps for using %lprun
Using %lprun: spot bottlenecks
Using %lprun: fix the bottleneck
Code profiling for memory usage
Pop quiz: steps for using %mprun
Using %mprun: Hero BMI
Using %mprun: Hero BMI 2.0
Bringing it all together: Star Wars profiling

This module covers more complex efficiency tips and tricks. You'll learn a few useful built-in modules for writing efficient code and practice using set theory. You'll then learn about looping patterns in Python and how to make them more efficient.

Efficiently combining, counting, and iterating
Combining Pokémon names and types
Counting Pokémon from a sample
Combinations of Pokémon
Set theory
Comparing Pokédexes
Searching for Pokémon
Gathering unique Pokémon
Eliminating loops
Gathering Pokémon without a loop
Pokémon totals and averages without a loop
Writing better loops
One-time calculation loop
Holistic conversion loop
Bringing it all together: Pokémon z-scores

This module offers a brief introduction on how to efficiently work with pandas DataFrames. You'll learn the various options you have for iterating over a DataFrame. Then, you'll learn how to efficiently apply functions to data stored in a DataFrame.

Intro to pandas DataFrame iteration
Iterating with .iterrows()
Run differentials with .iterrows()
Another iterator method: .itertuples()
Iterating with .itertuples()
Run differentials with .itertuples()
pandas alternative to looping
Analyzing baseball stats with .apply()
Settle a debate with .apply()
Optimal pandas iterating
Replacing .iloc with underlying arrays
Bringing it all together: Predict win percentage
Wrap up!

Streamlined Data Ingestion with pandas

Before you can analyze data, you first have to acquire it. This course teaches you how to build pipelines to import data kept in common storage formats. You’ll use pandas, a major Python library for analytics, to get data from a variety of sources, from spreadsheets of survey responses, to a database of public service requests, to an API for a popular review site. Along the way, you’ll learn how to fine-tune imports to get only what you need and to address issues like incorrect data types. Finally, you’ll assemble a custom dataset from a mix of sources.

4 Modules | 5+ Hours | 4 Skills

Course Modules

Practice using pandas to get just the data you want from flat files, learn how to wrangle data types and handle errors, and look into some U.S. tax data along the way.

Introduction to flat files
Get data from CSVs
Get data from other flat files
Modifying flat file imports
Import a subset of columns
Import a file in chunks
Handling errors and missing data
Specify data types
Set custom NA values
Skip bad data

Automate data imports from that staple of office life, Excel files. Import part or all of a workbook and ensure boolean and datetime data are properly loaded, all while learning about how other people are learning to code.

Introduction to spreadsheets
Get data from a spreadsheet
Load a portion of a spreadsheet
Getting data from multiple worksheets
Select a single sheet
Select multiple sheets
Work with multiple spreadsheets
Modifying imports: true/false data
Set Boolean columns
Set custom true/false values
Modifying imports: parsing dates
Parse simple dates
Get datetimes from multiple columns
Parse non-standard date formats

Combine pandas with the powers of SQL to find out just how many problems New Yorkers have with their housing. This module features introductory SQL topics like WHERE clauses, aggregate functions, and basic joins.

Introduction to databases
Connect to a database
Load entire tables
Refining imports with SQL queries
Selecting columns with SQL
Selecting rows
Filtering on multiple conditions
More complex SQL queries
Getting distinct values
Counting in groups
Working with aggregate functions
Loading multiple tables with joins
Joining tables
Joining and filtering
Joining, filtering, and aggregating

Learn how to work with JSON data and web APIs by exploring a public dataset and getting cafe recommendations from Yelp. End by learning some techniques to combine datasets once they have been loaded into data frames.

Introduction to JSON
Load JSON data
Work with JSON orientations
Introduction to APIs
Get data from an API
Set API parameters
Set request headers
Working with nested JSONs
Flatten nested JSONs
Handle deeply nested data
Combining multiple datasets
Concatenate dataframes
Merge dataframes
Wrap-up!

Learn Git

This course introduces learners to version control using Git. You will discover the importance of version control when working on data science projects and explore how you can use Git to track files, compare differences, modify and save files, undo changes, and allow collaborative development through the use of branches. You will gain an introduction to the structure of a repository, how to create new repositories and clone existing ones, and show how Git stores data. By working through typical data science tasks, you will gain the skills to handle conflicting files!

4 Modules | 5+ Hours | 4 Skills

Course Modules

In the first module, you’ll learn what version control is and why it is essential for data projects. Then, you’ll discover what Git is and how to use it for a version control workflow.

Introduction to version control with Git
Using the shell
Checking the version of Git
Saving files
Where does Git store information?
The Git workflow
Adding a file
Adding multiple files
Comparing files
What has changed?
What is going to be committed?
What's in the staging area?

Next, you’ll examine how Git stores data, learn essential commands to compare files and repositories at different times, and understand the process for restoring earlier versions of files in your data projects.

Storing data with Git
Interpreting the commit structure
Viewing a repository's history
Viewing a specific commit
Viewing changes
Comparing to the second most recent commit
Comparing commits
Who changed what?
Undoing changes before committing
How to unstage a file
Undoing changes to unstaged files
Undoing all changes
Restoring and reverting
Restoring an old version of a repo
Deleting untracked files
Restoring an old version of a file

In this module, you'll learn tips and tricks for configuring Git to make you more efficient! You'll also discover branches, identify how to create and switch to different branches, compare versions of files between branches, merge branches together, and deal with conflicting files across branches.

Configuring Git
Modifying your email address in Git
Creating an alias
Ignoring files
Branches
Branching and merging
Creating new branches
Checking the number of branches
Comparing branches
Working with branches
Switching branches
Merging two branches
Handling conflict
Recognizing conflict syntax
Resolving a conflict

This final module is all about collaboration! You'll gain an introduction to remote repositories and learn how to work with them to synchronize content between the cloud and your local computer. You'll also see how to create new repositories and clone existing ones, along with discovering a workflow to minimize the risk of conflicts between local and remote repositories.

Creating repos
Setting up a new repo
Converting an existing project
Working with remotes
Cloning a repo
Defining and identifying remotes
Gathering from a remote
Fetching from a remote
Pulling from a remote
Pushing to a remote
Pushing to a remote repo
Handling push conflicts
Wrap up!

Software Engineering Principles in Python

Data scientists can experience huge benefits by learning concepts from the field of software engineering, allowing them to more easily reutilize their code and share it with collaborators. In this course, you'll learn all about the important ideas of modularity, documentation, & automated testing, and you'll see how they can help you solve Data Science problems quicker and in a way that will make future you happy. You'll even get to use your acquired software engineering chops to write your very own Python package for performing text analytics.

4 Modules | 5+ Hours | 4 Skills

Course Modules

Why should you as a Data Scientist care about Software Engineering concepts? Here we'll cover specific Software Engineering concepts and how these important ideas can revolutionize your Data Science workflow!

Python, data science, & software engineering
The big ideas
Python modularity in the wild
Introduction to packages & documentation
Installing packages with pip
Leveraging documentation
Conventions and PEP 8
Using pycodestyle
Conforming to PEP 8
PEP 8 in documentation

Become a fully fledged Python package developer by writing your first package! You'll learn how to structure and write Python code that you can be installed, used, and distributed just like famous packages such as NumPy and Pandas.

Writing your first package
Minimal package requirements
Naming packages
Recognizing packages
Adding functionality to packages
Adding functionality to your package
Using your package's new functionality
Making your package portable
Writing requirements.txt
Installing package requirements
Creating setup.py
Listing requirements in setup.py

Object Oriented Programming is a staple of Python development. By leveraging classes and inheritance your Python package will become a much more powerful tool for your users.

Adding classes to a package
Writing a class for your package
Using your package's class
Adding functionality to classes
Writing a non-public method
Using your class's functionality
Classes and the DRY principle
Using inheritance to create a class
Adding functionality to a child class
Using your child class
Multilevel inheritance
Exploring with dir and help
Creating a grandchild class
Using inherited methods

You've now written a fully functional Python package for text analysis! To make maintaining your project as easy as possible we'll leverage best practices around concepts such as documentation and unit testing.

Documentation
Identifying good comments
Identifying proper docstrings
Writing docstrings
Readability counts
Using good function names
Using good variable names
Refactoring for readability
Unit testing
Using doctest
Using pytest
Documentation & testing in practice
Documenting classes for Sphinx
Identifying tools
Final Thoughts

ETL and ELT in Python

Empowering Analytics with Data Pipelines

Data pipelines are at the foundation of every strong data platform. Building these pipelines is an essential skill for data engineers, who provide incredible value to a business ready to step into a data-driven future. This introductory course will help you hone the skills to build effective, performant, and reliable data pipelines.

Building and Maintaining ETL Solutions

Throughout this course, you’ll dive into the complete process of building a data pipeline. You’ll grow skills leveraging Python libraries such as pandas and json to extract data from structured and unstructured sources before it’s transformed and persisted for downstream use. Along the way, you’ll develop confidence tools and techniques such as architecture diagrams, unit-tests, and monitoring that will help to set your data pipelines out from the rest. As you progress, you’ll put your new-found skills to the test with hands-on exercises.

Supercharge Data Workflows

After completing this course, you’ll be ready to design, develop and use data pipelines to supercharge your data workflow in your job, new career, or personal project.

4 Modules | 5+ Hours | 4 Skills

Course Modules

Get ready to discover how data is collected, processed, and moved using data pipelines. You will explore the qualities of the best data pipelines, and prepare to design and build your own.

Introduction to ETL and ELT Pipelines
Running an ETL Pipeline
ELT in Action
ETL and ELT Pipelines
Building ETL and ELT Pipelines
Building an ETL Pipeline
The "T" in ELT
Extracting, Transforming, and Loading Student Scores Data

Dive into leveraging pandas to extract, transform, and load data as you build your first data pipelines. Learn how to make your ETL logic reusable, and apply logging and exception handling to your pipelines.

Extracting data from structure sources
Extracting data from parquet files
Pulling data from SQL databases
Building functions to extract data
Transforming data with pandas
Filtering pandas DataFrames
Transforming sales data with pandas
Validating data transformations
Persisting data with pandas
Loading sales data to a CSV file
Customizing a CSV file
Persisting data to files
Monitoring a data pipeline
Logging within a data pipeline
Handling exceptions when loading data
Monitoring and alerting within a data pipeline

Supercharge your workflow with advanced data pipelining techniques, such as working with non-tabular data and persisting DataFrames to SQL databases. Discover tooling to tackle advanced transformations with pandas, and uncover best-practices for working with complex data.

Extracting non-tabular data
Ingesting JSON data with pandas
Reading JSON data into memory
Transforming non-tabular data
Iterating over dictionaries
Parsing data from dictionaries
Transforming JSON data
Transforming and cleaning DataFrames
Advanced data transformation with pandas
Filling missing values with pandas
Grouping data with pandas
Applying advanced transformations to DataFrames
Loading data to a SQL database with pandas
Loading data to a Postgres database
Validating data loaded to a Postgres Database

In this final module, you’ll create frameworks to validate and test data pipelines before shipping them into production. After you’ve tested your pipeline, you’ll explore techniques to run your data pipeline end-to-end, all while allowing for visibility into pipeline performance.

Manually testing a data pipeline
Testing data pipelines
Validating a data pipeline at "checkpoints"
Testing a data pipeline end-to-end
Unit-testing a data pipeline
Validating a data pipeline with assert
Writing unit tests with pytest
Creating fixtures with pytest
Unit testing a data pipeline with fixtures
Running a data pipeline in production
Orchestration and ETL tools
Data pipeline architecture patterns
Running a data pipeline end-to-end
Wrap-Up!

Introduction to Apache Airflow in Python

Now Updated to Apache Airflow 2.7 - Delivering data on a schedule can be a manual process. You write scripts, add complex cron tasks, and try various ways to meet an ever-changing set of requirements—and it's even trickier to manage everything when working with teammates. Apache Airflow can remove this headache by adding scheduling, error handling, and reporting to your workflows. In this course, you'll master the basics of Apache Airflow and learn how to implement complex data engineering pipelines in production. You'll also learn how to use Directed Acyclic Graphs (DAGs), automate data engineering workflows, and implement data engineering tasks in an easy and repeatable fashion—helping you to maintain your sanity.

4 Modules | 5+ Hours | 4 Skills

Course Modules

In this module, you’ll gain a complete introduction to the components of Apache Airflow and learn how and why you should use them.

Introduction to Apache Airflow
Testing a task in Airflow
Examining Airflow commands
Airflow DAGs
Defining a simple DAG
Working with DAGs and the Airflow shell
Troubleshooting DAG creation
Airflow web interface
Starting the Airflow webserver
Navigating the Airflow UI
Examining DAGs with the Airflow UI

What’s up DAG? Now it’s time to learn the basics of implementing Airflow DAGs. Through hands-on activities, you’ll learn how to set up and deploy operators, tasks, and scheduling.

Airflow operators
Defining a BashOperator task
Multiple BashOperators
Airflow tasks
Define order of BashOperators
Determining the order of tasks
Troubleshooting DAG dependencies
Additional operators
Using the PythonOperator
More PythonOperators
EmailOperator and dependencies
Airflow scheduling
Schedule a DAG via Python
Deciphering Airflow schedules
Troubleshooting DAG runs

In this module, you’ll learn how to save yourself time using Airflow components such as sensors and executors while monitoring and troubleshooting Airflow workflows.

Airflow sensors
Sensors vs operators
Sensory deprivation
Airflow executors
Determining the executor
Executor implications
Debugging and troubleshooting in Airflow
DAGs in the bag
Missing DAG
SLAs and reporting in Airflow
Defining an SLA
Defining a task SLA
Generate and email a report
Adding status emails

Put it all together. In this final module, you’ll apply everything you've learned to build a production-quality workflow in Airflow.

Working with templates
Creating a templated BashOperator
Templates with multiple arguments
More templates
Using lists with templates
Understanding parameter options
Sending templated emails
Branching
Define a BranchPythonOperator
Branch troubleshooting
Creating a production pipeline
Creating a production pipeline #1
Creating a production pipeline #2
Adding the final changes to your pipeline
Wrap-up!

COMPLETE DATA ENGINEER WITH SQL & PYTHON COST

United States

$899.99

United Kingdom

£799.99

Career and Certifications

GreaterHeight Academy's Certificate Holders also prepared work at companies like:

Our Advisor is just a CALL away

+1 5169831065 +447474275645

Available 24x7 for your queries

Talk to our advisors

Our advisors will get in touch with you in the next 24 hours.

Get Advice

FAQs

Complete Data Analysis & Visualization with Python Course

Python, created by Guido van Rossum in 1991, is a high-level, readable programming language known for its simplicity. It's versatile, with applications in web development, data analysis, AI, and more. Python's extensive standard library and rich ecosystem enhance its capabilities. It's cross-platform compatible and supported by a large community. Python's popularity has grown, making it widely used in diverse industries.

A Python developer is a software developer or programmer who specializes in using the Python programming language for creating applications, software, or solutions. They have expertise in writing Python code, understanding the language's syntax, libraries, and frameworks. Python developers are skilled in utilizing Python's features to develop web applications, data analysis tools, machine learning models, automation scripts, and other software solutions.
They work in various industries, collaborating with teams or independently to design, implement, test, and maintain Python-based projects. Python developers often possess knowledge of related technologies and tools to enhance their development process.

Python Developer Masters Program is a structured learning path recommended by leading industry experts and ensures that you transform into a proficient Python Developer. Being a full fledged Python Developer requires you to master multiple technologies and this program aims at providing you an in-depth knowledge of the entire Python programming practices. Individual courses at GreaterHeight Academy focus on specialization in one or two specific skills; however, if you intend to become a master in Python programming then this is your go to path to follow.

Yes. But you can also raise a ticket with the dedicated support team at any time. If your query does not get resolved through email, we can also arrange one-on-one sessions with our support team. However, our support is provided for a period of Twelve Weeks from the start date of your course.

There are several reasons why becoming a Python developer can be a rewarding career choice. Here are a few:

Versatility and Popularity: Python is a versatile programming language that can be used for various purposes, such as web development, data analysis, machine learning, artificial intelligence, scientific computing, and more. It has gained immense popularity in recent years due to its simplicity, readability, and extensive library ecosystem. Python is widely used in both small-scale and large-scale projects, making it a valuable skill in the job market.
Ease of Learning: Python has a clean and intuitive syntax that emphasizes readability, which makes it relatively easy to learn compared to other programming languages. Its simplicity allows beginners to grasp the fundamentals quickly and start building useful applications in a relatively short amount of time. This accessibility makes Python an attractive choice for both novice and experienced programmers.
Rich Ecosystem and Libraries: Python offers a vast collection of libraries and frameworks that can accelerate development and simplify complex tasks. For example, Django and Flask are popular web development frameworks that provide robust tools for building scalable and secure web applications. NumPy, Pandas, and Matplotlib are widely used libraries for data analysis and visualization. TensorFlow and PyTorch are prominent libraries for machine learning and deep learning. These libraries, among many others, contribute to Python's efficiency and effectiveness as a development language.
Job Opportunities: The demand for Python developers has been steadily growing in recent years. Many industries, including technology, finance, healthcare, and academia, rely on Python for various applications. By becoming a Python developer, you open up a wide range of career opportunities, whether you choose to work for a large corporation, a startup, or even as a freelancer. Additionally, Python's versatility allows you to explore different domains and switch roles if desired.
Community and Support: Python has a vibrant and supportive community of developers worldwide. This community actively contributes to the language's development, creates open-source libraries, and provides assistance through forums, online communities, and resources.

There are no prerequisites for enrollment to this Masters Program. Whether you are an experienced professional working in the IT industry or an aspirant planning to enter the world of Python programming, this masters program is designed and developed to accommodate various professional backgrounds.

Python Developer Masters Program has been curated after thorough research and recommendations from industry experts. It will help you differentiate yourself with multi-platform fluency and have real-world experience with the most important tools and platforms. GreaterHeight Academy will be by your side throughout the learning journey - We’re Ridiculously Committed.

The recommended duration to complete this Python Developer Masters Program is about 20 weeks, however, it is up to the individual to complete this program at their own pace.

The roles and responsibilities of a Python developer may vary depending on the specific job requirements and industry. However, here are some common tasks and responsibilities associated with the role:

Developing Applications: Python developers are responsible for designing, coding, testing, and debugging applications using Python programming language. This includes writing clean, efficient, and maintainable code to create robust software solutions.
Web Development: Python is widely used for web development. As a Python developer, you may be involved in building web applications, using frameworks like Django or Flask. This includes developing backend logic, integrating databases, handling data processing, and ensuring the smooth functioning of the web application.
Data Analysis and Visualization: Python offers powerful libraries like NumPy, Pandas, and Matplotlib, which are extensively used for data analysis and visualization. Python developers may be responsible for manipulating and analyzing large datasets, extracting insights, and presenting them visually.
Machine Learning and AI: Python is a popular choice for machine learning and artificial intelligence projects. Python developers may work on implementing machine learning algorithms, training models, and integrating them into applications. This involves using libraries like TensorFlow, PyTorch, or scikit-learn.
Collaborating and Teamwork: Python developers often work as part of a development team. They collaborate with other team members, including designers, frontend developers, project managers, and stakeholders. Effective communication and teamwork skills are crucial to ensure smooth project execution.
Documentation: Python developers are expected to document their code, providing clear explanations and instructions for others who may work on or maintain the codebase in the future. Documentation helps in understanding the code and facilitating collaboration.
Continuous Learning: Technology is constantly evolving, and as a Python developer, you need to stay updated with the latest advancements, libraries, frameworks, and best practices. Continuous learning and self-improvement are essential to excel in this role.

The Python Developer training course is for those who want to fast-track their Python programming career. This Python Developer Masters Program will benefit people working in the following roles:

Freshers
Engineers
IT professionals
Data Scientist
Machine Learning Engineer
AI Engineer
Business analysts
Data analysts

Top companies such as Microsoft, Google, Meta, Citibank, Well Fargo, and many more are actively hiring certified Python professionals at various positions.

On completing this Python Developer Masters Program, you’ll be eligible for the roles like: Python Developer, Web Developer, Data Analyst, Data Scientist, Software Engineer and many more.

There is undoubtedly great demand for data analytics as 96% of organizations seek to hire Data Analysts. The most significant data analyst companies that employ graduates who wish to have a data analyst career are Manthan, SAP, Oracle, Accenture Analytics, Alteryx, Qlik, Mu Sigma Analytics, Fractal Analytics, and Tiger Analytics. Professional Data Analyst training will make you become a magician of any organization, and you will spin insights by playing with big data.

A successful data analyst possesses a combination of technical skills and leadership skills.

Technical skills include knowledge of database languages such as SQL, R, or Python; spreadsheet tools such as Microsoft Excel or Google Sheets for statistical analysis; and data visualization software such as Tableau or Qlik. Mathematical and statistical skills are also valuable to help gather, measure, organize, and analyze data while using these common tools.
Leadership skills prepare a data analyst to complete decision-making and problem-solving tasks. These abilities allow analysts to think strategically about the information that will help stakeholders make data-driven business decisions and to communicate the value of this information effectively. For example, project managers rely on data analysts to track the most important metrics for their projects, to diagnose problems that may be occurring, and to predict how different courses of action could address a problem.

Career openings are available practically from all industries, from telecommunications to retail, banking, healthcare, and even fitness. Without extensive training and effort, it isn't easy to get data analyst career benefits. So, earning our Data Analyst certification will allow you to keep up-to-date on recent trends in the industry.

Yes, we do. We will discuss all possible technical interview questions and answers during the training program so that you can prepare yourself for interview.

No. Any abuse of copyright is taken seriously. Thanks for your understanding on this one.

Yes, we would be providing you with the certificate of completion of the program once you have successfully submitted all the assessment and it has been verified by our subject matter experts.

GreaterHeight is offering you the most updated, relevant, and high-value real-world projects as part of the training program. This way, you can implement the learning that you have acquired in real-world industry setup. All training comes with multiple projects that thoroughly test your skills, learning, and practical knowledge, making you completely industry ready.
You will work on highly exciting projects in the domains of high technology, ecommerce, marketing, sales, networking, banking, insurance, etc. After completing the projects successfully, your skills will be equal to 6 months of rigorous industry experience.

All our mentors are highly qualified and experience professionals. All have at least 15-20 yrs. of development experience in various technologies and are trained by GreaterHeight Academy to deliver interactive training to the participants.

Yes, we do. As the technology upgrades, we do update our content and provide your training on latest version of that technology.

All online training classes are recorded. You will get the recorded sessions so that you can watch the online classes when you want. Also, you can join other class to do your missing classes.

OUR POPULAR COURSES

Data Analytics and Visualization With Python

Advanced developments of expertise in cleaning, transforming, and modelling data to obtain insight into corporate decision making as a Senior Data Analyst - Using Python.

View Details

Data Science Training Masters Program

Learn Python, Statistics, Data Preparation, Data Analysis, Querying Data, Machine Learning, Clustering, Text Processing, Collaborative Filtering, Image Processing, etc..

View Details

Microsoft Azure DP-100 Data Science

You will Optimize & Manage Models, Perform Administration by using T-SQL, Run Experiment & Train Models, Deploy & Consume Models, and Automate Tasks.

View Details

Machine Learning using Python

Learn Data Science and Machine Learning from scratch, get hired, and have fun along the way with the most modern, up-to-date Data Science course on.

View Details

Microsoft Azure PL-300 Data Analysis

You will learn how to Design a Data Model in Power BI, Optimize Model Performance, Manage Datasets in Power BI and Create Paginated Reports.

View Details

Microsoft Azure DP-203 Data Engineer

You will learn Batch & Real Time Analytics, Azure Synapse Analytics, Azure Databricks, Implementing Security and ETL & ELT Pipelines.

View Details

The GreaterHeight Advantage

0+

Accredited Courseware

Most of our training courses are accredited by the respective governing bodies.

0+

Assured Classes

All our training courses are assured & scheduled dates are confirmed to run by SME.

0+

Expert Instructor Led Programs

We have well equipped and highly experienced instructors to train the professionals.

OUR CLIENTS

We Have Worked With Some Amazing Companies Around The World

Our awesome clients we've had the pleasure to work with!

Complete Data Engineer With SQL & Python