The Complete Data Scientist With Python

Learn Python for data science and gain the career-building skills you need to succeed as a data scientist, from data manipulation to machine learning. Master the skills you need to pass the Data Scientist in Python certification and prepare yourself for success in the field of data science.

Get Advice

Complete Data Scientist with Python

Who this course is for:

Python developers curious about the data analysis libraries.
Python developers curious about the data visualization libraries.

Anyone interested in learning Python.

Data Analysts
Anyone working with data

What you will Learn:

Python, we will be using Python3 in this course.
Data Analysis Libraries in Python such as NumPy and Pandas.

Data Visualization.

Data Visualization Libraries in Python such as Matplotlib and Seaborn.
How to use Python to manipulate & process data.

Data analysis & data visualization using Python.
How to analyze data.

Jupyter Notebooks IDE / Anaconda Distribution.

Course Benefits & Key Features

Data Scientist with Python’s benefits and key features.

Modules

30+ Modules.

Lessons

80+ Lessons

Practical

40+ Hands-On Labs

Life Projects

5+ Projects

Resume

CV Preparations

Job

Jobs Reference

Recording

Session Recording

Interviews

Mock Interviews

Support

On Job Supports

Membership

Membership Access

Networks

Networking

Certification

Certificate of Completion

INSTRUCTOR-LED LIVE ONLINE CLASSES

Our learn-by-building-project method enables you to build

practical or coding experience that sticks. 95% of our

learners say they have confidence and remember more

when they learn by building real world projects which is

required to work in your real life.

Get step-by-step guidance to practice your skills without getting stuck
Validate your technical problem-solving skills in a real environment
Troubleshoot complex scenarios to practice what you learned
Develop production experience that translates into real-world

.

Complete Python Data Scientist Job Outlook

.

Ranked #1 Programming
Language

TIOBE and PYPL ranks Python as the most popular global programming language.

Python Salary
Trend

The average salary for a Python Developer is $114,489 per year in the United States.

44.8% Compound annual growth rate (CAGR)

The global python market size is expected to reach USD 100.6 million in 2030.

Why Data Scientist with Python?

Learn In-demand Skills

Those with careers in data analysis learn relevant in-demand skills that span industries and add value to every digital-enabled organization.

Earn a Higher Salary

Experienced data analysts can earn up to $112,000 per year and transition into higher-paying jobs as Senior Data Analysts, Data Scientists, or Analytics Managers.

Positive Job Outlook

The data analytics market is predicted to hit $132.90 Billion USD by 2026. COVID-19 pandemic accelerated the adoption of data analytics solutions and services.

Shape the Future

Data analysts transform organizations by capitalizing on data to improve their business decisions and solve critical real-world problems.

Become a Leader

Being a central part of an organization’s decision-making processes, analytics experts often pick up strong leadership skills as well.

Data Analysis are Constantly Evolving

Data analysis moves quickly, and data analysts are constantly learning and advancing in their careers.

GreaterHeight Certificates holders are prepared to work at companies like these.

Some Alumni Testimonies

Investing in the course "Become a Data Analyst" with GreaterHeight Academy is great value for the money and I highly recommend. The trainer is very knowledgeable, very engaging, provided us with quality training sessions on all courses and was easily acessible for queries. We also had access to the course materials and also the timely availability of the recorded videos made it easy and aided the learning process..

QUEEN OBIWULU

Team Lead, Customer Success

The training was fantastic, the instructor is an awesome lecturer, relentless and not tired in his delivery. He obviously enjoys teaching, it comes natural to him. We got more than we expected. He extended my knowledge of Excel beyond what I knew, and the courses were brilliantly delivered. They reach out, follow up, ask questions, and in fact the support has been great. They are highly recommended and I would definitely subscribe to other training programs from them.

BISOLA OGUNRO

Fraud Analytics Risk Oversight Manager

It's one thing to look for just a Data Analysis training, and it's another to get the knowledge transferred through certified professional trainers. No matter your initial level of proficiency in any of the Data Analysis tools, GreaterHeight Academy would meet you there and take you up to a highly proficienct and confident level in a short time at a reasonable pace. I learnt a lot of Data Analysis tools and skills at GreaterHeight from patient and resourceful teachers.

TUNDE MEREDITH

Operation Director - Abbfem Technology

The Data Analysis training program was one of the best I have attended. The way GreaterHeight took off with Excel and concluded the four courses with Excel was a mind blowing - it was WOW!! I concluded that I'm on the right path with the right mentor to take me from a novice to professional. GreaterHeight is the best as far as impacting Data Analysis knowledge is concern. I would shout it at the rooftop to recommend GreaterHeight to any trainee that really wants to learn.

JOHN OSI PETER

Greaterheight

I wanted to take a moment to express my deepest gratitude for the opportunity to study data analytics at GreaterHeight Academy. I am truly impressed by the level of dedication and support that the sponsor and CEO have put into this program. GreaterHeight Academy is without a doubt the best tech institution out there, providing top-notch education and resources for its students. One of the advantages of studying at GreaterHeight Academy is the access to the best tools and technologies in the field.

AYODELE PAYNE

Sales/Data Analyst

It is an unforgettable experience that will surely stand the test of time learning to become a Data Analyst with GreaterHeights Academy. The Lecture delivery was so impactful and the Trainer is vast and well knowledgeable in using the applicable tools for the Sessions. Always ready to go extra mile with you. The supports you get during and after the lectures are top notch with materials and resources available to build your confidence on and off the job.

ADEBAYO OLADEJO

Customer Service Advisor (Special Operations)

Complete Data Scientist with Python Courses

Learn Python for data science and gain the career-building skills you need to succeed as a data scientist, from data manipulation to machine learning! In this track, you’ll learn how this versatile language allows you to import, clean, manipulate, and visualize data—all integral skills for any aspiring data professional or researcher. Starting with the Python essentials for data science, you’ll work through interactive exercises that test your abilities. You’ll get hands-on with some of the most popular Python libraries for data science, including pandas, Seaborn, Matplotlib, scikit-learn, and many more. As you progress, you’ll work with real-world datasets to learn the statistical and machine learning techniques you need to perform hypothesis testing and build predictive models. You’ll also get an introduction to supervised learning with scikit-learn and apply your skills to various projects. Start this track, grow your data science skills, and begin your journey to confidently pass the Associate Data Scientist in Python certification and thrive as a data scientist.

Master the skills you need to pass the Data Scientist in Python certification and prepare yourself for success in the field of data science. Throughout this track, you will focus on using Python for data science, starting with the basics and progressing to more advanced topics such as machine learning. You’ll cover a broad range of areas, including data manipulation, visualization, and analysis, using popular Python libraries such as pandas, Seaborn, Matplotlib, and scikit-learn. As you progress, you’ll work through interactive exercises using real-world datasets to help you test your abilities and develop your skills. These examples will help you explore various statistical and machine learning techniques, including hypothesis testing and predictive modeling. You’ll also gain an understanding of package development, data preprocessing, SQL for relational databases, Git for data science projects, and more. Complete this track to gain the knowledge and experience necessary to confidently pass the Data Scientist in Python certification and thrive as a data scientist.

.

Introduction to Python

An Introduction to Python

Python has grown to become the market leader in programming languages and the language of choice for data analysts and data scientists. Demand for data skills is rising because companies want to gain actionable insights from their data.

Discover the Python Basics

This is a Python course for beginners, and we designed it for people with no prior Python experience. It is even suitable if you have no coding experience at all. You will cover the basics of Python, helping you understand common, everyday functions and applications, including how to use Python as a calculator, understanding variables and types, and building Python lists. The first half of this course prepares you to use Python interactively and teaches you how to store, access, and manipulate data using one of the most popular programming languages in the world.

Explore Python Functions and Packages

The second half of the course starts with a view of how you can use functions, methods, and packages to use code that other Python developers have written. As an open-source language, Python has plenty of existing packages and libraries that you can use to solve your problems.

Get Started with NumPy

NumPy is an essential Python package for data science. You’ll finish this course by learning to use some of the most popular tools in the NumPy array and start exploring data in Python.

4 Modules | 6+ Hours | 4 Skills

Course Modules

An introduction to the basic concepts of Python. Learn how to use Python interactively and by using a script. Create your first variables and acquaint yourself with Python's basic data types.

Hello Python!
Your first Python code
Any comments?
Python as a calculator
Variables and Types
Variable Assignment
Calculations with variables
Other variable types
Operations with other types

Learn to store, access, and manipulate data in lists: the first step toward efficiently working with huge amounts of data.

Python Lists
Create a list
Create lists with different types
List of lists
Subsetting Lists
Subset and conquer
Slicing and dicing
Subsetting lists of lists
Manipulating Lists
Replace list elements
Extend a list
Delete list elements
Inner workings of lists

You'll learn how to use functions, methods, and packages to efficiently leverage the code that brilliant Python developers have written. The goal is to reduce the amount of code you need to solve challenging problems!

Functions
Familiar functions
Help!
Multiple arguments
Methods
String Methods
List Methods
List Methods (2)
Packages
Import package
Selective import
Different ways of importing

NumPy is a fundamental Python package to efficiently practice data science. Learn to work with powerful tools in the NumPy array, and get started with data exploration.

NumPy
Your First NumPy Array
Baseball players' height
NumPy Side Effects
Subsetting NumPy Arrays
2D NumPy Arrays
Your First 2D NumPy Array
Baseball data in 2D form
Subsetting 2D NumPy Arrays
2D Arithmetic
NumPy: Basic Statistics
Average versus median
Explore the baseball data

Intermediate Python

Improve Your Python Skills

Learning Python is crucial for any aspiring data science practitioner. Learn to visualize real data with Matplotlib’s functions and get acquainted with data structures such as the dictionary and pandas DataFrame. This four-hour intermediate course will help you to build on your existing Python skills and explore new Python applications and functions that expand your repertoire and help you work more efficiently.

Learn to Use Python Dictionaries and pandas

Dictionaries offer an alternative to Python lists, while the pandas dataframe is the most popular way of working with tabular data. In the second module of this course, you’ll find out how you can create and manipulate datasets, and how to access them using these structures. Hands-on practice throughout the course will build your confidence in each area.

Explore Python Boolean Logic and Python Loops

In the second half of this course, you’ll look at logic, control flow, filtering and loops. These functions work to control decision-making in Python programs and help you to perform more operations with your data, including repeated statements. You’ll finish the course by applying all of your new skills by using hacker statistics to calculate your chances of winning a bet.

Once you’ve completed all of the modules, you’ll be ready to apply your new skills in your job, new career, or personal project, and be prepared to move onto more advanced Python learning!

5 Modules | 6+ Hours | 5 Skills

Course Modules

Data visualization is a key skill for aspiring data scientists. Matplotlib makes it easy to create meaningful and insightful plots. In this module, you’ll learn how to build various types of plots, and customize them to be more visually appealing and interpretable.

Basic plots with Matplotlib
Line plot (1)
Line Plot (2): Interpretation
Line plot (3)
Scatter Plot (1)
Scatter plot (2)
Histogram
Build a histogram (1)
Build a histogram (2): bins
Build a histogram (3): compare
Choose the right plot (1)
Choose the right plot (2)
Customization
Labels
Ticks
Sizes
Colors
Additional Customizations
Interpretation

Learn about the dictionary, an alternative to the Python list, and the pandas DataFrame, the de facto standard to work with tabular data in Python. You will get hands-on practice with creating and manipulating datasets, and you’ll learn how to access the information you need from these data structures.

Dictionaries, Part 1
Motivation for dictionaries
Create dictionary
Access dictionary
Dictionaries, Part 2
Dictionary Manipulation (1)
Dictionary Manipulation (2)
Dictionariception
Pandas, Part 1
Dictionary to DataFrame (1)
Dictionary to DataFrame (2)
CSV to DataFrame (1)
CSV to DataFrame (2)
Pandas, Part 2
Square Brackets (1)
Square Brackets (2)
loc and iloc (1)
loc and iloc (2)
loc and iloc (3)

Boolean logic is the foundation of decision-making in Python programs. Learn about different comparison operators, how to combine them with Boolean operators, and how to use the Boolean outcomes in control structures. You'll also learn to filter data in pandas DataFrames using logic.

Comparison Operators
Equality
Greater and less than
Compare arrays
Boolean Operators
and, or, not (1)
and, or, not (2)
Boolean operators with NumPy
if, elif, else
Warmup
if
Add else
Customize further: elif
Filtering pandas DataFrames
Driving right (1)
Driving right (2)
Cars per capita (1)
Cars per capita (2)

There are several techniques you can use to repeatedly execute Python code. While loops are like repeated if statements, the for loop iterates over all kinds of data structures. Learn all about them in this module.

while loop
while: warming up
Basic while loop
Add conditionals
for loop
Loop over a list
Indexes and values (1)
Indexes and values (2)
Loop over list of lists
Loop Data Structures Part 1
Loop over dictionary
Loop over NumPy array
Loop Data Structures Part 2
Loop over DataFrame (1)
Loop over DataFrame (2)
Add column (1)
Add column (2)

This module will allow you to apply all the concepts you've learned in this course. You will use hacker statistics to calculate your chances of winning a bet. Use random number generators, loops, and Matplotlib to gain a competitive edge!

Random Numbers
Random float
Roll the dice
Determine your next move
Random Walk
The next step
How low can you go?
Visualize the walk
Distribution
Simulate multiple walks
Visualize all walks
Implement clumsiness
Plot the distribution
Calculate the odds

Data Manipulation With Pandas

Discover Data Manipulation with pandas

With this course, you’ll learn why pandas is the world's most popular Python library, used for everything from data manipulation to data analysis. You’ll explore how to manipulate DataFrames, as you extract, filter, and transform real-world datasets for analysis.

With pandas, you’ll explore all the core data science concepts. Using real-world data, including Walmart sales figures and global temperature time series, you’ll learn how to import, clean, calculate statistics, and create visualizations—using pandas to add to the power of Python.

Work with pandas Data to Explore Core Data Science Concepts

You’ll start by mastering the pandas basics, including how to inspect DataFrames and perform some fundamental manipulations. You’ll also learn about aggregating DataFrames, before moving on to slicing and indexing.

You’ll wrap up the course by learning how to visualize the contents of your DataFrames, working with a dataset that contains weekly US avocado sales.

Learn to Manipulate DataFrames

By completing this pandas course, you’ll understand how to use this Python library for data manipulation. You’ll have an understanding of DataFrames and how to use them, as well as be able to visualize your data in Python.

4 Modules | 5+ Hours | 4 Skills

Course Modules

Let’s master the pandas basics. Learn how to inspect DataFrames and perform fundamental manipulations, including sorting rows, subsetting, and adding new columns.

Introducing DataFrames
Inspecting a DataFrame
Parts of a DataFrame
Sorting and subsetting
Sorting rows
Subsetting columns
Subsetting rows
Subsetting rows by categorical variables
New columns
Adding new columns
Combo-attack!

In this module, you’ll calculate summary statistics on DataFrame columns, and master grouped summary statistics and pivot tables.

Summary statistics
Mean and median
Summarizing dates
Efficient summaries
Cumulative statistics
Counting
Dropping duplicates
Counting categorical variables
Grouped summary statistics
What percent of sales occurred at each store type?
Calculations with .groupby()
Multiple grouped summaries
Pivot tables
Pivoting on one variable
Fill in missing values and sum values with pivot tables

Indexes are supercharged row and column names. Learn how they can be combined with slicing for powerful DataFrame subsetting.

Explicit indexes
Setting and removing indexes
Subsetting with .loc[]
Setting multi-level indexes
Sorting by index values
Slicing and subsetting with .loc and .iloc
Slicing index values
Slicing in both directions
Slicing time series
Subsetting by row/column number
Working with pivot tables
Pivot temperature by city and year
Subsetting pivot tablesp
Calculating on a pivot table

Learn to visualize the contents of your DataFrames, handle missing data values, and import data from and export data to CSV files.

Visualizing your data
Which avocado size is most popular?
Changes in sales over time
Avocado supply and demand
Price of conventional vs. organic avocados
Missing values
Finding missing values
Removing missing values
Replacing missing values
Creating DataFrames
List of dictionaries
Dictionary of lists
Reading and writing CSVs
CSV to DataFrame
DataFrame to CSV
Wrap-up

Joining Data with Pandas

Being able to combine and work with multiple datasets is an essential skill for any aspiring Data Scientist. pandas is a crucial cornerstone of the Python data science ecosystem, with Stack Overflow recording 5 million views for pandas questions. Learn to handle multiple DataFrames by combining, organizing, joining, and reshaping them using pandas. You'll work with datasets from the World Bank and the City Of Chicago. You will finish the course with a solid skillset for data-joining in pandas.

4 Modules | 5+ Hours | 4 Skills

Course Modules

Learn how you can merge disparate data using inner joins. By combining information from multiple sources you’ll uncover compelling insights that may have previously been hidden. You’ll also learn how the relationship between those sources, such as one-to-one or one-to-many, can affect your result.

Inner join
What column to merge on?
Your first inner join
Inner joins and number of rows returned
One-to-many relationships
One-to-many classification
One-to-many merge
Merging multiple DataFrames
Total riders in a month
Three table merge
One-to-many merge with multiple tables

Take your knowledge of joins to the next level. In this module, you’ll work with TMDb movie data as you learn about left, right, and outer joins. You’ll also discover how to merge a table to itself and merge on a DataFrame index.

Left join
Counting missing rows with left join
Enriching a dataset
How many rows with a left join?
Other joins
Right join to find unique movies
Popular genres with right join
Using outer join to select actors
Merging a table to itself
Self join
How does pandas handle self joins?
Merging on indexes
Index merge for movie ratings
Do sequels earn more?

In this module, you’ll leverage powerful filtering techniques, including semi-joins and anti-joins. You’ll also learn how to glue DataFrames by vertically combining and using the pandas.concat function to create new datasets. Finally, because data is rarely clean, you’ll also learn how to validate your newly combined data structures.

Filtering joins
Steps of a semi join
Performing an anti join
Performing a semi join
Concatenate DataFrames together vertically
Concatenation basics
Concatenating with keys
Verifying integrity
Validating a merge
Concatenate and merge to find common songs

In this final module, you’ll step up a gear and learn to apply pandas' specialized methods for merging time-series and ordered data together with real-world financial and economic data from the city of Chicago. You’ll also learn how to query resulting tables using a SQL-style format, and unpivot data using the melt method.

Using merge_ordered()
Correlation between GDP and S&P500
Phillips curve using merge_ordered()
merge_ordered() caution, multiple columns
Using merge_asof()
Using merge_asof() to study stocks
Using merge_asof() to create dataset
merge_asof() and merge_ordered() differences
Selecting data with .query()
Explore financials with .query()
Subsetting rows with .query()
Reshaping data with .melt()
Select the right .melt() arguments
Using .melt() to reshape government data
Using .melt() for stocks vs bond performance
Course wrap-up

Introduction to Statistics in Python

Statistics is the study of how to collect, analyze, and draw conclusions from data. It’s a hugely valuable tool that you can use to bring the future into focus and infer the answer to tons of questions. For example, what is the likelihood of someone purchasing your product, how many calls will your support team receive, and how many jeans sizes should you manufacture to fit 95% of the population? In this course, you'll discover how to answer questions like these as you grow your statistical skills and learn how to calculate averages, use scatterplots to show the relationship between numeric values, and calculate correlation. You'll also tackle probability, the backbone of statistical reasoning, and learn how to use Python to conduct a well-designed study to draw your own conclusions from data.

4 Modules | 5+ Hours | 4 Skills

Course Modules

Summary statistics gives you the tools you need to boil down massive datasets to reveal the highlights. In this module, you'll explore summary statistics including mean, median, and standard deviation, and learn how to accurately interpret them. You'll also develop your critical thinking skills, allowing you to choose the best summary statistics for your data.

What is statistics?
Descriptive and inferential statistics
Data type classification
Measures of center
Mean and median
Mean vs. median
Measures of spread
Quartiles, quantiles, and quintiles
Variance and standard deviation
Finding outliers using IQR

In this module, you'll learn how to generate random samples and measure chance using probability. You'll work with real-world sales data to calculate the probability of a salesperson being successful. Finally, you’ll use the binomial distribution to model events with binary outcomes.

What are the chances?
With or without replacement?
Calculating probabilities
Sampling deals
Discrete distributions
Creating a probability distribution
Identifying distributions
Expected value vs. sample mean
Continuous distributions
Which distribution?
Data back-ups
Simulating wait times
The binomial distribution
Simulating sales deals
Calculating binomial probabilities
How many sales will be won?

It’s time to explore one of the most important probability distributions in statistics, normal distribution. You’ll create histograms to plot normal distributions and gain an understanding of the central limit theorem, before expanding your knowledge of statistical functions by adding the Poisson, exponential, and t-distributions to your repertoire.

The normal distribution
Distribution of Amir's sales
Probabilities from the normal distribution
Simulating sales under new market conditions
Which market is better?
The central limit theorem
Visualizing sampling distributions
The CLT in action
The mean of means
The Poisson distribution
Identifying lambda
Tracking lead responses
More probability distributions
Distribution dragging and dropping
Modeling time between leads
The t-distribution

In this module, you'll learn how to quantify the strength of a linear relationship between two variables, and explore how confounding variables can affect the relationship between two other variables. You'll also see how a study’s design can influence its results, change how the data should be analyzed, and potentially affect the reliability of your conclusions.

Correlation
Guess the correlation
Relationships between variables
Correlation caveats
What can't correlation measure?
Transforming variables
Does sugar improve happiness?
Confounders
Design of experiments
Study types
Longitudinal vs. cross-sectional studies
Course Wrap up!

Introduction to Data Visualization with Matplotlib

Visualizing data in plots and figures exposes the underlying patterns in the data and provides insights. Good visualizations also help you communicate your data to others, and are useful to data analysts and other consumers of the data. In this course, you will learn how to use Matplotlib, a powerful Python data visualization library. Matplotlib provides the building blocks to create rich visualizations of many different kinds of datasets. You will learn how to create visualizations for different kinds of data and how to customize, automate, and share these visualizations.

4 Modules | 5+ Hours | 4 Skills

Course Modules

This module introduces the Matplotlib visualization library and demonstrates how to use it with data.

Introduction to data visualization with Matplotlib
Using the matplotlib.pyplot interface
Adding data to an Axes object
Customizing your plots
Customizing data appearance
Customizing axis labels and adding titles
Small multiples
Creating a grid of subplots
Creating small multiples with plt.subplots
Small multiples with shared y axis

Time series data is data that is recorded. Visualizing this type of data helps clarify trends and illuminates relationships between data.

Plotting time-series data
Read data with a time index
Plot time-series data
Using a time index to zoom in
Plotting time-series with different variables
Plotting two variables
Defining a function that plots time-series data
Using a plotting function
Annotating time-series data
Annotating a plot of time-series data
Plotting time-series: putting it all together

Visualizations can be used to compare data in a quantitative manner. This module explains several methods for quantitative visualizations.

Quantitative comparisons: bar-charts
Bar chart
Stacked bar chart
Quantitative comparisons: histograms
Creating histograms
"Step" histogram
Statistical plotting
Adding error-bars to a bar chart
Adding error-bars to a plot
Creating boxplots
Quantitative comparisons: scatter plots
Simple scatter plot
Encoding time by color

This module shows you how to share your visualizations with others: how to save your figures as files, how to adjust their look and feel, and how to automate their creation based on input data.

Preparing your figures to share with others
Selecting a style for printing
Switching between styles
Saving your visualizations
Saving a file several times
Save a figure with different sizes
Automating figures from data
Unique values of a column
Automate your visualization
Where to go next

Introduction to Data Visualization with Seaborn

Create Your Own Seaborn Plots

Seaborn is a powerful Python library that makes it easy to create informative and attractive data visualizations. This 4-hour course provides an introduction to how you can use Seaborn to create a variety of plots, including scatter plots, count plots, bar plots, and box plots, and how you can customize your visualizations.

Turn Real Datasets into Custom Seaborn Visualizations

You’ll explore this library and create your Seaborn plots based on a variety of real-world data sets, including exploring how air pollution in a city changes through the day and looking at what young people like to do in their free time. This data will give you the opportunity to find out about Seaborn’s advantages first hand, including how you can easily create subplots in a single figure and how to automatically calculate confidence intervals.

Improve Your Data Communication Skills

By the end of this course, you’ll be able to use Seaborn in various situations to explore your data and effectively communicate the results of your data analysis to others. These skills are highly sought-after for data analysts, data scientists, and any other job that may involve creating data visualizations. If you’d like to continue your learning, this course is part of several tracks, including the Data Visualization track, where you can add more libraries and techniques to your skillset.

4 Modules | 5+ Hours | 4 Skills

Course Modules

What is Seaborn, and when should you use it? In this module, you will find out! Plus, you will learn how to create scatter plots and count plots with both lists of data and pandas DataFrames. You will also be introduced to one of the big advantages of using Seaborn - the ability to easily add a third variable to your plots by using color to represent different subgroups.

Introduction to Seaborn
Making a scatter plot with lists
Making a count plot with a list
Using pandas with Seaborn
"Tidy" vs. "untidy" data
Making a count plot with a DataFrame
Adding a third variable with hue
Hue and scatter plots
Hue and count plots

In this module, you will create and customize plots that visualize the relationship between two quantitative variables. To do this, you will use scatter plots and line plots to explore how the level of air pollution in a city changes over the course of a day and how horsepower relates to fuel efficiency in cars. You will also see another big advantage of using Seaborn - the ability to easily create subplots in a single figure!

Introduction to relational plots and subplots
Creating subplots with col and row
Creating two-factor subplots
Customizing scatter plots
Changing the size of scatter plot points
Changing the style of scatter plot points
Introduction to line plots
Interpreting line plots
Visualizing standard deviation with line plots
Plotting subgroups in line plots

Categorical variables are present in nearly every dataset, but they are especially prominent in survey data. In this module, you will learn how to create and customize categorical plots such as box plots, bar plots, count plots, and point plots. Along the way, you will explore survey data from young people about their interests, students about their study habits, and adult men about their feelings about masculinity.

Count plots and bar plots
Count plots
Bar plots with percentages
Customizing bar plots
Box plots
Create and interpret a box plot
Omitting outliers
Adjusting the whiskers
Point plots
Customizing point plots
Point plots with subgroups

In this final module, you will learn how to add informative plot titles and axis labels, which are one of the most important parts of any data visualization! You will also learn how to customize the style of your visualizations in order to more quickly orient your audience to the key takeaways. Then, you will put everything you have learned together for the final exercises of the course!

Changing plot style and color
Changing style and palette
Changing the scale
Using a custom palette
Adding titles and labels: Part 1
FacetGrids vs. AxesSubplots
Adding a title to a FacetGrid object
Adding titles and labels: Part 2
Adding a title and axis labels
Rotating x-tick labels
Putting it all together
Box plot with subgroups
Bar plot with subgroups and subplots
Wrap up!

Introduction to Functions in Python

It's time to push forward and develop your Python chops even further. Python has tons of fantastic functions and a module ecosystem. However, as a data professional or developer, you'll constantly need to write your own functions to solve problems that are dictated by your data. You will learn the art of function writing in this first course. You'll come out of this course being able to write your very own custom functions, complete with multiple parameters and multiple return values, along with default arguments and variable-length arguments. You'll gain insight into scoping in Python, be able to write lambda functions and handle errors in your function writing practice. You'll wrap up each module by using your new skills to write functions that analyze Twitter data.

3 Modules | 4+ Hours | 3 Skills

Course Modules

You'll learn how to write simple functions, as well as functions that accept multiple arguments and return multiple values. You'll also have the opportunity to apply these new skills to questions commonly encountered by data professionals and developers.

User-defined functions
Strings in Python
Recapping built-in functions
Write a simple function
Single-parameter functions
Functions that return single values
Multiple parameters and return values
Functions with multiple parameters
A brief introduction to tuples
Functions that return multiple values
Bringing it all together
Bringing it all together (1)
Bringing it all together (2)

You'll learn to write functions with default arguments so that the user doesn't always need to specify them, and variable-length arguments so they can pass an arbitrary number of arguments on to your functions. You'll also learn about the essential concept of scope.

Scope and user-defined functions
Pop quiz on understanding scope
The keyword global
Python's built-in scope
Nested functions
Nested Functions I
Nested Functions II
The keyword nonlocal and nested functions
Default and flexible arguments
Functions with one default argument
Functions with multiple default arguments
Functions with variable-length arguments (*args)
Functions with variable-length keyword arguments (**kwargs)
Bringing it all together
Bringing it all together (1)
Bringing it all together (2)

Learn about lambda functions, which allow you to write functions quickly and on the fly. You'll also practice handling errors in your functions, which is an essential skill. Then, apply your new skills to answer data science questions.

Lambda functions
Pop quiz on lambda functions
Writing a lambda function you already know
Map() and lambda functions
Filter() and lambda functions
Reduce() and lambda functions
Introduction to error handling
Pop quiz about errors
Error handling with try-except
Error handling by raising an error
Bringing it all together
Bringing it all together (1)
Bringing it all together (2)
Bringing it all together (3)
Bringing it all together: testing your error handling skills

Python Toolbox

In this Python Toolbox course, you'll continue to build more advanced Python skills. First, you'll learn about iterators, objects you have already encountered in the context of for loops. You'll then learn about list comprehensions, which are extremely handy tools for all data professionals and developers working in Python. You'll end the course by working through a case study in which you'll apply all the techniques you learned in both parts of this course.

3 Modules | 4+ Hours | 3 Skills

Course Modules

You'll learn all about iterators and iterables, which you have already worked with when writing for loops. You'll learn some handy functions that will allow you to effectively work with iterators. And you’ll finish the module with a use case that is pertinent to the world of data science and dealing with large amounts of data—in this case, data from Twitter that you will load in chunks using iterators.

Introduction to iterators
Iterators vs. Iterables
Iterating over iterables (1)
Iterating over iterables (2)
Iterators as function arguments
Playing with iterators
Using enumerate
Using zip
Using * and zip to 'unzip'
Using iterators to load large files into memory
Processing large amounts of Twitter data
Extracting information for large amounts of Twitter data

In this module, you'll build on your knowledge of iterators and be introduced to list comprehensions, which allow you to create complicated lists—and lists of lists—in one line of code! List comprehensions can dramatically simplify your code and make it more efficient, and will become a vital part of your Python toolbox. You'll then learn about generators, which are extremely helpful when working with large sequences of data that you may not want to store in memory, but instead generate on the fly.

List comprehensions
Write a basic list comprehension
List comprehension over iterables
Writing list comprehensions
Nested list comprehensions
Advanced comprehensions
Using conditionals in comprehensions (1)
Using conditionals in comprehensions (2)
Dict comprehensions
Introduction to generator expressions
List comprehensions vs. generators
Write your own generator expressions
Changing the output in generator expressions
Build a generator
Wrapping up comprehensions and generators.
List comprehensions for time-stamped data
Conditional list comprehensions for time-stamped data

This module will allow you to apply your newly acquired skills toward wrangling and extracting meaningful information from a real-world dataset—the World Bank's World Development Indicators. You'll have the chance to write your own functions and list comprehensions as you work with iterators and generators to solidify your Python chops.

Welcome to the case study!
Zipping dictionaries
Writing a function to help you
Using a list comprehension
Turning this all into a DataFrame
Using Python generators for streaming data
Processing data in chunks (1)
Writing a generator to load data in chunks (2)
Writing a generator to load data in chunks (3)
Using pandas' read_csv iterator for streaming data
Writing an iterator to load data in chunks (1)
Writing an iterator to load data in chunks (2)
Writing an iterator to load data in chunks (3)
Writing an iterator to load data in chunks (4)
Writing an iterator to load data in chunks (5)
Final thoughts

Exploratory Data Analysis in Python

So you’ve got some interesting data - where do you begin your analysis? This course will cover the process of exploring and analyzing data, from understanding what’s included in a dataset to incorporating exploration findings into a data science workflow.

Using data on unemployment figures and plane ticket prices, you’ll leverage Python to summarize and validate data, calculate, identify and replace missing values, and clean both numerical and categorical values. Throughout the course, you’ll create beautiful Seaborn visualizations to understand variables and their relationships.

For example, you’ll examine how alcohol use and student performance are related. Finally, the course will show how exploratory findings feed into data science workflows by creating new features, balancing categorical features, and generating hypotheses from findings.

By the end of this course, you’ll have the confidence to perform your own exploratory data analysis (EDA) in Python.You’ll be able to explain your findings visually to others and suggest the next steps for gathering insights from your data!

4 Modules | 5+ Hours | 4 Skills

Course Modules

What's the best way to approach a new dataset? Learn to validate and summarize categorical and numerical data and create Seaborn visualizations to communicate your findings.

Initial exploration
Functions for initial exploration
Counting categorical values
Global unemployment in 2021
Data validation
Detecting data types
Validating continents
Validating range
Data summarization
Summaries with .groupby() and .agg()
Named aggregations
Visualizing categorical summaries

Exploring and analyzing data often means dealing with missing values, incorrect data types, and outliers. In this module, you’ll learn techniques to handle these issues and streamline your EDA processes!

Addressing missing data
Dealing with missing data
Strategies for remaining missing data
Imputing missing plane prices
Converting and analyzing categorical data
Finding the number of unique values
Flight duration categories
Adding duration categories
Working with numeric data
Flight duration
Adding descriptive statistics
Handling outliers
What to do with outliers
Identifying outliers
Removing outliers

Variables in datasets don't exist in a vacuum; they have relationships with each other. In this module, you'll look at relationships across numerical, categorical, and even DateTime data, exploring the direction and strength of these relationships as well as ways to visualize them.

Patterns over time
Importing DateTime data
Updating data type to DateTime
Visualizing relationships over time
Correlation
Interpreting a heatmap
Visualizing variable relationships
Visualizing multiple variable relationships
Factor relationships and distributions
Categorical data in scatter plots
Exploring with KDE plots

Exploratory data analysis is a crucial step in the data science workflow, but it isn't the end! Now it's time to learn techniques and considerations you can use to successfully move forward with your projects after you've finished exploring!

Considerations for categorical data
Checking for class imbalance
Cross-tabulation
Generating new features
Extracting features for correlation
Calculating salary percentiles
Categorizing salaries
Generating hypotheses
Comparing salaries
Choosing a hypothesis
Recap!

Working with Categorical Data in Python

Being able to understand, use, and summarize non-numerical data—such as a person’s blood type or marital status—is a vital component of being a data scientist. In this course, you’ll learn how to manipulate and visualize categorical data using pandas and seaborn. Through hands-on exercises, you’ll get to grips with pandas' categorical data type, including how to create, delete, and update categorical columns. You’ll also work with a wide range of datasets including the characteristics of adoptable dogs, Las Vegas trip reviews, and census data to develop your skills at working with categorical data.

4 Modules | 5+ Hours | 4 Skills

Course Modules

Almost every dataset contains categorical information—and often it’s an unexplored goldmine of information. In this module, you’ll learn how pandas handles categorical columns using the data type category. You’ll also discover how to group data by categories to unearth great summary statistics.

Course introduction
Categorical vs. numerical
Exploring a target variable
Ordinal categorical variables
Categorical data in pandas
Setting dtypes and saving memory
Creating a categorical pandas Series
Setting dtype when reading data
Grouping data by category in pandas
Create lots of groups
Setting up a .groupby() statement
Using pandas functions effectively

Now it’s time to learn how to set, add, and remove categories from a Series. You’ll also explore how to update, rename, collapse, and reorder categories, before applying your new skills to clean and access other data within your DataFrame.

Setting category variables
Setting categories
Adding categories
Removing categories
Updating categories
Collapsing categories knowledge check
Renaming categories
Collapsing categories
Reordering categories
Reordering categories in a Series
Using .groupby() after reordering
Cleaning and accessing data
Cleaning variables
Accessing and filtering data

In this module, you’ll use the seaborn Python library to create informative visualizations using categorical data—including categorical plots (cat-plot), box plots, bar plots, point plots, and count plots. You’ll then learn how to visualize categorical columns and split data across categorical columns to visualize summary statistics of numerical columns.

Introduction to categorical plots using Seaborn
Boxplot understanding
Creating a box plot
Seaborn bar plotsp
Creating a bar plot
Ordering categories
Bar plot using hue
Point and count plots
Creating a point plot
Creating a count plot
Review catplot() types
Additional catplot() options
One visualization per group
Updating categorical plots

Lastly, you’ll learn how to overcome the common pitfalls of using categorical data. You’ll also grow your data encoding skills as you are introduced to label encoding and one-hot encoding—perfect for helping you prepare your data for use in machine learning algorithms.

Categorical pitfalls
Memory usage knowledge check
Overcoming pitfalls: string issues
Overcoming pitfalls: using NumPy arrays
Label encoding
Create a label encoding and map
Using saved mappings
Creating a Boolean encoding
One-hot encoding
One-hot knowledge check
One-hot encoding specific columns
Wrap-up!

Data Communication Concepts

No one enjoys looking at spreadsheets! Bring your data to life. Improve your presentation and learn how to translate technical data into actionable insights.

Learn the Basics of Data Communication

You’ve analyzed your data, run your model, and made your predictions. Now, it's time to bring your data to life! Presenting findings to stakeholders so they can make data-driven decisions is an essential skill for all data scientists. In this course, you’ll learn how to use storytelling to connect with your audience and help them understand the content of your presentation—so they can make the right decisions.

Explore Formats of Data Communication

Through hands-on exercises, you’ll learn the advantages and disadvantages of oral and written formats. You’ll also improve how you translate technical results into compelling stories, using the correct data, visualizations, and in-person presentation techniques. Start learning and improve your data storytelling today.

4 Modules | 5+ Hours | 4 Skills

Course Modules

Let's start with the importance of data storytelling and the elements you need to tell stories with data. You'll learn best practices to influence how decisions are made before learning how to translate technical results into stories for non-technical stakeholders.

Fundamentals of storytelling
The story begins
Building a story
Translating technical results
A non-tech story
Be aware
Impacting the decision-making process
Is it a true story?
Structured to impact
A story to compare

Deepen your storytelling knowledge. Learn how to avoid common mistakes when telling stories with data by tailoring your presentations to your audience. Then learn best practices for including visualizations and choosing between oral or written formats to make sure your presentations pack a punch!

Selecting the right data
The truth about salaries
Earning interests
Showing relevant statistics
Salary variation
On a payroll
It's not significant
Visualizations for different audiences
Salary development
Salary on demand
Choosing the appropriate format
A communication problem
Should we meet?
When in doubt

Now that you understand how to prepare for communicating findings, it’s time to learn how to structure your reports. You'll also learn the importance of reproducibility (work smarter, not harder) and how to get to the point when describing your findings. You’ll then get to apply all you’ve learned to a real-world use case as you create a compelling report on credit risk.

Types of reports
Something to report
In summary
Reproducibility and references
Replicate me
Same results
Write precise and clear reports
Half-empty glass
Strong words
Case study: report on credit risk
Credit me
Report my credit

You'll finish by learning simple techniques to structure a presentation, communicate insights, and inspire your audience to take action. Lastly, you'll learn how to improve your communication style and prepare to handle questions from your audience.

Planning an oral presentation
Is this the plan?
An effective plan!
Building presentation slides
A color building
Too much text
The right building
Delivering the presentation
Put it into practice
Best practice
Avoiding common errors
The true mistake
Do's and don'ts
Congratulations!

Introduction to Importing Data in Python

As a data scientist, you will need to clean data, wrangle and munge it, visualize it, build predictive models, and interpret these models. Before you can do so, however, you will need to know how to get data into Python. In this course, you'll learn the many ways to import data into Python: from flat files such as .txt and .csv; from files native to other software such as Excel spreadsheets, Stata, SAS, and MATLAB files; and from relational databases such as SQLite and PostgreSQL.

3 Modules | 4+ Hours | 3 Skills

Course Modules

In this module, you'll learn how to import data into Python from all types of flat files, which are a simple and prevalent form of data storage. You've previously learned how to use NumPy and pandas—you will learn how to use these packages to import flat files and customize your imports.

Welcome to the course!
Importing entire text files
Importing text files line by line
The importance of flat files in data science
Pop quiz: what exactly are flat files?
Why we like flat files and the Zen of Python
Importing flat files using NumPy
Using NumPy to import flat files
Customizing your NumPy import
Importing different datatypes
Importing flat files using pandas
Using pandas to import flat files as DataFrames (1)
Using pandas to import flat files as DataFrames (2)
Customizing your pandas import
Final thoughts on data import

You've learned how to import flat files, but there are many other file types you will potentially have to work with as a data scientist. In this module, you'll learn how to import data into Python from a wide array of important file types. These include pickled files, Excel spreadsheets, SAS and Stata files, HDF5 files, a file type for storing large quantities of numerical data, and MATLAB files.

Introduction to other file types
Not so flat any more
Loading a pickled file
Listing sheets in Excel files
Importing sheets from Excel files
Customizing your spreadsheet import
Importing SAS/Stata files using pandas
How to import SAS7BDAT
Importing SAS files
Using read_stata to import Stata files
Importing Stata files
Importing HDF5 files
Using File to import HDF5 files
Using h5py to import HDF5 files
Extracting data from your HDF5 file
Importing MATLAB files
Loading .mat files
The structure of .mat in Python

In this module, you'll learn how to extract meaningful data from relational databases, an essential skill for any data scientist. You will learn about relational models, how to create SQL queries, how to filter and order your SQL records, and how to perform advanced queries by joining database tables.

Introduction to relational databases
Pop quiz: The relational model
Creating a database engine in Python
Creating a database engine
What are the tables in the database?
Querying relational databases in Python
The Hello World of SQL Queries!
Customizing the Hello World of SQL Queries
Filtering your database records using SQL's WHERE
Ordering your SQL records with ORDER BY
Querying relational databases directly with pandas
Pandas and The Hello World of SQL Queries!
Pandas for more complex querying
Advanced querying: exploiting table relationships
The power of SQL lies in relationships between tables: INNER JOIN
Filtering your INNER JOIN
Final Thoughts

Cleaning Data in Python

Discover How to Clean Data in Python

It's commonly said that data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time analyzing it. Data cleaning is an essential step for every data scientist, as analyzing dirty data can lead to inaccurate conclusions.

In this course, you will learn how to identify, diagnose, and treat various data cleaning problems in Python, ranging from simple to advanced. You will deal with improper data types, check that your data is in the correct range, handle missing data, perform record linkage, and more!

Learn How to Clean Different Data Types

The first module of the course explores common data problems and how you can fix them. You will first understand basic data types and how to deal with them individually. After, you'll apply range constraints and remove duplicated data points.

The last module explores record linkage, a powerful tool to merge multiple datasets. You'll learn how to link records by calculating the similarity between strings. Finally, you'll use your new skills to join two restaurant review datasets into one clean master dataset.

Gain Confidence in Cleaning Data

By the end of the course, you will gain the confidence to clean data from various types and use record linkage to merge multiple datasets. Cleaning data is an essential skill for data scientists. If you want to learn more about cleaning data in Python and its applications, check out the following tracks: Data Scientist with Python and Importing & Cleaning Data with Python.

4 Modules | 5+ Hours | 4 Skills

Course Modules

In this module, you'll learn how to overcome some of the most common dirty data problems. You'll convert data types, apply range constraints to remove future data points, and remove duplicated data points to avoid double-counting.

Data type constraints
Common data types
Numeric data or ... ?
Summing strings and concatenating numbers
Data range constraints
Tire size constraints
Back to the future
Uniqueness constraints
How big is your subset?
Finding duplicates
Treating duplicates

Categorical and text data can often be some of the messiest parts of a dataset due to their unstructured nature. In this module, you’ll learn how to fix whitespace and capitalization inconsistencies in category labels, collapse multiple categories into one, and reformat strings for consistency.

Membership constraints
Members only
Finding consistency
Categorical variables
Categories of errors
Inconsistent categories
Remapping categories
Cleaning text data
Removing titles and taking names
Keeping it descriptive

In this module, you’ll dive into more advanced data cleaning problems, such as ensuring that weights are all written in kilograms instead of pounds. You’ll also gain invaluable skills that will help you verify that values have been added correctly and that missing values don’t negatively impact your analyses.

Uniformity
Ambiguous dates
Uniform currencies
Uniform dates
Cross field validation
Cross field or no cross field?
How's our data integrity?
Completeness
Is this missing at random?
Missing investors
Follow the money

Record linkage is a powerful technique used to merge multiple datasets together, used when values have typos or different spellings. In this module, you'll learn how to link records by calculating the similarity between strings—you’ll then use your new skills to join two restaurant review datasets into one clean master dataset.

Comparing strings
Minimum edit distance
The cutoff point
Remapping categories II
Generating pairs
To link or not to link?
Pairs of restaurants
Similar restaurants
Linking DataFrames
Getting the right index
Linking them together!
Course Wrap up!

Working with Dates and Times in Python

You'll probably never have a time machine, but how about a machine for analyzing time? As soon as time enters any analysis, things can get weird. It's easy to get tripped up on day and month boundaries, time zones, daylight saving time, and all sorts of other things that can confuse the unprepared. If you're going to do any kind of analysis involving time, you’ll want to use Python to sort it out. Working with data sets on hurricanes and bike trips, we’ll cover counting events, figuring out how much time has elapsed between events and plotting data over time. You'll work in both standard Python and in Pandas, and we'll touch on the dateutil library, the only timezone library endorsed by the official Python documentation. After this course, you'll confidently handle date and time data in any format like a champion.

4 Modules | 5+ Hours | 4 Skills

Course Modules

Hurricanes (also known as cyclones or typhoons) hit the U.S. state of Florida several times per year. To start off this course, you'll learn how to work with date objects in Python, starting with the dates of every hurricane to hit Florida since 1950. You'll learn how Python handles dates, common date operations, and the right way to format dates to avoid confusion.

Dates in Python
Which day of the week?
How many hurricanes come early?
Math with dates
Subtracting dates
Counting events per calendar month
Putting a list of dates in order
Turning dates into strings
Printing dates in a friendly format
Representing dates in different ways

Bike sharing programs have swept through cities around the world -- and luckily for us, every trip gets recorded! Working with all of the comings and goings of one bike in Washington, D.C., you'll practice working with dates and times together. You'll parse dates and times from text, analyze peak trip times, calculate ride durations, and more.

Dates and times
Creating datetimes by hand
Counting events before and after noon
Printing and parsing datetimes
Turning strings into datetimes
Parsing pairs of strings as datetimes
Recreating ISO format with strftime()
Unix timestamps
Working with durations
Turning pairs of datetimes into durations
Average trip time
The long and the short of why time is hard

In this module, you'll learn to confidently tackle the time-related topic that causes people the most trouble: time zones and daylight saving. Continuing with our bike data, you'll learn how to compare clocks around the world, how to gracefully handle "spring forward" and "fall back," and how to get up-to-date timezone data from the dateutil library.

UTC offsets
Creating timezone aware datetimes
Setting timezones
What time did the bike leave in UTC?
Time zone database
Putting the bike trips into the right time zone
What time did the bike leave? (Global edition)
Starting daylight saving time
How many hours elapsed around daylight saving?
March 29, throughout a decade
Ending daylight saving time
Finding ambiguous datetimes
Cleaning daylight saving data with fold

To conclude this course, you'll apply everything you've learned about working with dates and times in standard Python to working with dates and times in Pandas. With additional information about each bike ride, such as what station it started and stopped at and whether or not the rider had a yearly membership, you'll be able to dig much more deeply into the bike trip data. In this module, you'll cover powerful Pandas operations, such as grouping and plotting results by time.

Reading date and time data in Pandas
Loading a csv file in Pandas
Making timedelta columns
Summarizing datetime data in Pandas
How many joyrides?
It's getting cold outside, W20529
Members vs casual riders over time
Combining groupby() and resample()
Additional datetime methods in Pandas
Timezones in Pandas
How long per weekday?
How long between rides?
Wrap-up

Writing Functions in Python

You've done your analysis, built your report, and trained a model. What's next? Well, if you want to deploy your model into production, your code will need to be more reliable than exploratory scripts in a Jupyter notebook. Writing Functions in Python will give you a strong foundation in writing complex and beautiful functions so that you can contribute research and engineering skills to your team. You'll learn useful tricks, like how to write context managers and decorators. You'll also learn best practices around how to write maintainable reusable functions with good documentation. They say that people who can do good research and write high-quality code are unicorns. Take this course and discover the magic.

4 Modules | 5+ Hours | 4 Skills

Course Modules

The goal of this course is to transform you into a Python expert, and so the first module starts off with best practices when writing functions. You'll cover docstrings and why they matter and how to know when you need to turn a chunk of code into a function. You will also learn the details of how Python passes arguments to functions, as well as some common gotchas that can cause debugging headaches when calling functions.

Docstrings
Crafting a docstring
Retrieving docstrings
Docstrings to the rescue!
DRY and "Do One Thing"
Extract a function
Split up a function
Pass by assignment
Mutable or immutable?
Best practice for default arguments

If you've ever seen the "with" keyword in Python and wondered what its deal was, then this is the module for you! Context managers are a convenient way to provide connections in Python and guarantee that those connections get cleaned up when you are done using them. This module will show you how to use context managers, as well as how to write your own.

Using context managers
The number of cats
The speed of cats
Writing context managers
The timer() context manager
A read-only open() context manager
Advanced topics
Context manager use cases
Scraping the NASDAQ
Changing the working directory

Decorators are an extremely powerful concept in Python. They allow you to modify the behavior of a function without changing the code of the function itself. This module will lay the foundational concepts needed to thoroughly understand decorators (functions as objects, scope, and closures), and give you a good introduction into how decorators are used and defined. This deep dive into Python internals will set you up to be a superstar Pythonista.

Functions are objects
Building a command line data app
Reviewing your co-worker's code
Returning functions for a math game
Scope
Understanding scope
Modifying variables outside local scope
Closures
Checking for closure
Closures keep your values safe
Decorators
Using decorator syntax
Defining a decorator

Now that you understand how decorators work under the hood, this module gives you a bunch of real-world examples of when and how you would write decorators in your own code. You will also learn advanced decorator concepts like how to preserve the metadata of your decorated functions and how to write decorators that take arguments.

Real-world examples
Print the return type
Counter
Decorators and metadata
Preserving docstrings when decorating functions
Measuring decorator overhead
Decorators that take arguments
Run_n_times()
HTML Generator
Timeout(): a real world example
Tag your functions
Check the return type
Great job!

Introduction to Regression with statsmodels in Python

Use Python statsmodels For Linear and Logistic Regression

Linear regression and logistic regression are two of the most widely used statistical models. They act like master keys, unlocking the secrets hidden in your data. In this course, you’ll gain the skills to fit simple linear and logistic regressions.

Through hands-on exercises, you’ll explore the relationships between variables in real-world datasets, including motor insurance claims, Taiwan house prices, fish sizes, and more.

Discover How to Make Predictions and Assess Model Fit

You’ll start this 4-hour course by learning what regression is and how linear and logistic regression differ, learning how to apply both. Next, you’ll learn how to use linear regression models to make predictions on data while also understanding model objects.

As you progress, you’ll learn how to assess the fit of your model, and how to know how well your linear regression model fits. Finally, you’ll dig deeper into logistic regression models to make predictions on real data.

Learn the Basics of Python Regression Analysis

By the end of this course, you’ll know how to make predictions from your data, quantify model performance, and diagnose problems with model fit. You’ll understand how to use Python statsmodels for regression analysis and be able to apply the skills to real-life data sets.

4 Modules | 5+ Hours | 4 Skills

Course Modules

You’ll learn the basics of this popular statistical model, what regression is, and how linear and logistic regressions differ. You’ll then learn how to fit simple linear regression models with numeric and categorical explanatory variables, and how to describe the relationship between the response and explanatory variables using model coefficients.

A tale of two variables
Which one is the response variable?
Visualizing two numeric variables
Fitting a linear regression
Estimate the intercept
Estimate the slope
Linear regression with ols()
Categorical explanatory variables
Visualizing numeric vs. categorical
Calculating means by category
Linear regression with a categorical explanatory variable

In this module, you’ll discover how to use linear regression models to make predictions on Taiwanese house prices and Facebook advert clicks. You’ll also grow your regression skills as you get hands-on with model objects, understand the concept of "regression to the mean", and learn how to transform variables in a dataset.

Making predictions
Predicting house prices
Visualizing predictions
The limits of prediction
Working with model objects
Extracting model elements
Manually predicting house prices
Regression to the mean
Home run!
Plotting consecutive portfolio returns
Modeling consecutive returns
Transforming variables
Transforming the explanatory variable
Transforming the response variable too
Back transformation

In this module, you’ll learn how to ask questions of your model to assess fit. You’ll learn how to quantify how well a linear regression model fits, diagnose model problems using visualizations, and understand each observation's leverage and influence to create the model.

Quantifying model fit
Coefficient of determination
Residual standard error
Visualizing model fit
Residuals vs. fitted values
Q-Q plot of residuals
Scale-location
Drawing diagnostic plots
Outliers, leverage, and influence
Leverage
Influence
Extracting leverage and influence

Learn to fit logistic regression models. Using real-world data, you’ll predict the likelihood of a customer closing their bank account as probabilities of success and odds ratios, and quantify model performance using confusion matrices.

Why you need logistic regression
Exploring the explanatory variables
Visualizing linear and logistic models
Logistic regression with logit()
Predictions and odds ratios
Probabilities
Most likely outcome
Odds ratio
Log odds ratio
Quantifying logistic regression fit
Calculating the confusion matrix
Drawing a mosaic plot of the confusion matrix
Accuracy, sensitivity, specificity
Measuring logistic model performance
Recap

Sampling in Python

Sampling in Python is the cornerstone of inference statistics and hypothesis testing. It's a powerful skill used in survey analysis and experimental design to draw conclusions without surveying an entire population. In this Sampling in Python course, you’ll discover when to use sampling and how to perform common types of sampling—from simple random sampling to more complex methods like stratified and cluster sampling. Using real-world datasets, including coffee ratings, Spotify songs, and employee attrition, you’ll learn to estimate population statistics and quantify uncertainty in your estimates by generating sampling distributions and bootstrap distributions.

4 Modules | 5+ Hours | 4 Skills

Course Modules

Learn what sampling is and why it is so powerful. You’ll also learn about the problems caused by convenience sampling and the differences between true randomness and pseudo-randomness.

Sampling and point estimates
Reasons for sampling
Simple sampling with pandas
Simple sampling and calculating with NumPy
Convenience sampling
Are findings from the sample generalizable?
Are these findings generalizable?
Pseudo-random number generation
Generating random numbers
Understanding random seeds

It’s time to get hands-on and perform the four random sampling methods in Python: simple, systematic, stratified, and cluster.

Simple random and systematic sampling
Simple random sampling
Systematic sampling
Is systematic sampling OK?
Stratified and weighted random sampling
Which sampling method?
Proportional stratified sampling
Equal counts stratified sampling
Weighted sampling
Cluster sampling
Benefits of clustering
Performing cluster sampling
Comparing sampling methods
3 kinds of sampling
Comparing point estimates

Let’s test your sampling. In this module, you’ll discover how to quantify the accuracy of sample statistics using relative errors, and measure variation in your estimates by generating sampling distributions.

Relative error of point estimates
Calculating relative errors
Relative error vs. sample size
Creating a sampling distribution
Replicating samples
Replication parameters
Approximate sampling distributions
Exact sampling distribution
Generating an approximate sampling distribution
Exact vs. approximate
Standard errors and the Central Limit Theorem
Population & sampling distribution means
Population & sampling distribution variation

You’ll get to grips with resampling to perform bootstrapping and estimate variation in an unknown population. You’ll learn the difference between sampling distributions and bootstrap distributions using resampling.

Introduction to bootstrapping
Principles of bootstrapping
With or without replacement?
Generating a bootstrap distribution
Comparing sampling and bootstrap distributions
Bootstrap statistics and population statistics
Sampling distribution vs. bootstrap distribution
Compare sampling and bootstrap means
Compare sampling and bootstrap standard deviations
Confidence intervals
Confidence interval interpretation
Calculating confidence intervals
Recap!

Hypothesis Testing in Python

Hypothesis testing lets you answer questions about your datasets in a statistically rigorous way. In this course, you'll grow your Python analytical skills as you learn how and when to use common tests like t-tests, proportion tests, and chi-square tests. Working with real-world data, including Stack Overflow user feedback and supply-chain data for medical supply shipments, you'll gain a deep understanding of how these tests work and the key assumptions that underpin them. You'll also discover how non-parametric tests can be used to go beyond the limitations of traditional hypothesis tests.

4 Modules | 5+ Hours | 4 Skills

Course Modules

How does hypothesis testing work and what problems can it solve? To find out, you’ll walk through the workflow for a one sample proportion test. In doing so, you'll encounter important concepts like z-scores, p-values, and false negative and false positive errors.

Hypothesis tests and z-scores
Uses of A/B testing
Calculating the sample mean
Calculating a z-score
p-values
Criminal trials and hypothesis tests
Left tail, right tail, two tails
Calculating p-values
Statistical significance
Decisions from p-values
Calculating a confidence interval
Type I and type II errors

In this module, you’ll learn how to test for differences in means between two groups using t-tests and extend this to more than two groups using ANOVA and pairwise t-tests.

Performing t-tests
Hypothesis testing workflow
Two sample mean test statistic
Calculating p-values from t-statistics
Why is t needed?
The t-distribution
From t to p
Paired t-tests
Is pairing needed?
Visualizing the difference
Using ttest()
ANOVA tests
Visualizing many categories
Conducting an ANOVA test
Pairwise t-tests

Now it’s time to test for differences in proportions between two groups using proportion tests. Through hands-on exercises, you’ll extend your proportion tests to more than two groups with chi-square independence tests, and return to the one sample case with chi-square goodness of fit tests.

One-sample proportion tests
t for proportions?
Test for single proportions
Two-sample proportion tests
Test of two proportions
proportions_ztest() for two samples
Chi-square test of independence
The chi-square distribution
How many tails for chi-square tests?
Performing a chi-square test
Chi-square goodness of fit tests
Visualizing goodness of fit
Performing a goodness of fit test

Finally, it’s time to learn about the assumptions made by parametric hypothesis tests, and see how non-parametric tests can be used when those assumptions aren't met.

Assumptions in hypothesis testing
Common assumptions of hypothesis tests
Testing sample size
Non-parametric tests
Which parametric test?
Wilcoxon signed-rank test
Non-parametric ANOVA and unpaired t-tests
Wilcoxon-Mann-Whitney
Kruskal-Wallis
Recap!

Experimental Design in Python

Implement Experimental Design Setups

Learn how to implement the most appropriate experimental design setup for your use case. Learn about how randomized block designs and factorial designs can be implemented to measure treatment effects and draw valid and precise conclusions.

Conduct Statistical Analyses on Experimental Data

Deep-dive into performing statistical analyses on experimental data, including selecting and conducting statistical tests, including t-tests, ANOVA tests, and chi-square tests of association. Conduct post-hoc analysis following ANOVA tests to discover precisely which pairwise comparisons are significantly different.

Conduct Power Analysis

Learn to measure the effect size to determine the amount by which groups differ, beyond being significantly different. Conduct a power analysis using an assumed effect size to determine the minimum sample size required to obtain a required statistical power. Use Cohen's d formulation to measure the effect size for some sample data, and test whether the effect size assumptions used in the power analysis were accurate.

Address Complexities in Experimental Data

Extract insights from complex experimental data and learn best practices for communicating findings to different stakeholders. Address complexities such as interactions, heteroscedasticity, and confounding in experimental data to improve the validity of your conclusions. When data doesn't meet the assumptions of parametric tests, you'll learn to choose and implement an appropriate nonparametric test.

4 Modules | 5+ Hours | 4 Skills

Course Modules

Building knowledge in experimental design allows you to test hypotheses with best-practice analytical tools and quantify the risk of your work. You’ll begin your journey by setting the foundations of what experimental design is and different experimental design setups such as blocking and stratification. You’ll then learn and apply visual and analytical tests for normality in experimental data.

Setting up experiments
Non-random assignment of subjects
Random assignment of subjects
Experimental data setup
Blocking experimental data
Stratifying an experiment
Which was stratified?
Normal data
Visual normality in an agricultural experiment
Analytical normality in an agricultural experiment

You'll delve into sophisticated experimental design techniques, focusing on factorial designs, randomized block designs, and covariate adjustments. These methodologies are instrumental in enhancing the accuracy, efficiency, and interpretability of experimental results. Through a combination of theoretical insights and practical applications, you'll acquire the skills needed to design, implement, and analyze complex experiments in various fields of research.

Factorial designs: principles and applications
Understanding marketing campaign effectiveness
Heatmap of campaign interactions
Factorial designs and randomized block designs
Randomized block design: controlling variance
Implementing a randomized block design
Visualizing productivity within blocks by incentive
ANOVA within blocks of employees
Covariate adjustment in experimental design
Importance of covariates
Covariate adjustment with chick growth

Master statistical tests like t-tests, ANOVA, and Chi-Square, and dive deep into post-hoc analyses and power analysis essentials. Learn to select the right test, interpret p-values and errors, and skillfully conduct power analysis to determine sample and effect sizes, all while leveraging Python's powerful libraries to bring your data insights to life.

Choosing the right statistical test
Choosing the right test: petrochemicals
Choosing the right test: human resources
Choosing the right test: finance
Post-hoc analysis following ANOVA
Anxiety treatments ANOVA
Applying Tukey's HSD
Applying Bonferoni correction
P-values, alpha, and errors
Analyzing toy durability
Visualizing durability differences
Role of significance levels
Power analysis: sample and effect size
Effect size purpose
Estimating required sample size for energy study

Hop into the complexities of experimental data analysis. Learn to synthesize insights using pandas, address data issues like heteroscedasticity with scipy.stats, and apply nonparametric tests like Mann-Whitney U. Learn additional techniques for transforming, visualizing, and interpreting complex data, enhancing your ability to conduct robust analyses in various experimental settings.

Synthesizing insights from complex experiments
Visualizing loan approval yield
Exploring customer satisfaction
Effectively communicating experimental data
Addressing complexities in experimental data
Check for heteroscedasticity in shelf life
Exploring and transforming shelf life data
Applying nonparametric tests in experimental analysis
Visualizing and testing preservation methods
Further analyzing food preservation techniques
Recap!

Supervised Learning with scikit-learn

Grow your machine learning skills with scikit-learn and discover how to use this popular Python library to train models using labeled data. In this course, you'll learn how to make powerful predictions, such as whether a customer is will churn from your business, whether an individual has diabetes, and even how to tell classify the genre of a song. Using real-world datasets, you'll find out how to build predictive models, tune their parameters, and determine how well they will perform with unseen data.

4 Modules | 5+ Hours | 4 Skills

Course Modules

In this module, you'll be introduced to classification problems and learn how to solve them using supervised learning techniques. You'll learn how to split data into training and test sets, fit a model, make predictions, and evaluate accuracy. You’ll discover the relationship between model complexity and performance, applying what you learn to a churn dataset, where you will classify the churn status of a telecom company's customers.

Machine learning with scikit-learn
Binary classification
The supervised learning workflow
The classification challenge
k-Nearest Neighbors: Fit
k-Nearest Neighbors: Predict
Measuring model performance
Train/test split + computing accuracy
Overfitting and underfitting
Visualizing model complexity

In this module, you will be introduced to regression, and build models to predict sales values using a dataset on advertising expenditure. You will learn about the mechanics of linear regression and common performance metrics such as R-squared and root mean squared error. You will perform k-fold cross-validation, and apply regularization to regression models to reduce the risk of overfitting.

Introduction to regression
Creating features
Building a linear regression model
Visualizing a linear regression model
The basics of linear regression
Fit and predict for regression
Regression performance
Cross-validation
Cross-validation for R-squared
Analyzing cross-validation metrics
Regularized regression
Regularized regression: Ridge
Lasso regression for feature importance

Having trained models, now you will learn how to evaluate them. In this module, you will be introduced to several metrics along with a visualization technique for analyzing classification model performance using scikit-learn. You will also learn how to optimize classification and regression models through the use of hyperparameter tuning.

How good is your model?
Deciding on a primary metric
Assessing a diabetes prediction classifier
Logistic regression and the ROC curve
Building a logistic regression model
The ROC curve
ROC AUC
Hyperparameter tuning
Hyperparameter tuning with GridSearchCV
Hyperparameter tuning with RandomizedSearchCV

Learn how to impute missing values, convert categorical data to numeric values, scale data, evaluate multiple supervised learning models simultaneously, and build pipelines to streamline your workflow!

Preprocessing data
Creating dummy variables
Regression with categorical features
Handling missing data
Dropping missing data
Pipeline for song genre prediction: I
Pipeline for song genre prediction: II
Centering and scaling
Centering and scaling for regression
Centering and scaling for classification
Evaluating multiple models
Visualizing regression model performance
Predicting on the test set
Visualizing classification model performance
Pipeline for predicting song popularity
Wrap-up

Unsupervised Learning in Python

Say you have a collection of customers with a variety of characteristics such as age, location, and financial history, and you wish to discover patterns and sort them into clusters. Or perhaps you have a set of texts, such as Wikipedia pages, and you wish to segment them into categories based on their content. This is the world of unsupervised learning, called as such because you are not guiding, or supervising, the pattern discovery by some prediction task, but instead uncovering hidden structure from unlabeled data. Unsupervised learning encompasses a variety of techniques in machine learning, from clustering to dimension reduction to matrix factorization. In this course, you'll learn the fundamentals of unsupervised learning and implement the essential algorithms using scikit-learn and SciPy. You will learn how to cluster, transform, visualize, and extract insights from unlabeled datasets, and end the course by building a recommender system to recommend popular musical artists.

4 Modules | 5+ Hours | 4 Skills

Course Modules

Learn how to discover the underlying groups (or "clusters") in a dataset. By the end of this module, you'll be clustering companies using their stock market prices, and distinguishing different species by clustering their measurements.

Unsupervised Learning
How many clusters?
Clustering 2D points
Inspect your clustering
Evaluating a clustering
How many clusters of grain?
Evaluating the grain clustering
Transforming features for better clusterings
Scaling fish data for clustering
Clustering the fish data
Clustering stocks using KMeans
Which stocks move together?

In this module, you'll learn about two unsupervised learning techniques for data visualization, hierarchical clustering and t-SNE. Hierarchical clustering merges the data samples into ever-coarser clusters, yielding a tree visualization of the resulting cluster hierarchy. t-SNE maps the data samples into 2d space so that the proximity of the samples to one another can be visualized.

Visualizing hierarchies
How many merges?
Hierarchical clustering of the grain data
Hierarchies of stocks
Cluster labels in hierarchical clustering
Which clusters are closest?
Different linkage, different hierarchical clustering!
Intermediate clusterings
Extracting the cluster labels
t-SNE for 2-dimensional maps
t-SNE visualization of grain dataset
A t-SNE map of the stock market

Dimension reduction summarizes a dataset using its common occuring patterns. In this module, you'll learn about the most fundamental of dimension reduction techniques, "Principal Component Analysis" ("PCA"). PCA is often used before supervised learning to improve model performance and generalization. It can also be useful for unsupervised learning. For example, you'll employ a variant of PCA will allow you to cluster Wikipedia articles by their content!

Visualizing the PCA transformation
Correlated data in nature
Decorrelating the grain measurements with PCA
Principal components
Intrinsic dimension
The first principal component
Variance of the PCA features
Intrinsic dimension of the fish data
Dimension reduction with PCA
Dimension reduction of the fish measurements
A tf-idf word-frequency array
Clustering Wikipedia part I
Clustering Wikipedia part II

In this module, you'll learn about a dimension reduction technique called "Non-negative matrix factorization" ("NMF") that expresses samples as combinations of interpretable parts. For example, it expresses documents as combinations of topics, and images in terms of commonly occurring visual patterns. You'll also learn to use NMF to build recommender systems that can find you similar articles to read, or musical artists that match your listening history!

Non-negative matrix factorization (NMF)
Non-negative data
NMF applied to Wikipedia articles
NMF features of the Wikipedia articles
NMF reconstructs samples
NMF learns interpretable parts
NMF learns topics of documents
Explore the LED digits dataset
NMF learns the parts of images
PCA doesn't learn parts
Building recommender systems using NMF
Which articles are similar to 'Cristiano Ronaldo'?
Recommend musical artists part I
Recommend musical artists part II
Final thoughts

Machine Learning with Tree-Based Models in Python

Decision trees are supervised learning models used for problems involving classification and regression. Tree models present a high flexibility that comes at a price: on one hand, trees are able to capture complex non-linear relationships; on the other hand, they are prone to memorizing the noise present in a dataset. By aggregating the predictions of trees that are trained differently, ensemble methods take advantage of the flexibility of trees while reducing their tendency to memorize noise. Ensemble methods are used across a variety of fields and have a proven track record of winning many machine learning competitions. In this course, you'll learn how to use Python to train decision trees and tree-based models with the user-friendly scikit-learn machine learning library. You'll understand the advantages and shortcomings of trees and demonstrate how ensembling can alleviate these shortcomings, all while practicing on real-world datasets. Finally, you'll also understand how to tune the most influential hyperparameters in order to get the most out of your models.

5 Modules | 6+ Hours | 5 Skills

Course Modules

Classification and Regression Trees (CART) are a set of supervised learning models used for problems involving classification and regression. In this module, you'll be introduced to the CART algorithm.

Decision tree for classification
Train your first classification tree
Evaluate the classification tree
Logistic regression vs classification tree
Classification tree Learning
Growing a classification tree
Using entropy as a criterion
Entropy vs Gini index
Decision tree for regression
Train your first regression tree
Evaluate the regression tree
Linear regression vs regression tree

The bias-variance tradeoff is one of the fundamental concepts in supervised machine learning. In this module, you'll understand how to diagnose the problems of overfitting and underfitting. You'll also be introduced to the concept of ensembling where the predictions of several models are aggregated to produce predictions that are more robust.

Generalization Error
Complexity, bias and variance
Overfitting and underfitting
Diagnose bias and variance problems
Instantiate the model
Evaluate the 10-fold CV error
Evaluate the training error
High bias or high variance?
Ensemble Learning
Define the ensemble
Evaluate individual classifiers
Better performance with a Voting Classifier

Bagging is an ensemble method involving training the same algorithm many times using different subsets sampled from the training data. In this module, you'll understand how bagging can be used to create a tree ensemble. You'll also learn how the random forests algorithm can lead to further ensemble diversity through randomization at the level of each split in the trees forming the ensemble.

Bagging
Define the bagging classifier
Evaluate Bagging performance
Out of Bag Evaluation
Prepare the ground
OOB Score vs Test Set Score
Random Forests (RF)
Train an RF regressor
Evaluate the RF regressor
Visualizing features importance

Boosting refers to an ensemble method in which several models are trained sequentially with each model learning from the errors of its predecessors. In this module, you'll be introduced to the two boosting methods of AdaBoost and Gradient Boosting.

Adaboost
Define the AdaBoost classifier
Train the AdaBoost classifier
Evaluate the AdaBoost classifier
Gradient Boosting (GB)
Define the GB regressor
Train the GB regressor
Evaluate the GB regressor
Stochastic Gradient Boosting (SGB)
Regression with SGB
Train the SGB regressor
Evaluate the SGB regressor

The hyperparameters of a machine learning model are parameters that are not learned from data. They should be set prior to fitting the model to the training set. In this module, you'll learn how to tune the hyperparameters of a tree-based model using grid search cross validation.

Tuning a CART's Hyperparameters
Tree hyperparameters
Set the tree's hyperparameter grid
Search for the optimal tree
Evaluate the optimal tree
Tuning a RF's Hyperparameters
Random forests hyperparameters
Set the hyperparameter grid of RF
Search for the optimal forest
Evaluate the optimal forest
Congratulations!

Intermediate Importing Data in Python

As a data scientist, you will need to clean data, wrangle and munge it, visualize it, build predictive models and interpret these models. Before you can do so, however, you will need to know how to get data into Python. In the prequel to this course, you learned many ways to import data into Python: from flat files such as .txt and .csv; from files native to other software such as Excel spreadsheets, Stata, SAS, and MATLAB files; and from relational databases such as SQLite and PostgreSQL. In this course, you'll extend this knowledge base by learning to import data from the web and by pulling data from Application Programming Interfaces— APIs—such as the Twitter streaming API, which allows us to stream real-time tweets.

3 Modules | 4+ Hours | 3 Skills

Course Modules

The web is a rich source of data from which you can extract various types of insights and findings. In this module, you will learn how to get data from the web, whether it is stored in files or in HTML. You'll also learn the basics of scraping and parsing web data.

Importing flat files from the web
Importing flat files from the web: your turn!
Opening and reading flat files from the web
Importing non-flat files from the web
HTTP requests to import files from the web
Performing HTTP requests in Python using urllib
Printing HTTP request results in Python using urllib
Performing HTTP requests in Python using requests
Scraping the web in Python
Parsing HTML with BeautifulSoup
Turning a webpage into data using BeautifulSoup: getting the text
Turning a webpage into data using BeautifulSoup: getting the hyperlinks

In this module, you will gain a deeper understanding of how to import data from the web. You will learn the basics of extracting data from APIs, gain insight on the importance of APIs, and practice extracting data by diving into the OMDB and Library of Congress APIs.

Introduction to APIs and JSONs
Pop quiz: What exactly is a JSON?
Loading and exploring a JSON
Pop quiz: Exploring your JSON
APIs and interacting with the world wide web
Pop quiz: What's an API?
API requests
JSON–from the web to Python
Checking out the Wikipedia A

In this module, you will consolidate your knowledge of interacting with APIs in a deep dive into the Twitter streaming API. You'll learn how to stream real-time Twitter data, and how to analyze and visualize it.

The Twitter API and Authentication
Streaming tweets
Load and explore your Twitter data
Twitter data to DataFrame
A little bit of Twitter text analysis
Plotting your Twitter data
Final Thoughts

Preprocessing for Machine Learning in Python

This course covers the basics of how and when to perform data preprocessing. This essential step in any machine learning project is when you get your data ready for modeling. Between importing and cleaning your data and fitting your machine learning model is when preprocessing comes into play. You'll learn how to standardize your data so that it's in the right form for your model, create new features to best leverage the information in your dataset, and select the best features to improve your model fit. Finally, you'll have some practice preprocessing by getting a dataset on UFO sightings ready for modeling.

5 Modules | 6+ Hours | 5 Skills

Course Modules

In this module you'll learn exactly what it means to preprocess data. You'll take the first steps in any preprocessing journey, including exploring data types and dealing with missing data.

Introduction to preprocessing
Exploring missing data
Dropping missing data
Working with data types
Exploring data types
Converting a column type
Training and test sets
Class imbalance
Stratified sampling

This module is all about standardizing data. Often a model will make some assumptions about the distribution or scale of your features. Standardization is a way to make your data fit these assumptions and improve the algorithm's performance.

Standardization
When to standardize
Modeling without normalizing
Log normalization
Checking the variance
Log normalization in Python
Scaling data for feature comparison
Scaling data - investigating columns
Scaling data - standardizing columns
Standardized data and modeling
KNN on non-scaled data
KNN on scaled data

In this section you'll learn about feature engineering. You'll explore different ways to create new, more useful, features from the ones already in your dataset. You'll see how to encode, aggregate, and extract information from both numerical and textual features.

Feature engineering
Feature engineering knowledge test
Identifying areas for feature engineering
Encoding categorical variables
Encoding categorical variables - binary
Encoding categorical variables - one-hot
Engineering numerical features
Aggregating numerical features
Extracting datetime components
Engineering text features
Extracting string patterns
Vectorizing text
Text classification using tf/idf vectors

This module goes over a few different techniques for selecting the most important features from your dataset. You'll learn how to drop redundant features, work with text vectors, and reduce the number of features in your dataset using principal component analysis (PCA).

Feature selection
When to use feature selection
Identifying areas for feature selection
Removing redundant features
Selecting relevant features
Checking for correlated features
Selecting features using text vectors
Exploring text vectors, part 1
Exploring text vectors, part 2
Training Naive Bayes with feature selection
Dimensionality reduction
Using PCA
Training a model with PCA

Now that you've learned all about preprocessing you'll try these techniques out on a dataset that records information on UFO sightings.

UFOs and preprocessing
Checking column types
Dropping missing data
Categorical variables and standardization
Extracting numbers from strings
Identifying features for standardization
Engineering new features
Encoding categorical variables
Features from dates
Text vectorization
Feature selection and modeling
Selecting the ideal dataset
Modeling the UFO dataset, part 1
Modeling the UFO dataset, part 2
Recap!

Developing Python Packages

Do you find yourself copying and pasting the same code between files, wishing it was easier to reuse and share your awesome snippets? Wrapping your code into Python packages can help! In this course, you’ll learn about package structure and the extra files needed to turn loose code into convenient packages. You'll also learn about import structure, documentation, and how to maintain code style using flake8. You’ll then speed up your package development by building templates, using cookiecutter to create package skeletons. Finally, you'll learn how to use setuptools and twine to build and publish your packages to PyPI—the world stage for Python packages!

4 Modules | 5+ Hours | 4+ Skills

Course Modules

Get your package started by converting scripts you have already written. You'll create a simple package which you can use on your own computer.

Starting a package
Modules, packages and subpackages
From script to package
Putting your package to work
Documentation
Writing function documentation with pyment
Writing function documentation with pyment II
Package and module documentation
Structuring imports
Sibling imports
Importing from parents
Exposing functions to users

Make your package installable for yourself and others. In this module, you'll learn to deal with dependencies, write READMEs, and include licenses. You'll also complete all the steps to publish your package on PyPI—the main home of Python packages.

Installing your own package
Adding the setup script
Installing your package locally
Utilizing editable installs
Dealing with dependencies
User dependencies
Development dependencies
Including licences and writing READMEs
Writing a README
MANIFEST - Including extra files with your package
Publishing your package
Building a distribution
Uploading distributions

Bring your package up to a professional standard. Discover how to use pytest to guard against errors, tox to test if your package functions with multiple versions of Python, and flake8 to maintain great code style.

Testing your package
Creating the test directory
Writing some basic tests
Running your tests
Testing your package with different environments
Setting up tox
Running tox
Keeping your package stylish
Appropriate style filtering
Using flake8 to tidy up a file
Ignoring specific errors
Configuring flake8

Create your packages more quickly. In this final module, you’ll learn how to use cookiecutter to generate all the supporting files your package needs, Makefiles to simplify releasing new versions, and be introduced to the last few files your package needs to attract users and contributors.

Faster package development with templates
Using package templates
Version numbers and history
CONTRIBUTING.md
History file
Tracking version number with bumpversion
Makefiles and classifiers
PyPI classifiers
Using makefiles
Wrap-up

Machine Learning for Business

Learn the Basics of Machine Learning

This course will introduce the key elements of machine learning to the business leaders. We will focus on the key insights and base practices how to structure business questions as modeling projects with the machine learning teams.

Dive into the Model Specifics

You will understand the different types of models, what kind of business questions they help answer, or what kind of opportunities they can uncover, also learn to identify situations where machine learning should NOT be applied, which is equally important. You will understand the difference between inference and prediction, predicting probability and amounts, and how using unsupervised learning can help build meaningful customer segmentation strategy.

4 Modules | 5+ Hours | 4 Skills

Course Modules

Machine learning is used in many different industries and fields. It can fundamentally improve the business if applied correctly. This module outlines machine learning use cases, job roles and how they fit in the data needs pyramid.

Machine learning and data pyramid
Terminology clarification
Order data pyramid needs
Match tasks in data pyramid
Machine learning principles
Modeling types
Find supervised and unsupervised cases
Job roles, tools and technologies
Job role responsibilities
Match data projects with job roles
Team structure types

This module overviews different machine learning types. We will look into differences between causal and prediction models, explore supervised and unsupervised learning, and finally understand the sub-types of supervised learning: classification and regression.

Prediction vs. inference dilemma
Inference and prediction differences
Identify inference vs. prediction use cases
Inference (causal) models
Experiments and causal models
Identify non actionable variables
Prediction models (supervised learning)
Supervised modeling principles
Identify classification and regression models
Prediction models (unsupervised learning)
Unsupervised modeling use cases
Classification, regression or unsupervised models

This module reviews key steps in scoping out business requirements, identifying and sizing machine learning opportunities, assessing the model performance, and identifying any performance risks in the process.

Business requirements
Identify situation, opportunity and action
Identify successful experiments
Model training
Model training process
Training, validation and test
Model performance measurement
Poor performance examples
Identify performance metrics
Machine learning risks
Fixing non performing models
Non-actionable models
Identify actionable recommendations

This module will look into the best and worst practices of managing machine learning projects. We will identify most common machine learning mistakes, learn how to manage communication between the business and ML teams and finally address the challenges when deploying machine learning models to production.

Machine learning mistakes
Identify machine learning mistakes
Data needs pyramid
Match ML mistakes by their types
Communication management
Business communication focus
Market testing
Machine learning in production
Production systems
Production systems ML use cases
ML in production launch
Wrap-up

Introduction to SQL

Learn how Relational Databases are Organized

SQL is an essential language for building and maintaining relational databases, which opens the door to a range of careers in the data industry and beyond. You’ll start this course by covering data organization, tables, and best practices for database construction.

Write Your First SQL Queries

The second half of this course looks at creating SQL queries for selecting data that you need from your database. You’ll have the chance to practice your querying skills before moving on to customizing and saving your results.

Understand the Difference Between PostgreSQL and SQL Server

PostgreSQL and SQL Server are two of the most popular SQL flavors. You’ll finish off this course by looking at the differences, benefits, and applications of each. By the end of the course, you’ll have some hands-on experience in learning SQL and the grounding to start applying it to projects or continue your learning in a more specialized direction.

2 Modules | 3+ Hours | 2 Skills

Course Modules

Before writing any SQL queries, it’s important to understand the underlying data. In this module, we’ll discover the role of SQL in creating and querying relational databases. Using a database for a local library, we will explore database and table organization, data types and storage, and best practices for database construction.

Introduction to Database Management System
What are the advantages of databases?
Data organization
Introduction to SQL
Tables in SQL
Views in SQL
Table vs Views
Picking a unique ID
Setting the table in style
Finding data types

Introduction
Entity Relationship Model
Relationships in SQL
Recap

Introduction
Downloading SQL Developer Edition
Installing SQL Developer Edition
Connecting to SQL Server
Downloading Sample SQL Database in SQL Management Studio (SSMS)
Configuring SQL Server, and SSMS
Recap

Database Manipulation in SQL
SQL Storage Engines
Creating and Managing Tables in SQL
Creating and Managing Tables in SQL CREATE, DESCRIBE, and SHOW Table
Creating and Managing Tables in SQL ALTER, TRUNCATE, and DROP Tables
Inserting and Querying Data in Tables
Filtering Data From Tables in SQL
Filtering Data From Tables in SQL WHERE and DISTINCT Clauses
Filtering Data From Tables in SQL AND and OR Operators
Filtering Data From Tables in SQL IN and NOT IN Operators
Filtering Data From Tables in SQL BETWEEN and LIKE Operators
Filtering Data From Tables in SQL TOP, IS NULL, and IS NOT NULL Operators
Sorting Table Data
Recap

Learn your first SQL keywords for selecting relevant data from database tables! After practicing querying skills in a database of books, you’ll customize query results using aliasing and save them as views so they can be shared. Finally, you’ll explore the differences between SQL flavors and databases such as SQL Server.

Introducing queries
SQL strengths
Developing SQL style
Querying the books table
Writing queries
Comments in SQL
Making queries DISTINCT
Aliasing
Viewing your query
SQL flavors
Comparing flavors
Limiting results

Intermediate SQL

SQL is widely recognized as the most popular language for turning raw data stored in a database into actionable insights. This course uses a films database to teach how to navigate and extract insights from the data using SQL.

Discover Filtering with SQL

You'll discover techniques for filtering and comparing data, enabling you to extract specific information to gain insights and answer questions about the data.

Get Acquainted with Aggregation

Next, you'll get a taste of aggregate functions, essential for summarizing data effectively and gaining valuable insights from large datasets. You'll also combine this with sorting and grouping data, adding another layer of meaning to your insights and analysis.

Write Clean Queries

Finally, you'll be shown some tips and best practices for presenting your data and queries neatly. Throughout the course, you'll have hands-on practice queries to solidify your understanding of the concepts. By the end of the course, you'll have everything you need to know to analyze data using your own SQL code today!

7 Modules | 8+ Hours | 5+ Skills

Course Modules

In this first module, you’ll learn how to query a films database and select the data needed to answer questions about the movies and actors. You'll also understand how SQL code is executed and formatted.

SELECT Statement
SELECT DISTINCT
Query execution
Order of execution
SQL style
SQL best practices
Formatting
Non-standard fields

Arithmetic Operators: +, -, *, /, %
Comparison Operators: =, >, <, >=, <=, <>, !=
Logical Operators: AND, OR, NOT
Special Operators: LIKE, IN, NOT, NOT EQUAL, IS NULL, UNION , UNION ALL ,
| Except, Between, ALL and ANY, INTERSECT Clause, EXISTS

Learn about how you can filter numerical and textual data with SQL. Filtering is an important use for this language. You’ll learn how to use new keywords and operators to help you narrow down your query to get results that meet your desired criteria and gain a better understanding of NULL values and how to handle them.

Filtering numbers
Filtering results
Using WHERE with numbers
Using WHERE with text
Multiple criteria
Using AND
Using OR
Using BETWEEN
Filtering text
LIKE and NOT LIKE
WHERE IN
Combining filtering and selecting
Understanding NULL values
Practice with NULLs

Here, we will teach you how to sort and group data. These skills will take your analyses to a new level by helping you uncover critical business insights and identify trends and performance. You'll get hands-on experience to determine which films performed the best and how movie durations and budgets changed over time.

Sorting results
Sorting text
The SQL ORDER BY
ORDER BY - ascending
ORDER BY - descending
Sorting single fields
Sorting multiple fields

Data Definition Language (DDL): CREATE, DROP, ALTER, TRUNCATE
Data Query Language (DQL): SELECT, WHERE
Data Manipulation Language (DML): INSERT, UPDATE, DELETE

NOT NULL Constraints
UNIQUE Constraints
Primary Key Constraints
Foreign Key Constraints
Composite Key
Unique Constraints
Alternate Key
CHECK Constraints
DEFAULT Constraints

SQL allows you to zoom in and out to better understand an entire dataset, its subsets, and its individual records. You'll learn to summarize data using aggregate functions and perform basic arithmetic calculations inside queries to gain insights into what makes a successful film.

COUNT, SUM, AVG, MIN, MAX, Num
Summarizing data
Aggregate functions and data types
Practice with aggregate functions
Summarizing subsets
Grouping data
GROUP BY single fields
GROUP BY multiple fields
Answering business questions
Filtering grouped data
Filter with HAVING
HAVING and sorting
Combining aggregate functions with WHERE
Using ROUND()
ROUND() with a negative parameter
Aliasing and arithmetic
Using arithmetic
Aliasing with functions
Rounding results

Joining Data In SQL

Joining data is an essential skill in data analysis, enabling you to draw information from separate tables together into a single, meaningful set of results. In this comprehensive course on joining data, you'll delve into the intricacies of table joins and relational set theory, learning how to optimize your queries for efficient data retrieval.

Understand Data Joining Fundamentals

You will learn how to work with multiple tables in SQL by navigating and extracting data from various tables within a SQL database using various join types, including inner joins, outer joins, and cross joins. With practice, you'll gain the knowledge of how to select the appropriate join method.

Explore Advanced Data Manipulation Techniques

Next up, you'll explore set theory principles such as unions, intersects, and except clauses, as well as discover the power of nested queries in SQL. Every step is accompanied by exercises and opportunities to apply the theory and grow your confidence in SQL.

5 Modules | 6+ Hours | 5 Skills

Course Modules

Introduction to Alias
Introduction to JOINS
Right Cross and Self Join
Operators in SQL
Operators in SQL Updated
Intersect and Emulation
Minus and Emulation
Subquery in SQL
Subqueries with Statements and Operators
Subqueries with Commands
Derived Tables in SQL
EXISTS Operator
NOT EXISTS Operator
EXISTS vs IN Operators
Recap

In this closing Module, you’ll begin by investigating semi-joins and anti-joins. Next, you'll learn how to use nested queries. Last but not least, you’ll wrap up the course with some challenges!

Subquerying with semi joins and anti joins
Multiple WHERE clauses
Semi join
Diagnosing problems using anti join
Subqueries inside WHERE and SELECT
Subquery inside WHERE
WHERE do people live?
Subquery inside SELECT
Subqueries inside FROM
Subquery inside FROM
Subquery challenge
Final challenge
The finish line

In this module, you’ll be introduced to the concept of joining tables and will explore all the ways you can enrich your queries using joins—beginning with inner joins.

The ins and outs of INNER JOIN
Your first join
Joining with aliased tables
USING in action
Defining relationships
Relationships in our database
Inspecting a relationship
Multiple joins
Joining multiple tables
Checking multi-table joins

After familiarizing yourself with inner joins, you will come to grips with different kinds of outer joins. Next, you will learn about cross joins. Finally, you will learn about situations in which you might join a table with itself.

LEFT and RIGHT JOINs
Remembering what is LEFT
This is a LEFT JOIN, right?
Building on your LEFT JOIN
Is this RIGHT?
FULL JOINs
Comparing joins
Chaining FULL JOINs
Crossing into CROSS JOIN
Histories and languages
Choosing your join
Self joins
Comparing a country to itself
All joins on deck

In this module, you will learn about using set theory operations in SQL, with an introduction to UNION, UNION ALL, INTERSECT, and EXCEPT clauses. You’ll explore the predominant ways in which set theory operations differ from join operations.

Set theory for SQL Joins
UNION vs. UNION ALL
Comparing global economies
Comparing two set operations
At the INTERSECT
INTERSECT
Review UNION and INTERSECT
EXCEPT
You've got it, EXCEPT...
Calling all set operators

Learn Git

This course introduces learners to version control using Git. You will discover the importance of version control when working on data science projects and explore how you can use Git to track files, compare differences, modify and save files, undo changes, and allow collaborative development through the use of branches. You will gain an introduction to the structure of a repository, how to create new repositories and clone existing ones, and show how Git stores data. By working through typical data science tasks, you will gain the skills to handle conflicting files!

4 Modules | 5+ Hours | 4 Skills

Course Modules

In the first module, you’ll learn what version control is and why it is essential for data projects. Then, you’ll discover what Git is and how to use it for a version control workflow.

Introduction to version control with Git
Using the shell
Checking the version of Git
Saving files
Where does Git store information?
The Git workflow
Adding a file
Adding multiple files
Comparing files
What has changed?
What is going to be committed?
What's in the staging area?

Next, you’ll examine how Git stores data, learn essential commands to compare files and repositories at different times, and understand the process for restoring earlier versions of files in your data projects.

Storing data with Git
Interpreting the commit structure
Viewing a repository's history
Viewing a specific commit
Viewing changes
Comparing to the second most recent commit
Comparing commits
Who changed what?
Undoing changes before committing
How to unstage a file
Undoing changes to unstaged files
Undoing all changes
Restoring and reverting
Restoring an old version of a repo
Deleting untracked files
Restoring an old version of a file

In this module, you'll learn tips and tricks for configuring Git to make you more efficient! You'll also discover branches, identify how to create and switch to different branches, compare versions of files between branches, merge branches together, and deal with conflicting files across branches.

Configuring Git
Modifying your email address in Git
Creating an alias
Ignoring files
Branches
Branching and merging
Creating new branches
Checking the number of branches
Comparing branches
Working with branches
Switching branches
Merging two branches
Handling conflict
Recognizing conflict syntax
Resolving a conflict

This final module is all about collaboration! You'll gain an introduction to remote repositories and learn how to work with them to synchronize content between the cloud and your local computer. You'll also see how to create new repositories and clone existing ones, along with discovering a workflow to minimize the risk of conflicts between local and remote repositories.

Creating repos
Setting up a new repo
Converting an existing project
Working with remotes
Cloning a repo
Defining and identifying remotes
Gathering from a remote
Fetching from a remote
Pulling from a remote
Pushing to a remote
Pushing to a remote repo
Handling push conflicts
Wrap up!

THE COMPLETE DATA ANALYSIS & VISUALIZATION WITH PYTHON COST

United States

$899.99

United Kingdom

£799.99

Career and Certifications

GreaterHeight Academy's Certificate Holders also prepared work at companies like:

Our Advisor is just a CALL away

+1 5169831065 +447474275645

Available 24x7 for your queries

Talk to our advisors

Our advisors will get in touch with you in the next 24 hours.

Get Advice

FAQs

Complete Data Analysis & Visualization with Python Course

Python, created by Guido van Rossum in 1991, is a high-level, readable programming language known for its simplicity. It's versatile, with applications in web development, data analysis, AI, and more. Python's extensive standard library and rich ecosystem enhance its capabilities. It's cross-platform compatible and supported by a large community. Python's popularity has grown, making it widely used in diverse industries.

A Python developer is a software developer or programmer who specializes in using the Python programming language for creating applications, software, or solutions. They have expertise in writing Python code, understanding the language's syntax, libraries, and frameworks. Python developers are skilled in utilizing Python's features to develop web applications, data analysis tools, machine learning models, automation scripts, and other software solutions.
They work in various industries, collaborating with teams or independently to design, implement, test, and maintain Python-based projects. Python developers often possess knowledge of related technologies and tools to enhance their development process.

Python Developer Masters Program is a structured learning path recommended by leading industry experts and ensures that you transform into a proficient Python Developer. Being a full fledged Python Developer requires you to master multiple technologies and this program aims at providing you an in-depth knowledge of the entire Python programming practices. Individual courses at GreaterHeight Academy focus on specialization in one or two specific skills; however, if you intend to become a master in Python programming then this is your go to path to follow.

Yes. But you can also raise a ticket with the dedicated support team at any time. If your query does not get resolved through email, we can also arrange one-on-one sessions with our support team. However, our support is provided for a period of Twelve Weeks from the start date of your course.

There are several reasons why becoming a Python developer can be a rewarding career choice. Here are a few:

Versatility and Popularity: Python is a versatile programming language that can be used for various purposes, such as web development, data analysis, machine learning, artificial intelligence, scientific computing, and more. It has gained immense popularity in recent years due to its simplicity, readability, and extensive library ecosystem. Python is widely used in both small-scale and large-scale projects, making it a valuable skill in the job market.
Ease of Learning: Python has a clean and intuitive syntax that emphasizes readability, which makes it relatively easy to learn compared to other programming languages. Its simplicity allows beginners to grasp the fundamentals quickly and start building useful applications in a relatively short amount of time. This accessibility makes Python an attractive choice for both novice and experienced programmers.
Rich Ecosystem and Libraries: Python offers a vast collection of libraries and frameworks that can accelerate development and simplify complex tasks. For example, Django and Flask are popular web development frameworks that provide robust tools for building scalable and secure web applications. NumPy, Pandas, and Matplotlib are widely used libraries for data analysis and visualization. TensorFlow and PyTorch are prominent libraries for machine learning and deep learning. These libraries, among many others, contribute to Python's efficiency and effectiveness as a development language.
Job Opportunities: The demand for Python developers has been steadily growing in recent years. Many industries, including technology, finance, healthcare, and academia, rely on Python for various applications. By becoming a Python developer, you open up a wide range of career opportunities, whether you choose to work for a large corporation, a startup, or even as a freelancer. Additionally, Python's versatility allows you to explore different domains and switch roles if desired.
Community and Support: Python has a vibrant and supportive community of developers worldwide. This community actively contributes to the language's development, creates open-source libraries, and provides assistance through forums, online communities, and resources.

There are no prerequisites for enrollment to this Masters Program. Whether you are an experienced professional working in the IT industry or an aspirant planning to enter the world of Python programming, this masters program is designed and developed to accommodate various professional backgrounds.

Python Developer Masters Program has been curated after thorough research and recommendations from industry experts. It will help you differentiate yourself with multi-platform fluency and have real-world experience with the most important tools and platforms. GreaterHeight Academy will be by your side throughout the learning journey - We’re Ridiculously Committed.

The recommended duration to complete this Python Developer Masters Program is about 20 weeks, however, it is up to the individual to complete this program at their own pace.

The roles and responsibilities of a Python developer may vary depending on the specific job requirements and industry. However, here are some common tasks and responsibilities associated with the role:

Developing Applications: Python developers are responsible for designing, coding, testing, and debugging applications using Python programming language. This includes writing clean, efficient, and maintainable code to create robust software solutions.
Web Development: Python is widely used for web development. As a Python developer, you may be involved in building web applications, using frameworks like Django or Flask. This includes developing backend logic, integrating databases, handling data processing, and ensuring the smooth functioning of the web application.
Data Analysis and Visualization: Python offers powerful libraries like NumPy, Pandas, and Matplotlib, which are extensively used for data analysis and visualization. Python developers may be responsible for manipulating and analyzing large datasets, extracting insights, and presenting them visually.
Machine Learning and AI: Python is a popular choice for machine learning and artificial intelligence projects. Python developers may work on implementing machine learning algorithms, training models, and integrating them into applications. This involves using libraries like TensorFlow, PyTorch, or scikit-learn.
Collaborating and Teamwork: Python developers often work as part of a development team. They collaborate with other team members, including designers, frontend developers, project managers, and stakeholders. Effective communication and teamwork skills are crucial to ensure smooth project execution.
Documentation: Python developers are expected to document their code, providing clear explanations and instructions for others who may work on or maintain the codebase in the future. Documentation helps in understanding the code and facilitating collaboration.
Continuous Learning: Technology is constantly evolving, and as a Python developer, you need to stay updated with the latest advancements, libraries, frameworks, and best practices. Continuous learning and self-improvement are essential to excel in this role.

The Python Developer training course is for those who want to fast-track their Python programming career. This Python Developer Masters Program will benefit people working in the following roles:

Freshers
Engineers
IT professionals
Data Scientist
Machine Learning Engineer
AI Engineer
Business analysts
Data analysts

Top companies such as Microsoft, Google, Meta, Citibank, Well Fargo, and many more are actively hiring certified Python professionals at various positions.

On completing this Python Developer Masters Program, you’ll be eligible for the roles like: Python Developer, Web Developer, Data Analyst, Data Scientist, Software Engineer and many more.

There is undoubtedly great demand for data analytics as 96% of organizations seek to hire Data Analysts. The most significant data analyst companies that employ graduates who wish to have a data analyst career are Manthan, SAP, Oracle, Accenture Analytics, Alteryx, Qlik, Mu Sigma Analytics, Fractal Analytics, and Tiger Analytics. Professional Data Analyst training will make you become a magician of any organization, and you will spin insights by playing with big data.

A successful data analyst possesses a combination of technical skills and leadership skills.

Technical skills include knowledge of database languages such as SQL, R, or Python; spreadsheet tools such as Microsoft Excel or Google Sheets for statistical analysis; and data visualization software such as Tableau or Qlik. Mathematical and statistical skills are also valuable to help gather, measure, organize, and analyze data while using these common tools.
Leadership skills prepare a data analyst to complete decision-making and problem-solving tasks. These abilities allow analysts to think strategically about the information that will help stakeholders make data-driven business decisions and to communicate the value of this information effectively. For example, project managers rely on data analysts to track the most important metrics for their projects, to diagnose problems that may be occurring, and to predict how different courses of action could address a problem.

Career openings are available practically from all industries, from telecommunications to retail, banking, healthcare, and even fitness. Without extensive training and effort, it isn't easy to get data analyst career benefits. So, earning our Data Analyst certification will allow you to keep up-to-date on recent trends in the industry.

Yes, we do. We will discuss all possible technical interview questions and answers during the training program so that you can prepare yourself for interview.

No. Any abuse of copyright is taken seriously. Thanks for your understanding on this one.

Yes, we would be providing you with the certificate of completion of the program once you have successfully submitted all the assessment and it has been verified by our subject matter experts.

GreaterHeight is offering you the most updated, relevant, and high-value real-world projects as part of the training program. This way, you can implement the learning that you have acquired in real-world industry setup. All training comes with multiple projects that thoroughly test your skills, learning, and practical knowledge, making you completely industry ready.
You will work on highly exciting projects in the domains of high technology, ecommerce, marketing, sales, networking, banking, insurance, etc. After completing the projects successfully, your skills will be equal to 6 months of rigorous industry experience.

All our mentors are highly qualified and experience professionals. All have at least 15-20 yrs. of development experience in various technologies and are trained by GreaterHeight Academy to deliver interactive training to the participants.

Yes, we do. As the technology upgrades, we do update our content and provide your training on latest version of that technology.

All online training classes are recorded. You will get the recorded sessions so that you can watch the online classes when you want. Also, you can join other class to do your missing classes.

OUR POPULAR COURSES

Data Analytics and Visualization With Python

Advanced developments of expertise in cleaning, transforming, and modelling data to obtain insight into corporate decision making as a Senior Data Analyst - Using Python.

View Details

Data Science Training Masters Program

Learn Python, Statistics, Data Preparation, Data Analysis, Querying Data, Machine Learning, Clustering, Text Processing, Collaborative Filtering, Image Processing, etc..

View Details

Microsoft Azure DP-100 Data Science

You will Optimize & Manage Models, Perform Administration by using T-SQL, Run Experiment & Train Models, Deploy & Consume Models, and Automate Tasks.

View Details

Machine Learning using Python

Learn Data Science and Machine Learning from scratch, get hired, and have fun along the way with the most modern, up-to-date Data Science course on.

View Details

Microsoft Azure PL-300 Data Analysis

You will learn how to Design a Data Model in Power BI, Optimize Model Performance, Manage Datasets in Power BI and Create Paginated Reports.

View Details

Microsoft Azure DP-203 Data Engineer

You will learn Batch & Real Time Analytics, Azure Synapse Analytics, Azure Databricks, Implementing Security and ETL & ELT Pipelines.

View Details

The GreaterHeight Advantage

0+

Accredited Courseware

Most of our training courses are accredited by the respective governing bodies.

0+

Assured Classes

All our training courses are assured & scheduled dates are confirmed to run by SME.

0+

Expert Instructor Led Programs

We have well equipped and highly experienced instructors to train the professionals.

OUR CLIENTS

We Have Worked With Some Amazing Companies Around The World

Our awesome clients we've had the pleasure to work with!

The Complete Data Scientist With Python

Complete Data Scientist with Python

Course Benefits & Key Features

Data Scientist with Python’s benefits and key features.

Modules

Lessons

Practical

Life Projects

Resume

Job

Recording

Interviews

Support

Membership

Networks

Certification

INSTRUCTOR-LED LIVE ONLINE CLASSES

Complete Python Data Scientist Job Outlook

Ranked #1 ProgrammingLanguage

Python SalaryTrend

44.8% Compound annual growth rate (CAGR)

Why Data Scientist with Python?

Learn In-demand Skills

Earn a Higher Salary

Positive Job Outlook

Shape the Future

Become a Leader

Data Analysis are Constantly Evolving

GreaterHeight Certificates holders are prepared to work at companies like these.

Some Alumni Testimonies

QUEEN OBIWULU

BISOLA OGUNRO

TUNDE MEREDITH

JOHN OSI PETER

AYODELE PAYNE

ADEBAYO OLADEJO

Complete Data Scientist with Python Courses

Module 1: Python Basics

Module 2: Python Lists

Module 3: Functions and Packages

Module 4: NumPy

Module 1: Matplotlib

Module 2: Dictionaries & Pandas

Module 3: Logic, Control Flow and Filtering

Module 4: Loops

Module 5: Case Study: Hacker Statistics

Module 1: Transforming DataFrames

Module 2: Aggregating DataFrames

Module 3: Slicing and Indexing DataFrames

Module 4: Creating and Visualizing DataFrames

Module 1: Data Merging Basics

Module 2: Merging Tables With Different Join Types

Module 3: Advanced Merging and Concatenating

Module 4: Merging Ordered and Time-Series Data

Module 1: Summary Statistics

Module 2: Random Numbers and Probability

Module 3: More Distributions and the Central Limit Theorem

Module 4: Correlation and Experimental Design

Module 1: Introduction to Matplotlib

Module 2: Plotting time-series

Module 3: Quantitative comparisons and statistical visualizations

Module 4: Sharing Visualizations with Others

Module 1: Introduction to Seaborn

Module 2: Visualizing Two Quantitative Variables

Module 3: Visualizing a Categorical and a Quantitative Variable

Module 4: Customizing Seaborn Plots

Module 1: Writing Your Own Functions

Module 2: Default Arguments, Variable-Length Arguments and Scope

Module 3: Lambda Functions and Error-Handling

Module 1: Using iterators in PythonLand

Module 2: List Comprehensions and Generators

Module 3: Bringing it all Together!

Module 1: Getting to Know a Dataset

Module 2: Data Cleaning and Imputation

Module 3: Relationships in Data

Module 4: Turning Exploratory Analysis into Action

Module 1: Introduction to Categorical Data

Module 2: Categorical Pandas Series

Module 3: Visualizing Categorical Data

Module 4: Pitfalls and Encoding

Ranked #1 Programming
Language

Python Salary
Trend