Loading
  • Main Menu
GreaterHeight Technologies LLC ~ GreaterHeight Academy
  • All Courses
    • BI and Visualization
      • Mastering Data and Business Analytics
        • Basic Excel for Data Analysis
        • Intermediate and Advanced Excel for Data Analysis
        • Excel for Business Analysis & Analyst
        • PivotTable, PowerPivot, PowerQuery & DAX for Data Analysi
        • Data Analytics and Visualization with Tableau
        • Data Analytics with Power-BI
        • Data Analytics and Visualisation with SQL
      • Mastering Python for Data Analytics and Visualization
        • Python Foundation for Data Analytics
        • Data Analysis Using Python With Numpy and Pandas
        • Data Visualization Using Python with Matplotlib and Seaborn
        • Data Science with SQL Server and Azure SQL Database
        • Data Analytics and Visualisation with PowerBI
      • Complete Microsoft Excel Master Program
        • Basic Excel for Data Analysis
        • Excel Interactive Dashboard for Data Analysis
        • Intermediate and Advanced Excel for Data Analysis
        • PivotTable PowerPivot, PowerQuery & DAX for Data Analysis
        • Excel for Data Analysis and Visualization
        • Excel for Business Analysis & Analyst
      • Data Analytics With SQL Master Program
      • Master Data Analytics With PowerBI
      • Financial Reporting with PowerBI
      • Data Analysts with Power-BI
      • Data Analytics and Visualization with Excel
    • Mastering Python
      • Python Developer Masters Program
        • Python Programming Certification Course
        • Data Science With Python Certification
        • Artificial Intelligence Certification Course
        • PySpark Certification Training Course
        • Python Statistics for Data Science
      • The Complete Python Developer
      • Data Analysis and Visualization with Python
      • Complete Data Scientist with Python
      • Data Engineer with SQL and Python
      • Machine Learning Engineer with Python
    • Azure Cloud Computing
      • DevOps Engineer and Solutions Architect Master Program
      • Greaterheight Azure GH-602 Cloud Solution Architect Master
      • Greaterheight Azure GH-601 Cloud DevOps Master
      • Microsoft Azure az-900 Fundamentals
      • Microsoft Azure az-104 Administrator
      • Microsoft Azure az-204 Developer
      • Microsoft Azure az-305 Solutions Architect
      • Microsoft Azure az-400 DevOps Engineer
      • Microsoft Azure AI-900 Fundamentals
      • Microsoft Azure DP-100 Data Science
    • SQL and SQL-Server Database
      • Mastering SQL Server Development
      • Data Analytics With SQL Master Program
      • Data Engineer Course Online Masters Program
      • Data Science with SQL Server and Azure SQL Database
    • DevOps Development Program
      • DevOps Engineer & Solution Architect Expert Program
    • Data Science
      • Data Science With Python Certification
      • Pythom Statistics for Data Science
      • Data Science with SQL Server and Azure SQL Database
      • Complete Data Scientist with Python
  • Who We Serve
    • Individuals
    • Business
    • Universities
  • Partners
    • Employer Networks
    • Community Partnership
    • Opportunity Funds
    • Future Finance
    • Scholarships
  • Resources
    • Webinars
    • Blog
    • Tutorials
    • White Papers
    • Podcast
    • Events
  • Get Advice
GreaterHeight Technologies LLC ~ GreaterHeight Academy
  • All Courses
    • BI and Visualization
      • Mastering Data and Business Analytics
        • Basic Excel for Data Analysis
        • Intermediate and Advanced Excel for Data Analysis
        • Excel for Business Analysis & Analyst
        • PivotTable, PowerPivot, PowerQuery & DAX for Data Analysi
        • Data Analytics and Visualization with Tableau
        • Data Analytics with Power-BI
        • Data Analytics and Visualisation with SQL
      • Mastering Python for Data Analytics and Visualization
        • Python Foundation for Data Analytics
        • Data Analysis Using Python With Numpy and Pandas
        • Data Visualization Using Python with Matplotlib and Seaborn
        • Data Science with SQL Server and Azure SQL Database
        • Data Analytics and Visualisation with PowerBI
      • Complete Microsoft Excel Master Program
        • Basic Excel for Data Analysis
        • Excel Interactive Dashboard for Data Analysis
        • Intermediate and Advanced Excel for Data Analysis
        • PivotTable PowerPivot, PowerQuery & DAX for Data Analysis
        • Excel for Data Analysis and Visualization
        • Excel for Business Analysis & Analyst
      • Data Analytics With SQL Master Program
      • Master Data Analytics With PowerBI
      • Financial Reporting with PowerBI
      • Data Analysts with Power-BI
      • Data Analytics and Visualization with Excel
    • Mastering Python
      • Python Developer Masters Program
        • Python Programming Certification Course
        • Data Science With Python Certification
        • Artificial Intelligence Certification Course
        • PySpark Certification Training Course
        • Python Statistics for Data Science
      • The Complete Python Developer
      • Data Analysis and Visualization with Python
      • Complete Data Scientist with Python
      • Data Engineer with SQL and Python
      • Machine Learning Engineer with Python
    • Azure Cloud Computing
      • DevOps Engineer and Solutions Architect Master Program
      • Greaterheight Azure GH-602 Cloud Solution Architect Master
      • Greaterheight Azure GH-601 Cloud DevOps Master
      • Microsoft Azure az-900 Fundamentals
      • Microsoft Azure az-104 Administrator
      • Microsoft Azure az-204 Developer
      • Microsoft Azure az-305 Solutions Architect
      • Microsoft Azure az-400 DevOps Engineer
      • Microsoft Azure AI-900 Fundamentals
      • Microsoft Azure DP-100 Data Science
    • SQL and SQL-Server Database
      • Mastering SQL Server Development
      • Data Analytics With SQL Master Program
      • Data Engineer Course Online Masters Program
      • Data Science with SQL Server and Azure SQL Database
    • DevOps Development Program
      • DevOps Engineer & Solution Architect Expert Program
    • Data Science
      • Data Science With Python Certification
      • Pythom Statistics for Data Science
      • Data Science with SQL Server and Azure SQL Database
      • Complete Data Scientist with Python
  • Who We Serve
    • Individuals
    • Business
    • Universities
  • Partners
    • Employer Networks
    • Community Partnership
    • Opportunity Funds
    • Future Finance
    • Scholarships
  • Resources
    • Webinars
    • Blog
    • Tutorials
    • White Papers
    • Podcast
    • Events
  • Get Advice



The Complete Data Scientist With Python


Learn Python for data science and gain the career-building skills you need to succeed as a data scientist, from data manipulation to machine learning. Master the skills you need to pass the Data Scientist in Python certification and prepare yourself for success in the field of data science.


Get Advice

Complete Data Scientist with Python 


Who this course is for:

  • Python developers curious about the data analysis libraries.
  • Python developers curious about the data visualization libraries.
  • Anyone interested in learning Python.
  • Data Analysts
  • Anyone working with data


What you will Learn:

  • Python, we will be using Python3 in this course.
  • Data Analysis Libraries in Python such as NumPy and Pandas.
  • Data Visualization.
  • Data Visualization Libraries in Python such as Matplotlib and Seaborn.
  • How to use Python to manipulate & process data.
  • Data analysis & data visualization using Python.
  • How to analyze data.
  • Jupyter Notebooks IDE / Anaconda Distribution.

Course Benefits & Key Features

Data Scientist with Python’s benefits and key features.
Modules

30+ Modules.

Lessons

80+ Lessons

Practical

40+ Hands-On Labs

Life Projects

5+ Projects

Resume

CV Preparations

Job

Jobs Reference

Recording

Session Recording

Interviews

Mock Interviews

Support

On Job Supports

Membership

Membership Access

Networks

Networking

Certification

Certificate of Completion


INSTRUCTOR-LED LIVE ONLINE CLASSES

Our learn-by-building-project method enables you to build

practical or coding experience that sticks. 95% of our                        

learners say they have confidence and remember more               

when they learn by building real world projects which is                

required to work in your real life.


  • Get step-by-step guidance to practice your skills without getting stuck
  • Validate your technical problem-solving skills in a real environment
  • Troubleshoot complex scenarios to practice what you learned
  • Develop production experience that translates into real-world
.

Complete Python Data Scientist Job Outlook


.

Ranked #1 Programming
Language

TIOBE and PYPL ranks Python as the most popular global programming language.

Python Salary
Trend

The average salary for a Python Developer is $114,489 per year in the United States.

44.8% Compound annual growth rate (CAGR)

The global python market size is expected to reach USD 100.6 million in 2030.

Why Data Scientist with Python?

Learn In-demand Skills

Those with careers in data analysis learn relevant in-demand skills that span industries and add value to every digital-enabled organization.


Earn a Higher Salary

Experienced data analysts can earn up to $112,000 per year and transition into higher-paying jobs as Senior Data Analysts, Data Scientists, or Analytics Managers.

Positive Job Outlook

The data analytics market is predicted to hit $132.90 Billion USD by 2026. COVID-19 pandemic accelerated the adoption of data analytics solutions and services.

Shape the Future

Data analysts transform organizations by capitalizing on data to improve their business decisions and solve critical real-world problems.

Become a Leader

Being a central part of an organization’s decision-making processes, analytics experts often pick up strong leadership skills as well.

Data Analysis are Constantly Evolving

Data analysis moves quickly, and data analysts are constantly learning and advancing in their careers.




GreaterHeight Certificates holders are prepared to work at companies like these.

Some Alumni Testimonies

Investing in the course "Become a Data Analyst" with GreaterHeight Academy is great value for the money and I highly recommend. The trainer is very knowledgeable, very engaging, provided us with quality training sessions on all courses and was easily acessible for queries. We also had access to the course materials and also the timely availability of the recorded videos made it easy and aided the learning process..

QUEEN OBIWULU

Team Lead, Customer Success

The training was fantastic, the instructor is an awesome lecturer, relentless and not tired in his delivery. He obviously enjoys teaching, it comes natural to him. We got more than we expected. He extended my knowledge of Excel beyond what I knew, and the courses were brilliantly delivered. They reach out, follow up, ask questions, and in fact the support has been great. They are highly recommended and I would definitely subscribe to other training programs from them.

BISOLA OGUNRO

Fraud Analytics Risk Oversight Manager

It's one thing to look for just a Data Analysis training, and it's another to get the knowledge transferred through certified professional trainers. No matter your initial level of proficiency in any of the Data Analysis tools, GreaterHeight Academy would meet you there and take you up to a highly proficienct and confident level in a short time at a reasonable pace. I learnt a lot of Data Analysis tools and skills at GreaterHeight from patient and resourceful teachers.

TUNDE MEREDITH

Operation Director - Abbfem Technology

The Data Analysis training program was one of the best I have attended. The way GreaterHeight took off with Excel and concluded the four courses with Excel was a mind blowing - it was WOW!! I concluded that I'm on the right path with the right mentor to take me from a novice to professional. GreaterHeight is the best as far as impacting Data Analysis knowledge is concern. I would shout it at the rooftop to recommend GreaterHeight to any trainee that really wants to learn.

JOHN OSI PETER

Greaterheight

I wanted to take a moment to express my deepest gratitude for the opportunity to study data analytics at GreaterHeight Academy. I am truly impressed by the level of dedication and support that the sponsor and CEO have put into this program. GreaterHeight Academy is without a doubt the best tech institution out there, providing top-notch education and resources for its students. One of the advantages of studying at GreaterHeight Academy is the access to the best tools and technologies in the field. 

AYODELE PAYNE

Sales/Data Analyst

It is an unforgettable experience that will surely stand the test of time learning to become a Data Analyst with GreaterHeights Academy. The Lecture delivery was so impactful and the Trainer is vast and well knowledgeable in using the applicable tools for the Sessions. Always ready to go extra mile with you. The supports you get during and after the lectures are top notch with materials and resources available to build your confidence on and off the job.

ADEBAYO OLADEJO

Customer Service Advisor (Special Operations)

Complete Data Scientist with Python Courses


Learn Python for data science and gain the career-building skills you need to succeed as a data scientist, from data manipulation to machine learning! In this track, you’ll learn how this versatile language allows you to import, clean, manipulate, and visualize data—all integral skills for any aspiring data professional or researcher. Starting with the Python essentials for data science, you’ll work through interactive exercises that test your abilities. You’ll get hands-on with some of the most popular Python libraries for data science, including pandas, Seaborn, Matplotlib, scikit-learn, and many more. As you progress, you’ll work with real-world datasets to learn the statistical and machine learning techniques you need to perform hypothesis testing and build predictive models. You’ll also get an introduction to supervised learning with scikit-learn and apply your skills to various projects. Start this track, grow your data science skills, and begin your journey to confidently pass the Associate Data Scientist in Python certification and thrive as a data scientist.


Master the skills you need to pass the Data Scientist in Python certification and prepare yourself for success in the field of data science. Throughout this track, you will focus on using Python for data science, starting with the basics and progressing to more advanced topics such as machine learning. You’ll cover a broad range of areas, including data manipulation, visualization, and analysis, using popular Python libraries such as pandas, Seaborn, Matplotlib, and scikit-learn. As you progress, you’ll work through interactive exercises using real-world datasets to help you test your abilities and develop your skills. These examples will help you explore various statistical and machine learning techniques, including hypothesis testing and predictive modeling. You’ll also gain an understanding of package development, data preprocessing, SQL for relational databases, Git for data science projects, and more. Complete this track to gain the knowledge and experience necessary to confidently pass the Data Scientist in Python certification and thrive as a data scientist.
.


Introduction to Python
An Introduction to Python
Python has grown to become the market leader in programming languages and the language of choice for data analysts and data scientists. Demand for data skills is rising because companies want to gain actionable insights from their data.

Discover the Python Basics
This is a Python course for beginners, and we designed it for people with no prior Python experience. It is even suitable if you have no coding experience at all. You will cover the basics of Python, helping you understand common, everyday functions and applications, including how to use Python as a calculator, understanding variables and types, and building Python lists. The first half of this course prepares you to use Python interactively and teaches you how to store, access, and manipulate data using one of the most popular programming languages in the world.

Explore Python Functions and Packages
The second half of the course starts with a view of how you can use functions, methods, and packages to use code that other Python developers have written. As an open-source language, Python has plenty of existing packages and libraries that you can use to solve your problems.

Get Started with NumPy
NumPy is an essential Python package for data science. You’ll finish this course by learning to use some of the most popular tools in the NumPy array and start exploring data in Python.

4 Modules | 6+ Hours | 4 Skills

Course Modules 


An introduction to the basic concepts of Python. Learn how to use Python interactively and by using a script. Create your first variables and acquaint yourself with Python's basic data types.


  1. Hello Python!
  2. Your first Python code
  3. Any comments?
  4. Python as a calculator
  5. Variables and Types
  6. Variable Assignment
  7. Calculations with variables
  8. Other variable types
  9. Operations with other types

Learn to store, access, and manipulate data in lists: the first step toward efficiently working with huge amounts of data.


  1. Python Lists
  2. Create a list
  3. Create lists with different types
  4. List of lists
  5. Subsetting Lists
  6. Subset and conquer
  7. Slicing and dicing
  8. Subsetting lists of lists
  9. Manipulating Lists
  10. Replace list elements
  11. Extend a list
  12. Delete list elements
  13. Inner workings of lists

You'll learn how to use functions, methods, and packages to efficiently leverage the code that brilliant Python developers have written. The goal is to reduce the amount of code you need to solve challenging problems!


  1. Functions
  2. Familiar functions
  3. Help!
  4. Multiple arguments
  5. Methods
  6. String Methods
  7. List Methods
  8. List Methods (2)
  9. Packages
  10. Import package
  11. Selective import
  12. Different ways of importing

NumPy is a fundamental Python package to efficiently practice data science. Learn to work with powerful tools in the NumPy array, and get started with data exploration.


  1. NumPy
  2. Your First NumPy Array
  3. Baseball players' height
  4. NumPy Side Effects
  5. Subsetting NumPy Arrays
  6. 2D NumPy Arrays
  7. Your First 2D NumPy Array
  8. Baseball data in 2D form
  9. Subsetting 2D NumPy Arrays
  10. 2D Arithmetic
  11. NumPy: Basic Statistics
  12. Average versus median
  13. Explore the baseball data


Intermediate Python
Improve Your Python Skills
Learning Python is crucial for any aspiring data science practitioner. Learn to visualize real data with Matplotlib’s functions and get acquainted with data structures such as the dictionary and pandas DataFrame. This four-hour intermediate course will help you to build on your existing Python skills and explore new Python applications and functions that expand your repertoire and help you work more efficiently.

Learn to Use Python Dictionaries and pandas
Dictionaries offer an alternative to Python lists, while the pandas dataframe is the most popular way of working with tabular data. In the second module of this course, you’ll find out how you can create and manipulate datasets, and how to access them using these structures. Hands-on practice throughout the course will build your confidence in each area.

Explore Python Boolean Logic and Python Loops
In the second half of this course, you’ll look at logic, control flow, filtering and loops. These functions work to control decision-making in Python programs and help you to perform more operations with your data, including repeated statements. You’ll finish the course by applying all of your new skills by using hacker statistics to calculate your chances of winning a bet.

Once you’ve completed all of the modules, you’ll be ready to apply your new skills in your job, new career, or personal project, and be prepared to move onto more advanced Python learning!

5 Modules | 6+ Hours | 5 Skills

Course Modules 


Data visualization is a key skill for aspiring data scientists. Matplotlib makes it easy to create meaningful and insightful plots. In this module, you’ll learn how to build various types of plots, and customize them to be more visually appealing and interpretable.


  1. Basic plots with Matplotlib
  2. Line plot (1)
  3. Line Plot (2): Interpretation
  4. Line plot (3)
  5. Scatter Plot (1)
  6. Scatter plot (2)
  7. Histogram
  8. Build a histogram (1)
  9. Build a histogram (2): bins
  10. Build a histogram (3): compare
  11. Choose the right plot (1)
  12. Choose the right plot (2)
  13. Customization
  14. Labels
  15. Ticks
  16. Sizes
  17. Colors
  18. Additional Customizations
  19. Interpretation

Learn about the dictionary, an alternative to the Python list, and the pandas DataFrame, the de facto standard to work with tabular data in Python. You will get hands-on practice with creating and manipulating datasets, and you’ll learn how to access the information you need from these data structures.


  1. Dictionaries, Part 1
  2. Motivation for dictionaries
  3. Create dictionary
  4. Access dictionary
  5. Dictionaries, Part 2
  6. Dictionary Manipulation (1)
  7. Dictionary Manipulation (2)
  8. Dictionariception
  9. Pandas, Part 1
  10. Dictionary to DataFrame (1)
  11. Dictionary to DataFrame (2)
  12. CSV to DataFrame (1)
  13. CSV to DataFrame (2)
  14. Pandas, Part 2
  15. Square Brackets (1)
  16. Square Brackets (2)
  17. loc and iloc (1)
  18. loc and iloc (2)
  19. loc and iloc (3)

Boolean logic is the foundation of decision-making in Python programs. Learn about different comparison operators, how to combine them with Boolean operators, and how to use the Boolean outcomes in control structures. You'll also learn to filter data in pandas DataFrames using logic.


  1. Comparison Operators
  2. Equality
  3. Greater and less than
  4. Compare arrays
  5. Boolean Operators
  6. and, or, not (1)
  7. and, or, not (2)
  8. Boolean operators with NumPy
  9. if, elif, else
  10. Warmup
  11. if
  12. Add else
  13. Customize further: elif
  14. Filtering pandas DataFrames
  15. Driving right (1)
  16. Driving right (2)
  17. Cars per capita (1)
  18. Cars per capita (2)

There are several techniques you can use to repeatedly execute Python code. While loops are like repeated if statements, the for loop iterates over all kinds of data structures. Learn all about them in this module.


  1. while loop
  2. while: warming up
  3. Basic while loop
  4. Add conditionals
  5. for loop
  6. Loop over a list
  7. Indexes and values (1)
  8. Indexes and values (2)
  9. Loop over list of lists
  10. Loop Data Structures Part 1
  11. Loop over dictionary
  12. Loop over NumPy array
  13. Loop Data Structures Part 2
  14. Loop over DataFrame (1)
  15. Loop over DataFrame (2)
  16. Add column (1)
  17. Add column (2)

This module will allow you to apply all the concepts you've learned in this course. You will use hacker statistics to calculate your chances of winning a bet. Use random number generators, loops, and Matplotlib to gain a competitive edge!


  1. Random Numbers
  2. Random float
  3. Roll the dice
  4. Determine your next move
  5. Random Walk
  6. The next step
  7. How low can you go?
  8. Visualize the walk
  9. Distribution
  10. Simulate multiple walks
  11. Visualize all walks
  12. Implement clumsiness
  13. Plot the distribution
  14. Calculate the odds


Data Manipulation With Pandas
Discover Data Manipulation with pandas
With this course, you’ll learn why pandas is the world's most popular Python library, used for everything from data manipulation to data analysis. You’ll explore how to manipulate DataFrames, as you extract, filter, and transform real-world datasets for analysis.

With pandas, you’ll explore all the core data science concepts. Using real-world data, including Walmart sales figures and global temperature time series, you’ll learn how to import, clean, calculate statistics, and create visualizations—using pandas to add to the power of Python.

Work with pandas Data to Explore Core Data Science Concepts
You’ll start by mastering the pandas basics, including how to inspect DataFrames and perform some fundamental manipulations. You’ll also learn about aggregating DataFrames, before moving on to slicing and indexing.

You’ll wrap up the course by learning how to visualize the contents of your DataFrames, working with a dataset that contains weekly US avocado sales.

Learn to Manipulate DataFrames
By completing this pandas course, you’ll understand how to use this Python library for data manipulation. You’ll have an understanding of DataFrames and how to use them, as well as be able to visualize your data in Python.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


Let’s master the pandas basics. Learn how to inspect DataFrames and perform fundamental manipulations, including sorting rows, subsetting, and adding new columns.


  1. Introducing DataFrames
  2. Inspecting a DataFrame
  3. Parts of a DataFrame
  4. Sorting and subsetting
  5. Sorting rows
  6. Subsetting columns
  7. Subsetting rows
  8. Subsetting rows by categorical variables
  9. New columns
  10. Adding new columns
  11. Combo-attack!

In this module, you’ll calculate summary statistics on DataFrame columns, and master grouped summary statistics and pivot tables.


  1. Summary statistics
  2. Mean and median
  3. Summarizing dates
  4. Efficient summaries
  5. Cumulative statistics
  6. Counting
  7. Dropping duplicates
  8. Counting categorical variables
  9. Grouped summary statistics
  10. What percent of sales occurred at each store type?
  11. Calculations with .groupby()
  12. Multiple grouped summaries
  13. Pivot tables
  14. Pivoting on one variable
  15. Fill in missing values and sum values with pivot tables

Indexes are supercharged row and column names. Learn how they can be combined with slicing for powerful DataFrame subsetting.


  1. Explicit indexes
  2. Setting and removing indexes
  3. Subsetting with .loc[]
  4. Setting multi-level indexes
  5. Sorting by index values
  6. Slicing and subsetting with .loc and .iloc
  7. Slicing index values
  8. Slicing in both directions
  9. Slicing time series
  10. Subsetting by row/column number
  11. Working with pivot tables
  12. Pivot temperature by city and year
  13. Subsetting pivot tablesp
  14. Calculating on a pivot table

Learn to visualize the contents of your DataFrames, handle missing data values, and import data from and export data to CSV files.


  1. Visualizing your data
  2. Which avocado size is most popular?
  3. Changes in sales over time
  4. Avocado supply and demand
  5. Price of conventional vs. organic avocados
  6. Missing values
  7. Finding missing values
  8. Removing missing values
  9. Replacing missing values
  10. Creating DataFrames
  11. List of dictionaries
  12. Dictionary of lists
  13. Reading and writing CSVs
  14. CSV to DataFrame
  15. DataFrame to CSV
  16. Wrap-up


Joining Data with Pandas

Being able to combine and work with multiple datasets is an essential skill for any aspiring Data Scientist. pandas is a crucial cornerstone of the Python data science ecosystem, with Stack Overflow recording 5 million views for pandas questions. Learn to handle multiple DataFrames by combining, organizing, joining, and reshaping them using pandas. You'll work with datasets from the World Bank and the City Of Chicago. You will finish the course with a solid skillset for data-joining in pandas.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


Learn how you can merge disparate data using inner joins. By combining information from multiple sources you’ll uncover compelling insights that may have previously been hidden. You’ll also learn how the relationship between those sources, such as one-to-one or one-to-many, can affect your result.


  1. Inner join
  2. What column to merge on?
  3. Your first inner join
  4. Inner joins and number of rows returned
  5. One-to-many relationships
  6. One-to-many classification
  7. One-to-many merge
  8. Merging multiple DataFrames
  9. Total riders in a month
  10. Three table merge
  11. One-to-many merge with multiple tables

Take your knowledge of joins to the next level. In this module, you’ll work with TMDb movie data as you learn about left, right, and outer joins. You’ll also discover how to merge a table to itself and merge on a DataFrame index.


  1. Left join
  2. Counting missing rows with left join
  3. Enriching a dataset
  4. How many rows with a left join?
  5. Other joins
  6. Right join to find unique movies
  7. Popular genres with right join
  8. Using outer join to select actors
  9. Merging a table to itself
  10. Self join
  11. How does pandas handle self joins?
  12. Merging on indexes
  13. Index merge for movie ratings
  14. Do sequels earn more?

In this module, you’ll leverage powerful filtering techniques, including semi-joins and anti-joins. You’ll also learn how to glue DataFrames by vertically combining and using the pandas.concat function to create new datasets. Finally, because data is rarely clean, you’ll also learn how to validate your newly combined data structures.


  1. Filtering joins
  2. Steps of a semi join
  3. Performing an anti join
  4. Performing a semi join
  5. Concatenate DataFrames together vertically
  6. Concatenation basics
  7. Concatenating with keys
  8. Verifying integrity
  9. Validating a merge
  10. Concatenate and merge to find common songs

In this final module, you’ll step up a gear and learn to apply pandas' specialized methods for merging time-series and ordered data together with real-world financial and economic data from the city of Chicago. You’ll also learn how to query resulting tables using a SQL-style format, and unpivot data using the melt method.


  1. Using merge_ordered()
  2. Correlation between GDP and S&P500
  3. Phillips curve using merge_ordered()
  4. merge_ordered() caution, multiple columns
  5. Using merge_asof()
  6. Using merge_asof() to study stocks
  7. Using merge_asof() to create dataset
  8. merge_asof() and merge_ordered() differences
  9. Selecting data with .query()
  10. Explore financials with .query()
  11. Subsetting rows with .query()
  12. Reshaping data with .melt()
  13. Select the right .melt() arguments
  14. Using .melt() to reshape government data
  15. Using .melt() for stocks vs bond performance
  16. Course wrap-up


Introduction to Statistics in Python
Statistics is the study of how to collect, analyze, and draw conclusions from data. It’s a hugely valuable tool that you can use to bring the future into focus and infer the answer to tons of questions. For example, what is the likelihood of someone purchasing your product, how many calls will your support team receive, and how many jeans sizes should you manufacture to fit 95% of the population? In this course, you'll discover how to answer questions like these as you grow your statistical skills and learn how to calculate averages, use scatterplots to show the relationship between numeric values, and calculate correlation. You'll also tackle probability, the backbone of statistical reasoning, and learn how to use Python to conduct a well-designed study to draw your own conclusions from data.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


Summary statistics gives you the tools you need to boil down massive datasets to reveal the highlights. In this module, you'll explore summary statistics including mean, median, and standard deviation, and learn how to accurately interpret them. You'll also develop your critical thinking skills, allowing you to choose the best summary statistics for your data.


  1. What is statistics?
  2. Descriptive and inferential statistics
  3. Data type classification
  4. Measures of center
  5. Mean and median
  6. Mean vs. median
  7. Measures of spread
  8. Quartiles, quantiles, and quintiles
  9. Variance and standard deviation
  10. Finding outliers using IQR

In this module, you'll learn how to generate random samples and measure chance using probability. You'll work with real-world sales data to calculate the probability of a salesperson being successful. Finally, you’ll use the binomial distribution to model events with binary outcomes.


  1. What are the chances?
  2. With or without replacement?
  3. Calculating probabilities
  4. Sampling deals
  5. Discrete distributions
  6. Creating a probability distribution
  7. Identifying distributions
  8. Expected value vs. sample mean
  9. Continuous distributions
  10. Which distribution?
  11. Data back-ups
  12. Simulating wait times
  13. The binomial distribution
  14. Simulating sales deals
  15. Calculating binomial probabilities
  16. How many sales will be won?

It’s time to explore one of the most important probability distributions in statistics, normal distribution. You’ll create histograms to plot normal distributions and gain an understanding of the central limit theorem, before expanding your knowledge of statistical functions by adding the Poisson, exponential, and t-distributions to your repertoire.


  1. The normal distribution
  2. Distribution of Amir's sales
  3. Probabilities from the normal distribution
  4. Simulating sales under new market conditions
  5. Which market is better?
  6. The central limit theorem
  7. Visualizing sampling distributions
  8. The CLT in action
  9. The mean of means
  10. The Poisson distribution
  11. Identifying lambda
  12. Tracking lead responses
  13. More probability distributions
  14. Distribution dragging and dropping
  15. Modeling time between leads
  16. The t-distribution

In this module, you'll learn how to quantify the strength of a linear relationship between two variables, and explore how confounding variables can affect the relationship between two other variables. You'll also see how a study’s design can influence its results, change how the data should be analyzed, and potentially affect the reliability of your conclusions.


  1. Correlation
  2. Guess the correlation
  3. Relationships between variables
  4. Correlation caveats
  5. What can't correlation measure?
  6. Transforming variables
  7. Does sugar improve happiness?
  8. Confounders
  9. Design of experiments
  10. Study types
  11. Longitudinal vs. cross-sectional studies
  12. Course Wrap up!


Introduction to Data Visualization with Matplotlib 
Visualizing data in plots and figures exposes the underlying patterns in the data and provides insights. Good visualizations also help you communicate your data to others, and are useful to data analysts and other consumers of the data. In this course, you will learn how to use Matplotlib, a powerful Python data visualization library. Matplotlib provides the building blocks to create rich visualizations of many different kinds of datasets. You will learn how to create visualizations for different kinds of data and how to customize, automate, and share these visualizations.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


This module introduces the Matplotlib visualization library and demonstrates how to use it with data.


  1. Introduction to data visualization with Matplotlib
  2. Using the matplotlib.pyplot interface
  3. Adding data to an Axes object
  4. Customizing your plots
  5. Customizing data appearance
  6. Customizing axis labels and adding titles
  7. Small multiples
  8. Creating a grid of subplots
  9. Creating small multiples with plt.subplots
  10. Small multiples with shared y axis

Time series data is data that is recorded. Visualizing this type of data helps clarify trends and illuminates relationships between data.


  1. Plotting time-series data
  2. Read data with a time index
  3. Plot time-series data
  4. Using a time index to zoom in
  5. Plotting time-series with different variables
  6. Plotting two variables
  7. Defining a function that plots time-series data
  8. Using a plotting function
  9. Annotating time-series data
  10. Annotating a plot of time-series data
  11. Plotting time-series: putting it all together

Visualizations can be used to compare data in a quantitative manner. This module explains several methods for quantitative visualizations.


  1. Quantitative comparisons: bar-charts
  2. Bar chart
  3. Stacked bar chart
  4. Quantitative comparisons: histograms
  5. Creating histograms
  6. "Step" histogram
  7. Statistical plotting
  8. Adding error-bars to a bar chart
  9. Adding error-bars to a plot
  10. Creating boxplots
  11. Quantitative comparisons: scatter plots
  12. Simple scatter plot
  13. Encoding time by color

This module shows you how to share your visualizations with others: how to save your figures as files, how to adjust their look and feel, and how to automate their creation based on input data.


  1. Preparing your figures to share with others
  2. Selecting a style for printing
  3. Switching between styles
  4. Saving your visualizations
  5. Saving a file several times
  6. Save a figure with different sizes
  7. Automating figures from data
  8. Unique values of a column
  9. Automate your visualization
  10. Where to go next


Introduction to Data Visualization with Seaborn
Create Your Own Seaborn Plots
Seaborn is a powerful Python library that makes it easy to create informative and attractive data visualizations. This 4-hour course provides an introduction to how you can use Seaborn to create a variety of plots, including scatter plots, count plots, bar plots, and box plots, and how you can customize your visualizations.

Turn Real Datasets into Custom Seaborn Visualizations
You’ll explore this library and create your Seaborn plots based on a variety of real-world data sets, including exploring how air pollution in a city changes through the day and looking at what young people like to do in their free time. This data will give you the opportunity to find out about Seaborn’s advantages first hand, including how you can easily create subplots in a single figure and how to automatically calculate confidence intervals.

Improve Your Data Communication Skills
By the end of this course, you’ll be able to use Seaborn in various situations to explore your data and effectively communicate the results of your data analysis to others. These skills are highly sought-after for data analysts, data scientists, and any other job that may involve creating data visualizations. If you’d like to continue your learning, this course is part of several tracks, including the Data Visualization track, where you can add more libraries and techniques to your skillset.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


What is Seaborn, and when should you use it? In this module, you will find out! Plus, you will learn how to create scatter plots and count plots with both lists of data and pandas DataFrames. You will also be introduced to one of the big advantages of using Seaborn - the ability to easily add a third variable to your plots by using color to represent different subgroups.


  1. Introduction to Seaborn
  2. Making a scatter plot with lists
  3. Making a count plot with a list
  4. Using pandas with Seaborn
  5. "Tidy" vs. "untidy" data
  6. Making a count plot with a DataFrame
  7. Adding a third variable with hue
  8. Hue and scatter plots
  9. Hue and count plots

In this module, you will create and customize plots that visualize the relationship between two quantitative variables. To do this, you will use scatter plots and line plots to explore how the level of air pollution in a city changes over the course of a day and how horsepower relates to fuel efficiency in cars. You will also see another big advantage of using Seaborn - the ability to easily create subplots in a single figure!


  1. Introduction to relational plots and subplots
  2. Creating subplots with col and row
  3. Creating two-factor subplots
  4. Customizing scatter plots
  5. Changing the size of scatter plot points
  6. Changing the style of scatter plot points
  7. Introduction to line plots
  8. Interpreting line plots
  9. Visualizing standard deviation with line plots
  10. Plotting subgroups in line plots

Categorical variables are present in nearly every dataset, but they are especially prominent in survey data. In this module, you will learn how to create and customize categorical plots such as box plots, bar plots, count plots, and point plots. Along the way, you will explore survey data from young people about their interests, students about their study habits, and adult men about their feelings about masculinity.


  1. Count plots and bar plots
  2. Count plots
  3. Bar plots with percentages
  4. Customizing bar plots
  5. Box plots
  6. Create and interpret a box plot
  7. Omitting outliers
  8. Adjusting the whiskers
  9. Point plots
  10. Customizing point plots
  11. Point plots with subgroups

In this final module, you will learn how to add informative plot titles and axis labels, which are one of the most important parts of any data visualization! You will also learn how to customize the style of your visualizations in order to more quickly orient your audience to the key takeaways. Then, you will put everything you have learned together for the final exercises of the course!


  1. Changing plot style and color
  2. Changing style and palette
  3. Changing the scale
  4. Using a custom palette
  5. Adding titles and labels: Part 1
  6. FacetGrids vs. AxesSubplots
  7. Adding a title to a FacetGrid object
  8. Adding titles and labels: Part 2
  9. Adding a title and axis labels
  10. Rotating x-tick labels
  11. Putting it all together
  12. Box plot with subgroups
  13. Bar plot with subgroups and subplots
  14. Wrap up!


Introduction to Functions in Python
It's time to push forward and develop your Python chops even further. Python has tons of fantastic functions and a module ecosystem. However, as a data professional or developer, you'll constantly need to write your own functions to solve problems that are dictated by your data. You will learn the art of function writing in this first course. You'll come out of this course being able to write your very own custom functions, complete with multiple parameters and multiple return values, along with default arguments and variable-length arguments. You'll gain insight into scoping in Python, be able to write lambda functions and handle errors in your function writing practice. You'll wrap up each module by using your new skills to write functions that analyze Twitter data.

3 Modules | 4+ Hours | 3 Skills

Course Modules 


You'll learn how to write simple functions, as well as functions that accept multiple arguments and return multiple values. You'll also have the opportunity to apply these new skills to questions commonly encountered by data professionals and developers.


  1. User-defined functions
  2. Strings in Python
  3. Recapping built-in functions
  4. Write a simple function
  5. Single-parameter functions
  6. Functions that return single values
  7. Multiple parameters and return values
  8. Functions with multiple parameters
  9. A brief introduction to tuples
  10. Functions that return multiple values
  11. Bringing it all together
  12. Bringing it all together (1)
  13. Bringing it all together (2)

You'll learn to write functions with default arguments so that the user doesn't always need to specify them, and variable-length arguments so they can pass an arbitrary number of arguments on to your functions. You'll also learn about the essential concept of scope.


  1. Scope and user-defined functions
  2. Pop quiz on understanding scope
  3. The keyword global
  4. Python's built-in scope
  5. Nested functions
  6. Nested Functions I
  7. Nested Functions II
  8. The keyword nonlocal and nested functions
  9. Default and flexible arguments
  10. Functions with one default argument
  11. Functions with multiple default arguments
  12. Functions with variable-length arguments (*args)
  13. Functions with variable-length keyword arguments (**kwargs)
  14. Bringing it all together
  15. Bringing it all together (1)
  16. Bringing it all together (2)

Learn about lambda functions, which allow you to write functions quickly and on the fly. You'll also practice handling errors in your functions, which is an essential skill. Then, apply your new skills to answer data science questions.


  1. Lambda functions
  2. Pop quiz on lambda functions
  3. Writing a lambda function you already know
  4. Map() and lambda functions
  5. Filter() and lambda functions
  6. Reduce() and lambda functions
  7. Introduction to error handling
  8. Pop quiz about errors
  9. Error handling with try-except
  10. Error handling by raising an error
  11. Bringing it all together
  12. Bringing it all together (1)
  13. Bringing it all together (2)
  14. Bringing it all together (3)
  15. Bringing it all together: testing your error handling skills


Python Toolbox
In this Python Toolbox course, you'll continue to build more advanced Python skills. First, you'll learn about iterators, objects you have already encountered in the context of for loops. You'll then learn about list comprehensions, which are extremely handy tools for all data professionals and developers working in Python. You'll end the course by working through a case study in which you'll apply all the techniques you learned in both parts of this course.

3 Modules | 4+ Hours | 3 Skills

Course Modules 


You'll learn all about iterators and iterables, which you have already worked with when writing for loops. You'll learn some handy functions that will allow you to effectively work with iterators. And you’ll finish the module with a use case that is pertinent to the world of data science and dealing with large amounts of data—in this case, data from Twitter that you will load in chunks using iterators.


  1. Introduction to iterators
  2. Iterators vs. Iterables
  3. Iterating over iterables (1)
  4. Iterating over iterables (2)
  5. Iterators as function arguments
  6. Playing with iterators
  7. Using enumerate
  8. Using zip
  9. Using * and zip to 'unzip'
  10. Using iterators to load large files into memory
  11. Processing large amounts of Twitter data
  12. Extracting information for large amounts of Twitter data

In this module, you'll build on your knowledge of iterators and be introduced to list comprehensions, which allow you to create complicated lists—and lists of lists—in one line of code! List comprehensions can dramatically simplify your code and make it more efficient, and will become a vital part of your Python toolbox. You'll then learn about generators, which are extremely helpful when working with large sequences of data that you may not want to store in memory, but instead generate on the fly.


  1. List comprehensions
  2. Write a basic list comprehension
  3. List comprehension over iterables
  4. Writing list comprehensions
  5. Nested list comprehensions
  6. Advanced comprehensions
  7. Using conditionals in comprehensions (1)
  8. Using conditionals in comprehensions (2)
  9. Dict comprehensions
  10. Introduction to generator expressions
  11. List comprehensions vs. generators
  12. Write your own generator expressions
  13. Changing the output in generator expressions
  14. Build a generator
  15. Wrapping up comprehensions and generators.
  16. List comprehensions for time-stamped data
  17. Conditional list comprehensions for time-stamped data

This module will allow you to apply your newly acquired skills toward wrangling and extracting meaningful information from a real-world dataset—the World Bank's World Development Indicators. You'll have the chance to write your own functions and list comprehensions as you work with iterators and generators to solidify your Python chops.


  1. Welcome to the case study!
  2. Zipping dictionaries
  3. Writing a function to help you
  4. Using a list comprehension
  5. Turning this all into a DataFrame
  6. Using Python generators for streaming data
  7. Processing data in chunks (1)
  8. Writing a generator to load data in chunks (2)
  9. Writing a generator to load data in chunks (3)
  10. Using pandas' read_csv iterator for streaming data
  11. Writing an iterator to load data in chunks (1)
  12. Writing an iterator to load data in chunks (2)
  13. Writing an iterator to load data in chunks (3)
  14. Writing an iterator to load data in chunks (4)
  15. Writing an iterator to load data in chunks (5)
  16. Final thoughts


Exploratory Data Analysis in Python
So you’ve got some interesting data - where do you begin your analysis? This course will cover the process of exploring and analyzing data, from understanding what’s included in a dataset to incorporating exploration findings into a data science workflow.

Using data on unemployment figures and plane ticket prices, you’ll leverage Python to summarize and validate data, calculate, identify and replace missing values, and clean both numerical and categorical values. Throughout the course, you’ll create beautiful Seaborn visualizations to understand variables and their relationships.

For example, you’ll examine how alcohol use and student performance are related. Finally, the course will show how exploratory findings feed into data science workflows by creating new features, balancing categorical features, and generating hypotheses from findings.

By the end of this course, you’ll have the confidence to perform your own exploratory data analysis (EDA) in Python.You’ll be able to explain your findings visually to others and suggest the next steps for gathering insights from your data!

4 Modules | 5+ Hours | 4 Skills

Course Modules 


What's the best way to approach a new dataset? Learn to validate and summarize categorical and numerical data and create Seaborn visualizations to communicate your findings.


  1. Initial exploration
  2. Functions for initial exploration
  3. Counting categorical values
  4. Global unemployment in 2021
  5. Data validation
  6. Detecting data types
  7. Validating continents
  8. Validating range
  9. Data summarization
  10. Summaries with .groupby() and .agg()
  11. Named aggregations
  12. Visualizing categorical summaries

Exploring and analyzing data often means dealing with missing values, incorrect data types, and outliers. In this module, you’ll learn techniques to handle these issues and streamline your EDA processes!


  1. Addressing missing data
  2. Dealing with missing data
  3. Strategies for remaining missing data
  4. Imputing missing plane prices
  5. Converting and analyzing categorical data
  6. Finding the number of unique values
  7. Flight duration categories
  8. Adding duration categories
  9. Working with numeric data
  10. Flight duration
  11. Adding descriptive statistics
  12. Handling outliers
  13. What to do with outliers
  14. Identifying outliers
  15. Removing outliers

Variables in datasets don't exist in a vacuum; they have relationships with each other. In this module, you'll look at relationships across numerical, categorical, and even DateTime data, exploring the direction and strength of these relationships as well as ways to visualize them.


  1. Patterns over time
  2. Importing DateTime data
  3. Updating data type to DateTime
  4. Visualizing relationships over time
  5. Correlation
  6. Interpreting a heatmap
  7. Visualizing variable relationships
  8. Visualizing multiple variable relationships
  9. Factor relationships and distributions
  10. Categorical data in scatter plots
  11. Exploring with KDE plots

Exploratory data analysis is a crucial step in the data science workflow, but it isn't the end! Now it's time to learn techniques and considerations you can use to successfully move forward with your projects after you've finished exploring!


  1. Considerations for categorical data
  2. Checking for class imbalance
  3. Cross-tabulation
  4. Generating new features
  5. Extracting features for correlation
  6. Calculating salary percentiles
  7. Categorizing salaries
  8. Generating hypotheses
  9. Comparing salaries
  10. Choosing a hypothesis
  11. Recap!


Working with Categorical Data in Python
Being able to understand, use, and summarize non-numerical data—such as a person’s blood type or marital status—is a vital component of being a data scientist. In this course, you’ll learn how to manipulate and visualize categorical data using pandas and seaborn. Through hands-on exercises, you’ll get to grips with pandas' categorical data type, including how to create, delete, and update categorical columns. You’ll also work with a wide range of datasets including the characteristics of adoptable dogs, Las Vegas trip reviews, and census data to develop your skills at working with categorical data.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


Almost every dataset contains categorical information—and often it’s an unexplored goldmine of information. In this module, you’ll learn how pandas handles categorical columns using the data type category. You’ll also discover how to group data by categories to unearth great summary statistics.


  1. Course introduction
  2. Categorical vs. numerical
  3. Exploring a target variable
  4. Ordinal categorical variables
  5. Categorical data in pandas
  6. Setting dtypes and saving memory
  7. Creating a categorical pandas Series
  8. Setting dtype when reading data
  9. Grouping data by category in pandas
  10. Create lots of groups
  11. Setting up a .groupby() statement
  12. Using pandas functions effectively

Now it’s time to learn how to set, add, and remove categories from a Series. You’ll also explore how to update, rename, collapse, and reorder categories, before applying your new skills to clean and access other data within your DataFrame.


  1. Setting category variables
  2. Setting categories
  3. Adding categories
  4. Removing categories
  5. Updating categories
  6. Collapsing categories knowledge check
  7. Renaming categories
  8. Collapsing categories
  9. Reordering categories
  10. Reordering categories in a Series
  11. Using .groupby() after reordering
  12. Cleaning and accessing data
  13. Cleaning variables
  14. Accessing and filtering data

In this module, you’ll use the seaborn Python library to create informative visualizations using categorical data—including categorical plots (cat-plot), box plots, bar plots, point plots, and count plots. You’ll then learn how to visualize categorical columns and split data across categorical columns to visualize summary statistics of numerical columns.


  1. Introduction to categorical plots using Seaborn
  2. Boxplot understanding
  3. Creating a box plot
  4. Seaborn bar plotsp
  5. Creating a bar plot
  6. Ordering categories
  7. Bar plot using hue
  8. Point and count plots
  9. Creating a point plot
  10. Creating a count plot
  11. Review catplot() types
  12. Additional catplot() options
  13. One visualization per group
  14. Updating categorical plots

Lastly, you’ll learn how to overcome the common pitfalls of using categorical data. You’ll also grow your data encoding skills as you are introduced to label encoding and one-hot encoding—perfect for helping you prepare your data for use in machine learning algorithms.


  1. Categorical pitfalls
  2. Memory usage knowledge check
  3. Overcoming pitfalls: string issues
  4. Overcoming pitfalls: using NumPy arrays
  5. Label encoding
  6. Create a label encoding and map
  7. Using saved mappings
  8. Creating a Boolean encoding
  9. One-hot encoding
  10. One-hot knowledge check
  11. One-hot encoding specific columns
  12. Wrap-up!


Data Communication Concepts 
No one enjoys looking at spreadsheets! Bring your data to life. Improve your presentation and learn how to translate technical data into actionable insights.

Learn the Basics of Data Communication
You’ve analyzed your data, run your model, and made your predictions. Now, it's time to bring your data to life! Presenting findings to stakeholders so they can make data-driven decisions is an essential skill for all data scientists. In this course, you’ll learn how to use storytelling to connect with your audience and help them understand the content of your presentation—so they can make the right decisions.

Explore Formats of Data Communication
Through hands-on exercises, you’ll learn the advantages and disadvantages of oral and written formats. You’ll also improve how you translate technical results into compelling stories, using the correct data, visualizations, and in-person presentation techniques. Start learning and improve your data storytelling today.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


Let's start with the importance of data storytelling and the elements you need to tell stories with data. You'll learn best practices to influence how decisions are made before learning how to translate technical results into stories for non-technical stakeholders.


  1. Fundamentals of storytelling
  2. The story begins
  3. Building a story
  4. Translating technical results
  5. A non-tech story
  6. Be aware
  7. Impacting the decision-making process
  8. Is it a true story?
  9. Structured to impact
  10. A story to compare

Deepen your storytelling knowledge. Learn how to avoid common mistakes when telling stories with data by tailoring your presentations to your audience. Then learn best practices for including visualizations and choosing between oral or written formats to make sure your presentations pack a punch!


  1. Selecting the right data
  2. The truth about salaries
  3. Earning interests
  4. Showing relevant statistics
  5. Salary variation
  6. On a payroll
  7. It's not significant
  8. Visualizations for different audiences
  9. Salary development
  10. Salary on demand
  11. Choosing the appropriate format
  12. A communication problem
  13. Should we meet?
  14. When in doubt

Now that you understand how to prepare for communicating findings, it’s time to learn how to structure your reports. You'll also learn the importance of reproducibility (work smarter, not harder) and how to get to the point when describing your findings. You’ll then get to apply all you’ve learned to a real-world use case as you create a compelling report on credit risk.


  1. Types of reports
  2. Something to report
  3. In summary
  4. Reproducibility and references
  5. Replicate me
  6. Same results
  7. Write precise and clear reports
  8. Half-empty glass
  9. Strong words
  10. Case study: report on credit risk
  11. Credit me
  12. Report my credit

You'll finish by learning simple techniques to structure a presentation, communicate insights, and inspire your audience to take action. Lastly, you'll learn how to improve your communication style and prepare to handle questions from your audience.


  1. Planning an oral presentation
  2. Is this the plan?
  3. An effective plan!
  4. Building presentation slides
  5. A color building
  6. Too much text
  7. The right building
  8. Delivering the presentation
  9. Put it into practice
  10. Best practice
  11. Avoiding common errors
  12. The true mistake
  13. Do's and don'ts
  14. Congratulations!


Introduction to Importing Data in Python 
As a data scientist, you will need to clean data, wrangle and munge it, visualize it, build predictive models, and interpret these models. Before you can do so, however, you will need to know how to get data into Python. In this course, you'll learn the many ways to import data into Python: from flat files such as .txt and .csv; from files native to other software such as Excel spreadsheets, Stata, SAS, and MATLAB files; and from relational databases such as SQLite and PostgreSQL.

3 Modules | 4+ Hours | 3 Skills

Course Modules 


In this module, you'll learn how to import data into Python from all types of flat files, which are a simple and prevalent form of data storage. You've previously learned how to use NumPy and pandas—you will learn how to use these packages to import flat files and customize your imports.


  1. Welcome to the course!
  2. Importing entire text files
  3. Importing text files line by line
  4. The importance of flat files in data science
  5. Pop quiz: what exactly are flat files?
  6. Why we like flat files and the Zen of Python
  7. Importing flat files using NumPy
  8. Using NumPy to import flat files
  9. Customizing your NumPy import
  10. Importing different datatypes
  11. Importing flat files using pandas
  12. Using pandas to import flat files as DataFrames (1)
  13. Using pandas to import flat files as DataFrames (2)
  14. Customizing your pandas import
  15. Final thoughts on data import

You've learned how to import flat files, but there are many other file types you will potentially have to work with as a data scientist. In this module, you'll learn how to import data into Python from a wide array of important file types. These include pickled files, Excel spreadsheets, SAS and Stata files, HDF5 files, a file type for storing large quantities of numerical data, and MATLAB files.


  1. Introduction to other file types
  2. Not so flat any more
  3. Loading a pickled file
  4. Listing sheets in Excel files
  5. Importing sheets from Excel files
  6. Customizing your spreadsheet import
  7. Importing SAS/Stata files using pandas
  8. How to import SAS7BDAT
  9. Importing SAS files
  10. Using read_stata to import Stata files
  11. Importing Stata files
  12. Importing HDF5 files
  13. Using File to import HDF5 files
  14. Using h5py to import HDF5 files
  15. Extracting data from your HDF5 file
  16. Importing MATLAB files
  17. Loading .mat files
  18. The structure of .mat in Python

In this module, you'll learn how to extract meaningful data from relational databases, an essential skill for any data scientist. You will learn about relational models, how to create SQL queries, how to filter and order your SQL records, and how to perform advanced queries by joining database tables.


  1. Introduction to relational databases
  2. Pop quiz: The relational model
  3. Creating a database engine in Python
  4. Creating a database engine
  5. What are the tables in the database?
  6. Querying relational databases in Python
  7. The Hello World of SQL Queries!
  8. Customizing the Hello World of SQL Queries
  9. Filtering your database records using SQL's WHERE
  10. Ordering your SQL records with ORDER BY
  11. Querying relational databases directly with pandas
  12. Pandas and The Hello World of SQL Queries!
  13. Pandas for more complex querying
  14. Advanced querying: exploiting table relationships
  15. The power of SQL lies in relationships between tables: INNER JOIN
  16. Filtering your INNER JOIN
  17. Final Thoughts


Cleaning Data in Python 
Discover How to Clean Data in Python
It's commonly said that data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time analyzing it. Data cleaning is an essential step for every data scientist, as analyzing dirty data can lead to inaccurate conclusions.

In this course, you will learn how to identify, diagnose, and treat various data cleaning problems in Python, ranging from simple to advanced. You will deal with improper data types, check that your data is in the correct range, handle missing data, perform record linkage, and more!

Learn How to Clean Different Data Types
The first module of the course explores common data problems and how you can fix them. You will first understand basic data types and how to deal with them individually. After, you'll apply range constraints and remove duplicated data points.

The last module explores record linkage, a powerful tool to merge multiple datasets. You'll learn how to link records by calculating the similarity between strings. Finally, you'll use your new skills to join two restaurant review datasets into one clean master dataset.

Gain Confidence in Cleaning Data
By the end of the course, you will gain the confidence to clean data from various types and use record linkage to merge multiple datasets. Cleaning data is an essential skill for data scientists. If you want to learn more about cleaning data in Python and its applications, check out the following tracks: Data Scientist with Python and Importing & Cleaning Data with Python.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


In this module, you'll learn how to overcome some of the most common dirty data problems. You'll convert data types, apply range constraints to remove future data points, and remove duplicated data points to avoid double-counting.


  1. Data type constraints
  2. Common data types
  3. Numeric data or ... ?
  4. Summing strings and concatenating numbers
  5. Data range constraints
  6. Tire size constraints
  7. Back to the future
  8. Uniqueness constraints
  9. How big is your subset?
  10. Finding duplicates
  11. Treating duplicates

Categorical and text data can often be some of the messiest parts of a dataset due to their unstructured nature. In this module, you’ll learn how to fix whitespace and capitalization inconsistencies in category labels, collapse multiple categories into one, and reformat strings for consistency.


  1. Membership constraints
  2. Members only
  3. Finding consistency
  4. Categorical variables
  5. Categories of errors
  6. Inconsistent categories
  7. Remapping categories
  8. Cleaning text data
  9. Removing titles and taking names
  10. Keeping it descriptive

In this module, you’ll dive into more advanced data cleaning problems, such as ensuring that weights are all written in kilograms instead of pounds. You’ll also gain invaluable skills that will help you verify that values have been added correctly and that missing values don’t negatively impact your analyses.


  1. Uniformity
  2. Ambiguous dates
  3. Uniform currencies
  4. Uniform dates
  5. Cross field validation
  6. Cross field or no cross field?
  7. How's our data integrity?
  8. Completeness
  9. Is this missing at random?
  10. Missing investors
  11. Follow the money

Record linkage is a powerful technique used to merge multiple datasets together, used when values have typos or different spellings. In this module, you'll learn how to link records by calculating the similarity between strings—you’ll then use your new skills to join two restaurant review datasets into one clean master dataset.


  1. Comparing strings
  2. Minimum edit distance
  3. The cutoff point
  4. Remapping categories II
  5. Generating pairs
  6. To link or not to link?
  7. Pairs of restaurants
  8. Similar restaurants
  9. Linking DataFrames
  10. Getting the right index
  11. Linking them together!
  12. Course Wrap up!


Working with Dates and Times in Python

You'll probably never have a time machine, but how about a machine for analyzing time? As soon as time enters any analysis, things can get weird. It's easy to get tripped up on day and month boundaries, time zones, daylight saving time, and all sorts of other things that can confuse the unprepared. If you're going to do any kind of analysis involving time, you’ll want to use Python to sort it out. Working with data sets on hurricanes and bike trips, we’ll cover counting events, figuring out how much time has elapsed between events and plotting data over time. You'll work in both standard Python and in Pandas, and we'll touch on the dateutil library, the only timezone library endorsed by the official Python documentation. After this course, you'll confidently handle date and time data in any format like a champion.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


Hurricanes (also known as cyclones or typhoons) hit the U.S. state of Florida several times per year. To start off this course, you'll learn how to work with date objects in Python, starting with the dates of every hurricane to hit Florida since 1950. You'll learn how Python handles dates, common date operations, and the right way to format dates to avoid confusion.


  1. Dates in Python
  2. Which day of the week?
  3. How many hurricanes come early?
  4. Math with dates
  5. Subtracting dates
  6. Counting events per calendar month
  7. Putting a list of dates in order
  8. Turning dates into strings
  9. Printing dates in a friendly format
  10. Representing dates in different ways

Bike sharing programs have swept through cities around the world -- and luckily for us, every trip gets recorded! Working with all of the comings and goings of one bike in Washington, D.C., you'll practice working with dates and times together. You'll parse dates and times from text, analyze peak trip times, calculate ride durations, and more.


  1. Dates and times
  2. Creating datetimes by hand
  3. Counting events before and after noon
  4. Printing and parsing datetimes
  5. Turning strings into datetimes
  6. Parsing pairs of strings as datetimes
  7. Recreating ISO format with strftime()
  8. Unix timestamps
  9. Working with durations
  10. Turning pairs of datetimes into durations
  11. Average trip time
  12. The long and the short of why time is hard

In this module, you'll learn to confidently tackle the time-related topic that causes people the most trouble: time zones and daylight saving. Continuing with our bike data, you'll learn how to compare clocks around the world, how to gracefully handle "spring forward" and "fall back," and how to get up-to-date timezone data from the dateutil library.


  1. UTC offsets
  2. Creating timezone aware datetimes
  3. Setting timezones
  4. What time did the bike leave in UTC?
  5. Time zone database
  6. Putting the bike trips into the right time zone
  7. What time did the bike leave? (Global edition)
  8. Starting daylight saving time
  9. How many hours elapsed around daylight saving?
  10. March 29, throughout a decade
  11. Ending daylight saving time
  12. Finding ambiguous datetimes
  13. Cleaning daylight saving data with fold

To conclude this course, you'll apply everything you've learned about working with dates and times in standard Python to working with dates and times in Pandas. With additional information about each bike ride, such as what station it started and stopped at and whether or not the rider had a yearly membership, you'll be able to dig much more deeply into the bike trip data. In this module, you'll cover powerful Pandas operations, such as grouping and plotting results by time.


  1. Reading date and time data in Pandas
  2. Loading a csv file in Pandas
  3. Making timedelta columns
  4. Summarizing datetime data in Pandas
  5. How many joyrides?
  6. It's getting cold outside, W20529
  7. Members vs casual riders over time
  8. Combining groupby() and resample()
  9. Additional datetime methods in Pandas
  10. Timezones in Pandas
  11. How long per weekday?
  12. How long between rides?
  13. Wrap-up


Writing Functions in Python

You've done your analysis, built your report, and trained a model. What's next? Well, if you want to deploy your model into production, your code will need to be more reliable than exploratory scripts in a Jupyter notebook. Writing Functions in Python will give you a strong foundation in writing complex and beautiful functions so that you can contribute research and engineering skills to your team. You'll learn useful tricks, like how to write context managers and decorators. You'll also learn best practices around how to write maintainable reusable functions with good documentation. They say that people who can do good research and write high-quality code are unicorns. Take this course and discover the magic.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


The goal of this course is to transform you into a Python expert, and so the first module starts off with best practices when writing functions. You'll cover docstrings and why they matter and how to know when you need to turn a chunk of code into a function. You will also learn the details of how Python passes arguments to functions, as well as some common gotchas that can cause debugging headaches when calling functions.


  1. Docstrings
  2. Crafting a docstring
  3. Retrieving docstrings
  4. Docstrings to the rescue!
  5. DRY and "Do One Thing"
  6. Extract a function
  7. Split up a function
  8. Pass by assignment
  9. Mutable or immutable?
  10. Best practice for default arguments

If you've ever seen the "with" keyword in Python and wondered what its deal was, then this is the module for you! Context managers are a convenient way to provide connections in Python and guarantee that those connections get cleaned up when you are done using them. This module will show you how to use context managers, as well as how to write your own.


  1. Using context managers
  2. The number of cats
  3. The speed of cats
  4. Writing context managers
  5. The timer() context manager
  6. A read-only open() context manager
  7. Advanced topics
  8. Context manager use cases
  9. Scraping the NASDAQ
  10. Changing the working directory

Decorators are an extremely powerful concept in Python. They allow you to modify the behavior of a function without changing the code of the function itself. This module will lay the foundational concepts needed to thoroughly understand decorators (functions as objects, scope, and closures), and give you a good introduction into how decorators are used and defined. This deep dive into Python internals will set you up to be a superstar Pythonista.


  1. Functions are objects
  2. Building a command line data app
  3. Reviewing your co-worker's code
  4. Returning functions for a math game
  5. Scope
  6. Understanding scope
  7. Modifying variables outside local scope
  8. Closures
  9. Checking for closure
  10. Closures keep your values safe
  11. Decorators
  12. Using decorator syntax
  13. Defining a decorator

Now that you understand how decorators work under the hood, this module gives you a bunch of real-world examples of when and how you would write decorators in your own code. You will also learn advanced decorator concepts like how to preserve the metadata of your decorated functions and how to write decorators that take arguments.


  1. Real-world examples
  2. Print the return type
  3. Counter
  4. Decorators and metadata
  5. Preserving docstrings when decorating functions
  6. Measuring decorator overhead
  7. Decorators that take arguments
  8. Run_n_times()
  9. HTML Generator
  10. Timeout(): a real world example
  11. Tag your functions
  12. Check the return type
  13. Great job!


Introduction to Regression with statsmodels in Python

Use Python statsmodels For Linear and Logistic Regression
Linear regression and logistic regression are two of the most widely used statistical models. They act like master keys, unlocking the secrets hidden in your data. In this course, you’ll gain the skills to fit simple linear and logistic regressions.

Through hands-on exercises, you’ll explore the relationships between variables in real-world datasets, including motor insurance claims, Taiwan house prices, fish sizes, and more.

Discover How to Make Predictions and Assess Model Fit
You’ll start this 4-hour course by learning what regression is and how linear and logistic regression differ, learning how to apply both. Next, you’ll learn how to use linear regression models to make predictions on data while also understanding model objects.

As you progress, you’ll learn how to assess the fit of your model, and how to know how well your linear regression model fits. Finally, you’ll dig deeper into logistic regression models to make predictions on real data.

Learn the Basics of Python Regression Analysis
By the end of this course, you’ll know how to make predictions from your data, quantify model performance, and diagnose problems with model fit. You’ll understand how to use Python statsmodels for regression analysis and be able to apply the skills to real-life data sets.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


You’ll learn the basics of this popular statistical model, what regression is, and how linear and logistic regressions differ. You’ll then learn how to fit simple linear regression models with numeric and categorical explanatory variables, and how to describe the relationship between the response and explanatory variables using model coefficients.


  1. A tale of two variables
  2. Which one is the response variable?
  3. Visualizing two numeric variables
  4. Fitting a linear regression
  5. Estimate the intercept
  6. Estimate the slope
  7. Linear regression with ols()
  8. Categorical explanatory variables
  9. Visualizing numeric vs. categorical
  10. Calculating means by category
  11. Linear regression with a categorical explanatory variable

In this module, you’ll discover how to use linear regression models to make predictions on Taiwanese house prices and Facebook advert clicks. You’ll also grow your regression skills as you get hands-on with model objects, understand the concept of "regression to the mean", and learn how to transform variables in a dataset.


  1. Making predictions
  2. Predicting house prices
  3. Visualizing predictions
  4. The limits of prediction
  5. Working with model objects
  6. Extracting model elements
  7. Manually predicting house prices
  8. Regression to the mean
  9. Home run!
  10. Plotting consecutive portfolio returns
  11. Modeling consecutive returns
  12. Transforming variables
  13. Transforming the explanatory variable
  14. Transforming the response variable too
  15. Back transformation

In this module, you’ll learn how to ask questions of your model to assess fit. You’ll learn how to quantify how well a linear regression model fits, diagnose model problems using visualizations, and understand each observation's leverage and influence to create the model.


  1. Quantifying model fit
  2. Coefficient of determination
  3. Residual standard error
  4. Visualizing model fit
  5. Residuals vs. fitted values
  6. Q-Q plot of residuals
  7. Scale-location
  8. Drawing diagnostic plots
  9. Outliers, leverage, and influence
  10. Leverage
  11. Influence
  12. Extracting leverage and influence

Learn to fit logistic regression models. Using real-world data, you’ll predict the likelihood of a customer closing their bank account as probabilities of success and odds ratios, and quantify model performance using confusion matrices.


  1. Why you need logistic regression
  2. Exploring the explanatory variables
  3. Visualizing linear and logistic models
  4. Logistic regression with logit()
  5. Predictions and odds ratios
  6. Probabilities
  7. Most likely outcome
  8. Odds ratio
  9. Log odds ratio
  10. Quantifying logistic regression fit
  11. Calculating the confusion matrix
  12. Drawing a mosaic plot of the confusion matrix
  13. Accuracy, sensitivity, specificity
  14. Measuring logistic model performance
  15. Recap


Sampling in Python
Sampling in Python is the cornerstone of inference statistics and hypothesis testing. It's a powerful skill used in survey analysis and experimental design to draw conclusions without surveying an entire population. In this Sampling in Python course, you’ll discover when to use sampling and how to perform common types of sampling—from simple random sampling to more complex methods like stratified and cluster sampling. Using real-world datasets, including coffee ratings, Spotify songs, and employee attrition, you’ll learn to estimate population statistics and quantify uncertainty in your estimates by generating sampling distributions and bootstrap distributions.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


Learn what sampling is and why it is so powerful. You’ll also learn about the problems caused by convenience sampling and the differences between true randomness and pseudo-randomness.


  1. Sampling and point estimates
  2. Reasons for sampling
  3. Simple sampling with pandas
  4. Simple sampling and calculating with NumPy
  5. Convenience sampling
  6. Are findings from the sample generalizable?
  7. Are these findings generalizable?
  8. Pseudo-random number generation
  9. Generating random numbers
  10. Understanding random seeds

It’s time to get hands-on and perform the four random sampling methods in Python: simple, systematic, stratified, and cluster.


  1. Simple random and systematic sampling
  2. Simple random sampling
  3. Systematic sampling
  4. Is systematic sampling OK?
  5. Stratified and weighted random sampling
  6. Which sampling method?
  7. Proportional stratified sampling
  8. Equal counts stratified sampling
  9. Weighted sampling
  10. Cluster sampling
  11. Benefits of clustering
  12. Performing cluster sampling
  13. Comparing sampling methods
  14. 3 kinds of sampling
  15. Comparing point estimates

Let’s test your sampling. In this module, you’ll discover how to quantify the accuracy of sample statistics using relative errors, and measure variation in your estimates by generating sampling distributions.


  1. Relative error of point estimates
  2. Calculating relative errors
  3. Relative error vs. sample size
  4. Creating a sampling distribution
  5. Replicating samples
  6. Replication parameters
  7. Approximate sampling distributions
  8. Exact sampling distribution
  9. Generating an approximate sampling distribution
  10. Exact vs. approximate
  11. Standard errors and the Central Limit Theorem
  12. Population & sampling distribution means
  13. Population & sampling distribution variation

You’ll get to grips with resampling to perform bootstrapping and estimate variation in an unknown population. You’ll learn the difference between sampling distributions and bootstrap distributions using resampling.


  1. Introduction to bootstrapping
  2. Principles of bootstrapping
  3. With or without replacement?
  4. Generating a bootstrap distribution
  5. Comparing sampling and bootstrap distributions
  6. Bootstrap statistics and population statistics
  7. Sampling distribution vs. bootstrap distribution
  8. Compare sampling and bootstrap means
  9. Compare sampling and bootstrap standard deviations
  10. Confidence intervals
  11. Confidence interval interpretation
  12. Calculating confidence intervals
  13. Recap!


Hypothesis Testing in Python
Hypothesis testing lets you answer questions about your datasets in a statistically rigorous way. In this course, you'll grow your Python analytical skills as you learn how and when to use common tests like t-tests, proportion tests, and chi-square tests. Working with real-world data, including Stack Overflow user feedback and supply-chain data for medical supply shipments, you'll gain a deep understanding of how these tests work and the key assumptions that underpin them. You'll also discover how non-parametric tests can be used to go beyond the limitations of traditional hypothesis tests.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


How does hypothesis testing work and what problems can it solve? To find out, you’ll walk through the workflow for a one sample proportion test. In doing so, you'll encounter important concepts like z-scores, p-values, and false negative and false positive errors.


  1. Hypothesis tests and z-scores
  2. Uses of A/B testing
  3. Calculating the sample mean
  4. Calculating a z-score
  5. p-values
  6. Criminal trials and hypothesis tests
  7. Left tail, right tail, two tails
  8. Calculating p-values
  9. Statistical significance
  10. Decisions from p-values
  11. Calculating a confidence interval
  12. Type I and type II errors

In this module, you’ll learn how to test for differences in means between two groups using t-tests and extend this to more than two groups using ANOVA and pairwise t-tests.


  1. Performing t-tests
  2. Hypothesis testing workflow
  3. Two sample mean test statistic
  4. Calculating p-values from t-statistics
  5. Why is t needed?
  6. The t-distribution
  7. From t to p
  8. Paired t-tests
  9. Is pairing needed?
  10. Visualizing the difference
  11. Using ttest()
  12. ANOVA tests
  13. Visualizing many categories
  14. Conducting an ANOVA test
  15. Pairwise t-tests

Now it’s time to test for differences in proportions between two groups using proportion tests. Through hands-on exercises, you’ll extend your proportion tests to more than two groups with chi-square independence tests, and return to the one sample case with chi-square goodness of fit tests.


  1. One-sample proportion tests
  2. t for proportions?
  3. Test for single proportions
  4. Two-sample proportion tests
  5. Test of two proportions
  6. proportions_ztest() for two samples
  7. Chi-square test of independence
  8. The chi-square distribution
  9. How many tails for chi-square tests?
  10. Performing a chi-square test
  11. Chi-square goodness of fit tests
  12. Visualizing goodness of fit
  13. Performing a goodness of fit test

Finally, it’s time to learn about the assumptions made by parametric hypothesis tests, and see how non-parametric tests can be used when those assumptions aren't met.


  1. Assumptions in hypothesis testing
  2. Common assumptions of hypothesis tests
  3. Testing sample size
  4. Non-parametric tests
  5. Which parametric test?
  6. Wilcoxon signed-rank test
  7. Non-parametric ANOVA and unpaired t-tests
  8. Wilcoxon-Mann-Whitney
  9. Kruskal-Wallis
  10. Recap!


Experimental Design in Python
Implement Experimental Design Setups
Learn how to implement the most appropriate experimental design setup for your use case. Learn about how randomized block designs and factorial designs can be implemented to measure treatment effects and draw valid and precise conclusions.

Conduct Statistical Analyses on Experimental Data
Deep-dive into performing statistical analyses on experimental data, including selecting and conducting statistical tests, including t-tests, ANOVA tests, and chi-square tests of association. Conduct post-hoc analysis following ANOVA tests to discover precisely which pairwise comparisons are significantly different.

Conduct Power Analysis
Learn to measure the effect size to determine the amount by which groups differ, beyond being significantly different. Conduct a power analysis using an assumed effect size to determine the minimum sample size required to obtain a required statistical power. Use Cohen's d formulation to measure the effect size for some sample data, and test whether the effect size assumptions used in the power analysis were accurate.

Address Complexities in Experimental Data
Extract insights from complex experimental data and learn best practices for communicating findings to different stakeholders. Address complexities such as interactions, heteroscedasticity, and confounding in experimental data to improve the validity of your conclusions. When data doesn't meet the assumptions of parametric tests, you'll learn to choose and implement an appropriate nonparametric test.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


Building knowledge in experimental design allows you to test hypotheses with best-practice analytical tools and quantify the risk of your work. You’ll begin your journey by setting the foundations of what experimental design is and different experimental design setups such as blocking and stratification. You’ll then learn and apply visual and analytical tests for normality in experimental data.


  1. Setting up experiments
  2. Non-random assignment of subjects
  3. Random assignment of subjects
  4. Experimental data setup
  5. Blocking experimental data
  6. Stratifying an experiment
  7. Which was stratified?
  8. Normal data
  9. Visual normality in an agricultural experiment
  10. Analytical normality in an agricultural experiment

You'll delve into sophisticated experimental design techniques, focusing on factorial designs, randomized block designs, and covariate adjustments. These methodologies are instrumental in enhancing the accuracy, efficiency, and interpretability of experimental results. Through a combination of theoretical insights and practical applications, you'll acquire the skills needed to design, implement, and analyze complex experiments in various fields of research.


  1. Factorial designs: principles and applications
  2. Understanding marketing campaign effectiveness
  3. Heatmap of campaign interactions
  4. Factorial designs and randomized block designs
  5. Randomized block design: controlling variance
  6. Implementing a randomized block design
  7. Visualizing productivity within blocks by incentive
  8. ANOVA within blocks of employees
  9. Covariate adjustment in experimental design
  10. Importance of covariates
  11. Covariate adjustment with chick growth

Master statistical tests like t-tests, ANOVA, and Chi-Square, and dive deep into post-hoc analyses and power analysis essentials. Learn to select the right test, interpret p-values and errors, and skillfully conduct power analysis to determine sample and effect sizes, all while leveraging Python's powerful libraries to bring your data insights to life.


  1. Choosing the right statistical test
  2. Choosing the right test: petrochemicals
  3. Choosing the right test: human resources
  4. Choosing the right test: finance
  5. Post-hoc analysis following ANOVA
  6. Anxiety treatments ANOVA
  7. Applying Tukey's HSD
  8. Applying Bonferoni correction
  9. P-values, alpha, and errors
  10. Analyzing toy durability
  11. Visualizing durability differences
  12. Role of significance levels
  13. Power analysis: sample and effect size
  14. Effect size purpose
  15. Estimating required sample size for energy study

Hop into the complexities of experimental data analysis. Learn to synthesize insights using pandas, address data issues like heteroscedasticity with scipy.stats, and apply nonparametric tests like Mann-Whitney U. Learn additional techniques for transforming, visualizing, and interpreting complex data, enhancing your ability to conduct robust analyses in various experimental settings.


  1. Synthesizing insights from complex experiments
  2. Visualizing loan approval yield
  3. Exploring customer satisfaction
  4. Effectively communicating experimental data
  5. Addressing complexities in experimental data
  6. Check for heteroscedasticity in shelf life
  7. Exploring and transforming shelf life data
  8. Applying nonparametric tests in experimental analysis
  9. Visualizing and testing preservation methods
  10. Further analyzing food preservation techniques
  11. Recap!


Supervised Learning with scikit-learn
Grow your machine learning skills with scikit-learn and discover how to use this popular Python library to train models using labeled data. In this course, you'll learn how to make powerful predictions, such as whether a customer is will churn from your business, whether an individual has diabetes, and even how to tell classify the genre of a song. Using real-world datasets, you'll find out how to build predictive models, tune their parameters, and determine how well they will perform with unseen data.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


In this module, you'll be introduced to classification problems and learn how to solve them using supervised learning techniques. You'll learn how to split data into training and test sets, fit a model, make predictions, and evaluate accuracy. You’ll discover the relationship between model complexity and performance, applying what you learn to a churn dataset, where you will classify the churn status of a telecom company's customers.


  1. Machine learning with scikit-learn
  2. Binary classification
  3. The supervised learning workflow
  4. The classification challenge
  5. k-Nearest Neighbors: Fit
  6. k-Nearest Neighbors: Predict
  7. Measuring model performance
  8. Train/test split + computing accuracy
  9. Overfitting and underfitting
  10. Visualizing model complexity

In this module, you will be introduced to regression, and build models to predict sales values using a dataset on advertising expenditure. You will learn about the mechanics of linear regression and common performance metrics such as R-squared and root mean squared error. You will perform k-fold cross-validation, and apply regularization to regression models to reduce the risk of overfitting.


  1. Introduction to regression
  2. Creating features
  3. Building a linear regression model
  4. Visualizing a linear regression model
  5. The basics of linear regression
  6. Fit and predict for regression
  7. Regression performance
  8. Cross-validation
  9. Cross-validation for R-squared
  10. Analyzing cross-validation metrics
  11. Regularized regression
  12. Regularized regression: Ridge
  13. Lasso regression for feature importance

Having trained models, now you will learn how to evaluate them. In this module, you will be introduced to several metrics along with a visualization technique for analyzing classification model performance using scikit-learn. You will also learn how to optimize classification and regression models through the use of hyperparameter tuning.


  1. How good is your model?
  2. Deciding on a primary metric
  3. Assessing a diabetes prediction classifier
  4. Logistic regression and the ROC curve
  5. Building a logistic regression model
  6. The ROC curve
  7. ROC AUC
  8. Hyperparameter tuning
  9. Hyperparameter tuning with GridSearchCV
  10. Hyperparameter tuning with RandomizedSearchCV

Learn how to impute missing values, convert categorical data to numeric values, scale data, evaluate multiple supervised learning models simultaneously, and build pipelines to streamline your workflow!


  1. Preprocessing data
  2. Creating dummy variables
  3. Regression with categorical features
  4. Handling missing data
  5. Dropping missing data
  6. Pipeline for song genre prediction: I
  7. Pipeline for song genre prediction: II
  8. Centering and scaling
  9. Centering and scaling for regression
  10. Centering and scaling for classification
  11. Evaluating multiple models
  12. Visualizing regression model performance
  13. Predicting on the test set
  14. Visualizing classification model performance
  15. Pipeline for predicting song popularity
  16. Wrap-up


Unsupervised Learning in Python

Say you have a collection of customers with a variety of characteristics such as age, location, and financial history, and you wish to discover patterns and sort them into clusters. Or perhaps you have a set of texts, such as Wikipedia pages, and you wish to segment them into categories based on their content. This is the world of unsupervised learning, called as such because you are not guiding, or supervising, the pattern discovery by some prediction task, but instead uncovering hidden structure from unlabeled data. Unsupervised learning encompasses a variety of techniques in machine learning, from clustering to dimension reduction to matrix factorization. In this course, you'll learn the fundamentals of unsupervised learning and implement the essential algorithms using scikit-learn and SciPy. You will learn how to cluster, transform, visualize, and extract insights from unlabeled datasets, and end the course by building a recommender system to recommend popular musical artists.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


Learn how to discover the underlying groups (or "clusters") in a dataset. By the end of this module, you'll be clustering companies using their stock market prices, and distinguishing different species by clustering their measurements.


  1. Unsupervised Learning
  2. How many clusters?
  3. Clustering 2D points
  4. Inspect your clustering
  5. Evaluating a clustering
  6. How many clusters of grain?
  7. Evaluating the grain clustering
  8. Transforming features for better clusterings
  9. Scaling fish data for clustering
  10. Clustering the fish data
  11. Clustering stocks using KMeans
  12. Which stocks move together?

In this module, you'll learn about two unsupervised learning techniques for data visualization, hierarchical clustering and t-SNE. Hierarchical clustering merges the data samples into ever-coarser clusters, yielding a tree visualization of the resulting cluster hierarchy. t-SNE maps the data samples into 2d space so that the proximity of the samples to one another can be visualized.


  1. Visualizing hierarchies
  2. How many merges?
  3. Hierarchical clustering of the grain data
  4. Hierarchies of stocks
  5. Cluster labels in hierarchical clustering
  6. Which clusters are closest?
  7. Different linkage, different hierarchical clustering!
  8. Intermediate clusterings
  9. Extracting the cluster labels
  10. t-SNE for 2-dimensional maps
  11. t-SNE visualization of grain dataset
  12. A t-SNE map of the stock market

Dimension reduction summarizes a dataset using its common occuring patterns. In this module, you'll learn about the most fundamental of dimension reduction techniques, "Principal Component Analysis" ("PCA"). PCA is often used before supervised learning to improve model performance and generalization. It can also be useful for unsupervised learning. For example, you'll employ a variant of PCA will allow you to cluster Wikipedia articles by their content!


  1. Visualizing the PCA transformation
  2. Correlated data in nature
  3. Decorrelating the grain measurements with PCA
  4. Principal components
  5. Intrinsic dimension
  6. The first principal component
  7. Variance of the PCA features
  8. Intrinsic dimension of the fish data
  9. Dimension reduction with PCA
  10. Dimension reduction of the fish measurements
  11. A tf-idf word-frequency array
  12. Clustering Wikipedia part I
  13. Clustering Wikipedia part II

In this module, you'll learn about a dimension reduction technique called "Non-negative matrix factorization" ("NMF") that expresses samples as combinations of interpretable parts. For example, it expresses documents as combinations of topics, and images in terms of commonly occurring visual patterns. You'll also learn to use NMF to build recommender systems that can find you similar articles to read, or musical artists that match your listening history!


  1. Non-negative matrix factorization (NMF)
  2. Non-negative data
  3. NMF applied to Wikipedia articles
  4. NMF features of the Wikipedia articles
  5. NMF reconstructs samples
  6. NMF learns interpretable parts
  7. NMF learns topics of documents
  8. Explore the LED digits dataset
  9. NMF learns the parts of images
  10. PCA doesn't learn parts
  11. Building recommender systems using NMF
  12. Which articles are similar to 'Cristiano Ronaldo'?
  13. Recommend musical artists part I
  14. Recommend musical artists part II
  15. Final thoughts


Machine Learning with Tree-Based Models in Python

Decision trees are supervised learning models used for problems involving classification and regression. Tree models present a high flexibility that comes at a price: on one hand, trees are able to capture complex non-linear relationships; on the other hand, they are prone to memorizing the noise present in a dataset. By aggregating the predictions of trees that are trained differently, ensemble methods take advantage of the flexibility of trees while reducing their tendency to memorize noise. Ensemble methods are used across a variety of fields and have a proven track record of winning many machine learning competitions. In this course, you'll learn how to use Python to train decision trees and tree-based models with the user-friendly scikit-learn machine learning library. You'll understand the advantages and shortcomings of trees and demonstrate how ensembling can alleviate these shortcomings, all while practicing on real-world datasets. Finally, you'll also understand how to tune the most influential hyperparameters in order to get the most out of your models.

5 Modules | 6+ Hours | 5 Skills

Course Modules 


Classification and Regression Trees (CART) are a set of supervised learning models used for problems involving classification and regression. In this module, you'll be introduced to the CART algorithm.


  1. Decision tree for classification
  2. Train your first classification tree
  3. Evaluate the classification tree
  4. Logistic regression vs classification tree
  5. Classification tree Learning
  6. Growing a classification tree
  7. Using entropy as a criterion
  8. Entropy vs Gini index
  9. Decision tree for regression
  10. Train your first regression tree
  11. Evaluate the regression tree
  12. Linear regression vs regression tree

The bias-variance tradeoff is one of the fundamental concepts in supervised machine learning. In this module, you'll understand how to diagnose the problems of overfitting and underfitting. You'll also be introduced to the concept of ensembling where the predictions of several models are aggregated to produce predictions that are more robust.


  1. Generalization Error
  2. Complexity, bias and variance
  3. Overfitting and underfitting
  4. Diagnose bias and variance problems
  5. Instantiate the model
  6. Evaluate the 10-fold CV error
  7. Evaluate the training error
  8. High bias or high variance?
  9. Ensemble Learning
  10. Define the ensemble
  11. Evaluate individual classifiers
  12. Better performance with a Voting Classifier

Bagging is an ensemble method involving training the same algorithm many times using different subsets sampled from the training data. In this module, you'll understand how bagging can be used to create a tree ensemble. You'll also learn how the random forests algorithm can lead to further ensemble diversity through randomization at the level of each split in the trees forming the ensemble.


  1. Bagging
  2. Define the bagging classifier
  3. Evaluate Bagging performance
  4. Out of Bag Evaluation
  5. Prepare the ground
  6. OOB Score vs Test Set Score
  7. Random Forests (RF)
  8. Train an RF regressor
  9. Evaluate the RF regressor
  10. Visualizing features importance

Boosting refers to an ensemble method in which several models are trained sequentially with each model learning from the errors of its predecessors. In this module, you'll be introduced to the two boosting methods of AdaBoost and Gradient Boosting.


  1. Adaboost
  2. Define the AdaBoost classifier
  3. Train the AdaBoost classifier
  4. Evaluate the AdaBoost classifier
  5. Gradient Boosting (GB)
  6. Define the GB regressor
  7. Train the GB regressor
  8. Evaluate the GB regressor
  9. Stochastic Gradient Boosting (SGB)
  10. Regression with SGB
  11. Train the SGB regressor
  12. Evaluate the SGB regressor

The hyperparameters of a machine learning model are parameters that are not learned from data. They should be set prior to fitting the model to the training set. In this module, you'll learn how to tune the hyperparameters of a tree-based model using grid search cross validation.


  1. Tuning a CART's Hyperparameters
  2. Tree hyperparameters
  3. Set the tree's hyperparameter grid
  4. Search for the optimal tree
  5. Evaluate the optimal tree
  6. Tuning a RF's Hyperparameters
  7. Random forests hyperparameters
  8. Set the hyperparameter grid of RF
  9. Search for the optimal forest
  10. Evaluate the optimal forest
  11. Congratulations!


Intermediate Importing Data in Python

As a data scientist, you will need to clean data, wrangle and munge it, visualize it, build predictive models and interpret these models. Before you can do so, however, you will need to know how to get data into Python. In the prequel to this course, you learned many ways to import data into Python: from flat files such as .txt and .csv; from files native to other software such as Excel spreadsheets, Stata, SAS, and MATLAB files; and from relational databases such as SQLite and PostgreSQL. In this course, you'll extend this knowledge base by learning to import data from the web and by pulling data from Application Programming Interfaces— APIs—such as the Twitter streaming API, which allows us to stream real-time tweets.

3 Modules | 4+ Hours | 3 Skills

Course Modules 


The web is a rich source of data from which you can extract various types of insights and findings. In this module, you will learn how to get data from the web, whether it is stored in files or in HTML. You'll also learn the basics of scraping and parsing web data.


  1. Importing flat files from the web
  2. Importing flat files from the web: your turn!
  3. Opening and reading flat files from the web
  4. Importing non-flat files from the web
  5. HTTP requests to import files from the web
  6. Performing HTTP requests in Python using urllib
  7. Printing HTTP request results in Python using urllib
  8. Performing HTTP requests in Python using requests
  9. Scraping the web in Python
  10. Parsing HTML with BeautifulSoup
  11. Turning a webpage into data using BeautifulSoup: getting the text
  12. Turning a webpage into data using BeautifulSoup: getting the hyperlinks

In this module, you will gain a deeper understanding of how to import data from the web. You will learn the basics of extracting data from APIs, gain insight on the importance of APIs, and practice extracting data by diving into the OMDB and Library of Congress APIs.


  1. Introduction to APIs and JSONs
  2. Pop quiz: What exactly is a JSON?
  3. Loading and exploring a JSON
  4. Pop quiz: Exploring your JSON
  5. APIs and interacting with the world wide web
  6. Pop quiz: What's an API?
  7. API requests
  8. JSON–from the web to Python
  9. Checking out the Wikipedia A

In this module, you will consolidate your knowledge of interacting with APIs in a deep dive into the Twitter streaming API. You'll learn how to stream real-time Twitter data, and how to analyze and visualize it.


  1. The Twitter API and Authentication
  2. Streaming tweets
  3. Load and explore your Twitter data
  4. Twitter data to DataFrame
  5. A little bit of Twitter text analysis
  6. Plotting your Twitter data
  7. Final Thoughts


Preprocessing for Machine Learning in Python
This course covers the basics of how and when to perform data preprocessing. This essential step in any machine learning project is when you get your data ready for modeling. Between importing and cleaning your data and fitting your machine learning model is when preprocessing comes into play. You'll learn how to standardize your data so that it's in the right form for your model, create new features to best leverage the information in your dataset, and select the best features to improve your model fit. Finally, you'll have some practice preprocessing by getting a dataset on UFO sightings ready for modeling.

5 Modules | 6+ Hours | 5 Skills

Course Modules 


In this module you'll learn exactly what it means to preprocess data. You'll take the first steps in any preprocessing journey, including exploring data types and dealing with missing data.


  1. Introduction to preprocessing
  2. Exploring missing data
  3. Dropping missing data
  4. Working with data types
  5. Exploring data types
  6. Converting a column type
  7. Training and test sets
  8. Class imbalance
  9. Stratified sampling

This module is all about standardizing data. Often a model will make some assumptions about the distribution or scale of your features. Standardization is a way to make your data fit these assumptions and improve the algorithm's performance.


  1. Standardization
  2. When to standardize
  3. Modeling without normalizing
  4. Log normalization
  5. Checking the variance
  6. Log normalization in Python
  7. Scaling data for feature comparison
  8. Scaling data - investigating columns
  9. Scaling data - standardizing columns
  10. Standardized data and modeling
  11. KNN on non-scaled data
  12. KNN on scaled data

In this section you'll learn about feature engineering. You'll explore different ways to create new, more useful, features from the ones already in your dataset. You'll see how to encode, aggregate, and extract information from both numerical and textual features.


  1. Feature engineering
  2. Feature engineering knowledge test
  3. Identifying areas for feature engineering
  4. Encoding categorical variables
  5. Encoding categorical variables - binary
  6. Encoding categorical variables - one-hot
  7. Engineering numerical features
  8. Aggregating numerical features
  9. Extracting datetime components
  10. Engineering text features
  11. Extracting string patterns
  12. Vectorizing text
  13. Text classification using tf/idf vectors

This module goes over a few different techniques for selecting the most important features from your dataset. You'll learn how to drop redundant features, work with text vectors, and reduce the number of features in your dataset using principal component analysis (PCA).


  1. Feature selection
  2. When to use feature selection
  3. Identifying areas for feature selection
  4. Removing redundant features
  5. Selecting relevant features
  6. Checking for correlated features
  7. Selecting features using text vectors
  8. Exploring text vectors, part 1
  9. Exploring text vectors, part 2
  10. Training Naive Bayes with feature selection
  11. Dimensionality reduction
  12. Using PCA
  13. Training a model with PCA

Now that you've learned all about preprocessing you'll try these techniques out on a dataset that records information on UFO sightings.


  1. UFOs and preprocessing
  2. Checking column types
  3. Dropping missing data
  4. Categorical variables and standardization
  5. Extracting numbers from strings
  6. Identifying features for standardization
  7. Engineering new features
  8. Encoding categorical variables
  9. Features from dates
  10. Text vectorization
  11. Feature selection and modeling
  12. Selecting the ideal dataset
  13. Modeling the UFO dataset, part 1
  14. Modeling the UFO dataset, part 2
  15. Recap!


Developing Python Packages
Do you find yourself copying and pasting the same code between files, wishing it was easier to reuse and share your awesome snippets? Wrapping your code into Python packages can help! In this course, you’ll learn about package structure and the extra files needed to turn loose code into convenient packages. You'll also learn about import structure, documentation, and how to maintain code style using flake8. You’ll then speed up your package development by building templates, using cookiecutter to create package skeletons. Finally, you'll learn how to use setuptools and twine to build and publish your packages to PyPI—the world stage for Python packages!

4 Modules | 5+ Hours | 4+ Skills

Course Modules 


Get your package started by converting scripts you have already written. You'll create a simple package which you can use on your own computer.


  1. Starting a package
  2. Modules, packages and subpackages
  3. From script to package
  4. Putting your package to work
  5. Documentation
  6. Writing function documentation with pyment
  7. Writing function documentation with pyment II
  8. Package and module documentation
  9. Structuring imports
  10. Sibling imports
  11. Importing from parents
  12. Exposing functions to users

Make your package installable for yourself and others. In this module, you'll learn to deal with dependencies, write READMEs, and include licenses. You'll also complete all the steps to publish your package on PyPI—the main home of Python packages.


  1. Installing your own package
  2. Adding the setup script
  3. Installing your package locally
  4. Utilizing editable installs
  5. Dealing with dependencies
  6. User dependencies
  7. Development dependencies
  8. Including licences and writing READMEs
  9. Writing a README
  10. MANIFEST - Including extra files with your package
  11. Publishing your package
  12. Building a distribution
  13. Uploading distributions

Bring your package up to a professional standard. Discover how to use pytest to guard against errors, tox to test if your package functions with multiple versions of Python, and flake8 to maintain great code style.


  1. Testing your package
  2. Creating the test directory
  3. Writing some basic tests
  4. Running your tests
  5. Testing your package with different environments
  6. Setting up tox
  7. Running tox
  8. Keeping your package stylish
  9. Appropriate style filtering
  10. Using flake8 to tidy up a file
  11. Ignoring specific errors
  12. Configuring flake8

Create your packages more quickly. In this final module, you’ll learn how to use cookiecutter to generate all the supporting files your package needs, Makefiles to simplify releasing new versions, and be introduced to the last few files your package needs to attract users and contributors.


  1. Faster package development with templates
  2. Using package templates
  3. Version numbers and history
  4. CONTRIBUTING.md
  5. History file
  6. Tracking version number with bumpversion
  7. Makefiles and classifiers
  8. PyPI classifiers
  9. Using makefiles
  10. Wrap-up


Machine Learning for Business
Learn the Basics of Machine Learning
This course will introduce the key elements of machine learning to the business leaders. We will focus on the key insights and base practices how to structure business questions as modeling projects with the machine learning teams.

Dive into the Model Specifics
You will understand the different types of models, what kind of business questions they help answer, or what kind of opportunities they can uncover, also learn to identify situations where machine learning should NOT be applied, which is equally important. You will understand the difference between inference and prediction, predicting probability and amounts, and how using unsupervised learning can help build meaningful customer segmentation strategy.

4 Modules | 5+ Hours | 4 Skills

Course Modules 


Machine learning is used in many different industries and fields. It can fundamentally improve the business if applied correctly. This module outlines machine learning use cases, job roles and how they fit in the data needs pyramid.


  1. Machine learning and data pyramid
  2. Terminology clarification
  3. Order data pyramid needs
  4. Match tasks in data pyramid
  5. Machine learning principles
  6. Modeling types
  7. Find supervised and unsupervised cases
  8. Job roles, tools and technologies
  9. Job role responsibilities
  10. Match data projects with job roles
  11. Team structure types

This module overviews different machine learning types. We will look into differences between causal and prediction models, explore supervised and unsupervised learning, and finally understand the sub-types of supervised learning: classification and regression.


  1. Prediction vs. inference dilemma
  2. Inference and prediction differences
  3. Identify inference vs. prediction use cases
  4. Inference (causal) models
  5. Experiments and causal models
  6. Identify non actionable variables
  7. Prediction models (supervised learning)
  8. Supervised modeling principles
  9. Identify classification and regression models
  10. Prediction models (unsupervised learning)
  11. Unsupervised modeling use cases
  12. Classification, regression or unsupervised models

This module reviews key steps in scoping out business requirements, identifying and sizing machine learning opportunities, assessing the model performance, and identifying any performance risks in the process.


  1. Business requirements
  2. Identify situation, opportunity and action
  3. Identify successful experiments
  4. Model training
  5. Model training process
  6. Training, validation and test
  7. Model performance measurement
  8. Poor performance examples
  9. Identify performance metrics
  10. Machine learning risks
  11. Fixing non performing models
  12. Non-actionable models
  13. Identify actionable recommendations

This module will look into the best and worst practices of managing machine learning projects. We will identify most common machine learning mistakes, learn how to manage communication between the business and ML teams and finally address the challenges when deploying machine learning models to production.


  1. Machine learning mistakes
  2. Identify machine learning mistakes
  3. Data needs pyramid
  4. Match ML mistakes by their types
  5. Communication management
  6. Business communication focus
  7. Market testing
  8. Machine learning in production
  9. Production systems
  10. Production systems ML use cases
  11. ML in production launch
  12. Wrap-up


Introduction to SQL
Learn how Relational Databases are Organized
SQL is an essential language for building and maintaining relational databases, which opens the door to a range of careers in the data industry and beyond. You’ll start this course by covering data organization, tables, and best practices for database construction.

Write Your First SQL Queries
The second half of this course looks at creating SQL queries for selecting data that you need from your database. You’ll have the chance to practice your querying skills before moving on to customizing and saving your results.

Understand the Difference Between PostgreSQL and SQL Server
PostgreSQL and SQL Server are two of the most popular SQL flavors. You’ll finish off this course by looking at the differences, benefits, and applications of each. By the end of the course, you’ll have some hands-on experience in learning SQL and the grounding to start applying it to projects or continue your learning in a more specialized direction.

2 Modules | 3+ Hours | 2 Skills

Course Modules 


Before writing any SQL queries, it’s important to understand the underlying data. In this module, we’ll discover the role of SQL in creating and querying relational databases. Using a database for a local library, we will explore database and table organization, data types and storage, and best practices for database construction.


  1. Introduction to Database Management System
  2. What are the advantages of databases?
  3. Data organization
  4. Introduction to SQL
  5. Tables in SQL
  6. Views in SQL
  7. Table vs Views
  8. Picking a unique ID
  9. Setting the table in style
  10. Finding data types

  1. Introduction
  2. Entity Relationship Model
  3. Relationships in SQL
  4. Recap


  1. Introduction
  2. Downloading SQL Developer Edition
  3. Installing SQL Developer Edition
  4. Connecting to SQL Server
  5. Downloading Sample SQL Database in SQL Management Studio (SSMS)
  6. Configuring SQL Server, and SSMS
  7. Recap


  1. Database Manipulation in SQL
  2. SQL Storage Engines
  3. Creating and Managing Tables in SQL
  4. Creating and Managing Tables in SQL CREATE, DESCRIBE, and SHOW Table
  5. Creating and Managing Tables in SQL ALTER, TRUNCATE, and DROP Tables
  6. Inserting and Querying Data in Tables
  7. Filtering Data From Tables in SQL
  8. Filtering Data From Tables in SQL WHERE and DISTINCT Clauses
  9. Filtering Data From Tables in SQL AND and OR Operators
  10. Filtering Data From Tables in SQL IN and NOT IN Operators
  11. Filtering Data From Tables in SQL BETWEEN and LIKE Operators
  12. Filtering Data From Tables in SQL TOP, IS NULL, and IS NOT NULL Operators
  13. Sorting Table Data
  14. Recap

Learn your first SQL keywords for selecting relevant data from database tables! After practicing querying skills in a database of books, you’ll customize query results using aliasing and save them as views so they can be shared. Finally, you’ll explore the differences between SQL flavors and databases such as SQL Server.


  1. Introducing queries
  2. SQL strengths
  3. Developing SQL style
  4. Querying the books table
  5. Writing queries
  6. Comments in SQL
  7. Making queries DISTINCT
  8. Aliasing
  9. Viewing your query
  10. SQL flavors
  11. Comparing flavors
  12. Limiting results


Intermediate SQL
SQL is widely recognized as the most popular language for turning raw data stored in a database into actionable insights. This course uses a films database to teach how to navigate and extract insights from the data using SQL.

Discover Filtering with SQL
You'll discover techniques for filtering and comparing data, enabling you to extract specific information to gain insights and answer questions about the data.

Get Acquainted with Aggregation
Next, you'll get a taste of aggregate functions, essential for summarizing data effectively and gaining valuable insights from large datasets. You'll also combine this with sorting and grouping data, adding another layer of meaning to your insights and analysis.

Write Clean Queries
Finally, you'll be shown some tips and best practices for presenting your data and queries neatly. Throughout the course, you'll have hands-on practice queries to solidify your understanding of the concepts. By the end of the course, you'll have everything you need to know to analyze data using your own SQL code today!

7 Modules | 8+ Hours | 5+ Skills

Course Modules 


In this first module, you’ll learn how to query a films database and select the data needed to answer questions about the movies and actors. You'll also understand how SQL code is executed and formatted.


  1. SELECT Statement
  2. SELECT DISTINCT
  3. Query execution
  4. Order of execution
  5. SQL style
  6. SQL best practices
  7. Formatting
  8. Non-standard fields

  1. Arithmetic Operators: +, -, *, /, %
  2. Comparison Operators: =, >, <, >=, <=, <>, !=
  3. Logical Operators: AND, OR, NOT
  4. Special Operators: LIKE, IN, NOT, NOT EQUAL, IS NULL, UNION , UNION ALL ,
  5. | Except, Between, ALL and ANY, INTERSECT Clause, EXISTS

Learn about how you can filter numerical and textual data with SQL. Filtering is an important use for this language. You’ll learn how to use new keywords and operators to help you narrow down your query to get results that meet your desired criteria and gain a better understanding of NULL values and how to handle them.


  1. Filtering numbers
  2. Filtering results
  3. Using WHERE with numbers
  4. Using WHERE with text
  5. Multiple criteria
  6. Using AND
  7. Using OR
  8. Using BETWEEN
  9. Filtering text
  10. LIKE and NOT LIKE
  11. WHERE IN
  12. Combining filtering and selecting
  13. Understanding NULL values
  14. Practice with NULLs

Here, we will teach you how to sort and group data. These skills will take your analyses to a new level by helping you uncover critical business insights and identify trends and performance. You'll get hands-on experience to determine which films performed the best and how movie durations and budgets changed over time.


  1. Sorting results
  2. Sorting text
  3. The SQL ORDER BY
  4. ORDER BY - ascending
  5. ORDER BY - descending
  6. Sorting single fields
  7. Sorting multiple fields

  1. Data Definition Language (DDL): CREATE, DROP, ALTER, TRUNCATE
  2. Data Query Language (DQL): SELECT, WHERE
  3. Data Manipulation Language (DML): INSERT, UPDATE, DELETE

  1. NOT NULL Constraints
  2. UNIQUE Constraints
  3. Primary Key Constraints
  4. Foreign Key Constraints
  5. Composite Key
  6. Unique Constraints
  7. Alternate Key
  8. CHECK Constraints
  9. DEFAULT Constraints

SQL allows you to zoom in and out to better understand an entire dataset, its subsets, and its individual records. You'll learn to summarize data using aggregate functions and perform basic arithmetic calculations inside queries to gain insights into what makes a successful film.


  1. COUNT, SUM, AVG, MIN, MAX, Num
  2. Summarizing data
  3. Aggregate functions and data types
  4. Practice with aggregate functions
  5. Summarizing subsets
  6. Grouping data
  7. GROUP BY single fields
  8. GROUP BY multiple fields
  9. Answering business questions
  10. Filtering grouped data
  11. Filter with HAVING
  12. HAVING and sorting
  13. Combining aggregate functions with WHERE
  14. Using ROUND()
  15. ROUND() with a negative parameter
  16. Aliasing and arithmetic
  17. Using arithmetic
  18. Aliasing with functions
  19. Rounding results


Joining Data In SQL
Joining data is an essential skill in data analysis, enabling you to draw information from separate tables together into a single, meaningful set of results. In this comprehensive course on joining data, you'll delve into the intricacies of table joins and relational set theory, learning how to optimize your queries for efficient data retrieval.

Understand Data Joining Fundamentals
You will learn how to work with multiple tables in SQL by navigating and extracting data from various tables within a SQL database using various join types, including inner joins, outer joins, and cross joins. With practice, you'll gain the knowledge of how to select the appropriate join method.

Explore Advanced Data Manipulation Techniques
Next up, you'll explore set theory principles such as unions, intersects, and except clauses, as well as discover the power of nested queries in SQL. Every step is accompanied by exercises and opportunities to apply the theory and grow your confidence in SQL.

5 Modules | 6+ Hours | 5 Skills

Course Modules 


  1. Introduction to Alias
  2. Introduction to JOINS
  3. Right Cross and Self Join
  4. Operators in SQL
  5. Operators in SQL Updated
  6. Intersect and Emulation
  7. Minus and Emulation
  8. Subquery in SQL
  9. Subqueries with Statements and Operators
  10. Subqueries with Commands
  11. Derived Tables in SQL
  12. EXISTS Operator
  13. NOT EXISTS Operator
  14. EXISTS vs IN Operators
  15. Recap

In this closing Module, you’ll begin by investigating semi-joins and anti-joins. Next, you'll learn how to use nested queries. Last but not least, you’ll wrap up the course with some challenges!

  1. Subquerying with semi joins and anti joins
  2. Multiple WHERE clauses
  3. Semi join
  4. Diagnosing problems using anti join
  5. Subqueries inside WHERE and SELECT
  6. Subquery inside WHERE
  7. WHERE do people live?
  8. Subquery inside SELECT
  9. Subqueries inside FROM
  10. Subquery inside FROM
  11. Subquery challenge
  12. Final challenge
  13. The finish line

In this module, you’ll be introduced to the concept of joining tables and will explore all the ways you can enrich your queries using joins—beginning with inner joins.

  1. The ins and outs of INNER JOIN
  2. Your first join
  3. Joining with aliased tables
  4. USING in action
  5. Defining relationships
  6. Relationships in our database
  7. Inspecting a relationship
  8. Multiple joins
  9. Joining multiple tables
  10. Checking multi-table joins

After familiarizing yourself with inner joins, you will come to grips with different kinds of outer joins. Next, you will learn about cross joins. Finally, you will learn about situations in which you might join a table with itself.

  1. LEFT and RIGHT JOINs
  2. Remembering what is LEFT
  3. This is a LEFT JOIN, right?
  4. Building on your LEFT JOIN
  5. Is this RIGHT?
  6. FULL JOINs
  7. Comparing joins
  8. Chaining FULL JOINs
  9. Crossing into CROSS JOIN
  10. Histories and languages
  11. Choosing your join
  12. Self joins
  13. Comparing a country to itself
  14. All joins on deck

In this module, you will learn about using set theory operations in SQL, with an introduction to UNION, UNION ALL, INTERSECT, and EXCEPT clauses. You’ll explore the predominant ways in which set theory operations differ from join operations.

  1. Set theory for SQL Joins
  2. UNION vs. UNION ALL
  3. Comparing global economies
  4. Comparing two set operations
  5. At the INTERSECT
  6. INTERSECT
  7. Review UNION and INTERSECT
  8. EXCEPT
  9. You've got it, EXCEPT...
  10. Calling all set operators


Learn Git
This course introduces learners to version control using Git. You will discover the importance of version control when working on data science projects and explore how you can use Git to track files, compare differences, modify and save files, undo changes, and allow collaborative development through the use of branches. You will gain an introduction to the structure of a repository, how to create new repositories and clone existing ones, and show how Git stores data. By working through typical data science tasks, you will gain the skills to handle conflicting files!


4 Modules | 5+ Hours | 4 Skills

Course Modules

In the first module, you’ll learn what version control is and why it is essential for data projects. Then, you’ll discover what Git is and how to use it for a version control workflow.


  1. Introduction to version control with Git
  2. Using the shell
  3. Checking the version of Git
  4. Saving files
  5. Where does Git store information?
  6. The Git workflow
  7. Adding a file
  8. Adding multiple files
  9. Comparing files
  10. What has changed?
  11. What is going to be committed?
  12. What's in the staging area?

Next, you’ll examine how Git stores data, learn essential commands to compare files and repositories at different times, and understand the process for restoring earlier versions of files in your data projects.


  1. Storing data with Git
  2. Interpreting the commit structure
  3. Viewing a repository's history
  4. Viewing a specific commit
  5. Viewing changes
  6. Comparing to the second most recent commit
  7. Comparing commits
  8. Who changed what?
  9. Undoing changes before committing
  10. How to unstage a file
  11. Undoing changes to unstaged files
  12. Undoing all changes
  13. Restoring and reverting
  14. Restoring an old version of a repo
  15. Deleting untracked files
  16. Restoring an old version of a file

In this module, you'll learn tips and tricks for configuring Git to make you more efficient! You'll also discover branches, identify how to create and switch to different branches, compare versions of files between branches, merge branches together, and deal with conflicting files across branches.


  1. Configuring Git
  2. Modifying your email address in Git
  3. Creating an alias
  4. Ignoring files
  5. Branches
  6. Branching and merging
  7. Creating new branches
  8. Checking the number of branches
  9. Comparing branches
  10. Working with branches
  11. Switching branches
  12. Merging two branches
  13. Handling conflict
  14. Recognizing conflict syntax
  15. Resolving a conflict

This final module is all about collaboration! You'll gain an introduction to remote repositories and learn how to work with them to synchronize content between the cloud and your local computer. You'll also see how to create new repositories and clone existing ones, along with discovering a workflow to minimize the risk of conflicts between local and remote repositories.


  1. Creating repos
  2. Setting up a new repo
  3. Converting an existing project
  4. Working with remotes
  5. Cloning a repo
  6. Defining and identifying remotes
  7. Gathering from a remote
  8. Fetching from a remote
  9. Pulling from a remote
  10. Pushing to a remote
  11. Pushing to a remote repo
  12. Handling push conflicts
  13. Wrap up!

THE COMPLETE DATA ANALYSIS & VISUALIZATION WITH PYTHON COST


United States

$899.99

United Kingdom

£799.99

Career and Certifications


GreaterHeight Academy's Certificate Holders also prepared work at companies like:



Our Advisor is just a CALL away

+1 5169831065                                    +447474275645
Available 24x7 for your queries


Talk to our advisors

Our advisors will get in touch with you in the next 24 hours.


Get Advice


FAQs

Complete Data Analysis & Visualization with Python Course

  • Python, created by Guido van Rossum in 1991, is a high-level, readable programming language known for its simplicity. It's versatile, with applications in web development, data analysis, AI, and more. Python's extensive standard library and rich ecosystem enhance its capabilities. It's cross-platform compatible and supported by a large community. Python's popularity has grown, making it widely used in diverse industries.

  • A Python developer is a software developer or programmer who specializes in using the Python programming language for creating applications, software, or solutions. They have expertise in writing Python code, understanding the language's syntax, libraries, and frameworks. Python developers are skilled in utilizing Python's features to develop web applications, data analysis tools, machine learning models, automation scripts, and other software solutions.
  • They work in various industries, collaborating with teams or independently to design, implement, test, and maintain Python-based projects. Python developers often possess knowledge of related technologies and tools to enhance their development process.

  • Python Developer Masters Program is a structured learning path recommended by leading industry experts and ensures that you transform into a proficient Python Developer. Being a full fledged Python Developer requires you to master multiple technologies and this program aims at providing you an in-depth knowledge of the entire Python programming practices. Individual courses at GreaterHeight Academy focus on specialization in one or two specific skills; however, if you intend to become a master in Python programming then this is your go to path to follow.

  • Yes. But you can also raise a ticket with the dedicated support team at any time. If your query does not get resolved through email, we can also arrange one-on-one sessions with our support team. However, our support is provided for a period of Twelve Weeks from the start date of your course.

There are several reasons why becoming a Python developer can be a rewarding career choice. Here are a few:

  • Versatility and Popularity: Python is a versatile programming language that can be used for various purposes, such as web development, data analysis, machine learning, artificial intelligence, scientific computing, and more. It has gained immense popularity in recent years due to its simplicity, readability, and extensive library ecosystem. Python is widely used in both small-scale and large-scale projects, making it a valuable skill in the job market.
  • Ease of Learning: Python has a clean and intuitive syntax that emphasizes readability, which makes it relatively easy to learn compared to other programming languages. Its simplicity allows beginners to grasp the fundamentals quickly and start building useful applications in a relatively short amount of time. This accessibility makes Python an attractive choice for both novice and experienced programmers.
  • Rich Ecosystem and Libraries: Python offers a vast collection of libraries and frameworks that can accelerate development and simplify complex tasks. For example, Django and Flask are popular web development frameworks that provide robust tools for building scalable and secure web applications. NumPy, Pandas, and Matplotlib are widely used libraries for data analysis and visualization. TensorFlow and PyTorch are prominent libraries for machine learning and deep learning. These libraries, among many others, contribute to Python's efficiency and effectiveness as a development language.
  • Job Opportunities: The demand for Python developers has been steadily growing in recent years. Many industries, including technology, finance, healthcare, and academia, rely on Python for various applications. By becoming a Python developer, you open up a wide range of career opportunities, whether you choose to work for a large corporation, a startup, or even as a freelancer. Additionally, Python's versatility allows you to explore different domains and switch roles if desired.
  • Community and Support: Python has a vibrant and supportive community of developers worldwide. This community actively contributes to the language's development, creates open-source libraries, and provides assistance through forums, online communities, and resources.

  • There are no prerequisites for enrollment to this Masters Program. Whether you are an experienced professional working in the IT industry or an aspirant planning to enter the world of Python programming, this masters program is designed and developed to accommodate various professional backgrounds.

  • Python Developer Masters Program has been curated after thorough research and recommendations from industry experts. It will help you differentiate yourself with multi-platform fluency and have real-world experience with the most important tools and platforms. GreaterHeight Academy will be by your side throughout the learning journey - We’re Ridiculously Committed.

  • The recommended duration to complete this Python Developer Masters Program is about 20 weeks, however, it is up to the individual to complete this program at their own pace.

The roles and responsibilities of a Python developer may vary depending on the specific job requirements and industry. However, here are some common tasks and responsibilities associated with the role:

  1. Developing Applications: Python developers are responsible for designing, coding, testing, and debugging applications using Python programming language. This includes writing clean, efficient, and maintainable code to create robust software solutions.
  2. Web Development: Python is widely used for web development. As a Python developer, you may be involved in building web applications, using frameworks like Django or Flask. This includes developing backend logic, integrating databases, handling data processing, and ensuring the smooth functioning of the web application.
  3. Data Analysis and Visualization: Python offers powerful libraries like NumPy, Pandas, and Matplotlib, which are extensively used for data analysis and visualization. Python developers may be responsible for manipulating and analyzing large datasets, extracting insights, and presenting them visually.
  4. Machine Learning and AI: Python is a popular choice for machine learning and artificial intelligence projects. Python developers may work on implementing machine learning algorithms, training models, and integrating them into applications. This involves using libraries like TensorFlow, PyTorch, or scikit-learn.
  5. Collaborating and Teamwork: Python developers often work as part of a development team. They collaborate with other team members, including designers, frontend developers, project managers, and stakeholders. Effective communication and teamwork skills are crucial to ensure smooth project execution.
  6. Documentation: Python developers are expected to document their code, providing clear explanations and instructions for others who may work on or maintain the codebase in the future. Documentation helps in understanding the code and facilitating collaboration.
  7. Continuous Learning: Technology is constantly evolving, and as a Python developer, you need to stay updated with the latest advancements, libraries, frameworks, and best practices. Continuous learning and self-improvement are essential to excel in this role.

The Python Developer training course is for those who want to fast-track their Python programming career. This Python Developer Masters Program will benefit people working in the following roles:

  1. Freshers
  2. Engineers
  3. IT professionals
  4. Data Scientist
  5. Machine Learning Engineer
  6. AI Engineer
  7. Business analysts
  8. Data analysts

  • Top companies such as Microsoft, Google, Meta, Citibank, Well Fargo, and many more are actively hiring certified Python professionals at various positions.

  • On completing this Python Developer Masters Program, you’ll be eligible for the roles like: Python Developer, Web Developer, Data Analyst, Data Scientist, Software Engineer and many more.

  • There is undoubtedly great demand for data analytics as 96% of organizations seek to hire Data Analysts. The most significant data analyst companies that employ graduates who wish to have a data analyst career are Manthan, SAP, Oracle, Accenture Analytics, Alteryx, Qlik, Mu Sigma Analytics, Fractal Analytics, and Tiger Analytics. Professional Data Analyst training will make you become a magician of any organization, and you will spin insights by playing with big data.

A successful data analyst possesses a combination of technical skills and leadership skills.

  • Technical skills include knowledge of database languages such as SQL, R, or Python; spreadsheet tools such as Microsoft Excel or Google Sheets for statistical analysis; and data visualization software such as Tableau or Qlik. Mathematical and statistical skills are also valuable to help gather, measure, organize, and analyze data while using these common tools.
  • Leadership skills prepare a data analyst to complete decision-making and problem-solving tasks. These abilities allow analysts to think strategically about the information that will help stakeholders make data-driven business decisions and to communicate the value of this information effectively. For example, project managers rely on data analysts to track the most important metrics for their projects, to diagnose problems that may be occurring, and to predict how different courses of action could address a problem.

Career openings are available practically from all industries, from telecommunications to retail, banking, healthcare, and even fitness. Without extensive training and effort, it isn't easy to get data analyst career benefits. So, earning our Data Analyst certification will allow you to keep up-to-date on recent trends in the industry.

  • Yes, we do. We will discuss all possible technical interview questions and answers during the training program so that you can prepare yourself for interview.

  • No. Any abuse of copyright is taken seriously. Thanks for your understanding on this one.

  • Yes, we would be providing you with the certificate of completion of the program once you have successfully submitted all the assessment and it has been verified by our subject matter experts.

  • GreaterHeight is offering you the most updated, relevant, and high-value real-world projects as part of the training program. This way, you can implement the learning that you have acquired in real-world industry setup. All training comes with multiple projects that thoroughly test your skills, learning, and practical knowledge, making you completely industry ready.
  • You will work on highly exciting projects in the domains of high technology, ecommerce, marketing, sales, networking, banking, insurance, etc. After completing the projects successfully, your skills will be equal to 6 months of rigorous industry experience.

All our mentors are highly qualified and experience professionals. All have at least 15-20 yrs. of development experience in various technologies and are trained by GreaterHeight Academy to deliver interactive training to the participants.

Yes, we do. As the technology upgrades, we do update our content and provide your training on latest version of that technology.

  • All online training classes are recorded. You will get the recorded sessions so that you can watch the online classes when you want. Also, you can join other class to do your missing classes.

OUR POPULAR COURSES

Data  Analytics and Visualization With Python

Data Analytics and Visualization With Python

Advanced developments of expertise in cleaning, transforming, and modelling data to obtain insight into  corporate decision making as a Senior Data Analyst - Using Python.

View Details
Data Science Training Masters Program

Data Science Training Masters Program

Learn Python, Statistics, Data Preparation, Data Analysis, Querying Data, Machine Learning, Clustering, Text Processing, Collaborative Filtering, Image Processing, etc..

View Details
Microsoft Azure DP-100 Data Science

Microsoft Azure DP-100 Data Science

You will Optimize & Manage Models,   Perform Administration by using T-SQL, Run Experiment & Train Models, Deploy & Consume Models, and Automate Tasks.

View Details
Machine Learning using Python

Machine Learning using Python

Learn Data Science and Machine Learning from scratch, get hired, and have fun along the way with the most modern, up-to-date Data Science course on.

View Details
Microsoft Azure PL-300 Data Analysis

Microsoft Azure PL-300 Data Analysis

You will learn how to Design a Data Model in Power BI, Optimize Model Performance,   Manage Datasets in Power BI and Create Paginated Reports.

View Details
Microsoft Azure DP-203 Data Engineer

Microsoft Azure DP-203 Data Engineer

You will learn Batch & Real Time Analytics, Azure Synapse Analytics, Azure Databricks,   Implementing Security and ETL & ELT Pipelines.

View Details

The GreaterHeight Advantage

0+

Accredited Courseware

Most of our training courses are accredited by the respective governing bodies.

0+

Assured Classes

All our training courses are assured & scheduled dates are confirmed to run by SME.

0+

Expert Instructor Led Programs

We have well equipped and highly experienced instructors to train the professionals.

OUR CLIENTS

We Have Worked With Some Amazing Companies Around The World

Our awesome clients we've had the pleasure to work with!


Client 01
Client 02
Client 03
Client 04
Client 05
Client 06
Client 07
  • Contact info
  • Facebook
  • WhatsApp
  • (+86)1234567809
  • creative@gmail.com
  • Back to top