Data Analysis : Advance : DA301

About the Course

In May 1968, the U.S. Navy’s nuclear submarine USS Scorpion failed to arrive as expected at her home port of Norfolk, Virginia. The command officers of the U.S. Navy were nearly certain that the vessel had been lost off the Eastern Seaboard, but an extensive search there failed to discover the remains of Scorpion.

Then, a Navy deep-water expert, John P. Craven, suggested that Scorpion had sunk elsewhere. Craven organized a search southwest of the Azores based on a controversial approximate triangulation by hydrophones. He was allocated only a single ship, Mizar, and he took advice from a firm of consultant mathematicians in order to maximize his resources. A Bayesian search methodology was adopted. Experienced submarine commanders were interviewed to construct hypotheses about what could have caused the loss of Scorpion.

The sea area was divided up into grid squares and a probability assigned to each square, under each of the hypotheses, to give a number of probability grids, one for each hypothesis. These were then added together to produce an overall probability grid. The probability attached to each square was then the probability that the wreck was in that square. A second grid was constructed with probabilities that represented the probability of successfully finding the wreck if that square were to be searched and the wreck were to be actually there. The result of combining this grid with the previous grid is a grid which gives the probability of finding the wreck in each grid square of the sea if it were to be searched.

At the end of October 1968, the Navy’s oceanographic research ship, Mizar, located sections of the hull of Scorpion on the seabed, about 740 km southwest of the Azores, under more than 3,000 m of water.

Sounds fun? Then dive in to learn everything you need to get started with the first principles of Bayesian Probabilistic Programming in a fun and easy to digest format. Bayesian statistics is the statistics of small data. Sometimes we have to make educated guesses based on very few data points as it could be impossible or expensive to gather more data. And this course will show you exactly how to do it.

The focus of this course is to apply probabilistic modelling to non-trivial problems. There are four functional roles in Data Science, namely, Business Analyst, Data Analyst, Machine Learning Engineer and Data Engineer. The DA track targets the Data Analyst role.

Prerequisites

Curiosity
Basic arithmetic skills - Brackets, division, multiplication, addition, subtraction
Ability to operate a computer, keyboard and mouse
Ability to use a web browser to access and use the internet
Ability to install software on your computer
Data Analysis : Intermediate : DA201

Hardware and Software Requirements

Physical operational computer (not in virtualization) - Fedora 34 or greater OR PopOS/Ubuntu 20.04 or greater, OR Windows 10 or greater, OR MacOS 10 or greater
16 GB RAM
Broadband internet connection > 5 MBPS
100 GB free hard disk space, SSD Drive recommended
Dedicated graphic card is not required but recommended. Cloud will be used

Learning Objective

Advanced AI And ML (Shallow/Traditional Learning)

Timeseries
AutoML
H20
Explaining Models

Bayesian Computational Statistics

What is a Statistical Distribution
Introduction to Inverse Probability
Introduction to Bayes Theory
Advance Computational Statistics
Bayesian Estimation
Odds
Decision Analysis
Probabilistic Prediction
Observer Bias and Queuing Theory
Applying Bayes Theorem in Two Dimensions
B.E.S.T (Bayesian estimation supersedes the t-test): Hypothesis Testing
Hierarchical Models
Species Problem

Learning Outcome

Learn various techniques to perform time-series analysis on non-random-walk data
Use AutoML tools to create models
Learn to explain models
Understand the difference between Frequentist and Bayesian approaches
Learnt to apply the inverse probability theorem using advanced computational statistics methods
Performing estimations and decision analysis
Performing probabilistic predictions (for example, which soccer team will win - think Moneyball)
Understand and apply Queuing theory
Apply computational Bayes Theory in two dimensions
Performing hypothesis testing using Bayesian methods
Creating hierarchical probabilistic models
Apply Bayes Theorem to Species detection problem

Fineprint

The topics presented are tentative and we reserve the right to add or remove a topic to update or improve the bootcamp, or for a technical or time reasons.
† 18% Indian taxes extra.