Data Analysis : Introduction : DA101

About the Course

The problem is not Big Data. The problem is Small Data. We live in a world of data exhaust. Unrefined Data is aplenty. But most of the time, we have very Small Data to analyze. This is where Statistical Modelling shines. This course shows you everything you need to get started with the first principles of statistical modelling in a fun and easy to digest format. There are four functional roles in Data Science, namely, Business Analyst, Data Analyst, Machine Learning Engineer and Data Engineer. The DA track targets the Data Analyst role. Although this course is on Data Analysis, you will learn some Predictive Analytics too as sometimes its difficult to separate the two.

Most people ignore Statistics thinking in the world of Deep Learning its irrelevant. Nothing can be further from the truth. Before you can learn to analyze Big Data, you need to learn to analyze small data. Before you can apply Deep Learning, learn to apply Shallow Learning using traditional non-deep and Statistical Modelling techniques. This is exactly the focus of this course - Shallow Learning and Small Data.

Reason? Unlike the Deep learning models which are data hungry black boxes, these statistical models allow you to arrive upon an educated guesstimate with very little data and the results are explainable. For example, they are the foundation of many digital marketing techniques such as A/B testing and Hypothesis Testing. They allow you to infer and predict properties of populations using a small sample which are cheap and easy to collect. Sometimes its very expensive or outright impossible to collect huge quantities of data. Lastly, there is an entire class of Data problems that can only be solved using Computational Probability and Statistics, period.

Most importantly, this is where you learn to solve Data problems from first principles and learn how to program with Data. These techniques are as old as statistics itself and predate the modern computers. The twist is that we have made it easy for you to learn and apply these techniques by letting the computer do all the hard work! You learn to understand a problem, which Statistical tool to use to solve it, formulate a solution and then let the computer do the calculations. This discreet approach is not formula or Calculus dependent. This allows you to create tests for which formulas do not exist. The discreet computational approach is more flexible so its less prone to errors and are easy to understand and apply as compared to Analytical Statistics (the one we hated in school).

This is not your regular school textbook Statistics course filled with theoretical mathematical symbols. This is a practical, hands-on, solution based approach designed to be used on real life problems. The course is jam-packed with interactive classes, interesting articles, book references and exciting projects the likes of which you may have never seen!

Prerequisites

Curiosity
Basic arithmetic skills - Brackets, division, multiplication, addition, subtraction
Ability to operate a computer, keyboard and mouse
Ability to use a web browser to access and use the internet
Ability to install software on your computer
1DataScience.com Bootcamp

Hardware and Software Requirements

Physical operational computer (not in virtualization) - Fedora 34 or greater OR PopOS/Ubuntu 20.04 or greater, OR Windows 10 or greater, OR MacOS 10 or greater
16 GB RAM
Broadband internet connection > 5 MBPS
100 GB free hard disk space, SSD Drive recommended
Dedicated graphic card is not required but recommended. Cloud will be used

Learning Objective

Introduction To AI And ML (Shallow/Traditional Learning)

Introduction
Colab
Data Preprocessing
Regression
Classification
Anomalies Detection
Properties of Good Features
Bias In Models

Introduction to Applied Computational Statistics

Introduction to Frequentist Statistics
Visualizing Information : Categorical Variables - Visualization Techniques, Numerical Variables - Frequency Distribution Table
Measuring Central Tendencies
Measuring Variability And Spread
Introduction to Probabilities
Using Discrete Probability Distributions
Introduction to Permutations And Combinations
Exploratory Data Analysis (EDA)

Learning Outcome

Use Google’s Colab service to create models using small data
Perform preprocessing of data to account for invalid or missing data
Create simple and multiple linear regression models
Create models to classify data
Create models to detect anomalies in data
Understand the properties of good features
Learn to identify biases in models
Understand what are frequency based estimations
How to effectively visualize numerical and categorical data
How to measure central tendencies of the data using various measure and when to use which one
Measuring variability and spread of data
Understand fundamentals of probabilities and odds
Applying discreet probability distributions to real world datasets
Understand fundamentals of permutations and combinations
How to explore the data, which is the first step before analysis

Fineprint

The topics presented are tentative and we reserve the right to add or remove a topic to update or improve the bootcamp, or for a technical or time reasons.
† taxes extra.