• Home
  • Data Analysis : Introduction : DA101

Start your career in Supervised Traditional (Shallow) Machine Learning and Statistical Modelling with this Introduction to Data Analysis program.

Data Analysis : Introduction : DA101

  • DURATION

    4 Months

  • WEEKLY

    45 hours

  • FEE

    Contact us

About the Course

The problem is not Big Data. The problem is Small Data. We live in a world of data exhaust. Unrefined Data is aplenty. But most of the time, we have very Small Data to analyze. This is where Statistical Modelling shines. This course shows you everything you need to get started with the first principles of statistical modelling in a fun and easy to digest format. There are four functional roles in Data Science, namely, Business Analyst, Data Analyst, Machine Learning Engineer and Data Engineer. The DA track targets the Data Analyst role. Although this course is on Data Analysis, you will learn some Predictive Analytics too as sometimes its difficult to separate the two.

Most people ignore Statistics thinking in the world of Deep Learning its irrelevant. Nothing can be further from the truth. Before you can learn to analyze Big Data, you need to learn to analyze small data. Before you can apply Deep Learning, learn to apply Shallow Learning using traditional non-deep and Statistical Modelling techniques. This is exactly the focus of this course - Shallow Learning and Small Data.

Reason? Unlike the Deep learning models which are data hungry black boxes, these statistical models allow you to arrive upon an educated guesstimate with very little data and the results are explainable. For example, they are the foundation of many digital marketing techniques such as A/B testing and Hypothesis Testing. They allow you to infer and predict properties of populations using a small sample which are cheap and easy to collect. Sometimes its very expensive or outright impossible to collect huge quantities of data. Lastly, there is an entire class of Data problems that can only be solved using Computational Probability and Statistics, period.

Most importantly, this is where you learn to solve Data problems from first principles and learn how to program with Data. These techniques are as old as statistics itself and predate the modern computers. The twist is that we have made it easy for you to learn and apply these techniques by letting the computer do all the hard work! You learn to understand a problem, which Statistical tool to use to solve it, formulate a solution and then let the computer do the calculations. This discreet approach is not formula or Calculus dependent. This allows you to create tests for which formulas do not exist. The discreet computational approach is more flexible so its less prone to errors and are easy to understand and apply as compared to Analytical Statistics (the one we hated in school).

This is not your regular school textbook Statistics course filled with theoretical mathematical symbols. This is a practical, hands-on, solution based approach designed to be used on real life problems. The course is jam-packed with interactive classes, interesting articles, book references and exciting projects the likes of which you may have never seen!

Prerequisites

  • Curiosity
  • Basic arithmetic skills - Brackets, division, multiplication, addition, subtraction
  • Ability to operate a computer, keyboard and mouse
  • Ability to use a web browser to access and use the internet
  • Ability to install software on your computer
  • 1DataScience.com Bootcamp

Hardware and Software Requirements

  • Physical operational computer (not in virtualization) - Fedora 34 or greater OR PopOS/Ubuntu 20.04 or greater, OR Windows 10 or greater, OR MacOS 10 or greater
  • 16 GB RAM
  • Broadband internet connection > 5 MBPS
  • 100 GB free hard disk space, SSD Drive recommended
  • Dedicated graphic card is not required but recommended. Cloud will be used

Learning Objective

Introduction To AI And ML (Shallow/Traditional Learning)
  • Introduction
  • Colab
  • Data Preprocessing
  • Regression
  • Classification
  • Anomalies Detection
  • Properties of Good Features
  • Bias In Models
Introduction to Applied Computational Statistics
  • Introduction to Frequentist Statistics
  • Visualizing Information : Categorical Variables - Visualization Techniques, Numerical Variables - Frequency Distribution Table
  • Measuring Central Tendencies
  • Measuring Variability And Spread
  • Introduction to Probabilities
  • Using Discrete Probability Distributions
  • Introduction to Permutations And Combinations
  • Exploratory Data Analysis (EDA)

Learning Outcome

  • Use Google’s Colab service to create models using small data
  • Perform preprocessing of data to account for invalid or missing data
  • Create simple and multiple linear regression models
  • Create models to classify data
  • Create models to detect anomalies in data
  • Understand the properties of good features
  • Learn to identify biases in models
  • Understand what are frequency based estimations
  • How to effectively visualize numerical and categorical data
  • How to measure central tendencies of the data using various measure and when to use which one
  • Measuring variability and spread of data
  • Understand fundamentals of probabilities and odds
  • Applying discreet probability distributions to real world datasets
  • Understand fundamentals of permutations and combinations
  • How to explore the data, which is the first step before analysis

Fineprint

  • The topics presented are tentative and we reserve the right to add or remove a topic to update or improve the bootcamp, or for a technical or time reasons.
  • † taxes extra.
teacher
Manuj Chandra

Manuj Chandra

Data Science

Related Course

Business Intelligence and Information Design : Advance : BIID301
  • 3 Month
  • Business Intelligence

Business Intelligence and Information Design : Advance : BIID301

About the Course There are four functional roles in Data Science, namely, Business Analyst, Data …

Apply now
Data Analysis : Intermediate : DA201
  • 4 Months
  • Data Analysis

Data Analysis : Intermediate : DA201

About the Course The problem is not Big Data. The problem is Small Data.

Apply now
Programming Effectively with Generative AI (Code Based)
  • 2 days
  • Data Analytics

Programming Effectively with Generative AI (Code Based)

Introduction The world of programming is rapidly evolving. Generative AI tools, like Github Copilot, …

Apply now