• Home
  • Data Engineering : Introduction : DE101

If you want to start your career in Big Data and Data Engineering, this is the best place to start.

Data Engineering : Introduction : DE101

  • DURATION

    4 Months

  • WEEKLY

    45 hours

  • FEE

    Contact us

About the Course

There are four functional roles in Data Science, namely, Business Analyst, Data Analyst, Machine Learning Engineer and Data Engineer. The DE track targets the Data Engineer role. The Data Engineer collects, transforms, moves, secures and stores Data to make Business Analysis, Data Analysis and Machine Learning possible.

This course will demystify Regular Expressions for you and enable you to apply them in your real life. Regular Expressions are everywhere and once you learn to harness their power you will be able to use them to extract specific patterns in your data.

This course is also the beginning of your cloud journey. We have decided to use Google Cloud Platform (GCP) because its scenario focused and easy to use, yet powerful. Google is the inventor of many watershed Data Science technologies such as Hadoop, Tensorflow and Kubernetes and has been named as a leader in AI by Gartner. Google has an unparalleled network and many of its data centers run on renewable energy. This is good for the environment since the cloud is actually in the ocean! Another great thing about Google is that most (but not all) of its technologies are open source so there is least amount of vendor lock-in. And Google gives a USD 300 free credit to learn and practice. More credit on top of it is also added when you start to learn Machine Learning on the cloud later. And no, we are not getting paid by Google to say that 🤑

You will start your journey into data engineering by learning Google’s powerful yet easy to use Data Warehouse called BigQuery. In many cases, BigQuery can be used as a stand-in replacement for Hadoop. The value proposition of BigQuery is that its serverless (you don’t have to worry about the infrastructure) and allows you to use almost standard SQL on Big Data!

Prerequisites

  • Curiosity
  • Patience
  • Basic arithmetic skills - Brackets, division, multiplication, addition, subtraction
  • Ability to operate a computer, keyboard and mouse
  • Ability to use a web browser to access and use the internet
  • Ability to install software on your computer
  • SQL (You can look into BIID programs to learn SQL)

Hardware and Software Requirements

  • Physical operational computer (not in virtualization) – Fedora 34 or greater OR PopOS/Ubuntu 20.04 or greater, OR Windows 10 or greater, OR MacOS 10 or greater
  • 16 GB RAM
  • Broadband internet connection > 5 MBPS
  • 100 GB free hard disk space. SSD Drive recommended
  • Dedicated graphic card is not required but recommended. Cloud will be used
  • Access to a credit card for Google Cloud Compute account with billing enabled and free $300 credits

Learning Objective

Regular Expressions (Regex)
  • Character Classes
  • The Backslash Plague
  • Alteration
  • Quantifiers
  • Greedy and Non-Greedy Quantifiers
  • Boundary Matchers
  • Splitting
  • Substitution
  • Compilation Flags
  • Grouping
  • Backreferencing
  • Named Groups
  • Non-Capturing Groups
  • Look ahead
  • Look behind
Google Cloud Compute Platform (GCP) Fundamentals
  • GCP Introduction
  • Setting Up GCP
  • Google Cloud Compute Engine
  • Google Cloud Storage
  • IAM
  • Billing
Data Warehousing with Google Cloud Compute Platform (GCP) using BigQuery
  • Introduction To Data Engineering
  • Transactional And Analytical Processing
  • Introducing Bigquery
  • Choosing Bigquery
  • Bigquery Pricing
  • Enabling APIs
  • Exploring UI
  • Public Datasets
  • Executing Queries
  • Working With The BQ Command On The Terminal
  • Introduction to Datasets Tables And Views
  • Creating And Editing Access To A Dataset
  • Creating Tables and Querying Tables Metadata
  • Creating Tables From Other Source Tables
  • Creating Logical Views
  • Changing data residency
  • Uploading Data To Buckets
  • Importing Data from CSV Files On Cloud Storage
  • Importing Json Files
  • Querying a non native file in bucket
  • Introduction to Partitioning In BigQuery
  • Creating And Querying Ingestion Time Partitioned Tables
  • Creating Column Based Partitioned Tables
  • Normalized Storage In A Traditional Database
  • Denormalized Storage, Nested, And Repeated Fields
  • Unnest, Array Agg, And The Struct Operators
  • Sub-query and CTE design WITH
  • Working With Nested Fields Structs
  • Populating Data Into A Table With Nested Fields Using Struct
  • Working With Repeated Fields
  • Populating Tables With Repeated Fields Using Array Agg
  • Using Nested And Repeated Fields Together
  • Using Unnest To Query Repeated Fields
  • Aggregations
  • Subqueries
  • Windowing Operations
  • Performing Window Operations
  • Integrating Bigquery With Data Studio
  • Connecting To Datalab
  • Running Queries Programmatically
  • JOINS
  • Optimization Pro Tips to reduce cost

Learning Outcome

  • Learn Regular Expressions to extract data from various sources
  • Understand what is the cloud what what roles it plays in Data Science
  • Understand what is Big Data and what strategies are used to deal with it
  • Learn to setup a Google Cloud Platform (GCP) account
  • Learn to work with virtual machines on the cloud
  • Understand the concept of block and blog storage and its role in Data Lakes
  • Learn to secure your cloud account using Identity and Access Management (IAM)
  • Learn to identify where your money is being spent on the cloud
  • Understand what is a Data Lake, Data Mart and Data Warehouse
  • Understand the difference between transactional processing and analytical processing
  • Learn to professionally work with BigQuery, which is Google’s data warehouse

Fineprint

  • The topics presented are tentative and we reserve the right to add or remove a topic to update or improve the bootcamp, or for a technical or time reasons.
  • † taxes extra.
teacher
Manuj Chandra

Manuj Chandra

Data Science

Related Course

Data Analytics : Introduction : PA101
  • 4 Month
  • Data Analytics

Data Analytics : Introduction : PA101

About the Course There are four functional roles in Data Science, namely, Business Analyst, Data …

Apply now
Prompt Engineeting Masterclass (No Code)
  • 2 Days
  • Data Anylytics

Prompt Engineeting Masterclass (No Code)

About the Course Welcome to the Age of Generative AI! The world is on the cusp of a new era, powered …

Apply now
Programming Effectively with Generative AI (Code Based)
  • 2 days
  • Data Analytics

Programming Effectively with Generative AI (Code Based)

Introduction The world of programming is rapidly evolving. Generative AI tools, like Github Copilot, …

Apply now