About the Course
There are four functional roles in Data Science, namely, Business Analyst, Data Analyst, Machine Learning Engineer and Data Engineer. The DE track targets the Data Engineer role. The Data Engineer collects, transforms, moves, secures and stores Data to make Business Analysis, Data Analysis and Machine Learning possible.
This is part two of your Data Engineering journey. You will be introduced to ETL and ELT paradigms. You will learn how to setup a Data Lake and a Data warehouse and how to use industry standard tools to create hands-off automated data processing pipelines! Not only that, you will also learn code-free tools for processing Big Data to make your life as a Data Engineer easy.
This course also focuses on various types of SQL, NoSQL and real-time databases to complete your understanding of all major Big Data technologies. So if you are serious about a career in Data Engineering, this course could prove to be an invaluable asset.
Prerequisites
- Curiosity
- Patience
- Basic arithmetic skills - Brackets, division, multiplication, addition, subtraction
- Ability to operate a computer, keyboard and mouse
- Ability to use a web browser to access and use the internet
- Ability to install software on your computer
- Data Engineering : Introduction : DE101
Hardware and Software Requirements
- Physical operational computer (not in virtualization) – Fedora 34 or greater OR PopOS/Ubuntu 20.04 or greater, OR Windows 10 or greater, OR MacOS 10 or greater
- 16 GB RAM
- Broadband internet connection > 5 MBPS
- 100 GB free hard disk space. SSD Drive recommended
- Dedicated graphic card is not required but recommended. Cloud will be used
- Access to a credit card for Google Cloud Compute account with billing enabled and free $300 credits
Learning Objective
- Dataflow (Apache Beam)
- Cloud Composer (Apache Airflow)
- Preparing Data In Bigquery And Dataprep
- Cloud SQL
- Cloud Spanner
- Firestore For Realtime Database
- Big Data Solutions Using Bigtable
- Advanced Bigquery
- Fusion (CDAP)
- Data-warehouse (DWH) And Data-lake
- Case Studies
- DWH from OLTP To OLAP
Dashboards for Big Data
- Visualization With Datastudio
- Google Charts
Learning Outcome
- Understand what is ETL
- Learn to use Managed Apache Beam (Dataflow) for bounded datasets (Batch)
- Use Managed Apache Airflow (Composer) to create an end-to-end Data Warehouse pipeline that extracts data from an OLTP database and performs ELT to load it into a Data Warehouse
- Learn to use visual no-code tools such as Fusion (CDAP) and Dataprep to preprocess Data
- Learn to understand what type of database to use under what circumstances
- Learn to use Managed SQL
- Learn to use Google’s proprietary one-of-a-kind horizontally scalable OLTP database Cloud Spanner
- Learn to use Real time databases
Fineprint
- The topics presented are tentative and we reserve the right to add or remove a topic to update or improve the bootcamp, or for a technical or time reasons.
- † 18% Indian taxes extra.