Complete Data Engineering With Azure - Basic To Advance (Self Paced)


  • Image Icon

    LanguageEnglish

  • Image Icon

    Mode Recorded

  • Image Icon

    Validity 1 Year

  • Image Icon

    Duration 100+ Hours

  • Image Icon

    Sessions 35

  • Image Icon

    Projects 11


Shape Images
Shape Images
Shape Images
INR 6000
This course includes
  • 100+ hours content
  • 11 Industry level real world projects
  • Learn while solving real life examples in live lectures
  • BigData Projects on AZURE Cloud
  • Projects will be covered from scratch
  • Hands on exercises & quizzes after each module
  • Career & Interview Preparation guidance session
  • Resume Preparation session
  • Linkedin Profile Making
  • Certificate of completion
Show More

Tech stack you'll learn

  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image

Course Content

  • Class - 1
    • What is Database?
    • Difference between Transactional Databases and NoSQL databases
    • What is DBMS & RDBMS?
    • Transactions & ACID Properties
    • Setup MySQL Workbench
    • Setup MySQL Using Docker
    • DDL, DML, DQL, DCL
    • CREATE Command
    • INSERT Command
    • Integrity Constraints

  • Class - 2
    • Alter Command
    • Drop, Truncate and Delete
    • Primary Key vs Foreign Key
    • Referential Integrity
    • Select Query, In-Built Functions, Aliases
    • UPDATE Command
    • Auto Increment in create table
    • Limit
    • Order By Clause
    • Conditional Operators
    • Logical Operators
    • Like Operation
    • User Defined Functions (UDFs)

  • Class - 3
    • IS NULL, IS NOT NULL
    • Group By, Having Clause
    • Group Concat, Group RollUP
    • Sub Queries, IN and NOT IN
    • CASE-When
    • SQL Joins

  • Class - 4
    • Exists and Not Exists
    • Window Functions
    • Frame Clause
    • Coalesce Function
    • Common Table Expressions - Iterative and Recursive

  • Class - 1
    • BigData Fundamentals
    • 5 Vā€™s of BigData
    • Distributed Computation
    • Distributed Storage
    • Cluster, Commodity Hardware
    • File Formats
    • Types of Data
    • History of Hadoop
    • Hadoop Architecture & Components

  • Class - 2
    • Map-Reduce Architecture
    • YARN Architecture

  • Class - 1
    • Hive Complete Architecture
    • Hadoop Cluster Setup on GCP (Dataproc)

  • Class - 2
    • Data Types in Hive
    • Create Database
    • Create Table
    • Load Data From Local
    • Load Data From HDFS
    • Internal Table
    • External Table
    • Array & Map Data Types
    • SerDe in Hive
    • File Formats in Hive - ORC, Parquet, Avro

  • Class - 3
    • CSV SerDe
    • JSON SerDe
    • Parquet SerDe
    • ORC SerDe
    • Static Partitioning
    • Dynamic Partitioning
    • Bucketing
    • Map-Side Join, Bucket Map Join, Sorted Merge Join, Skew Join

  • Class - 1
    • Kafka Cluster Architecture
    • Brokers
    • Topics
    • Partitions
    • Producer-Consumer, Consumer Group
    • Offset Management
    • Replicas
    • Commits
    • Sync & Async Commits

  • Class - 2
    • Confluent Kafka Setup
    • Topic Creation
    • Schema Registry
    • Key, Value Message
    • Message in Kafka Topics based on Random and Constant Keys
    • Kafka Producer Code with Serialisation
    • Kafka Consumer Code with De-Serialization
    • Consumer Groups
    • Working with JSON, CSV Data
    • GCP Pub-Sub Setup
    • Producer & Consumer for GCP Pub-Sub Setup

  • Class - 1
    • CAP Theorem
    • What is MongoDB and MongoDB Atlas?
    • MongoDB vs Relational Database
    • MongoDB features
    • MongoDB use cases and applications
    • MongoDB architecture
    • Node
    • Data Centre
    • Cluster
    • Data replication
    • Write operation
    • Read operation
    • Indexing

  • Class - 2
    • MongoDB Atlas Setup
    • Understanding different ways to communicate with MongoDB
    • Query data from MongoDB tables
    • Queries on MongoDB using Python
    • Streaming data pipeline setup using Kafka Connect where Source is Kafka and MongoDB is destination

  • Class - 1
    • CAP Theorem
    • What is Apache Cassandra?
    • Cassandra Database vs Relational Database
    • Apache Cassandra features
    • Cassandra use cases and applications
    • Cassandra architecture
    • Node
    • Data Centre
    • Cluster
    • Commit log
    • Mem-table
    • SSTable
    • Data replication
    • Read operation

  • Class - 2
    • Data Partitioning and Token
    • VNodes in Cassandra
    • Read Operation in Cassandra
    • Compaction in Cassandra
    • Gossip Protocol in Cassandra
    • Write consistency in Cassandra
    • Read consistency in Cassandra
    • Partition Key, Cluster Key, Row Key Declaration
    • Cassandra Setup Using Docker
    • CQL in Cassandra
    • Cassandra Free Tier Setup On DataStax
    • Queries in Cassandra using Python

  • Class - 1
    • Problems with Hadoop Map-Reduce
    • What is Apache Spark?
    • Features of Spark
    • Spark ecosystem
    • RDD in Spark
    • Properties of RDD
    • How Spark perform data partitioning?
    • Transformation in Spark
    • Narrow Transformation vs Wide Transformation
    • Action in Spark
    • Read & Write operation in Spark are transformation or action?
    • Lazy evaluation in Spark
    • Lineage graph or DAG in Spark
    • How DAG looks on Spark Web UI?
    • Job, Stage and Task in Spark
    • What if Spark cluster capacity is less than the size of data to be processed?
    • Spark in-depth architecture and it's components
    • Spark with Standalone Cluster Manager Type
    • Spark with YARN Cluster Manager Type
    • Deployment modes of Spark Application
    • How DAG looks on Spark Web UI?
    • Internals of Spark Job over the cluster

  • Class - 2
    • Persist and Caching in Spark
    • Storage Levels in Persist
    • How does data skewness occur in Spark?
    • Techniques to deal with data skewness
    • Repartition vs Coalesce
    • Example of Key Salting technique
    • RDD vs Dataframe vs Dataset
    • How to use Spark-Submit utility?
    • Memory management in Spark
    • Memory components in Executor Container
    • Dynamic occupancy mechanism
    • How to process 1 TB of data in Spark?
    • Resource allocation case study - 1 : 6 Nodes and each node have 16 cores & 64 GB RAM
    • Resource allocation case study - 2 : 6 Nodes and each node have 32 cores & 64 GB RAM
    • Resource allocation case study - 3 : When more memory isn't required for the executors
    • Broadcast and Accumulators in Spark
    • Different type of failures in Spark and how to resolve them
    • Out Of Memory failures
    • Code and Resource level optimizations in Spark
    • Best practices to design Spark Applications

  • Class - 3
    • Spark Cluster Setup On GCP Dataproc
    • PySpark Core and Dataframe Operations
    • PySpark SQL
    • Read & Write from Hive tables using PySpark

  • Class - 4
    • Execution of Spark application using Spark-Submit Utility
    • Monitor, Debug & Understand Spark Dag on Spark Web UI - Practical Example
    • What is Stream Processing?
    • Spark structured streaming
    • Spark streaming with word count example
    • Output modes in writeStream in Spark structured streaming
    • What if memory due to state management is full?
    • DStream vs Spark Structured Streaming
    • Spark structured streaming with File as source
    • Triggers in Spark structured streaming

  • Class - 5
    • Checkpointing
    • Exactly once in spark structured streaming
    • Stateless and Stateful Processing
    • Global aggregation and Windowed aggregation
    • Windowing
    • Sliding window
    • Tumbling window vs Sliding window
    • When and why we should use windowing?
    • Windowed aggregations example
    • Arbitrary stateful transformations
    • Watermarking
    • Working example of handling delayed events using watermarking
    • Code implementation for Stateless spark structured streaming with source as Confluent Kafka Topic
    • Code implementation for Stateful spark structured streaming with source as Confluent Kafka Topic - Global aggregation and Windowed aggregation
    • Spark structure streaming pipeline implementation where Source is Confluent Kafka Topic and Destination is MongoDB

  • Class - 1
    • What is Databricks?
    • Databricks Architecture, Delta Lake & Delta Tables
    • Databricks Account Setup on GCP
    • Workspace setup
    • Compute in Databricks & Spark cluster setup
    • Unity catalog
    • Spark Job Execution on Databricks Cluster
    • Workflow Creation in Databricks
    • Project - Incremental Logistics Data Ingestion and perform merge operation in Delta tables

  • Class - 2
    • Project - 1: Real time healthcare data processing with DLT (Delta Live Tables) in Databricks
    • Tech Stack:
      • PySpark
      • Databricks
      • Delta Tables
      • Databricks DLT Workflow
    • Project - 2: Booking.com incremental SCD2 Merge ingestion
    • Tech Stack:
      • PySpark
      • Databricks
      • Delta Tables
      • Databricks DLT Workflow
      • PyDeequ

  • Class - 1
    • What is orchestration in BigData?
    • Need of dependency management in Data Pipeline design
    • What is Airflow?
    • Architecture & Different Components of Airflow
    • Operators in Airflow
    • How to write Airflow DAG Scripts?
    • Attribute description
    • How to execute parallel tasks?

  • Class - 2
    • Setup Airflow on GCP using Composer
    • Create and schedule Airflow dag with sequential tasks using BashOperator and PythonOperator
    • Create and schedule Airflow dag with parallel tasks using BashOperator and PythonOperator
    • Airflow Project - 1 : End-To-End Airflow dag to Create, Run PySpark Job and Destroy GCP Dataproc cluster
    • Airflow Project - 2 : Airflow dag to use user defined variables and pass external config parameters

  • Class - 1
    • OLAP vs OLTP
    • What is a Data Warehouse?
    • Difference between Data Warehouse, Data Lake and Data Mart
    • Fact Tables
    • Dimension Tables
    • Slowly changing Dimensions
    • Types of SCDs
    • Star Schema Design
    • Snowflake Schema Design
    • Galaxy Schema Design

  • Class - 2
    • Uber Data Warehouse Design Case Study
    • AirBnB Data Warehouse Design Case Study

  • Class - 1
    • Snowflake free tier account setup
    • Snowflake UI walkthrough
    • Load data from UI and create snowflake
    • Event driven data ingestion in snowflake table using SnowPipe (Tech Stack Used : Google Storage Bucket, GCP Pub-Sub, Snowflake)
    • How to create and schedule task in snowflake

  • Class - 2
    • Project - 1: News Data Analysis with event driven incremental load in Snowflake table
    • Tech Stack:
      • Airflow
      • Google Cloud Storage
      • Python
      • Snowflake
    • Project - 2: Ecommerce CDC data real time aggregation in Snowflake Dynamic Table
    • Tech Stack:
      • Python
      • Snowflake Dynamic Table
    • Project - 3: Car rental data batch ingestion with SCD2 merge in snowflake table
    • Tech Stack:
      • Python
      • PySpark
      • GCP Dataproc
      • Snowflake
      • Airflow

  • Class - 1
    • BigQuery Overview
    • BigQuery Architecture
    • Capacitor ā€” Columnar format
    • Colossus ā€” Storage
    • Dremel ā€” Execution Engine
    • Borg ā€” Compute
    • Jupiter ā€” Network
    • Project - 1: IRCTC Streaming data ingestion into BigQuery
    • Tech Stack:
      • Python
      • GCP Storage
      • GCP Pub-Sub
      • BigQuery
      • Dataflow
    • Project - 2: Walmart data ingestion into BigQuery
    • Tech Stack:
      • Python
      • Airflow
      • GCP Storage
      • BigQuery

  • AZURE Services Covered
    • Azure Blob Storage, Azure Functions, Azure Role-Based Access Control (RBAC), Azure Virtual Machines (VMs), Azure Logic Apps, Azure Service Bus (Queue, Topics), Azure Key Vault
    • Azure Stream Data Analytics, Azure Event Hubs, Azure Cosmos DB, Azure Data Factory, Azure SQL Database, Azure Synapse, Azure Delta Lake, Azure DevOPS

  • Class - 1
    • Setup Free Trial Azure Cloud Account
    • Setup VSCode with Azure Extensions
    • Azure Blob Storage (Storage Account, Container, Blob)
    • Azure CLI Setup to interact with Azure Services
    • Read data from Azure Blob using Python
    • Azure Function & Azure Function App
    • HTTP Trigger based Azure Function
    • Deployment of Azure Function in Azure Function app from VSCode

  • Class - 2
    • Implementation and Deployment of Azure Function with Blob Trigger
    • Azure Service Bus basics - Topic & Queue
    • Azure Service Bus Setup
    • Topic creation in Azure Service Bus
    • Write messages in Topic from Azure Console
    • Subscriber setup for Topic and Read (Peek and Receive Mode) messages from Azure Console
    • Implementation of Azure Logic App where Azure Function will listen to Blob trigger and publish blob details into Topics, inside Logic App it will read message content from Topic and send customize email with attachment
    • Implementation and deployment of Azure function app with Service Bus Topic trigger along with publishing mock data in Service Bus Topic using Python code
    • Azure Service Bus Queue basics
    • Create Queue from Azure Console
    • Read & Write messages in Queue from Azure Console
    • Implementation and deployment of Azure function app with Service Bus Queue trigger along with publishing mock data in Service Bus Queue using Python code

  • Class - 3
    • Azure Virtual Machine Setup
    • SSH based login from terminal in VM
    • Azure SQL Database fundamentals
    • Setup Azure SQL database from Azure Portal
    • SQL commands for Azure SQL database using Query Editor
    • Azure Key-Vault basics
    • How to create and store secrets in Azure Key-Vault
    • Access & Manipulate data in SQL database using Python script (Access secrets from Key-Vault)
    • Azure Event Hub Fundamentals
    • Difference between Azure Event Hub & Kafka
    • Setup Azure event hub namespace from Azure Portal
    • Create event hub in namespace
    • Publish data in event hub using Python script
    • Create consumer group in event hub
    • Consume data from event hub using Python script
    • Azure Stream Analytics job basics
    • Create Inputs, Outputs & Query in Stream Analytics Job
    • Setup Stream Analytics job to read data from event hub in real time, apply window based aggregations and ingest data in SQL database

  • Class - 4
    • Azure CosmosDB Fundamentals
    • Setup CosmosDB Account from Azure Portal
    • Create dataset and containers in CosmosDB
    • Connect & Manipulate data in CosmosDB using Python
    • Azure Data Factory Fundamentals
    • Create first Azure Data Factory Namespace
    • Connect ADF with Github
    • Setup Linked Services for ADLS & Snowflake in ADF
    • Create CSV file and Snowflake table datasets
    • Create DataFlow in ADF
    • Perform source, filter, aggregate & sink transformations in ADF
    • Debug Dataflow
    • Create Pipeline in ADF with DataFlow activity
    • Manual trigger for ADF pipeline
    • Azure Synapse Analytics fundamentals and Architecture
    • Setup first Azure Synpase Analytics workspace
    • Different features of Synapse Studio
    • Work with in-built and dedicated SQL pools
    • SQL Database under workspace
    • Read external files using in-built SQL pool queries
    • Create external tables using in-built sql pool
    • Work with SQL worksheet and create Schema, Internal & External tables in dedicated sql pool
    • Setup Spark SQL Pool
    • Create PySpark notebook in Synapse and write data in delta tables in Lake Database
    • Create linked services for ADLS and CosmosDB in Synapse
    • Data ingestion pipeline in Synapse Analytics

  • Project - 1: Real-time Healthcare Data Processing with DLT (Delta Live Tables) in Databricks (Covered In Module 8)
    • Tech Stack:
      • PySpark
      • Databricks
      • Delta Tables
      • Databricks DLT Workflow
  • Project - 2: Booking.com Incremental SCD2 Merge Ingestion (Covered In Module 8)
    • Tech Stack:
      • PySpark
      • Databricks
      • Delta Tables
      • Databricks DLT Workflow
      • PyDeequ
  • Project - 3: News Data Analysis with Event-Driven Incremental Load in Snowflake Table (Covered In Module 11)
    • Tech Stack:
      • Airflow
      • Google Cloud Storage
      • Python
      • Snowflake
  • Project - 4: E-commerce CDC Data Real-time Aggregation in Snowflake Dynamic Table (Covered In Module 11)
    • Tech Stack:
      • Python
      • Snowflake
      • Dynamic Table
  • Project - 5: Car Rental Data Batch Ingestion with SCD2 Merge in Snowflake Table (Covered In Module 11)
    • Tech Stack:
      • Python
      • PySpark
      • GCP Dataproc
      • Snowflake
      • Airflow
  • Project - 6: IRCTC Streaming Data Ingestion into BigQuery (Covered In Module 12)
    • Tech Stack:
      • Python
      • GCP Storage
      • GCP Pub-Sub
      • BigQuery
      • Dataflow
  • Project - 7: Walmart Data Ingestion into BigQuery (Covered In Module 12)
    • Tech Stack:
      • Python
      • Airflow
      • GCP Storage
      • BigQuery
  • Project - 8: AirBnB CDC Ingestion Pipeline
    • Tech Stack:
      • AZURE
      • Python
      • ADLS
      • CosmosDB
      • Azure Data Factory
      • Azure Synapse
      • SQL
  • Project - 9: Fintech SQL Data migration into Azure Portal
    • Tech Stack:
      • Python
      • Azure SQL Database
      • SQL
      • Azure Synapse
      • ADLS
      • PySpark
      • Delta Tables
  • Project - 10: BookMyShow Online Ticket Booking Stream data processing
    • Tech Stack:
      • Python
      • EventHub
      • Azure Stream Analytics Job
      • SQL
      • Synapse DWH
  • Project - 11: Airlines Data Incremental data processing with CICD Process
    • Tech Stack:
      • ADLS
      • Azure Data Factory
      • Azure Synapse
      • Logic Apps
      • GitHub
      • Azure DevOPS

Course Schedule

Mode Of The Course:
Recorded
Course Duration:
100+ Hours
Total Sessions:
35
Total Project:
11
Class Recording Provided:
Yes
Course Validity:
1 year
Programming Language Used:
Python
Prerequisite:
āš ļø Important Notice :
The video may not work on Linux due to DRM restrictions. It is only accessible on Chrome when using Windows or macOS.

Workaround: To access the video on Linux, you can create a Windows virtual machine (VM) and watch the video through the VM. Alternatively, you can use our Android or iOS application to view the video on your mobile device.

Instructor

Shashank Mishra is a seasoned Data Engineer with over 6 years of industry experience, worked at leading companies like Amazon, PayTM, and McKinsey & Company.

More Similar Courses

Related Courses

Complete Data Engineering With Azure - Basic To Advance (Self Paced))
INR 6000



Namaste šŸ™

Welcome To Grow Data Skills !!!
Our chat support representative will respond within an hour.

Hello! How Can I Help You?
Ɨ

Enquiry