Grow Data SKills | Most Affordable Courses | Become Top Data Professional

Light
Dark

Complete Data Engineering With AWS - Basic To Advance

Data Engineering With AWS

By Shashank In Grow Data Skills

Self Paced
English
Only For Renewal

Demo Class Video

Data Engineering With AWS

1 Year

Validity (From the date of Enrollment)

160 Hours

Duration

Mode

Not For Sale

15

Projects

This course includes

Content Duration160 Hours
Total Video Sessions37
Total Industry Projects15
GCP & AWS Cloud
Quality Assignments & quizzes after each module
Interview preparation guide
Dedicated placement assistance
Resume & Linkedin profile making
Doubt support on private discord community
Certificate of completion

Show More

Tech stack you'll learn

Data Engineering With AWS

Course Content

✅ Class - 1

What is Database?
Difference between Transactional Databases and NoSQL databases
What is DBMS & RDBMS?
Transactions & ACID Properties
Setup MySQL Workbench
Setup MySQL Using Docker
DDL, DML, DQL, DCL
CREATE Command
INSERT Command
Integrity Constraints

✅ Class - 2

Alter Command
Drop, Truncate and Delete
Primary Key vs Foreign Key
Referential Integrity
Select Query, In-Built Functions, Aliases
UPDATE Command
Auto Increment in create table
Limit
Order By Clause
Conditional Operators
Logical Operators
Like Operation
User Defined Functions (UDFs)

✅ Class - 3

IS NULL, IS NOT NULL
Group By, Having Clause
Group Concat, Group RollUP
Sub Queries, IN and NOT IN
CASE-When
SQL Joins

✅ Class - 4

Exists and Not Exists
Window Functions
Frame Clause
Coalesce Function
Common Table Expressions - Iterative and Recursive

✅ Class - 1

BigData Fundamentals
5 V’s of BigData
Distributed Computation
Distributed Storage
Cluster, Commodity Hardware
File Formats
Types of Data
History of Hadoop
Hadoop Architecture & Components

✅ Class - 2

Map-Reduce Architecture
YARN Architecture

✅ Class - 1

Hive Complete Architecture
Hadoop Cluster Setup on GCP (Dataproc)

✅ Class - 2

Data Types in Hive
Create Database
Create Table
Load Data From Local
Load Data From HDFS
Internal Table
External Table
Array & Map Data Types
SerDe in Hive
File Formats in Hive - ORC, Parquet, Avro

✅ Class - 3

CSV SerDe
JSON SerDe
Parquet SerDe
ORC SerDe
Static Partitioning
Dynamic Partitioning
Bucketing
Map-Side Join, Bucket Map Join, Sorted Merge Join, Skew Join

✅ Class - 1

Kafka Cluster Architecture
Brokers
Topics
Partitions
Producer-Consumer, Consumer Group
Offset Management
Replicas
Commits
Sync & Async Commits

✅ Class - 2

Confluent Kafka Setup
Topic Creation
Schema Registry
Key, Value Message
Message in Kafka Topics based on Random and Constant Keys
Kafka Producer Code with Serialisation
Kafka Consumer Code with De-Serialization
Consumer Groups
Working with JSON, CSV Data
GCP Pub-Sub Setup
Producer & Consumer for GCP Pub-Sub Setup

✅ Class - 1

CAP Theorem
What is MongoDB and MongoDB Atlas?
MongoDB vs Relational Database
MongoDB features
MongoDB use cases and applications
MongoDB architecture
Node
Data Centre
Cluster
Data replication
Write operation
Read operation
Indexing

✅ Class - 2

MongoDB Atlas Setup
MongoDB Cluster Creation
MongoDB Compass Setup
Database & Collection in MongoDB
Connect with MongoDB Cluster from MongoDB Compass
Import JSON data in MongoDB Collection
Queries on MongoDB Collection from Python Application
KSQLdb in Confluent Kafka
Streams in KSQLdb
Tables in KSQLdb
Persistent Queries in KSQLdb
JOIN queries on streams in KSQLdb
McDonald's Payments Stream data ingestion from Kafka to MongoDB
Setup Orders & Payments Streams using KSQLdb
Setup windowed JOIN streams using KSQLdb
Setup MongoDB Sink Connector

✅ Class - 1

CAP Theorem
What is Apache Cassandra?
Cassandra Database vs Relational Database
Apache Cassandra features
Cassandra use cases and applications
Cassandra architecture
Node
Data Centre
Cluster
Commit log
Mem-table
SSTable
Data replication
Read operation

✅ Class - 2

Data Partitioning and Token
VNodes in Cassandra
Read Operation in Cassandra
Compaction in Cassandra
Gossip Protocol in Cassandra
Write consistency in Cassandra
Read consistency in Cassandra
Partition Key, Cluster Key, Row Key Declaration
Cassandra Setup Using Docker
CQL in Cassandra
Cassandra Free Tier Setup On DataStax
Queries in Cassandra using Python

✅ Class - 1

Problems with Hadoop Map-Reduce
What is Apache Spark?
Features of Spark
Spark ecosystem
RDD in Spark
Properties of RDD
How Spark perform data partitioning?
Transformation in Spark
Narrow Transformation vs Wide Transformation
Action in Spark
Read & Write operation in Spark are transformation or action?
Lazy evaluation in Spark
Lineage graph or DAG in Spark
How DAG looks on Spark Web UI?
Job, Stage and Task in Spark
What if Spark cluster capacity is less than the size of data to be processed?
Spark in-depth architecture and it's components
Spark with Standalone Cluster Manager Type
Spark with YARN Cluster Manager Type
Deployment modes of Spark Application
How DAG looks on Spark Web UI?
Internals of Spark Job over the cluster

✅ Class - 2

Persist and Caching in Spark
Storage Levels in Persist
How does data skewness occur in Spark?
Techniques to deal with data skewness
Repartition vs Coalesce
Example of Key Salting technique
RDD vs Dataframe vs Dataset
How to use Spark-Submit utility?
Memory management in Spark
Memory components in Executor Container
Dynamic occupancy mechanism
How to process 1 TB of data in Spark?
Resource allocation case study - 1 : 6 Nodes and each node have 16 cores & 64 GB RAM
Resource allocation case study - 2 : 6 Nodes and each node have 32 cores & 64 GB RAM
Resource allocation case study - 3 : When more memory isn't required for the executors
Broadcast and Accumulators in Spark
Different type of failures in Spark and how to resolve them
Out Of Memory failures
Code and Resource level optimizations in Spark
Best practices to design Spark Applications

✅ Class - 3

Spark Cluster Setup On GCP Dataproc
Spark Session creation
Create dataframe with custom schema
Read csv data from HDFS
Partitions and Partition size in Spark job
Select operation
withColumn operation
withColumnRenamed operation
Filter operation
Drop column operation
Drop Duplicates operation
Order By operation
Group By operation
Accumlator
Case-When operation
Window functions
Join & Broadcast join operation
Spark SQL, Register dataframe as table
Write CSV data in HDFS without partition key
Write Parquet data in HDFS
Write CSV data in HDFS with partition key
Write CSV data in HDFS with Coalesce
Read & Write from Hive tables using PySpark

✅ Class - 4

Execution of Spark application using Spark-Submit Utility
Monitor, Debug & Understand Spark Dag on Spark Web UI - Practical Example
What is Stream Processing?
Spark structured streaming
Spark streaming with word count example
Output modes in writeStream in Spark structured streaming
What if memory due to state management is full?
DStream vs Spark Structured Streaming
Spark structured streaming with File as source
Triggers in Spark structured streaming

✅ Class - 5

Checkpointing
Exactly once in spark structured streaming
Stateless and Stateful Processing
Global aggregation and Windowed aggregation
Windowing
Sliding window
Tumbling window vs Sliding window
When and why we should use windowing?
Windowed aggregations example
Arbitrary stateful transformations
Watermarking
Working example of handling delayed events using watermarking
Code implementation for Stateless spark structured streaming with source as Confluent Kafka Topic
Code implementation for Stateful spark structured streaming with source as Confluent Kafka Topic - Global aggregation and Windowed aggregation
Spark structure streaming pipeline implementation where Source is Confluent Kafka Topic and Destination is MongoDB

✅ Class - 1

What is Databricks?
Unity Catalog
Delta Lake & Delta Tables
Databricks Account Setup on GCP
Workspace Setup
Metastore Setup
Managed & External Catalog Setup
Volumes In Databricks
Databricks Cluster Setup
PySpark Notebook Setup
Read/Write from Databricks Volume in PySpark Notebook
Create Delta table and Write data in PySpark using DeltaTable Python API
Write partitioned data in Delta table
Read from Delta table in PySpark
Time travel in Delta table (Read from specific version or timestamp)

✅ Class - 2

Project 1 - Order Tracking Event Driven Data Ingestion (Industrial Project)

Tech Stack: Google Storage, PySpark, Databricks, Delta Lake, Databricks Workflows, GitHub
Project 2 - UPI Transactions Real Time CDC Feed Processing (Industrial Project)

Tech Stack - Databricks, Spark Structured Streaming, Delta Lake
Project 3 - Travel Bookings Data Ingestion Pipeline With SCD2 Merge(Industrial Project)

Tech Stack - Databricks, PySpark, Google Storage, Delta Lake, Databricks Workflows, PyDeequ
What is DLT in Databricks?
How to create materialized views & streaming tables with DLT pipeline?
How to setup DLT pipeline job?
Validation & Execution of DLT pipeline with lineage
Checkpointing in DLT pipeline
Project 4 - Healthcare Delta Live Table Pipeline with Medallion Architecture (Industrial Project)

Tech Stack - Databricks, PySpark, Delta Lake, Delta Live Table Job

✅ Class - 1

What is orchestration in BigData?
Need of dependency management in Data Pipeline design
What is Airflow?
Architecture & Different Components of Airflow
Operators in Airflow
How to write Airflow DAG Scripts?
Attribute description
How to execute parallel tasks?

✅ Class - 2

Setup Airflow on GCP using Composer
Create and schedule Airflow Dag with sequential tasks using BashOperator and PythonOperator
Create and schedule Airflow dag with parallel tasks using BashOperator and PythonOperator
Airflow Exercise - 1 : End-To-End Airflow Dag to Create Dataproc Cluster, Run PySpark Job on cluster and Delete GCP Dataproc cluster
Airflow Exercise - 2 : Airflow Dag to support data backfilling via parameterized date inputs and use if Variables in Airflow
Project 1 - Flight Booking Data Pipeline with Airflow & CICD (Industrial Project)

Tech Stack - GitHub, GitHub Actions, Google Storage, PySpark, Dataproc Serverless, Airflow, BigQuery

✅ Class - 1

OLAP vs OLTP
What is a Data Warehouse?
Difference between Data Warehouse, Data Lake and Data Mart
Fact Tables
Dimension Tables
Slowly changing Dimensions
Types of SCDs
Star Schema Design
Snowflake Schema Design
Galaxy Schema Design

✅ Class - 2

Case Study - 1: Uber Data Warehouse Design Case Study
Case Study - 2: AirBnB Data Warehouse Design Case Study

✅ Class - 1

Snowflake free tier account setup
Snowflake UI walkthrough
Load data from UI and create snowflake table
Hands On - Event driven data ingestion in snowflake table using SnowPipe

Tech Stack Used : Google Storage Bucket, GCP Pub-Sub, Snowflake
How to create and schedule task in snowflake

✅ Class - 2

Project - 1: News Data Analysis with event driven incremental load in Snowflake table(Industrial Project)

Tech Stack: Airflow, Google Cloud Storage, Python, Snowflake
Project - 2: Movie Booking CDC data real time aggregation in Snowflake Dynamic Table(Industrial Project)

Tech Stack: Python, Snowflake Dynamic Table, Snowflake Stream, Snowflake Tasks, Streamlit
Project - 3: Car rental data batch ingestion with SCD2 merge in snowflake table(Industrial Project)

Tech Stack: Python, PySpark, GCP Dataproc, Airflow, Snowflake

✅ Class - 3 (Recorded)

BigQuery Overview
BigQuery Architecture
Capacitor — Columnar format
Colossus — Storage
Dremel — Execution Engine
Borg — Compute
Jupiter — Network
Project - 1: IRCTC Streaming data ingestion into BigQuery(Industrial Project)

Tech Stack: Python GCP Storage, GCP Pub-Sub, BigQuery, Dataflow
Project - 2: Walmart data ingestion into BigQuery(Industrial Project)

Tech Stack: Python GCP Storage, Airflow, BigQuery

✅ Class - 1

BigQuery Overview
BigQuery Architecture
Capacitor — Columnar format
Colossus — Storage
Dremel — Execution Engine
Borg — Compute
Jupiter — Network
Project - 1: IRCTC Streaming data ingestion into BigQuery

Tech Stack: Python GCP Storage, GCP Pub-Sub, BigQuery, Dataflow
Project - 2: Walmart data ingestion into BigQuery

Tech Stack: Python GCP Storage, Airflow, BigQuery

AWS Services Covered

Event Bridge Scheduler, Event Bridge Pipe, Kinesis, Kinesis Firehose, DynamoDB, SNS, SQS

S3, Lambda, IAM, CLOUDWATCH, EC2, SNS, SQS

Step Function, EMR, GLUE, RDS, ATHENA, REDSHIFT

✅ Class - 1

AWS Free Tier Account Setup
AWS Console Walkthrough
S3 Bucket Creation
AWS CLI Setup
IAM User Setup
Access S3 Buckets using AWS CLI
S3 Bucket ARN
AWS Lambda Basics
Create Hello World Lambda function with Python
Execution and Testing of Lambda Function
Trigger Lambda Function with S3 Create Object Notification
Deployment of Lambda Functions with other dependencies
How to create and use Layers in Lambda

✅ Class - 2

Read data from S3 file in Lambda Function with event driven notification & boto3 library
AWS SNS Basics
Create topics in SNS
Setup Email subscription of SNS topic
S3 Create object notification to SNS topic
Publish custom messages in SNS topic from Lambda function
AWS SQS Basics
SQS vs Kafka
Create SQS in AWS
Send and Receive messages in SQS
Read stream of messages in Lambda Function from SQS
AWS Event Bridge & Event Bridge Pipe
Scheduled trigger of Lambda function using event Bridge
Event bridge pipe to read stream of data from SQS and send to Lambda function with intermediate filters

✅ Class - 3

Create EC2 instance in AWS
SSH in EC2 machine from terminal
AWS RDS
Setup MySQL database with AWS RDS
Login & Access MySQL Database from terminal
Connect and manipulate data in MySQL database using Python
AWS Athena Basics
Athena vs Spark
Create & Query Athena Tables
Setup Datasources in AWS Glue Catalog
Table metadata preparation with AWS Glue Crawler
Run Athena queries from Lambda Function

✅ Class - 4

Crawl partitioned data in S3 with Glue Crawler
Read partitioned data from S3 in Athena
AWS Redshift fundamentals & architecture
Setup Redshift cluster
Table operations on sample data in redshift
Load data from S3 into Redshift table
Unload query command in Redshift
Unload data from Redshift into S3 with Manifest file
Create external table in Redshift
Materialized views in Redshift
AWS Glue fundamentals & components
AWS Glue Catalog & Glue Crawler
Setup Redshift connector in Glue
Data pipeline using AWS Glue Visualizer with S3 as Source and Redshift as Destination
AWS Glue job execution and insights

Project - 1: Order Tracking Event Driven Data Ingestion (Covered In Module 8)

Tech Stack: Google Storage, PySpark, Databricks, Delta Lake, Databricks Workflows, GitHub

Project - 2: UPI Transactions Real Time CDC Feed Processing (Covered In Module 8)

Tech Stack: Databricks, Spark Structured Streaming, Delta Lake

Project - 3: Travel Bookings Data Ingestion Pipeline With SCD2 Merge (Covered In Module 8)

Tech Stack: Databricks, PySpark, Delta Lake, Delta Live Table Job

Project - 4: Healthcare Delta Live Table Pipeline with Medallion Architecture (Covered In Module 8)

Tech Stack: Databricks, PySpark, Delta Lake, Delta Live Table Job

Project - 5: Flight Booking Data Pipeline with Airflow & CICD (Covered In Module 9)

Tech Stack: GitHub, GitHub Actions, Google Storage, PySpark, Dataproc Serverless, Airflow, BigQuery
Project - 6: News Data Analysis with Event-Driven Incremental Load in Snowflake Table (Covered In Module 11)

Tech Stack: Airflow, Google Cloud Storage, Python, Snowflake
Project - 7: Movie Booking CDC data real time aggregation in Snowflake Dynamic Table (Covered In Module 11)

Tech Stack: Python, Snowflake Dynamic Table, Snowflake Stream, Snowflake Tasks, Streamlit
Project - 8: Car Rental Data Batch Ingestion with SCD2 Merge in Snowflake Table (Covered In Module 11)

Tech Stack: Python, PySpark, GCP Dataproc, Airflow, Snowflake
Project - 9: IRCTC Streaming Data Ingestion into BigQuery (Covered In Module 12)

Tech Stack: Python, GCP Storage, GCP Pub-Sub, BigQuery, Dataflow
Project - 10: Walmart Data Ingestion into BigQuery (Covered In Module 12)

Tech Stack: Python, Airflow, GCP Storage, BigQuery
Project - 11: Quality Movie Data Analysis

Tech Stack: S3, Glue Crawler, Glue Catalog, Glue Catalog Data Quality, Glue Low Code ETL (With PySpark), Redshift, Event Bridge, SNS
Project - 12: Gadget Sales Data Projection

Tech Stack: Python, DynamoDB, DynamoDB Streams, Kinesis Streams, Event Bridge Pipe, Kinesis Firehose, S3, Lambda, Athena
Project - 13: Airline Data Ingestion

Tech Stack: S3, S3 Cloudtrail Notification, Event Bridge Pattern Rule, Glue Crawler, Glue Visual ETL (With PySpark), SNS, Redshift, Step Function
Project - 14: Logistics Data Warehouse Management

Tech Stack: GCP Storage, Airflow (GCP Composer), Hive Operators, PySpark With GCP Dataproc, Hive
Project - 15: Sales Order & Payment Data Real Time Ingestion

Tech Stack: GCP Pub-Sub, Python, Docker, Cassandra

Attention Seeking Resume Preparation and Interview Strategies
Strategies To Crack Tech Interviews
Linkedin Profile Making
How To Expand Your Professional Network On Linkedin
How To Use Various Job Portals
How To Approach For Referrals

Course Schedule

Mode of the Course:

Recorded

Course Duration:

160 Hours

Total Sessions:

37

Total Projects:

15

Validity:

1 Year (From the date of enrollment)

Programming Language Used:

Python

Prerequisite:

Python

⚠️ Important Notice :

The video may not work on Linux due to DRM restrictions. It is only accessible on Chrome when using Windows or macOS.

Workaround: To access the video on Linux, you can create a Windows virtual machine (VM) and watch the video through the VM. Alternatively, you can use our Android or iOS application to view the video on your mobile device.

Instructor

Data Engineering With AWS

Shashank Mishra

Shashank Mishra is a seasoned Data Engineer with over 7 years of experience at top companies like Expedia, Amazon, PayTM, and McKinsey & Company. He specializes in Big Data, Cloud, and architecting scalable data pipelines across industries. A proud MCA graduate from NIT Allahabad, Shashank is passionate about sharing his expertise. Through his YouTube channel, E-Learning Bridge (177k+ subscribers), and LinkedIn (175k+ followers), he has mentored over 14,000 aspiring data professionals, helping them launch successful careers in Data Engineering.

Complete Data Engineering With AWS - Basic To Advance (Not For Sale)

Copyright © 2026 Regex Data Learning Pvt Ltd. All Rights Reserved.