Complete Data Engineering With AWS Basic To Advance (Self Paced)


  • Image Icon

    LanguageEnglish

  • Image Icon

    Mode Recorded

  • Image Icon

    Validity 1 Year

  • Image Icon

    Duration 100+ Hours

  • Image Icon

    Sessions 37

  • Image Icon

    Projects 12


Shape Images
Shape Images
Shape Images
INR 5800
This course includes
  • 37 Video Sessions
  • Content Duration - 100+ Hours
  • 12 Industry Projects
  • BigData Projects on AWS Cloud
  • Hands on exercises & quizzes after each module
  • Career & Interview Preparation guidance session
  • Resume Preparation Session
  • Linkedin Profile Making
  • Discord Community Access
  • Certificate of completion
Show More

Tech stack you'll learn

  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image
  • Brand Image

Course Content

  • Class - 1
    • What is Database?
    • Difference between Transactional Databases and NoSQL databases
    • What is DBMS & RDBMS?
    • Transactions & ACID Properties
    • Setup MySQL Workbench
    • Setup MySQL Using Docker
    • DDL, DML, DQL, DCL
    • CREATE Command
    • INSERT Command
    • Integrity Constraints

  • Class - 2
    • Alter Command
    • Drop, Truncate and Delete
    • Primary Key vs Foreign Key
    • Referential Integrity
    • Select Query, In-Built Functions, Aliases
    • UPDATE Command
    • Auto Increment in create table
    • Limit
    • Order By Clause
    • Conditional Operators
    • Logical Operators
    • Like Operation
    • User Defined Functions (UDFs)

  • Class - 3
    • IS NULL, IS NOT NULL
    • Group By, Having Clause
    • Group Concat, Group RollUP
    • Sub Queries, IN and NOT IN
    • CASE-When
    • SQL Joins

  • Class - 4
    • Exists and Not Exists
    • Window Functions
    • Frame Clause
    • Coalesce Function
    • Common Table Expressions - Iterative and Recursive

  • Class - 1
    • BigData Fundamentals
    • 5 Vā€™s of BigData
    • Distributed Computation
    • Distributed Storage
    • Cluster, Commodity Hardware
    • File Formats
    • Types of Data
    • History of Hadoop
    • Hadoop Architecture & Components

  • Class - 2
    • Map-Reduce Architecture
    • YARN Architecture

  • Class - 1
    • Hive Complete Architecture
    • Hadoop Cluster Setup on GCP (Dataproc)

  • Class - 2
    • Data Types in Hive
    • Create Database
    • Create Table
    • Load Data From Local
    • Load Data From HDFS
    • Internal Table
    • External Table
    • Array & Map Data Types
    • SerDe in Hive
    • File Formats in Hive - ORC, Parquet, Avro

  • Class - 3
    • CSV SerDe
    • JSON SerDe
    • Parquet SerDe
    • ORC SerDe
    • Static Partitioning
    • Dynamic Partitioning
    • Bucketing
    • Map-Side Join, Bucket Map Join, Sorted Merge Join, Skew Join

  • Class - 1
    • Kafka Cluster Architecture
    • Brokers
    • Topics
    • Partitions
    • Producer-Consumer, Consumer Group
    • Offset Management
    • Replicas
    • Commits
    • Sync & Async Commits

  • Class - 2
    • Confluent Kafka Setup
    • Topic Creation
    • Schema Registry
    • Key, Value Message
    • Message in Kafka Topics based on Random and Constant Keys
    • Kafka Producer Code with Serialisation
    • Kafka Consumer Code with De-Serialization
    • Consumer Groups
    • Working with JSON, CSV Data
    • GCP Pub-Sub Setup
    • Producer & Consumer for GCP Pub-Sub Setup

  • Class - 1
    • CAP Theorem
    • What is MongoDB and MongoDB Atlas?
    • MongoDB vs Relational Database
    • MongoDB features
    • MongoDB use cases and applications
    • MongoDB architecture
    • Node
    • Data Centre
    • Cluster
    • Data replication
    • Write operation
    • Read operation
    • Indexing

  • Class - 2
    • MongoDB Atlas Setup
    • Understanding different ways to communicate with MongoDB
    • Query data from MongoDB tables
    • Queries on MongoDB using Python
    • Streaming data pipeline setup using Kafka Connect where Source is Kafka and MongoDB is destination

  • Class - 1
    • CAP Theorem
    • What is Apache Cassandra?
    • Cassandra Database vs Relational Database
    • Apache Cassandra features
    • Cassandra use cases and applications
    • Cassandra architecture
    • Node
    • Data Centre
    • Cluster
    • Commit log
    • Mem-table
    • SSTable
    • Data replication
    • Read operation

  • Class - 2
    • Data Partitioning and Token
    • VNodes in Cassandra
    • Read Operation in Cassandra
    • Compaction in Cassandra
    • Gossip Protocol in Cassandra
    • Write consistency in Cassandra
    • Read consistency in Cassandra
    • Partition Key, Cluster Key, Row Key Declaration
    • Cassandra Setup Using Docker
    • CQL in Cassandra
    • Cassandra Free Tier Setup On DataStax
    • Queries in Cassandra using Python

  • Class - 1
    • Problems with Hadoop Map-Reduce
    • What is Apache Spark?
    • Features of Spark
    • Spark ecosystem
    • RDD in Spark
    • Properties of RDD
    • How Spark perform data partitioning?
    • Transformation in Spark
    • Narrow Transformation vs Wide Transformation
    • Action in Spark
    • Read & Write operation in Spark are transformation or action?
    • Lazy evaluation in Spark
    • Lineage graph or DAG in Spark
    • How DAG looks on Spark Web UI?
    • Job, Stage and Task in Spark
    • What if Spark cluster capacity is less than the size of data to be processed?
    • Spark in-depth architecture and it's components
    • Spark with Standalone Cluster Manager Type
    • Spark with YARN Cluster Manager Type
    • Deployment modes of Spark Application
    • How DAG looks on Spark Web UI?
    • Internals of Spark Job over the cluster

  • Class - 2
    • Persist and Caching in Spark
    • Storage Levels in Persist
    • How does data skewness occur in Spark?
    • Techniques to deal with data skewness
    • Repartition vs Coalesce
    • Example of Key Salting technique
    • RDD vs Dataframe vs Dataset
    • How to use Spark-Submit utility?
    • Memory management in Spark
    • Memory components in Executor Container
    • Dynamic occupancy mechanism
    • How to process 1 TB of data in Spark?
    • Resource allocation case study - 1 : 6 Nodes and each node have 16 cores & 64 GB RAM
    • Resource allocation case study - 2 : 6 Nodes and each node have 32 cores & 64 GB RAM
    • Resource allocation case study - 3 : When more memory isn't required for the executors
    • Broadcast and Accumulators in Spark
    • Different type of failures in Spark and how to resolve them
    • Out Of Memory failures
    • Code and Resource level optimizations in Spark
    • Best practices to design Spark Applications

  • Class - 3
    • Spark Cluster Setup On GCP Dataproc
    • PySpark Core and Dataframe Operations
    • PySpark SQL
    • Read & Write from Hive tables using PySpark

  • Class - 4
    • Execution of Spark application using Spark-Submit Utility
    • Monitor, Debug & Understand Spark Dag on Spark Web UI - Practical Example
    • What is Stream Processing?
    • Spark structured streaming
    • Spark streaming with word count example
    • Output modes in writeStream in Spark structured streaming
    • What if memory due to state management is full?
    • DStream vs Spark Structured Streaming
    • Spark structured streaming with File as source
    • Triggers in Spark structured streaming

  • Class - 5
    • Checkpointing
    • Exactly once in spark structured streaming
    • Stateless and Stateful Processing
    • Global aggregation and Windowed aggregation
    • Windowing
    • Sliding window
    • Tumbling window vs Sliding window
    • When and why we should use windowing?
    • Windowed aggregations example
    • Arbitrary stateful transformations
    • Watermarking
    • Working example of handling delayed events using watermarking
    • Code implementation for Stateless spark structured streaming with source as Confluent Kafka Topic
    • Code implementation for Stateful spark structured streaming with source as Confluent Kafka Topic - Global aggregation and Windowed aggregation
    • Spark structure streaming pipeline implementation where Source is Confluent Kafka Topic and Destination is MongoDB

  • Class - 1
    • What is Databricks?
    • Databricks Architecture, Delta Lake & Delta Tables
    • Databricks Account Setup on GCP
    • Workspace setup
    • Compute in Databricks & Spark cluster setup
    • Unity catalog
    • Spark Job Execution on Databricks Cluster
    • Workflow Creation in Databricks
    • Project - Incremental Logistics Data Ingestion and perform merge operation in Delta tables

  • Class - 2
    • Project - 1: Real time healthcare data processing with DLT (Delta Live Tables) in Databricks
    • Tech Stack:
      • PySpark
      • Databricks
      • Delta Tables
      • Databricks DLT Workflow
    • Project - 2: Booking.com incremental SCD2 Merge ingestion
    • Tech Stack:
      • PySpark
      • Databricks
      • Delta Tables
      • Databricks DLT Workflow
      • PyDeequ

  • Class - 1
    • What is orchestration in BigData?
    • Need of dependency management in Data Pipeline design
    • What is Airflow?
    • Architecture & Different Components of Airflow
    • Operators in Airflow
    • How to write Airflow DAG Scripts?
    • Attribute description
    • How to execute parallel tasks?

  • Class - 2
    • Setup Airflow on GCP using Composer
    • Create and schedule Airflow dag with sequential tasks using BashOperator and PythonOperator
    • Create and schedule Airflow dag with parallel tasks using BashOperator and PythonOperator
    • Airflow Project - 1 : End-To-End Airflow dag to Create, Run PySpark Job and Destroy GCP Dataproc cluster
    • Airflow Project - 2 : Airflow dag to use user defined variables and pass external config parameters

  • Class - 1
    • OLAP vs OLTP
    • What is a Data Warehouse?
    • Difference between Data Warehouse, Data Lake and Data Mart
    • Fact Tables
    • Dimension Tables
    • Slowly changing Dimensions
    • Types of SCDs
    • Star Schema Design
    • Snowflake Schema Design
    • Galaxy Schema Design

  • Class - 2
    • Uber Data Warehouse Design Case Study
    • AirBnB Data Warehouse Design Case Study

  • Class - 1
    • Snowflake free tier account setup
    • Snowflake UI walkthrough
    • Load data from UI and create snowflake
    • Event driven data ingestion in snowflake table using SnowPipe (Tech Stack Used : Google Storage Bucket, GCP Pub-Sub, Snowflake)
    • How to create and schedule task in snowflake

  • Class - 2
    • Project - 1: News Data Analysis with event driven incremental load in Snowflake table
    • Tech Stack:
      • Airflow
      • Google Cloud Storage
      • Python
      • Snowflake
    • Project - 2: Ecommerce CDC data real time aggregation in Snowflake Dynamic Table
    • Tech Stack:
      • Python
      • Snowflake Dynamic Table
    • Project - 3: Car rental data batch ingestion with SCD2 merge in snowflake table
    • Tech Stack:
      • Python
      • PySpark
      • GCP Dataproc
      • Snowflake
      • Airflow

  • Class - 1
    • BigQuery Overview
    • BigQuery Architecture
    • Capacitor ā€” Columnar format
    • Colossus ā€” Storage
    • Dremel ā€” Execution Engine
    • Borg ā€” Compute
    • Jupiter ā€” Network
    • Project - 1: IRCTC Streaming data ingestion into BigQuery
    • Tech Stack:
      • Python
      • GCP Storage
      • GCP Pub-Sub
      • BigQuery
      • Dataflow
    • Project - 2: Walmart data ingestion into BigQuery
    • Tech Stack:
      • Python
      • Airflow
      • GCP Storage
      • BigQuery

  • AWS Services Covered
    • S3, Lambda, IAM, CLOUDWATCH, EC2, SNS, SQS
    • Event Bridge Scheduler, Event Bridge Pipe, Kinesis, Kinesis Firehose, DynamoDB, SNS, SQS
    • Step Function, EMR, GLUE, RDS, ATHENA, REDSHIFT
  • Class - 1
    • AWS Free Tier Account Setup
    • AWS Console Walkthrough
    • S3 Bucket Creation
    • AWS CLI Setup
    • IAM User Setup
    • Access S3 Buckets using AWS CLI
    • S3 Bucket ARN
    • AWS Lambda Basics
    • Create Hello World Lambda function with Python
    • Execution and Testing of Lambda Function
    • Trigger Lambda Function with S3 Create Object Notification
    • Deployment of Lambda Functions with other dependencies
    • How to create and use Layers in Lambda
  • Class - 2
    • Read data from S3 file in Lambda Function with event driven notification & boto3 library
    • AWS SNS Basics
    • Create topics in SNS
    • Setup Email subscription of SNS topic
    • S3 Create object notification to SNS topic
    • Publish custom messages in SNS topic from Lambda function
    • AWS SQS Basics
    • SQS vs Kafka
    • Create SQS in AWS
    • Send and Receive messages in SQS
    • Read stream of messages in Lambda Function from SQS
    • AWS Event Bridge & Event Bridge Pipe
    • Scheduled trigger of Lambda function using event Bridge
    • Event bridge pipe to read stream of data from SQS and send to Lambda function with intermediate filters
  • Class - 3
    • Create EC2 instance in AWS
    • SSH in EC2 machine from terminal
    • AWS RDS
    • Setup MySQL database with AWS RDS
    • Login & Access MySQL Database from terminal
    • Connect and manipulate data in MySQL database using Python
    • AWS Athena Basics
    • Athena vs Spark
    • Create & Query Athena Tables
    • Setup Datasources in AWS Glue Catalog
    • Table metadata preparation with AWS Glue Crawler
    • Run Athena queries from Lambda Function
  • Class - 4
    • Crawl partitioned data in S3 with Glue Crawler
    • Read partitioned data from S3 in Athena
    • AWS Redshift fundamentals & architecture
    • Setup Redshift cluster
    • Table operations on sample data in redshift
    • Load data from S3 into Redshift table
    • Unload query command in Redshift
    • Unload data from Redshift into S3 with Manifest file
    • Create external table in Redshift
    • Materialized views in Redshift
    • AWS Glue fundamentals & components
    • AWS Glue Catalog & Glue Crawler
    • Setup Redshift connector in Glue
    • Data pipeline using AWS Glue Visualizer with S3 as Source and Redshift as Destination
    • AWS Glue job execution and insights

  • Project - 1: Real-time Healthcare Data Processing with DLT (Delta Live Tables) in Databricks (Covered In Module 8)
    • Tech Stack:
      • PySpark
      • Databricks
      • Delta Tables
      • Databricks DLT Workflow
  • Project - 2: Booking.com Incremental SCD2 Merge Ingestion (Covered In Module 8)
    • Tech Stack:
      • PySpark
      • Databricks
      • Delta Tables
      • Databricks DLT Workflow
      • PyDeequ
  • Project - 3: News Data Analysis with Event-Driven Incremental Load in Snowflake Table (Covered In Module 11)
    • Tech Stack:
      • Airflow
      • Google Cloud Storage
      • Python
      • Snowflake
  • Project - 4: E-commerce CDC Data Real-time Aggregation in Snowflake Dynamic Table (Covered In Module 11)
    • Tech Stack:
      • Python
      • Snowflake
      • Dynamic Table
  • Project - 5: Car Rental Data Batch Ingestion with SCD2 Merge in Snowflake Table (Covered In Module 11)
    • Tech Stack:
      • Python
      • PySpark
      • GCP Dataproc
      • Snowflake
      • Airflow
  • Project - 6: IRCTC Streaming Data Ingestion into BigQuery (Covered In Module 12)
    • Tech Stack:
      • Python
      • GCP Storage
      • GCP Pub-Sub
      • BigQuery
      • Dataflow
  • Project - 7: Walmart Data Ingestion into BigQuery (Covered In Module 12)
    • Tech Stack:
      • Python
      • Airflow
      • GCP Storage
      • BigQuery
  • Project - 8: Quality Movie Data Analysis
    • Tech Stack:
      • S3
      • Glue Crawler
      • Glue Catalog
      • Glue Catalog Data Quality
      • Glue Low Code ETL (With PySpark)
      • Redshift
      • Event Bridge
      • SNS
  • Project - 9: Gadget Sales Data Projection
    • Tech Stack:
      • Python
      • DynamoDB
      • DynamoDB Streams
      • Kinesis Streams
      • Event Bridge Pipe
      • Kinesis Firehose
      • S3
      • Lambda
      • Athena
  • Project - 10: Airline Data Ingestion
    • Tech Stack:
      • S3
      • S3 Cloudtrail Notification
      • Event Bridge Pattern Rule
      • Glue Crawler
      • Glue Visual ETL (With PySpark)
      • SNS
      • Redshift
      • Step Function
  • Project - 11: Logistics Data Warehouse Management
    • Tech Stack:
      • GCP Storage
      • Airflow (GCP Composer)
      • Hive Operators
      • PySpark With GCP Dataproc
      • Hive
  • Project - 12: Sales Order & Payment Data Real Time Ingestion
    • Tech Stack:
      • GCP Pub-Sub
      • Python
      • Docker
      • Cassandra

Course Schedule

Mode of the Course:
Recorded
Course Duration:
100+ Hours
Session:
37
Validity:
1 Year
Class Recording Provided:
Yes
Programming Language Used:
Python
Prerequisite:
āš ļø Important Notice :
The video may not work on Linux due to DRM restrictions. It is only accessible on Chrome when using Windows or macOS.

Workaround: To access the video on Linux, you can create a Windows virtual machine (VM) and watch the video through the VM. Alternatively, you can use our Android or iOS application to view the video on your mobile device.

Instructor

Shashank Mishra is a seasoned Data Engineer with over 6 years of industry experience, worked at leading companies like Amazon, PayTM, and McKinsey & Company.

Complete Data Engineering With AWS - Basic To Advance (Self Paced)
INR 5800



Namaste šŸ™

Welcome To Grow Data Skills !!!
Our chat support representative will respond within an hour.

Hello! How Can I Help You?
Ɨ

Enquiry