Skip to content

irish-luong/BigData-java_spark_transformation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big Data Project - Apache Spark and Java


Table of Content

1. Apache Spark introduction

2. Getting Started with Spark

2. Spark Dataframe basic operations

3. Spark Dataframe advanced operations

4. Spark SQL and other functionalities

5. Big data batching application

6. Deploy and cluster execution

7. Monitoring and performance fundamentals


Apache Spark introduction

Why choose Spark?

What is Spark?

A brief history of Spark

  • Enter MapReduce
  • Spark arrives

A comprehensive stack

Core components and architecture

Big data primer

Big data life cycle

Spark and the batch data processing model

Distributed processing model


Getting Started with Spark and Java

Spring boot CLI application

Project structure


Spark Dataframe basic operations

Dataframe's schema

Dataframe of POJO

Transformation and action

Transformation (I): Map and Filter

Transformation (II): Flatmap and Distinct

Action (I): Count, Take and Collect

Action (II): Reduce and Aggregation (Max, Min, Mean)

Deep dive: Internal of Spark execution


Spark Dataframe advanced operations

Data partitioning and shuffling

Transformation (III): GroupBy and GroupByKey

Transformation (IV): Join

Transformation (V): Union, UnionByName, UnionAll and DropDuplications

Sharing data in cluster: Accumulators and Broadcast variable

UDFs: User-defined functions


Spark SQL and other functionalities

1. Ingest files

1. CSV
2. Jsonline
3. Json
4. Text
5. XML
6. Parquet

2. Ingest databases

1. Delta table (upgraded parquet)

Big data batching application

1. The application architecture ecosystem

Management and scheduling tier

Workflow tier

Logging and monitoring tier

Processing tier

Storage tier

Database tier

2. Cloud architecture - AWS


Deploy and cluster execution


Monitoring and performance fundamentals

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages