big-data-processing

Here are 74 public repositories matching this topic...

drshahizan / BDM

Course covers big data fundamentals, processes, technologies, platform ecosystem, and management for practical application development.

big-data big-data-analytics big-data-processing big-data-architecture

Updated Apr 7, 2024
Jupyter Notebook

souvik-databricks / dlt-with-debug

Star

A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.

big-data spark etl python3 databricks dlt etl-pipeline big-data-processing delta-live-tables

Updated Dec 7, 2022
Python

This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsearch, and Postgres

python big-data apache-flink big-data-processing realtime-streaming

Updated Dec 4, 2023
Java

eskimo-sh / eskimo

Star

Eskimo is a state of the art Big Data Infrastructure and Management Web Console to build, manage and operate Big Data 2.0 Analytics clusters on Kubernetes. This is the git repository of Eskimo Community Edition.

Updated Sep 14, 2023
Java

felipefrizzo / terraform-aws-kinesis-firehose

Star

This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.

big-data analytics terraform kinesis-firehose cloudwatch-logs parquet terraform-provider etl-job terraform-aws big-data-processing

Updated Aug 4, 2021
HCL

StarPlatinumStudio / Flink-SQL-Practice

Star

Flink SQL 实战 -中文博客专栏

sql stream-processing apache-flink big-data-processing

Updated Jun 17, 2022
Java

giucris / yasp

Star

Yet Another SPark Framework

framework scala big-data spark etl sparksql elt etl-framework etl-pipeline big-data-processing

Updated Feb 5, 2023
Scala

pyajs / veronica

Star

big data processing and machine learning platform，just like useing sql

sql python3 pyspark machine-learning-platform big-data-processing xql

Updated Oct 15, 2024
Python

hope-data-science / R4BD

Star

R for Big Data (Chinese Version)

r big-data big-data-processing big-data-analytics-techniques

Updated Nov 13, 2024
R

anjijava16 / GCP_Data_Enginner_Utils

Star

GCP_Data_Enginner

python bigquery scala notebook gcp pubsub pyspark dataflow shell-script dataproc-cluster dataproc gcp-storage big-data-processing

Updated Sep 4, 2021
Shell

impresso / impresso-text-acquisition

Star

🛠️ Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.

big-data-processing historical-newspapers impresso-project

Updated Oct 1, 2024
Jupyter Notebook

bdnf / BigData-Engineering-Projects

Star

Data modeling with Cassandra, building Data Warehouse using Redshift and creation of Data Lake using Spark and Airflow

airflow spark cassandra data-warehouse data-lake redshift big-data-analytics big-data-processing

Updated Feb 28, 2020
Jupyter Notebook

tabletop-labs / tabletop

Star

A curated selection of tools, libraries and services that help tame your dataflow to productively build ambitious, data driven & reactive applications on a streaming lakehouse

real-time microservices kafka big-data stream-processing big-data-analytics timetravel big-data-processing modern-data-stack elasticscaling semi-structured-cloud-warehouse

Updated May 30, 2023
Go

theGuyWithBlackTie / electricChargingStations

Star

big-data electric-vehicles spark-ml charging-stations big-data-processing

Updated Dec 13, 2021
Jupyter Notebook

VladOnMyOwn / ctr-poisson-bootstrap

Star

Here I demonstrate the performance difference between the Poisson and the classic bootstrap by estimating the confidence interval for the difference of CTRs of the two user groups

python bootstrap statistics big-data ab-testing statistical-tests ab-tests ab-test click-through-rate big-data-processing poisson-bootstrap

Updated Oct 22, 2022
Jupyter Notebook

vvittis / FlinkSampling

Star

Reservoir Sampling for Group-By Queries in Flink Platform. Answering effectively Single Aggregate.

java topic stratum apache-flink sampling reservoir-sampling streaming-data big-data-analytics group-by big-data-processing streaming-tuples

Updated Aug 12, 2023
Java

software-competence-center-hagenberg / AVUBDI

Star

Github Repository for a versatile usable Big Data infrastructure (AVUBDI)

docker kafka spark docker-compose docker-swarm template-project big-data-platform big-data-processing

Updated Feb 23, 2021
Shell

chandnii7 / Big-Data-Processing-Pipeline

Star

A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.