Here you can find all materials regarding the lecture "BIG DATA - An Introduction To The Fields Of Data Engineering, Development And Architecture Of Data-Intensive Applications." hold at the Cooperative State University Baden-Wuerttemberg in Stuttgart.
This lecture will give you a brief introduction to so what is called ’Big Data’. We will quickly refresh the basics about databases, data models and data processing you have learned so far and compare those to the distributed world of Big Data. After that we will take a deep dive into the foundations of distributed data storages and data processing as well as the belonging concepts and challenges of reliability, scalability, replication, partitioning, batch and stream processing. Later on we will take a look at the most common used software and frameworks (mostly the hadoop ecosystem). At the end, as you know the basic concepts and you are able to setup and work with distributed environments and huge data sets, there will be a short introduction to data science.
At the end of each lesson, there will be some hands-on exercises, which we will start together and which have to be nished till the next week. This lecture will only be about 36 hours in 12 weeks (1 slot each week), which is very little time to cover such an extensive topic. So pay close attention and if you can’t keep up, feel free to ask questions at the end of each lesson.
- Script
- Slides presented within the lecture
- Exercises and Solutions
- Docker Images
You can just download everything directly or install git and get everything by using:
git clone https://github.com/marcelmittelstaedt/BigData.git
and
git pull
If you find any mistakes or misspellings feel free to send me a mail (contact@marcel-mittelstaedt.com) or if you are able to, commit a push request.
- Use Spotify Audio Features To Automatically Categorize Music Tracks
- Use OpenCellID Data To Estimate GSM, UMTS and LTE Coverage For A Certain Destination
- Use OpenAddress Data To Validate Adresses
- Use TLC Trip Record Data To Calculate Performance KPIs For New York City Cabs
- Use Kaggle Hubway Data To Calclulate Bike Sharing Usage KPIs
- Create Magic The Gathering Trading Card Database By API
- Create Magic The Gathering Trading Card Database By Crawler
- Use XKCD API To Build A Searchable Database
- Create A Searchable IP And GeoLocation Database
Marcel Mittelstädt
- Co-Founder, DataWhizz, www.datawhizz.io
- Head of Data Architecture and Development, ProSiebenSat.1 Media SE, www.prosiebensat1.com
- University Lecturer, Cooperative State University Baden-Wuerttemberg, www.dhbw.de
Contact: