Prof. Ratnik Gandhi | Profile |
Rahul Patel | Profile |
This is a PG course (open as a technical elective to senior UG). The course will evolve around reading and implementation of state of the art literature in streaming algorithms related to big data problems. The objective of this course is to expose students with state of the art literature in the area of algorithms designing specifically for big data (focusing on streaming algorithms and related optimization problems). Student taking this course will develop an ability to independently take up a problem related to big data, model it and design a relevant solution.
This will be a Laboratory-class based course. Every week we will meet for a 3 hours session. During this session we will be discussing one or two ideas from reference research papers. Further, in this session, you (students) will be implementing these ideas in relevant software systems
Type | Weightage | Description |
---|---|---|
Midterm Project | 30% | A 3 Week project individual project - Implementation of an existing research paper. |
Endterm Project | 30% | A 7 Week project group project - Implementation and extention of an existing research paper. |
Endterm Take-home | 40% | A one week individual assignment - Propose solution/s for the open-ended problem provided. |
Consider a scenario in which a company like LinkedIn wants to build a module for suggesting career progression paths to its registered users. When a user logs onto the platform, the platform reads user’s profile and based on various parameters of this profile comes up with relevant suggestions on how the user should consider next set of skills to be acquired. Your aim, through this exam, is to design the following two modules:
- A module that reads user’s profile and suggest a career path – in terms of skillset -- to be acquired. [10]
- A module in which user enters a career goal and based on this career goal and other related information the platform suggest a career path.[10]
Relevant user profile data is available here. You are supposed to design modules for
- Reading data and if required cleaning it.
- Once data is available in relevant structured format, design modules 1 and 2 mentioned above.
Significantly unique solutions will be appreciated.[10]
Submission will be a report (maximum 3 page for diagrams/algorithm/results and other discussions) and GitHub code. A neat and clean algorithm with relevant proofs of correctness and its efficiency (recorded as a measure of computational complexity) will fetch more grades. [10]
Submit your solutions to Rahul on April 28, 2017 between 10am to 11am.
List of assignments with solutions.
Sr. No | Submission | Solution |
---|---|---|
1 | Online Regression | Parth Satodiya: Tensorflow |
2 | Online Singular Value Decomposition | Kishan Raval |
3 | Robust Principle Component Analysis (Midterm Project) | Riddhesh Sanghavi |
4 | Probabilistic Principle Component Analysis (PPCA) using Expectation Maximization | Parth Satodiya |
5 | Incremental Principle Component Analysis | Maunil Vyas Sol. |
6 | Generative Adversarial Network using PPCA (Endterm Project) | Maunil Vyas, Deep Patel and Shreyas Patel Sol. |
7 | Online K-medians Clustering | Shreyas Patel Sol. |
8 | Incremental Linear Discriminant Analysis | Shreyas Patel Sol. |
8 | Endterm Take-home | Maunil Vyas and Deep Patel Sol. Ashutosh Kakadiya Sol. |
Link to the excel file containing list of student repositories.
- Artificial Neural Network using Numpy
ANN-NUMPY
- Expectation Maximization using Python
EM
- A Singularly Valuable Decomposition: The SVD of a Matrix
SVD
- Expectation Maximization Algorithm
EM
Video
- A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
EM
- What is the expectation maximization algorithm?
EM
- The EM algorithm
EM
- The Expectation Maximization Algorithm: A short tutorial
EM
- Generative Learning algorithms by Andrew Ng - CS229 Lecture Notes
GAN
- Image Completion with Deep Learning in TensorFlow
GAN
- Generative Adversarial Networks Tutorial by Adit Deshpande
GAN
- The PAM Clustering Algorithm
Clustering
- Multi-Dimensional Regression Analysis of Time-Series Data Streams, Chen et al., Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002.
- Linear Programming in the Semi-Streaming Model with Applications to the Maximum Matching Problem, Ahn and Guha, Arxiv 2011.
- Fast Low-Rank Modifications of the Think Singular Value Decomposition, M Brand, Elsevier 2006.
- Parallel and Collaborative filtering for Streaming Data, Ali, Jhonson, Tang, 2011.
- Streaming Algorithm for the SVD, Strumpen, Hoffmann, Agarwal, MIT LCS Technical Memo 2003.
- Matrix Factorization for Collaborative Prediction, Kleeman, Hendersen, Denuit.
- Generalized Hebbian Algorithm for Incremental Latent Semantic Analysis, Gorrell and Webb.
- Analytic challenges in Social Sensing, Abdelzaher and Wang.
- Detecting anomaly in data streams by fractal model, Zhang et al., WWW 2014.
- Eigenspace Method for Spatiotemporal Hotspot Detection, Fanaee- T and Gama, Arxiv 2014.
- Chandima Hewa Nadungodage, Yuni Xia, Fang Li, Jaehwan John Lee, and Jiaqi Ge. Streamfitter: a real time linear regression analysis system for continuous data streams. In Database Systems for Advanced Applications, pages 458{461. Springer, 2011.
- Haitao Zhao, Pong Chi Yuen, and James T Kwok. A novel incremental principal component analysis and its application for face recognition. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 36(4):873{886, 2006.
- Pang, Shaoning, Seiichi Ozawa, and Nikola Kasabov. "Incremental linear discriminant analysis for classification of data streams." IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 35, no. 5 (2005): 905-914.
- Aggarwal, Charu C. "Outlier analysis." In Data Mining, pp. 237-263. Springer International Publishing, 2015.
- Aggarwal, Charu C. "A Survey of Stream Clustering Algorithms." (2013): 231-258.
- Candès, Emmanuel J., et al. "Robust principal component analysis?." Journal of the ACM (JACM) 58.3 (2011): 11.
- Tipping, Michael E., and Christopher M. Bishop. "Probabilistic principal component analysis." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61.3 (1999): 611-622.