This project is realized as a computer science project in the Master 1 of data science at the university of Lille. 01/11/2020
This repository contains :
-
trajectory_class.py
: python script for the class Trajectory. -
grid_class.py
: python script for the class Grid. -
test_trajectory.py
: python script for testing the trajectory class with the TDD(Test-Driven Development) method. -
clustering trajectories.ipynb
: A jupyter notebook version of the 3 sripts above that contains all the classes (Trajectory, Grid, Test)
For the txt files used in this project, they are located in the folder cabspottingdata
(1 file = 1 trajectory) which is available in this link.
-
- Add a new unit test
-
- Check it fails
-
- Make it pass (with the others)
-
- Refactor (if needed)
-
- Commit (& push)
-
- Go to (i).
Identify input/output structures
-
Input: Set of trajectories
- Classes: Trajectory, Point
-
Output: Graph of paths
- Classes: Graph, Node, Edge
Apply TDD to develop clustering algorithm
-
Download the mobility dataset from here (SF cabs)
-
Use the CSV python parser to load each filcabspottingdata.tar.gze in the directory (1 file = 1 trajectory) as an input trajectory for your algorithm
-
Run clustering algorithm on the loaded dataset by creating a new Application class that combines the parser with the clustering algorithm
-
Plot the resulting graph using matplotlib for basic rendering of the resulting graph, or:
-
Explore the effects of changing the clustering “depths” on the resulting graph