By processing and clustering mining taxi data, fully mining taxi passenger hotspots, it can provide information assistance and decision support for taxi dispatching and management, and improve taxi utilization. In the traditional sense, taxi data processing and passenger hotspot mining are based on a single computer. Due to the configuration and performance of a single computer, the number of taxis and the computing speed are limited. The emergence of big data Hadoop technology solves the bottlenecks of storage and calculation of a large amount of data, thereby making it possible to process and mine a large amount of taxi data.
The second project used spider to obtain taxi data near the airport
Other parts of this project solution include:
- Python Spider
- [Hadoop Cluster]
- [Mapreduce K-means]
- [Visualizaztion(PHP js)]
- Due to the large amount of data, the code and data implemented by the paper are on the [website] and Baidu's web disk
- The result is the result obtained by MapReduce's secondary sort. The code can be viewed on the [website]
- License plate number
- latitude
- longitude
- Whether to carry passengers
- time
Version Alpha 0.1