In this project we have taken the IPL data from the years 2008 to 2016 and have applied the concepts of Hadoop and implemented in hive
The Indian Premier League (IPL), also known as TATA IPL for sponsorship reasons, is a men's T20 franchise cricket league of India. It is annually contested by ten teams based out of seven Indian cities and three Indian states. The league was founded by the Board of Control for Cricket in India (BCCI) in 2007. Brijesh Patel is the incumbent chairman of IPL. It is usually held annually in summer across India between March to May and has an exclusive window in the ICC Future Tours Programme. The IPL is the most-attended cricket league in the world and in 2014 was ranked sixth by average attendance among all sports leagues. In 2010, the IPL became the first sporting event in the world to be broadcast live on YouTube. Over the course of its run starting from its inaugural season in 2008 till the recently concluded one in 2022, there have been various winners with the franchises of Chennai and Mumbai winning the title multiple times. In this project we have taken the IPL data from the years 2008 to 2016 and have applied the concepts of Hadoop and implemented in hive and pig.
Two cricket data files with Indian Premier League data from 2008 to 2016 is used as a data source. The files are as follows: 1.matches.csv – Provides details about each match played 2.deliveries.csv – Provides details about consolidated deliveries of all the matches
These files are extracted and loaded into Hive. The data is further processed, transformed, and analyzed to get the winner for each season and the top 5 batsmen with maximum run in each season and overall season.
To find the information of certain players and to find the winner of the IPL editions from 2008 to 2016 on the two given datasets and analyse them using hive in Hadoop.