In order to participate and start working on this assignment, you should do the following:
- Register on se-gitlab.inf.tu-dresden.de .
- Setup your account credentials (ssh-keys).
- FORK the SE2W 2020 Assignment2.
- Clone the repository on your local machine.
Having done the aforementioned, there are two options of working with the provided repository:
-
using the eclipse project file provided in the top-level directory or,
-
using your favorite editor coupled with the Dockerfile in the toplevel directory Since option 1 is trivial, the below provided instructions are to help you work with the docker file.
- Install docker in your machine
- In the toplevel directory with the Dockefile, run
docker image build -t se_assignment:2 .
the above command will build the image, download all the data needed for the assignment, run ant build command and then runAllExamples.docker run --user 1000:1000 --rm -it -v `realpath ./se2-w-2019-assignment2`:/Source se_assignment:2 bash
will create a docker container which one could use to execute the code.- Navigate to /Source directory and execute
ant init
command. - To run the provided examples, execute
ant runAllExamples
orrunMapRedWordFrequencyCount
for WordFrequencyCount example. The available targets can be seen in thebuild.xml
file. - Modify the source code in the solutions directory corresponding to task 1 or task 2.
- Execute
ant runMapSolution1
or,ant runMapSolution2
and/orant runAllSolutions
to test the solution you have developed.
The output of the tasks should inform you of success or failure of the given/your solution.
Noteworthy Points
- Push your code to se-inf.gitlab whenever you make and commit changes.
- We run/evaluate your solutions everyday and you will be sent a link by email showing your score based on the output of your program.
- You need to work inside the university VPN during initialization of your project. This will allow you to download some of the data from university servers.
- The tasks are described below
In order to complete the tasks below, please fill the gaps code wise in the src/solutions package. Note: You can use Eclipse or any favorite Java IDE to accomplish those tasks.
You are provided with an apache log showing links that have been accessed by clients. The task is to create a MapReduce program that counts the total number of times a given url has been accessed. If the url you get does not start with a valid hostname, you should prepend it with http://localhost/ (see general notes).
Expected Output: URL → frequency
#!csv
http://localhost/tikiwiki-2.1/css/admin.css 7
http://localhost/tikiwiki-2.1/tiki-admin.php 308
…
###Task Description###
You have been provided csv data that describes taxi services in the city of Newyork.
For each hourly period, or hour, of the day, determine the number of taxis in operation during the period.
The hourly period is defined as the period between the start of an hour and the last second of that hour e.g. from 12.00.00 to 12.59.59.
Some taxi services can span more than one hour period. In that case, consider the operation in each different time period as unique/different operation.
Expected Output: timeslot/window → frequency
#!csv
1am 301935
1pm 548485
10am 504387
…
…
- Solutions must be turned in no later than 11:59pm AOE, 5th of Jan‘21! No late days or other excuses.
- Commit & PUSH!!! to your bitbucket repository before the deadline. Don't forget the push.
- No team work. We check for plagarism and will let you fail if there is an indication given.
- Ask questions at auditorium if there are any.
- You need to set the JAVA_HOME environment variable using the short 8.3 path notation:
C:\PROGRA~1\Java\JDK18~1.0_1
- Your project should reside in a folder structure/subfolder without spaces.
- Your project should not reside on a Windows share otherwise you get exceptions such as:
org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:241)