Skip to content
This repository has been archived by the owner on Jan 15, 2022. It is now read-only.

glob list instead of simple hdfs list and pattern support for input #103

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

angadsingh
Copy link

Right now hraven accepts a simple hdfs path as input folder and will fetch all job history + conf files underneath it. This pull request adds support for specifying a pattern with wildcards (*) and using hdfs api's globStatus method to list files instead of hraven's recursive listFiles method. This way one can easily shard hraven's job to different years/months/days.

@jrottinghuis
Copy link
Contributor

Interesting idea. Doesn't the RM already do this (sharding history files by date) ?
For Hadoop 1 we had the original directory all in one place (where the history server can read from), then we separately ran JobFilePartitioner to shard the files into a yyyy/mm/dd directory structure.
Are you doing a different setup ?
Can you explain how your history files appear in one place and then get shared, or how that works for you ?

@@ -40,8 +40,12 @@ costfile=/var/lib/hraven/conf/costFile
hadoopconfdir=${HADOOP_CONF_DIR:-$HADOOP_HOME/conf}
hbaseconfdir=${HBASE_CONF_DIR:-$HBASE_HOME/conf}
# HDFS directories for processing and loading job history data
historyRawDir=/yarn/history/done/
historyProcessingDir=/hraven/processing/
year=2014
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So each year you would have to manually adjust this to the new year ?
I'm pretty sure that we'd forget to make this change on January first in a daze of New Year and we'd have collection broken.

@CLAassistant
Copy link

CLAassistant commented Jul 18, 2019

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Angad Singh seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants