Skip to content
This repository has been archived by the owner on Jan 15, 2022. It is now read-only.

Refactoring hraven for multiple sink support #102

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

angadsingh
Copy link

Generic object model and abstraction for output records of JobFileProcessor's mapper instead of directly emitting Hbase puts at the lowest level of code hierarchy. Used MultipleOutputs to allow sinking to different sinks (graphite, hbase, etc.) and handle specifically writing of records at the sink's OutputFormat level. Added graphite sink and refactored hbase storage to work as a sink. Changes no hraven behaviour.

…nd abstraction for output records of JobFileProcessor's mapper instead of directly emitting Hbase puts at the lowest level of code hierarchy. Added graphite sink and refactored hbase storage to work as a sink. Changes no hraven behavior.
@angadsingh
Copy link
Author

2 more changes which are subsequent to this one. will create PRs when this one is accepted:
angadsingh/hraven@twitter:master...optional-task-history-processing
angadsingh/hraven@twitter:master...graphite-key-mapping


/**
*
* @author angad.singh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop authors

@jrottinghuis
Copy link
Contributor

Definitely need a little more time to grok this change.
On the face of it, it sounds great to separate out HBase and add an additional sink.
How would that work for cases where we need to rely on HBase CAS or increment operations ?

In theory one could store this data in a regular SQL DB as well, however, we rely heavily on efficient HBase scans to be able to efficiently pull data out of tables that contain tens of billions of task records for about a dozen clusters.

If we do support an additional sink we need some serious unit test cases to make sure that any additional changes to hRaven don't break compatibility for Graphite.

@CLAassistant
Copy link

CLAassistant commented Jul 18, 2019

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Angad Singh seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants