Adding HiveServer Credential Provider #210

georgepachitariu · 2020-03-10T16:19:19Z

Hello Jim,
first I would like to thank you for implementing this library. We needed a way to launch Jupyter containers on our existing Hadoop and your libraries are fitting great.
In our Hadoop, the users get the data by connecting to our Hive database. Since we have a kerberised cluster, the way to connect from a yarn container is to use delegation tokens.

I implemented the part that connects to Hive and obtains a delegation token that is added to the rest of the tokens.
I used the Oozie implementation for inspiration (the interface CredentialProvider.java is also like there):
https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/action/hadoop/Hive2Credentials.java

This is my first draft. Could you please have a look at it?

…koverflow.com/questions/48033792/log4j2-error-statuslogger-unrecognized-conversion-specifier

georgepachitariu · 2020-03-13T10:45:38Z

The Travis CI checks failed with message:
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.

I think that the stalling is caused by the build and not by the new code. I would appreciate if I receive any guidance here.

georgepachitariu · 2020-03-13T11:24:15Z

I created a pull-request in yarnspawner as well:
jupyterhub/yarnspawner#17

jcrist

Thanks for working on this! Apologies for the delayed review here. I've left a few comments on the implementation.

A few general questions:

Is credential_providers the best name for this field? What terminology do other systems use?
What other credential providers may a user want us to support?
Are uri and principal sufficient information for all other implementations?

jcrist · 2020-03-16T18:16:54Z

java/pom.xml

@@ -390,6 +394,38 @@
      <scope>provided</scope>
    </dependency>

+    <dependency>
+      <groupId>org.apache.hive</groupId>
+      <artifactId>hive-jdbc</artifactId>


If a system has hive available, is this jar likely to already be on the classpath (this is common for other hadoop libraries)? If so, we usually set <scope>provided</scope> for these types of dependencies to keep the size of our jar down and not worry about dependency mismatches. You should be able to drop all the exclusions as well.

This is true. Thanks for explaining why I should use provided

jcrist · 2020-03-16T18:18:13Z

skein/model.py

@@ -1152,6 +1152,60 @@ def from_protobuf(cls, obj):
                   security=security)


+class CredentialProviderSpec(Specification):


I'd just call this CredentialProvider.

jcrist · 2020-03-16T18:19:44Z

java/src/main/proto/skein.proto

+message CredentialProviderSpec {
+  string name = 1;
+  string uri = 2;
+  string principal = 3;


Is this sufficient for any other credential provider we may want to support later?

as explained in the big comment, having these 2 are not enough for all providers, and I switched to a map of configurations

jcrist · 2020-03-16T18:26:20Z

java/src/main/java/com/anaconda/skein/credentials/CredentialProvider.java

+import org.apache.hadoop.yarn.exceptions.YarnException;
+import java.io.IOException;
+
+public interface CredentialProvider {


Rather than abstracting this out into an interface with two implementations, I'd make a single class with a method for adding each one, and a larger method for iterating through the spec and adding credentials for each provider. In python pseudocode:

class CredentialManager: def add_credentials_common(self, credentials): pass def add_credentials_hive(self, credentials, uri, principal): pass def add_credentials(self, credentials, spec): self.add_credentials_common(credentials) for cred in spec.credential_providers: if cred.name == "hive": self.add_credentials_hive(credentials, cred.uri, cred.principal) elif cred.name == ...: pass # other implementations added here

This may not be as java idiomatic, but better matches with the existing code style. I don't see a reason to abstract this out when we'll only ever support a set list of credential providers.

True, I'll refactor it this way.

jcrist · 2020-03-16T18:27:53Z

java/src/main/java/com/anaconda/skein/MsgUtils.java

+    for(int i = 0; i < spec.getCredentialProvidersCount(); i++) {
+      Msg.CredentialProviderSpec d = spec.getCredentialProviders(i);
+      if (d.getName().equals("hive")) {
+        credentialProviders.add(new HiveCredentials(d, spec.getUser()));


What should we do if the user requests a provider we don't support? There's two cases here:

The name provided doesn't match any of our implementations.

The name does match one we implement, but adding a credential fails.

In both cases I think we should error. But I don't think that should be handled here, rather I think it should be handled in the routine that manages adding credentials (described below).

yes, there will be exceptions propagated backwards in both cases in the next draft

georgepachitariu · 2020-03-17T14:48:31Z

Thank you for reviewing my code :).
I understood your comments. I will come back with the answers and a new draft later today.

georgepachitariu · 2020-03-17T19:49:30Z

Thanks for working on this! Apologies for the delayed review here. I've left a few comments on the implementation.

A few general questions:

Is credential_providers the best name for this field? What terminology do other systems use?

What other credential providers may a user want us to support?

Are uri and principal sufficient information for all other implementations?

Hi, I answered the questions:

Is credential_providers the best name for this field? What terminology do other systems use?

In Oozie it's called CredentialsProvider
In Spark it's called HadoopDelegationTokenProvider
Researching this I found that Hadoop has a CredentialProvider API, which is more general than what we want.

Since we are only dealing with Delegation token maybe we can rename "credential_providers" to be more specific: hadoop_delegation_token_provider ?
This name will be in line with the Hadoop & Kerberos book. It is mentioned there "delegation token" and "Hadoop tokens".

What other credential providers may a user want us to support?
From reading Oozie and Spark code: HCAT (Hive Metastore), Hbase, JHS (Hadoop Job History Server), Kafka.
Are uri and principal sufficient information for all other implementations?
After some research, the answer is a sad no.

Hbase uses the Hbase client configuration and the input Hadoop job config. received as input.
I think that Hadoop Job History Server uses the Hadoop configuration.
Kafka similarly has it's own configuration: KafkaTokenClusterConf

I was thinking, can we have a dictionary<str, str> in protobuf that can be filled with whatever configuration (as keys with values) each provider needs, all bundled together?
Because It might not be very nice to change the protobuf everytime we add a provider.

georgepachitariu · 2020-03-20T17:48:43Z

Hi Jim, my second draft is ready for review. I think I covered all the things you mentioned above.
If you could have a look when you have time, that would be great :).

jcrist · 2020-03-24T15:42:24Z

Thanks @georgepachitariu. I'm currently on break between jobs (and without a computer), but will try to look at the changes as soon as I can (max 2 weeks from now). Apologies, thanks for being patient.

georgepachitariu · 2020-03-24T18:02:21Z

No worries @jcrist, take your time.

santosh-d3vpl3x · 2020-04-21T21:59:24Z

This is amazing effort at enabling good support for exhaustive services in hadoop, thanks! Any ETA on when can we expect this to be available for use?

jcrist · 2020-04-21T22:26:24Z

Hi all. I just started a new job, so a fair bit of my time is occupied ramping up on that. I expect to be able to give this a good review by end-of-week though. Thanks for your patience.

wundervaflja · 2020-04-23T07:47:54Z

This is very nice PR, exactly what we currently need for our platform. How I can help to have it merged ? Really looking forward to help and to have it in master.

georgepachitariu · 2020-04-28T10:30:26Z

Hi @santosh-d3vpl3x @wundervaflja. It's very cool to see that other people have the same ideas as me. As Jim said, please be a little patient. We will work towards a solution we like.
If you like to live on the edge, you can install this branch in your deployment. (That's what I did :D )

gboutry · 2022-11-16T19:13:26Z

Hello, do you have an estimate on the completeness of this PR ? I'm really interested, and ready to help if needed.

georgepachitariu · 2022-11-16T23:08:45Z

Hello, do you have an estimate on the completeness of this PR ? I'm really interested, and ready to help if needed.

Hi @gboutry, nice to meet you!
Sadly this branch didn't get merged (and I don't work on data engineering systems anymore),
BUT the branch has the complete functionality. So I would advise you to build using this branch and try it out.

gboutry · 2022-11-25T23:15:50Z

Hi @georgepachitariu,

Sorry for the late reply, I was able to test your work, and indeed it works as expected, you did a really good job.
Many thanks

(I rebased your branch on skein master, and it worked well)

georgepachitariu added 6 commits March 10, 2020 17:00

Adding HiveServer Credential Provider

6330e04

I fixed a problem

5fe9f54

I should not have added this file

7bebfef

I cleaned 2 files

58e419f

log4j ended up being added twice and we had this problem https://stac…

397034c

…koverflow.com/questions/48033792/log4j2-error-statuslogger-unrecognized-conversion-specifier

I put back the exception message

16f669e

georgepachitariu mentioned this pull request Mar 13, 2020

Adding HiveServer Credential Provider jupyterhub/yarnspawner#17

Open

jcrist reviewed Mar 16, 2020

View reviewed changes

Implemented the notes from the first review

4856530

georgepachitariu mentioned this pull request Mar 24, 2020

Added a tool that prints the Hive username and password from inside a… georgepachitariu/skein#1

Closed

Added HCat credential provider

85d3c5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding HiveServer Credential Provider #210

Adding HiveServer Credential Provider #210

georgepachitariu commented Mar 10, 2020

georgepachitariu commented Mar 13, 2020

georgepachitariu commented Mar 13, 2020 •

edited

Loading

jcrist left a comment

jcrist Mar 16, 2020

georgepachitariu Mar 20, 2020

jcrist Mar 16, 2020

georgepachitariu Mar 20, 2020

jcrist Mar 16, 2020

georgepachitariu Mar 20, 2020

jcrist Mar 16, 2020

georgepachitariu Mar 20, 2020

jcrist Mar 16, 2020

georgepachitariu Mar 20, 2020

georgepachitariu commented Mar 17, 2020

georgepachitariu commented Mar 17, 2020

georgepachitariu commented Mar 20, 2020

jcrist commented Mar 24, 2020

georgepachitariu commented Mar 24, 2020

santosh-d3vpl3x commented Apr 21, 2020

jcrist commented Apr 21, 2020

wundervaflja commented Apr 23, 2020

georgepachitariu commented Apr 28, 2020

gboutry commented Nov 16, 2022

georgepachitariu commented Nov 16, 2022 •

edited

Loading

gboutry commented Nov 25, 2022

		@@ -1152,6 +1152,60 @@ def from_protobuf(cls, obj):
		security=security)


		class CredentialProviderSpec(Specification):

Adding HiveServer Credential Provider #210

Are you sure you want to change the base?

Adding HiveServer Credential Provider #210

Conversation

georgepachitariu commented Mar 10, 2020

georgepachitariu commented Mar 13, 2020

georgepachitariu commented Mar 13, 2020 • edited Loading

jcrist left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

georgepachitariu commented Mar 17, 2020

georgepachitariu commented Mar 17, 2020

georgepachitariu commented Mar 20, 2020

jcrist commented Mar 24, 2020

georgepachitariu commented Mar 24, 2020

santosh-d3vpl3x commented Apr 21, 2020

jcrist commented Apr 21, 2020

wundervaflja commented Apr 23, 2020

georgepachitariu commented Apr 28, 2020

gboutry commented Nov 16, 2022

georgepachitariu commented Nov 16, 2022 • edited Loading

gboutry commented Nov 25, 2022

georgepachitariu commented Mar 13, 2020 •

edited

Loading

georgepachitariu commented Nov 16, 2022 •

edited

Loading