OAuth2-based authentication provider for Cosmos

What is cosmos-hive-auth-provider

cosmos-hive-auth-provider is a custom authentication provider for Hive. Hive natively provides many ways of implementing authentication, e.g. Kerberos, PAM or LDAP, but it also allows for configuring custom mechanisms, like this one.

By using cosmos-hive-auth-provider users will be able to authenticate by means or their OAuth2 token, generated by a OAuth2 Tokens Generator (a third party) handled by any trusted Identity Manager (for instance, FIWARE Lab one).

The advantage regarding the way this library has been implemented is that any user-and-password-based Hive client will continue working; simply, the password configuration parameter takes the token value.




git tool and Maven must be installed in order to download and build the cosmos-hive-auth-provider library, respectively. It is not the goal of this document to provide detailed installation instructions about the mentioned tools.

Of course, this library has no sense if no Hive service is deployed in a Hadoop cluster.



Start by cloning the fiware-cosmos repository at Github:

$ git clone https://github.com/telefonicaid/fiware-cosmos.git

That will create a fiware-cosmos folder. Then, the library is built by running the following commands:

$ cd fiware-cosmos/cosmos-hive-auth-provider
$ git checkout release/x.y.z
$ mvn clean compile assembly:single
Jar copying

The cosmos-hive-auth-provider jar containing the OAuth2AuthenticationProviderImpl class must be copied into one folder within the Hive classpath, for instance:

$ ls /usr/lib/hive/lib/ | grep cosmos


Unit tests

The unit tests are run by invoking this parameterized mvn test command:

$ mvn test -Duser=frb -Dtoken=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Nothing has to be configured regarding cosmos-hive-auth-provider itself, but the Hive service must be configured in order to use it.

The following properties must be added to hive-site.xml in order to enable a custom authentication provider, i.e. cosmos-hive-auth-provider and its OAuth2AuthenticationProviderImpl class:



This other property must be added to hive-site.xml if we want to overwrite the default value for the Identity Manager endpoint (the one validating the OAuth2 authentication tokens):


Finally, this property must be modified in order to enable impersonation (on the contrary, all the queries are executed by the user hive instead of the real end user):




This is a library directly used by Hive. In order HiveServer2 knows about it, restart the service from your cluster manager (e.g. Ambari) or from the command line:

$ (sudo) service hive-server2 restart



The cosmos-hive-auth-provider library is used when a HiveServer2 client connects and pass a user and a OAuth2 token. For instance, let's assume we are using any one of the Hive clients distributed within the fiware-cosmos repository:

$ pwd
$ mvn exec:java -Dexec.args="computing.cosmos.lab.fiware.org 10000 default frb xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building hiveserver2-client 0.0.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] >>> exec-maven-plugin:1.2.1:java (default-cli) > validate @ hiveserver2-client >>>
[INFO] <<< exec-maven-plugin:1.2.1:java (default-cli) < validate @ hiveserver2-client <<<
[INFO] --- exec-maven-plugin:1.2.1:java (default-cli) @ hiveserver2-client ---
Connecting to jdbc:hive2://computing.cosmos.lab.fiware.org:10000/default?user=frb&password=XXXXXXXXXX
remotehive> show tables;
remotehive> describe frb_test;
remotehive> select * from frb_test;

When passing the user credentials, a OAuth2 token is passed instead of a Unix or LDAP password.




HiveServer2 traces are usually logged within /var/log/hive/hiveserver2.log. There, you can find the traces regarding cosmos-hive-auth-provider, for instance, if everything goes well:

2016-01-27 15:59:17,587 INFO  [pool-5-thread-4]: authprovider.HttpClientFactory (OAuth2AuthenticationProviderImpl.java:Authenticate(67)) - Doing request: GET https://account.lab.fiware.org/user?access_token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx HTTP/1.1
2016-01-27 15:59:17,594 INFO  [pool-5-thread-4]: authprovider.HttpClientFactory (OAuth2AuthenticationProviderImpl.java:Authenticate(78)) - Response received: {"organizations": [], "displayName": "frb", "roles": [{"name": "provider", "id": "106"}], "app_id": "8556cc76154f41b3b43d7b31f0699982", "email": "frb@tid.es", "id": "frb"}
2016-01-27 15:59:17,667 INFO  [pool-5-thread-4]: thrift.ThriftCLIService (ThriftCLIService.java:OpenSession(188)) - Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V6
2016-01-27 15:59:17,676 INFO  [pool-5-thread-4]: hive.metastore (HiveMetaStoreClient.java:open(297)) - Trying to connect to metastore with URI thrift://dev-fiwr-bignode-11.hi.inet:9083
2016-01-27 15:59:17,677 INFO  [pool-5-thread-4]: hive.metastore (HiveMetaStoreClient.java:open(385)) - Connected to metastore.

If the token does not exist, this is an example of relevant traces:

2016-01-27 16:10:10,196 INFO  [pool-5-thread-28]: authprovider.HttpClientFactory (OAuth2AuthenticationProviderImpl.java:Authenticate(67)) - Doing request: GET https://account.lab.fiware.org/user?access_token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx HTTP/1.1
2016-01-27 16:10:10,197 INFO  [pool-5-thread-28]: authprovider.HttpClientFactory (OAuth2AuthenticationProviderImpl.java:Authenticate(78)) - Response received: {"error": {"message": "Access Token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx not found", "code": 404, "title": "Not Found"}}
2016-01-27 16:10:10,197 ERROR [pool-5-thread-28]: transport.TSaslTransport (TSaslTransport.java:open(296)) - SASL negotiation failure

If the token exists but does not match the given user, then something like this is logged:

2016-01-27 16:12:11,520 INFO  [pool-5-thread-32]: authprovider.HttpClientFactory (OAuth2AuthenticationProviderImpl.java:Authenticate(67)) - Doing request: GET https://account.lab.fiware.org/user?access_token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx HTTP/1.1
2016-01-27 16:12:11,521 INFO  [pool-5-thread-32]: authprovider.HttpClientFactory (OAuth2AuthenticationProviderImpl.java:Authenticate(78)) - Response received: {"organizations": [], "displayName": "frb", "roles": [{"name": "provider", "id": "106"}], "app_id": "8556cc76154f41b3b43d7b31f0699982", "email": "frb@tid.es", "id": "frb"}
2016-01-27 16:12:11,521 ERROR [pool-5-thread-32]: transport.TSaslTransport (TSaslTransport.java:open(296)) - SASL negotiation failure
javax.security.sasl.SaslException: Error validating the login [Caused by javax.security.sasl.AuthenticationException: The given token does not match the given user]



It is important to note this authentication provider caches OAuth2 tokens. Caching tokens is useful because it saves queries to the Identity Manager, and because it allows using expired tokens from clients not implementing the refreshing mechanism (e.g. Cygnus).

This is done transparenty to the user, nevertheless from the administration point of view this is relevant regarding certain logs may appear, and the backup file for the cache.

Logs are about cache usage, possibilities are:

17/03/02 09:14:20 INFO authprovider.OAuth2AuthenticationProviderImpl: User was not cached or token did not match, thus querying the IdM
17/03/02 09:14:20 INFO authprovider.OAuth2AuthenticationProviderImpl: User cached
17/03/02 09:14:29 INFO authprovider.OAuth2AuthenticationProviderImpl: User and token were cached, thus nothing to query to IdM

With regards to the backup file for the cache, this is a file saving the cache between runs of HiveServer2, loaded each time the server starts, and saved each time a modification is done in the cache. It is always saved as /home/hive/oauth2.cache file (such a path is not currently configurable).


Reporting issues and contact information

There are several channels suited for reporting issues and asking for doubts in general. Each one depends on the nature of the question:

