cosmos-tidoop-api exposes a RESTful API for running MapReduce jobs in a shared Hadoop environment.

Please observe we emphasize in a shared Hadoop environment. This is because shared Hadoops require special management of the data and the analysis processes being run (storage and computation). There are tools like Oozie in charge of running MapReduce jobs as well through an API, but they do not take into account the access to the run jobs, their status, results, etc must be controlled. In other words, using Oozie any user may kill a job by knowing its ID; using cosmos-tidoop-api only the owner of the job will be able to.

The key point is to relate all the MapReduce operations (run, kill, retrieve status, etc) to the user space in HDFS. This way, simple but effective authorization policies can be stablished per user space (in the most basic approach, allowing only a user to access it own user space). This can be easily combined with authentication mechanisms such as OAuth2.

Finally, it is important to remark cosmos-tidoop-api is being designed to run in a computing cluster, in charge of analyzing the data within a storage cluster. Sometimes, of course, both storage and computing cluster may be the same; even in that case the software is ready for that.




This REST API has only sense with a Hadoop cluster administrated by the reader.

cosmos-tidoop-api is a Node.js application, therefore install it from the official download. An advanced alternative is to install Node Version Manager (nvm) by creationix/Tim Caswell, which will allow you to have several versions of Node.js and switch among them.

Of course, common tools such as git and curl are needed.


API installation

This is a software written in JavaScript, specifically suited for Node.js (JavaScript on the server side). JavaScript is an interpreted programming language thus it is not necessary to compile it nor build any package; having the source code downloaded somewhere in your machine is enough.

Start by creating, if not yet created, a Unix user named cosmos-tidoop; it is needed for installing and running the application. You can only do this as root, or as another sudoer user:

$ sudo useradd cosmos-tidoop
$ sudo passwd cosmos-tidoop <choose_a_password>

While you are a sudoer user, create a folder for saving the cosmos-tidoop-api log traces under a path of your choice, typically /var/log/cosmos/cosmos-tidoop-api, and set cosmos-tidoop as the owner:

$ sudo mkdir -p /var/log/cosmos/cosmos-tidoop-api
$ sudo chown -R cosmos-tidoop:cosmos-tidoop /var/log/cosmos

Now it is time to enable the cosmos-tidoop user to run Hadoop commands as the requesting user. This can be done in two ways:

  • Adding the cosmos-tidoop user to the sudoers list. This is the easiest way, but the most dangerous one.
  • Adding the cosmos-tidoop user to all the user groups (by default, for any user there exists a group with the same name than the user). This is only useful if, and only if, the group permissions are as wide open as the user ones (i.e. 77X).

Once , change to the new fresh cosmos-tidoop user:

$ su - cosmos-tidoop

Then, clone the fiware-cosmos repository somewhere of your ownership:

$ git clone https://github.com/telefonicaid/fiware-cosmos.git

cosmos-tidoop-api code is located at fiware-cosmos/cosmos-tidoop-api. Change to that directory and execute the installation command:

$ cd fiware-cosmos/cosmos-tidoop-api
$ git checkout release/x.y.z
$ npm install

That must download all the dependencies under a node_modules directory.


Unit tests

To be done.



cosmos-tidoop-api is configured through a JSON file (conf/cosmos-tidoop-api.json). These are the available parameters:

  • host: FQDN or IP address of the host running the service. Do not use localhost unless you want only local clients may access the service.
  • port: TCP listening port for incoming API methods invocation. 12000 by default.
  • storage_cluster:
    • namenode_host: FQDN or IP address of the Namenode of the storage cluster.
    • namenode_ipc_port: TCP listening port for Inter-Process Communication used by the Namenode of the storage cluster. 8020 by default.
  • log:
    • file_name: Path of the file where the log traces will be saved in a daily rotation basis. This file must be within the logging folder owned by the the user tidoop.
    • date_pattern: Data pattern to be appended to the log file name when the log file is rotated.



The Http server implemented by cosmos-tidoop-api is run as (assuming your current directory is fiware-cosmos/cosmos-tidoop-api):

$ npm start

If everything goes well, you should be able to remotely ask (using a web browser or curl tool) for the version of the software:

$ curl -X GET "http://<host_running_the_api>:12000/tidoop/v1/version"
{"version": "0.1.1"}

cosmos-tidoop-api typically listens in the TCP/12000 port, but you can change if by editing conf/cosmos-tidoop-api.conf as seen above.



Two are the sources of data for administration purposes, the logs and the list of jobs launched.


Logging traces

Logging traces, typically saved under /var/log/cosmos/cosmos-tidoop-lib, are the main source of information regarding the GUI performance. These traces are written in JSON format, having the following fields: level, message and timestamp. For instance:

{"level":"info","message":"Connected to","timestamp":"2015-07-31T08:44:04.624Z"}

Logging levels follow this hierarchy:

debug < info < warn < error < fatal

Within the log it is expected to find many info messages, and a few of warn or error types. Of special interest are the errors:

  • Some error occurred during the starting of the Hapi server: This message may appear when starting the API. Most probably the configured host IP address/FQDN does not belongs to the physical machine the service is running, or the configured port is already used.


Submitted jobs

As an administrator, information regarding submitted jobs can be retrieved via the hadoop job command (it must be said such a command is the underlying mechanism the REST API uses in order to return information regarding MapReduce jobs). A complete reference for this command can be found in the official Hadoop documentation.