Configuration

The configuration of hanythingondemand starts with the hod.conf file. This is an ini style configuration file with at least two sections: Meta and Config. Here is an example taken from the Hadoop 2.3 configs:

[Meta]
version=1

[Config]
modules=Hadoop/2.3.0-cdh5.0.0
master_env=HADOOP_HOME,EBROOTHADOOP,JAVA_HOME
services=resourcemanager.conf,nodemanager.conf,screen.conf
config_writer=hod.config.writer.hadoop_xml
workdir=/tmp
directories=$localworkdir/dfs/name,$localworkdir/dfs/data
autogen=hadoop

Here we have the Meta section with version set to 1. Version refers to the hanythingondemand configuration version. This is a placeholder in case we change the configurations around. That’s all the Meta information is needed (for now). The following parameters are set in the Config section:

  • autogen - Configuration files to autogenerate. This can be hadoop, hadoop_on_lustre2, or left blank. If it is set then hanythingondemand will create a basic configuration for you. This is particularly useful since it will calculate values for memory settings.You can then override any settings you feel necessary.
  • config_writer - a reference to the python code that will output the configuration used by the services.
  • directories - directories to create. If the service would fail without some directories being created, they should be entered here.
  • master_env - environment variables to pass from the master node to the slave nodes. This is used because MPI slaves don’t have an environment.
  • modules - modules that must be loaded when the cluster begins.
  • services - a list of service files containing start and stop script information.
  • workdir - place where logs and temporary data is written. Configuration files will be copied here as well. localworkdir is a subdirectory of workdir and is useful for when workdir is on a shared file system.

Template parameters

There are some templating variables that can be entered into the configuration files. These use a dollar sign ($) prefix.

  • masterhostname - hostname for the master node.
  • masterdataname - hostname for the Infiniband interface of the master node
  • hostname - hostname for the local node.
  • hostaddress - ip for the local node.
  • dataname - hostname for the Infiniband interface of the local node.
  • dataaddress - ip for the Infiniband interface of the local node.
  • user - user name of the person running the cluster.
  • pid - process ID.
  • workdir - workdir as defined.
  • localworkdir - subdirectory of workdir qualified using the node name and a pid. This is used for keeping distinct per-node directories on a shared file system.

Service configs

Service configs have three sections: Unit, Exec and Environment. Here is an example:

[Unit]
Name=nodemanager
RunsOn=all

[Service]
ExecStart=$$EBROOTHADOOP/sbin/yarn-daemon.sh start nodemanager
ExecStop=$$EBROOTHADOOP/sbin/yarn-daemon.sh stop nodemanager

[Environment]
YARN_NICENESS=1 /usr/bin/ionice -c2 -n0 /usr/bin/hwloc-bind socket:0
HADOOP_CONF_DIR=$localworkdir/conf
YARN_LOG_DIR=$localworkdir/log
YARN_PID_DIR=$localworkdir/pid
  • Name - name of the service.
  • RunsOn - (all|master|slave). Determines which nodes/group of nodes to run the service.
  • ExecStartPre - script to run before starting the service. e.g. used in HDFS to run the -format script.
  • ExecStart - script to start the service
  • ExecStop - script to stop the service
  • Environment - Environment variable definitions used for the service.

Autogenerated configuration

Autogenerating configurations is a powerful feature that lets you run services inside hanythingondemand on new clusters without having to hand calculate all the memory settings by hand.

For example,

  • If your administrators installed a brand spanking new cluster with a large amount of memory available, you don’t have to create a bunch of new configuration files to reflect the new system. It should all work seamlessly.
  • You are holding a class and would like to allocate each student half a node - then they can use autogenerated settings along with --rm-ppn=<half-the-number-of-cores>

To autogenerate some configurations, set autogen setting to an appropriate value in the Config section.

Preview configuration

To preview the output configuration files that your hod.conf file would produce, one can use the genconfig command:

hod genconfig --hodconf=/path/to/hod.conf --workdir=/path/to/workdir

Here, --workdir is the output directory (which will be created if it doesn’t yet exist) and --hodconf is the input configuration file.