Configuration¶
All the configuration files of the HPCStats components are formatted like an
INI file with sections between square brakets ([]
) and parameter/value
pairs separated by an equal sign (=
).
The various sections and parameters of these files are fully documented in the following sub-sections.
HPCStats server¶
The configuration file of the HPCStats server component is located at
/etc/hpcstats/hpcstats.conf
.
The first section is clusters
(required). It must contain the following
parameter:
clusters
(required): a list of cluster name separated by commas (,
) The list is then considered as the official list of clusters supported by HPCStats server component.
The hpcstatsdb
section (required) contains all the parameters to access
the central PostgreSQL HPCStats database. It must contain the following
parameters:
hostname
(required): The network hostname and the IP address of the PostgreSQL server.dbname
(required): The name of the database.port
(required): The TCP port listened by PostgreSQL server for incoming connections (Note: default TCP port of PostgreSQL is 5432).user
(required): The user name to authenticate to PostgreSQL server.password
(required): The password to authenticate to PostgreSQL server.
The constraints
section (optional) has several parameters that control
how HPCStats should behave when importing production data from sources that do
not strictely respect all constraints required by the HPCStats database schema.
It can contain the following parameters:
strict_user_membership
: This parameter controls how HPCStatsUserImporterLdap
connector should behave when a user is a member of the group in LDAP directory but has no account in this LDAP directory. If set toTrue
(default), HPCStats will fail (and stop running) when such incoherency is encountered. If set toFalse
, HPCStats will simply just print a warning message, discard this user and keep running.strict_job_project_binding
: This parameter controls how HPCStats Job importers category should behave when a job is linked to a project that has not been loaded by Project importer category previously. If set toTrue
(default), HPCStats will fail (and stop running). If set toFalse
, HPCStats will just print a warning message and set project reference toNULL
in theJob
table of the HPCStats database.strict_job_businesscode_binding
: This parameter is basically the same asstrict_job_project_binding
but for the Business codes. The possible values areTrue
(default) andFalse
.strict_job_account_binding
: This parameter controls how HPCStats Job importer category should behave when importing a job submitted by an account unknown by the User importer category. When set toTrue
(default), HPCStats will fail (and stop running) when such job is encountered. If set toFalse
, HPCStats will just print a warning message and skip the job.strict_job_wckey_format
: This parameter controls howJobImporterSlurm
connector should behave when importing a job with a badly formatted wckey. If set toTrue
(default), HPCStats will fail (and stop running) when such job is encountered. If set toFalse
, HPCStats will just print a warning message and ignore the wckey.ignored_errors
(optional): A comma separated list of errors codes to ignore. If encountered during the importation process, these errors will be reported as debug messages instead of warnings. All possible error codes are available in table Error management.
If the constraints
section is missing, default values are assumed for all
parameters.
The globals
section (optional) defines which connectors must be used for
projects and business importer categories. It can contain the following
parameters:
business
(optional): Possible values aredummy
(default),csv
andslurm
.projects
(optional): Possible values aredummy
(default),csv
andslurm
.
If the globals
section is missing, default values are assumed for all
parameters.
If the business
parameter is set to csv
, then the business
section
(optional) must be present with this parameter:
file
(optional): The absolute path to the business codes CSV file.
If the projects
parameter is set to csv
or slurm
, then the
projects
section (optional) must be present with these parameters:
file
(optional): forProjectImporterCSV
connector, it is the absolute path to the projects CSV file.default_domain_key
(optional): forProjectImporterSlurm
connector, the key of the default domain associated to created projects.default_domain_name
(optional): forProjectImporterSlurm
connector, the name of the default domain associated to created projects.
The configuration file must also contain one section per cluster declared in
the clusters
parameter list. The section must be the cluster name. These
sections must contain the following parameters to specify which connectors must
be used by HPCStats for each importer category on these clusters:
architecture
(required): The only possible value isarchfile
.users
(required): The possible values areldap
andldap+slurm
.fsusage
(required): Possible values aredummy
andssh
.events
(required): The only possible value isslurm
.jobs
(required): The only possible value isslurm
.
Then, the other sections depend on the connectors used for the cluster.
The <cluster>/archfile
section (optional) is required by
ArchitectureImporterArchfile
connector. It must contains the
following parameters:
file
(required): The absolute path to the architecture file which which description the component of the cluster.
The <cluster>/ldap
section (optional) is required by
UserImporterLdap
and UserImporterLdapSlurm
connectors.
It must contains the following parameters:
url
(required): the URL to connect to the LDAP server with its protocol and eventually the TCP port. Ex:ldaps://ldap.company.tld/
orldap://ldap.company.tld:636/
.dn
(required): The distinguished name of the user for binding the LDAP server.phash
(required): The hashed and salted password ofdn
for binding the LDAP server.cert
(optional): The absolute path to the CA certificate to check LDAP server certificate. Default isNone
with means it checks the server certificate against all CA certificates available on the system.basedn
(required): The base distinguished name to look for groups and users in the LDAP directory tree. Ex:dc=company,dc=tld
.rdn_people
(optional) The relative distinguished name of the subtree to search users. Default isou=people
.rdn_groups
(optional): The relative distinguished name of the subtree to search groups. Default isou=groups
.group
(DEPRECATED): The name of the group of users of the cluster. Should be replaced bygroups
.groups
(required): A comma separated list of group of users of the cluster. For compatibility reasons, can be ommited ifgroup
is set.group_dpt_search
(required): The regular expression to restrict the search of users secondary groups to find their department.group_dpt_regexp
(required): The regular expression to extract the department name of the user out of a group name. Ex:cn=(.+)-dp-(.+),ou.*
.default_subdir
(optional): The default subdirection assigned to users whose real department cannot be defined based on their groups memberships. This default subdirection is concatenated to the name of the user primary group. Default isunknown
.groups_alias_file
(optional): The absolute path to a file which defines aliases to primary group names. With these aliases, it is possible to substitute the primary group name with a more appropriate direction name in the resulting user department name. The file must be formatted with one alias per line, each alias being the primary group name and the alias separated with a whitespace (ex:group_name alias
). If this parameter is not defined, there is no aliasing involved.
The <cluster>/slurm
section (optional) is required by
ProjectImporterSlurm
, BusinessCodeImporterSlurm
,
UserImporterLdapSlurm
, EventImporterSlurm
and
JobImporterSlurm
connectors. It must contains the following
parameters:
host
(required): The network hostname or the IP address of the SlurmDBD MySQL (or MariaDB) server.name
(required): The name of MySQL database that contains the SlurmDBD accounting (hint: value is probablyslurm_acct_db
).user
(required): The name of the user to authenticate on MySQL server.password
(optional): The password of the user to authenticate on MySQL server. Default is None, ie. no password.window_size
(optional): The size of the window of loaded jobs. When this parameter is set to a valueN
above0
, the new jobs will be loaded byJobImporterSlurm
in windowed mode,N
jobs at a time, until there are no jobs to load anymore. If set to0
(default), all jobs will be loaded at once and this can lead to a lot of memory consumption when there too many jobs. It is recommended to set this value to avoid memory over-consumption during jobs import.prefix
(optional): The prefix in SlurmDBD database table names. Default value is the cluster name. This parameter might be usefull only in some corner-cases when someone wants the cluster name in HPCStats to be different from the Slurm cluster name.partitions
(optional): List of comma separated Slurm partitions whose imported data (jobs, projects, business codes, etc) are restricted to. Data on other partitions are ignored by HPCStats for this cluster. By default, HPCStats imports data from all Slurm partitions of the cluster without any restriction.
The <cluster>/fsusage
section (optional) is required by
FSUsageImporterSSH
connector. It must contains the following
parameters:
host
(required): The network hostname or the IP address of the cluster node on which thefsusage
runs and where the HPCStats should connect to.name
(required): The user name to authenticate on the remote cluster node.privkey
(required): The absolute path to the SSH private key file to authenticate on the remote cluster node.file
(required): The absolute path of the remote CSV file to read and parse for new filesystem usage metrics.timestamp_fmt
(optional): The format of the timestamps written in the CSV file. Default value is%Y-%m-%dT%H:%M:%S.%fZ
.
All sections and parameters on the HPCStats server component configuration file have been covered. Here is complete annoted configuration file example with 2 clusters cluster1 and cluster2:
# Example of hpcstats.conf
# Most of the default values can be change and have to be adpated
# This file is not meant to be a ready to work one you might have to tune it
###############################################################################
[clusters]
clusters = cluster1,cluster2
[hpcstatsdb]
hostname = localhost
dbname = hpcstatsdb
port = 5432
user = <myuser>
password = <password>
# Importer constraints
###############################################################################
[constraints]
# This parameter specifies how hpcstats should behave when a user is a member
# of the group in LDAP directory but has not an account in this LDAP directory.
# If set to True (default value), hpcstats will fail (and stop running) when
# such error is encountered. If set to False, hpcstats will simply print a
# warning message, ignore this user and keep running.
strict_user_membership=True
# These parameters define whether hpcstats should either fail or just print an
# error if an imported job is linked to a projet (or respectively a business
# code) that has not been loaded by ProjectImporter (and BusinessCodeImporter)
# previously. If set to True (default value), hpcstats will fail (and stop
# running). Else hpstats will just print a warning and will set project and
# business references to NULL in HPCStatsDB.
strict_job_project_binding=True
strict_job_businesscode_binding=True
# This parameter controls how hpcstats should behave when a loaded job was
# submitted by an account that has not been loaded by UserImporter. When, set
# to True, hpcstats will fail (and stop running) when such job is encountered.
# Else, hpcstats will just print a warning message and skip the job. Default
# value is True.
strict_job_account_binding=True
# This parameter controls if hpcstats should fail or just print an error when
# a job loaded by JobImporterSlurm has a wckey in a wrong format. When set to
# True (default) hpcstats will fail, when set to False hpcstats will simply
# print an error message.
strict_job_wckey_format=True
# Comma-separated list of errors to ignore during the importation process
#ignored_errors =
# Global parameters
###############################################################################
[globals]
business = csv
projects = slurm
[business]
file = <absolute path to CSV file>
[projects]
default_domain_key = dft
default_domain_name = Default domain
# Cluster 1
###############################################################################
[cluster1]
architecture = archfile
users = ldap
fsusage = ssh
events = slurm
jobs = slurm
[cluster1/archfile]
file = /path/to/archfile
[cluster1/ldap]
url = ldaps://<ldapuri>/
dn = <dn>
basedn = <basedn>
rdn_people = ou=people
rdn_groups = ou=groups
phash = <password>
cert = /path/to/cert
groups = <group1>,<group2>
group_dpt_search = *dp*
group_dpt_regexp = cn=(\w+)-dp-(\w+),ou=.*
# If the department cannot be defined based on user groups membership, it is
# defined based on the user primary group in LDAP directory and this default
# subdirection.
#default_subdir = unknown
# Optional alias file to associate group names to the organization directions
# for department names based on users primary group name.
#groups_alias_file = /etc/hpcstats/groups.alias
[cluster1/slurm]
host = <slurm_mysql_db_ip>
name = slurm_acct_db
user = slurm
password = <slurmpasswd>
# When this parameter is set to a value N above 0, the new jobs will be
# loaded by JobImporterSlurm in windowed mode, N jobs at a time, until there
# are no jobs to load anymore. If set to 0 (default value), all jobs will be
# loaded at once and this can lead to a lot of memory consumption when there
# too many jobs. It is recommended to set this value to limit memory
# consumption during jobs import.
window_size = 1000
# SlurmDBD clusters specific table names prefix. Default is cluster name.
# prefix = cluster
# Optionally restrict imported data from slurm (jobs, events, projects) to a
# specific list of slurm partitions of a cluster. The list is comma separated.
# partitions = compute,graphics
[cluster1/fsusage]
host = host IP
name = username
file = <absolute path to CSV file>
# The format of the timestamp to parse in remote log file.
# Default value is: %Y-%m-%dT%H:%M:%S.%fZ
timestamp_fmt = %Y-%m-%dT%H:%M:%S.%fZ
# Cluster2
###############################################################################
[cluster2]
architecture = archfile
users = ldap
fsusage = ssh
events = slurm
jobs = slurm
[cluster2/archfile]
file = /path/to/archfile
[cluster2/ldap]
url = ldaps://<ldapuri>/
dn = <dn>
basedn = <basedn>
phash = <password>
groups = <group1>
group_dpt_search = *dp*
group_dpt_regexp = cn=(.+)-dp-(.+),ou.*
# If the department cannot be defined based on user groups membership, it is
# defined based on the user primary group in LDAP directory and this default
# subdirection.
# default_subdir = unknown
[cluster2/slurm]
host = <slurm_mysql_db_ip>
name = slurm_acct_db
user = slurm
password = <slurmpasswd>
[cluster2/fsusage]
host = host IP
name = username
file = <absolute path to CSV file>
Agents and launcher¶
FSUsage agent¶
The configuration file of the fsusage agent is located at
/etc/hpcstats/fsusage.conf
.
This file contains only one global
section (required) with the following
parameters:
fs
(required): The list separated by commas (,
) of the mount points of the filesystem to monitor.csv
(required): The absolute path of the CSV file where the file system usage rates are recorded.maxsize
(required): the maximum size in MB of the CSV file. When this size is reached, the fsusage agent remove the first two third of the file to significantly reduce its size.
Here is complete annoted HPCStats fsusage agent configuration file example:
[global]
fs=/home,/scratch
csv=/var/lib/hpcstats/fsusage.csv
maxsize=2
JobStats agent¶
The configuration file of the jobstats agent is located at
/etc/hpcstats/jobstats.conf
.
This file contains a global
section (required) with the following
parameters:
tpl
(required): The absolute path of the source template job submission script.script
(required): The absolute path of the output generated job submission script.subcmd
(required): The command to use for submitting the jobs. Ex: sbatch.
Then, the file must also contain a vars
section (required). This section
contains all the variables used in the template job submission script:
name
(required): The name of the jobs.ntasks
(required): The number of CPU allocated for the jobs.error
(required): The absolute path of the error output logging file.output
(required): The absolute path of the standard output logging file.time
(required): The maximum running time of the jobs (in minutes).partition
(required): The name of the partition in which the jobs will be submitted.qos
(required): The name of the QOS in which the jobs will be submitted.wckey
(required): The wckey of the jobs.fs
(required): The list of separated by white spaces of filesystems mountpoints to check in the jobs.log
(required): The absolute path of the file where all check results will be recorded.
Here is complete annoted HPCStats jobstats agent configuration file example:
[global]
tpl=/usr/share/hpcstats/bin/jobstats.tpl.sh
script=/tmp/jobstats.sh
subcmd=sbatch
[vars]
name=STATS
ntasks=1
error=/tmp/jobstats.err.log
output=/tmp/jobstats.out.log
partition=cn
time=5
qos=dsp-ap-hpcstats
wckey=dsp:hpcstats
fs=/home /scratch
log=/tmp/jobstats.log
JobStats launcher¶
The configuration file of the jobstats launcher is located at
/etc/hpcstats/launcher.conf
.
This file contains a global
section (required) with the following
parameter:
clusters
(required): a list of cluster name separated by commas (,
).
Then, for each cluster present in the list, a dedicated section must be present, named after the cluster name. These sections must contain the following parameters:
frontend
(required): The network hostname or the IP address of the cluster frontend on which the launcher should connect to launch the jobstats agent.user
(required): The user name to authenticate on the remote cluster frontend node.privkey
(required): The absolute path to the SSH private key file to authenticate on the remote cluster frontend node.script
(required): The absolute path to the jobstats agent.
Here is complete annoted HPCStats jobstats launcher configuration file example:
[global]
clusters=cluster1,cluster2
[cluster1]
user=hpcstats
frontend=frontend.cluster1.company.tld
privkey=/home/hpcstats/.ssh/id_rsa
script=/usr/share/hpcstats/jobstats
[cluster2]
user=hpcstats
frontend=frontend.cluster2.company.tld
privkey=/home/hpcstats/.ssh/id_rsa
script=/usr/share/hpcstats/jobstats