Jobs Package

Jobs Package

This package contains all HPCStats JobImporters.

JobImporter Module

This module contains the base class for all Job importers.

class HPCStats.Importer.Jobs.JobImporter.JobImporter(app, db, config, cluster)

Bases: HPCStats.Importer.Importer.Importer

This is the base class common to all HPCStats Job importers. It simply defines a common set of attributes.

JobImporterFactory Module

This module contains the factory design pattern class that builds the appropriate JobImporter depending on what is specified in configuration.

class HPCStats.Importer.Jobs.JobImporterFactory.JobImporterFactory

Bases: object

This class simply delivers the factory() static method, there is not point to instanciate it with an object.

static factory(app, db, config, cluster)

This method returns the appropriate JobImporter object depending on what is specified in configuration. In case of configuration error, HPCStatsConfigurationException is raised.

JobImporterSlurm Module

This module contains the JobImporterSlurm class.

class HPCStats.Importer.Jobs.JobImporterSlurm.JobImporterSlurm(app, db, config, cluster)

Bases: HPCStats.Importer.Jobs.JobImporter.JobImporter

This JobImporter imports jobs related data from Slurm accounting database.

check()

Check if cluster Slurm database is available for connection.

connect_db()

Connect to cluster Slurm database and set conn/cur attribute accordingly. Raises HPCStatsSourceError in case of problem.

create_runs(nodelist, job)

Create all Runs objects for the job in parameter and all the nodes in nodelist.

disconnect_db()

Disconnect from cluster Slurm database.

static get_job_state_from_slurm_state(state)

Returns the human readable job state textual representation corresponding to the numeric state in parameter.

From slurm.h.inc:

enum job_states {
    JOB_PENDING, /* queued waiting for initiation */
    JOB_RUNNING, /* allocated resources and executing */
    JOB_SUSPENDED, /* allocated resources, execution suspended */
    JOB_COMPLETE, /* completed execution successfully */
    JOB_CANCELLED, /* cancelled by user */
    JOB_FAILED, /* completed execution unsuccessfully */
    JOB_TIMEOUT, /* terminated on reaching time limit */
    JOB_NODE_FAIL, /* terminated on node failure */
    JOB_PREEMPTED, /* terminated due to preemption */
    JOB_BOOT_FAIL, /* terminated due to preemption */
    JOB_END /* not a real state, last entry in table */
};
#define JOB_STATE_BASE  0x00ff  /* Used for job_states above */
#define JOB_STATE_FLAGS 0xff00  /* Used for state flags below */
#define JOB_COMPLETING  0x8000  /* Waiting for epilog completion */
#define JOB_CONFIGURING 0x4000  /* Allocated nodes booting */
#define JOB_RESIZING    0x2000  /* Size of job about to change, flag set
                                 * before calling accounting functions
                                 * immediately before job changes size */
#define JOB_SPECIAL_EXIT 0x1000 /* Requeue an exit job in hold */
#define JOB_REQUEUE_HOLD 0x0800 /* Requeue any job in hold */
#define JOB_REQUEUE      0x0400 /* Requeue job in completing state */
#define JOB_STOPPED      0x0200 /* Job is stopped state (holding resources,
                                 * but sent SIGSTOP */
#define JOB_LAUNCH_FAILED 0x0100
get_jobs_after_batchid(batchid, window_size=0)

Fill the jobs attribute with the list of Jobs found in Slurm DB whose id_job is over or equals to the batchid in parameter. Returns the last found batch_id.

get_search_batch_id()

Determine and return the oldest batch_id to search for update in Slurm database.

job_partition(job_id, partitions_str, nodelist)

Return one partition name depending on the partition field and the nodelist job record in Slurm DB.

The partitions_str parameter is the partition field from Slurm DB job table. It is a string that represents a comma-separated list of partitions. Ex: ‘partition1,partition2’

The nodelist parameter is the nodelist field from Slurm DB job table. It a string that represents the nodeset allocated to the job. Ex: ‘node[001-100]’

If this partition list has only one element, this element is the resulting partition to record in HPCStatsDB. If this list has multiple elements, the function looks through the nodelist to find the coresponding partition loaded by ArchitectureImporter.

If the nodelist is not specified, the function arbitrary select the first element of the partition list.

Else, it searches through the job partitions loaded by ArchitectureImporter to find the list of job partitions whose nodelist totally intersects with job’s nodelist. Once found, it searches the partition over the list and returns it if found.

load()

Load jobs from Slurm DB.

load_update_window()

Load and update job in windowed mode until there is more job to load in Slurm database.

load_window(batch_id)

Load a limited amount of jobs (limite to the window size) starting from the batch_id in parameter and fill jobs list attribute accordingly.

update()

Update and save loaded Jobs in HPCStats DB.