Jobs Package¶
Jobs
Package¶
This package contains all HPCStats JobImporters.
JobImporter
Module¶
This module contains the base class for all Job importers.
-
class
HPCStats.Importer.Jobs.JobImporter.
JobImporter
(app, db, config, cluster)¶ Bases:
HPCStats.Importer.Importer.Importer
This is the base class common to all HPCStats Job importers. It simply defines a common set of attributes.
JobImporterFactory
Module¶
This module contains the factory design pattern class that builds the appropriate JobImporter depending on what is specified in configuration.
-
class
HPCStats.Importer.Jobs.JobImporterFactory.
JobImporterFactory
¶ Bases:
object
This class simply delivers the factory() static method, there is not point to instanciate it with an object.
-
static
factory
(app, db, config, cluster)¶ This method returns the appropriate JobImporter object depending on what is specified in configuration. In case of configuration error, HPCStatsConfigurationException is raised.
-
static
JobImporterSlurm
Module¶
This module contains the JobImporterSlurm class.
-
class
HPCStats.Importer.Jobs.JobImporterSlurm.
JobImporterSlurm
(app, db, config, cluster)¶ Bases:
HPCStats.Importer.Jobs.JobImporter.JobImporter
This JobImporter imports jobs related data from Slurm accounting database.
-
check
()¶ Check if cluster Slurm database is available for connection.
-
connect_db
()¶ Connect to cluster Slurm database and set conn/cur attribute accordingly. Raises HPCStatsSourceError in case of problem.
-
create_runs
(nodelist, job)¶ Create all Runs objects for the job in parameter and all the nodes in nodelist.
-
disconnect_db
()¶ Disconnect from cluster Slurm database.
-
static
get_job_state_from_slurm_state
(state)¶ Returns the human readable job state textual representation corresponding to the numeric state in parameter.
From
slurm.h.inc
:enum job_states { JOB_PENDING, /* queued waiting for initiation */ JOB_RUNNING, /* allocated resources and executing */ JOB_SUSPENDED, /* allocated resources, execution suspended */ JOB_COMPLETE, /* completed execution successfully */ JOB_CANCELLED, /* cancelled by user */ JOB_FAILED, /* completed execution unsuccessfully */ JOB_TIMEOUT, /* terminated on reaching time limit */ JOB_NODE_FAIL, /* terminated on node failure */ JOB_PREEMPTED, /* terminated due to preemption */ JOB_BOOT_FAIL, /* terminated due to preemption */ JOB_END /* not a real state, last entry in table */ }; #define JOB_STATE_BASE 0x00ff /* Used for job_states above */ #define JOB_STATE_FLAGS 0xff00 /* Used for state flags below */ #define JOB_COMPLETING 0x8000 /* Waiting for epilog completion */ #define JOB_CONFIGURING 0x4000 /* Allocated nodes booting */ #define JOB_RESIZING 0x2000 /* Size of job about to change, flag set * before calling accounting functions * immediately before job changes size */ #define JOB_SPECIAL_EXIT 0x1000 /* Requeue an exit job in hold */ #define JOB_REQUEUE_HOLD 0x0800 /* Requeue any job in hold */ #define JOB_REQUEUE 0x0400 /* Requeue job in completing state */ #define JOB_STOPPED 0x0200 /* Job is stopped state (holding resources, * but sent SIGSTOP */ #define JOB_LAUNCH_FAILED 0x0100
-
get_jobs_after_batchid
(batchid, window_size=0)¶ Fill the jobs attribute with the list of Jobs found in Slurm DB whose id_job is over or equals to the batchid in parameter. Returns the last found batch_id.
-
get_search_batch_id
()¶ Determine and return the oldest batch_id to search for update in Slurm database.
-
job_partition
(job_id, partitions_str, nodelist)¶ Return one partition name depending on the partition field and the nodelist job record in Slurm DB.
The partitions_str parameter is the partition field from Slurm DB job table. It is a string that represents a comma-separated list of partitions. Ex: ‘partition1,partition2’
The nodelist parameter is the nodelist field from Slurm DB job table. It a string that represents the nodeset allocated to the job. Ex: ‘node[001-100]’
If this partition list has only one element, this element is the resulting partition to record in HPCStatsDB. If this list has multiple elements, the function looks through the nodelist to find the coresponding partition loaded by ArchitectureImporter.
If the nodelist is not specified, the function arbitrary select the first element of the partition list.
Else, it searches through the job partitions loaded by ArchitectureImporter to find the list of job partitions whose nodelist totally intersects with job’s nodelist. Once found, it searches the partition over the list and returns it if found.
-
load
()¶ Load jobs from Slurm DB.
-
load_update_window
()¶ Load and update job in windowed mode until there is more job to load in Slurm database.
-
load_window
(batch_id)¶ Load a limited amount of jobs (limite to the window size) starting from the batch_id in parameter and fill jobs list attribute accordingly.
-
update
()¶ Update and save loaded Jobs in HPCStats DB.
-