A helper daemon (co-located with and explicitly started by a WMS), which is used to handle all job executions. More...
#include <JobManager.h>
Public Member Functions | |
std::shared_ptr< PilotJob > | createPilotJob () |
Create a pilot job. More... | |
std::shared_ptr< StandardJob > | createStandardJob (std::vector< WorkflowTask * > tasks, std::map< WorkflowFile *, std::shared_ptr< FileLocation > > file_locations) |
Create a standard job. More... | |
std::shared_ptr< StandardJob > | createStandardJob (std::vector< WorkflowTask * > tasks, std::map< WorkflowFile *, std::shared_ptr< FileLocation > > file_locations, std::vector< std::tuple< WorkflowFile *, std::shared_ptr< FileLocation >, std::shared_ptr< FileLocation > >> pre_file_copies, std::vector< std::tuple< WorkflowFile *, std::shared_ptr< FileLocation >, std::shared_ptr< FileLocation > >> post_file_copies, std::vector< std::tuple< WorkflowFile *, std::shared_ptr< FileLocation > >> cleanup_file_deletions) |
Create a standard job. More... | |
std::shared_ptr< StandardJob > | createStandardJob (WorkflowTask *task, std::map< WorkflowFile *, std::shared_ptr< FileLocation > > file_locations) |
Create a standard job. More... | |
std::set< std::shared_ptr< PilotJob > > | getPendingPilotJobs () |
Get the list of currently pending pilot jobs. More... | |
std::set< std::shared_ptr< PilotJob > > | getRunningPilotJobs () |
Get the list of currently running pilot jobs. More... | |
void | kill () |
Kill the job manager (brutally terminate the daemon, clears all jobs) | |
void | stop () override |
Stop the job manager. More... | |
void | submitJob (std::shared_ptr< WorkflowJob > job, std::shared_ptr< ComputeService > compute_service, std::map< std::string, std::string > service_specific_args={}) |
Submit a job to compute service. More... | |
void | terminateJob (std::shared_ptr< WorkflowJob > job) |
Terminate a job (standard or pilot) that hasn't completed/expired/failed yet. More... | |
Public Member Functions inherited from wrench::Service | |
void | assertServiceIsUp () |
Throws an exception if the service is not up. More... | |
std::string | getHostname () |
Get the name of the host on which the service is / will be running. More... | |
double | getNetworkTimeoutValue () |
Returns the service's network timeout value. More... | |
bool | getPropertyValueAsBoolean (std::string) |
Get a property of the Service as a boolean. More... | |
double | getPropertyValueAsDouble (std::string) |
Get a property of the Service as a double. More... | |
std::string | getPropertyValueAsString (std::string) |
Get a property of the Service as a string. More... | |
unsigned long | getPropertyValueAsUnsignedLong (std::string) |
Get a property of the Service as an unsigned long. More... | |
bool | isUp () |
Returns true if the service is UP, false otherwise. More... | |
void | resume () |
Resume the service. More... | |
void | setNetworkTimeoutValue (double value) |
Sets the service's network timeout value. More... | |
void | start (std::shared_ptr< Service > this_service, bool daemonize, bool auto_restart) |
Start the service. More... | |
void | suspend () |
Suspend the service. | |
Detailed Description
A helper daemon (co-located with and explicitly started by a WMS), which is used to handle all job executions.
Member Function Documentation
◆ createPilotJob()
std::shared_ptr< PilotJob > wrench::JobManager::createPilotJob | ( | ) |
Create a pilot job.
- Returns
- the pilot job
- Exceptions
-
std::invalid_argument
◆ createStandardJob() [1/3]
std::shared_ptr< StandardJob > wrench::JobManager::createStandardJob | ( | std::vector< WorkflowTask * > | tasks, |
std::map< WorkflowFile *, std::shared_ptr< FileLocation > > | file_locations | ||
) |
Create a standard job.
- Parameters
-
tasks a list of tasks (which must be either READY, or children of COMPLETED tasks or of tasks also included in the list) file_locations a map that specifies locations where files, if any, should be read/written. When empty, it is assumed that the ComputeService's scratch storage space will be used.
- Returns
- the standard job
- Exceptions
-
std::invalid_argument
◆ createStandardJob() [2/3]
std::shared_ptr< StandardJob > wrench::JobManager::createStandardJob | ( | std::vector< WorkflowTask * > | tasks, |
std::map< WorkflowFile *, std::shared_ptr< FileLocation > > | file_locations, | ||
std::vector< std::tuple< WorkflowFile *, std::shared_ptr< FileLocation >, std::shared_ptr< FileLocation > >> | pre_file_copies, | ||
std::vector< std::tuple< WorkflowFile *, std::shared_ptr< FileLocation >, std::shared_ptr< FileLocation > >> | post_file_copies, | ||
std::vector< std::tuple< WorkflowFile *, std::shared_ptr< FileLocation > >> | cleanup_file_deletions | ||
) |
Create a standard job.
- Parameters
-
tasks a list of tasks (which must be either READY, or children of COMPLETED tasks or of tasks also included in the standard job) file_locations a map that specifies locations where input/output files, if any, should be read/written. When empty, it is assumed that the ComputeService's scratch storage space will be used. pre_file_copies a vector of tuples that specify which file copy operations should be completed before task executions begin. The ComputeService::SCRATCH constant can be used to mean "the scratch storage space of the ComputeService". post_file_copies a vector of tuples that specify which file copy operations should be completed after task executions end. The ComputeService::SCRATCH constant can be used to mean "the scratch storage space of the ComputeService". cleanup_file_deletions a vector of file tuples that specify file deletion operations that should be completed at the end of the job. The ComputeService::SCRATCH constant can be used to mean "the scratch storage space of the ComputeService".
- Returns
- the standard job
- Exceptions
-
std::invalid_argument
◆ createStandardJob() [3/3]
std::shared_ptr< StandardJob > wrench::JobManager::createStandardJob | ( | WorkflowTask * | task, |
std::map< WorkflowFile *, std::shared_ptr< FileLocation > > | file_locations | ||
) |
Create a standard job.
- Parameters
-
task a task (which must be ready) file_locations a map that specifies locations where input/output files should be read/written. When unspecified, it is assumed that the ComputeService's scratch storage space will be used.
- Returns
- the standard job
- Exceptions
-
std::invalid_argument
◆ getPendingPilotJobs()
std::set< std::shared_ptr< PilotJob > > wrench::JobManager::getPendingPilotJobs | ( | ) |
Get the list of currently pending pilot jobs.
- Returns
- a set of pilot jobs
◆ getRunningPilotJobs()
std::set< std::shared_ptr< PilotJob > > wrench::JobManager::getRunningPilotJobs | ( | ) |
Get the list of currently running pilot jobs.
- Returns
- a set of pilot jobs
◆ stop()
|
overridevirtual |
Stop the job manager.
- Exceptions
-
WorkflowExecutionException std::runtime_error
Reimplemented from wrench::Service.
◆ submitJob()
void wrench::JobManager::submitJob | ( | std::shared_ptr< WorkflowJob > | job, |
std::shared_ptr< ComputeService > | compute_service, | ||
std::map< std::string, std::string > | service_specific_args = {} |
||
) |
Submit a job to compute service.
- Parameters
-
job a workflow job compute_service a compute service service_specific_args arguments specific for compute services: - to a BareMetalComputeService: {{"taskID", "[hostname:][num_cores]}, ...}
- If no value is provided for a task, then the service will choose a host and use as many cores as possible on that host.
- If a "" value is provided for a task, then the service will choose a host and use as many cores as possible on that host.
- If a "hostname" value is provided for a task, then the service will run the task on that host, using as many of its cores as possible
- If a "num_cores" value is provided for a task, then the service will run that task with this many cores, but will choose the host on which to run it.
- If a "hostname:num_cores" value is provided for a task, then the service will run that task with the specified number of cores on that host.
- to a BatchComputeService: {{"-t":"<int>" (requested number of minutes)},{"-N":"<int>" (number of requested hosts)},{"-c":"<int>" (number of requested cores per host)}[,{"-u":"<string>" (username)}]}
- to a VirtualizedClusterComputeService: {} (jobs should not be submitted directly to the service)}
- to a CloudComputeService: {} (jobs should not be submitted directly to the service)}
- to a HTCondorComputeService:
- For a "grid universe" job that will be submitted to a child BatchComputeService: {{"universe":"grid", {"-t":"<int>" (requested number of minutes)},{"-N":"<int>" (number of requested hosts)},{"-c":"<int>" (number of requested cores per host)}[,{"-service":"<string>" (batch service name)}, {"-u":"<string>" (username)}]}
- For a "non-grid universe" job that will be submitted to a child BareMetalComputeService: {}
- to a BareMetalComputeService: {{"taskID", "[hostname:][num_cores]}, ...}
- Exceptions
-
std::invalid_argument WorkflowExecutionException
◆ terminateJob()
void wrench::JobManager::terminateJob | ( | std::shared_ptr< WorkflowJob > | job | ) |
Terminate a job (standard or pilot) that hasn't completed/expired/failed yet.
- Parameters
-
job the job to be terminated
- Exceptions
-
WorkflowExecutionException std::invalid_argument std::runtime_error
The documentation for this class was generated from the following files:
- JobManager.h
- JobManager.cpp