wrench::JobManager
-
class JobManager : public wrench::Service
A helper daemon (co-located with and explicitly started by an execution controller), which is used to handle all job executions.
Public Functions
-
~JobManager() override
Destructor, which kills the daemon (and clears all the jobs)
-
std::shared_ptr<CompoundJob> createCompoundJob(std::string name)
Create a Compound job.
- Parameters:
name – the job’s name (if empty, a unique job name will be picked for you)
- Returns:
the job
-
std::shared_ptr<PilotJob> createPilotJob()
Create a pilot job.
- Throws:
std::invalid_argument –
- Returns:
the pilot job
Create a standard job.
- Parameters:
task – a task (which must be ready)
- Throws:
std::invalid_argument –
- Returns:
the standard job
Create a standard job.
- Parameters:
task – a task (which must be ready)
file_locations – a map that specifies locations where input/output files should be read/written. When unspecified, it is assumed that the ComputeService’s scratch storage space will be used.
- Throws:
std::invalid_argument –
- Returns:
the standard job
Create a standard job.
- Parameters:
task – a task (which must be ready)
file_locations – a map that specifies, for each file, a list of locations, in preference order, where input/output files should be read/written. When unspecified, it is assumed that the ComputeService’s scratch storage space will be used.
- Throws:
std::invalid_argument –
- Returns:
the standard job
Create a standard job.
- Parameters:
tasks – a list of tasks (which must be either READY, or children of COMPLETED tasks or of tasks also included in the list)
- Throws:
std::invalid_argument –
- Returns:
the standard job
Create a standard job.
- Parameters:
tasks – a list of tasks (which must be either READY, or children of COMPLETED tasks or of tasks also included in the list)
file_locations – a map that specifies locations where files, if any, should be read/written. When empty, it is assumed that the ComputeService’s scratch storage space will be used.
- Throws:
std::invalid_argument –
- Returns:
the standard job
Create a standard job.
- Parameters:
tasks – a list of tasks (which must be either READY, or children of COMPLETED tasks or of tasks also included in the standard job)
file_locations – a map that specifies locations where input/output files, if any, should be read/written. When empty, it is assumed that the ComputeService’s scratch storage space will be used.
pre_file_copies – a vector of tuples that specify which file copy operations should be completed before task executions begin.
post_file_copies – a vector of tuples that specify which file copy operations should be completed after task executions end.
cleanup_file_deletions – a vector of file tuples that specify file deletion operations that should be completed at the end of the job.
- Throws:
std::invalid_argument –
- Returns:
the standard job
Create a standard job.
- Parameters:
tasks – a list of tasks (which must be either READY, or children of COMPLETED tasks or of tasks also included in the list)
file_locations – a map that specifies, for each file, a list of locations, in preference order, where input/output files should be read/written. When unspecified, it is assumed that the ComputeService’s scratch storage space will be used.
- Throws:
std::invalid_argument –
- Returns:
the standard job
Create a standard job.
- Parameters:
tasks – a list of tasks (which must be either READY, or children of COMPLETED tasks or of tasks also included in the standard job)
file_locations – a map that specifies, for each file, a list of locations, in preference order, where input/output files should be read/written. When unspecified, it is assumed that the ComputeService’s scratch storage space will be used.
pre_file_copies – a vector of tuples that specify which file copy operations should be completed before task executions begin.
post_file_copies – a vector of tuples that specify which file copy operations should be completed after task executions end.
cleanup_file_deletions – a vector of file tuples that specify file deletion operations that should be completed at the end of the job.
- Throws:
std::invalid_argument –
- Returns:
the standard job
-
simgrid::s4u::Mailbox *getCreatorMailbox()
Return the mailbox of the job manager’s creator.
- Returns:
a mailbox
-
unsigned long getNumRunningPilotJobs() const
Get the list of currently running pilot jobs.
- Returns:
a set of pilot jobs
-
void kill()
Kill the job manager (brutally terminate the daemon, clears all jobs)
-
virtual void stop() override
Stop the job manager.
- Throws:
std::runtime_error –
Submit a compound job to a compute service.
- Parameters:
job – a compound job
compute_service – a compute service
service_specific_args – arguments specific for compute services:
to a BareMetalComputeService: {{“actionID”, “[hostname:][num_cores]}, …}
If no value is provided for a task, then the service will choose a host and use as many cores as possible on that host.
If a “” value is provided for a task, then the service will choose a host and use as many cores as possible on that host.
If a “hostname” value is provided for a task, then the service will run the task on that host, using as many of its cores as possible
If a “num_cores” value is provided for a task, then the service will run that task with this many cores, but will choose the host on which to run it.
If a “hostname:num_cores” value is provided for a task, then the service will run that task with the specified number of cores on that host.
to a BatchComputeService: {{“-t”:”<int>” (requested number of seconds)},{“-N”:”<int>” (number of requested hosts)},{“-c”:”<int>” (number of requested cores per host)}[,{“actionID”:”[node_index:]num_cores”}] [,{“-u”:”<string>” (username)}]}
to a VirtualizedClusterComputeService: {} (jobs should not be submitted directly to the service)}
to a CloudComputeService: {} (jobs should not be submitted directly to the service)}
to a HTCondorComputeService:
For a “grid universe” job that will be submitted to a child BatchComputeService: {{“-universe”:”grid”, {“-t”:”<int>” (requested number of seconds)},{“-N”:”<int>” (number of requested hosts)},{“-c”:”<int>” (number of requested cores per host)}[,{“-service”:”<string>” (BatchComputeService service name)}] [, {“actionID”:”[node_index:]num_cores”}] [, {“-u”:”<string>” (username)}]}
For a “non-grid universe” job that will be submitted to a child BareMetalComputeService: {}
- Throws:
std::invalid_argument –
Submit a pilot job to a compute service.
- Parameters:
job – a pilot job
compute_service – a compute service
service_specific_args – arguments specific for compute services:
to a BatchComputeService: {“-t”:”<int>” (requested number of seconds)},{“-N”:”<int>” (number of requested hosts)},{“-c”:”<int>” (number of requested cores per host)}
to a BareMetalComputeService: {} (pilot jobs should not be submitted directly to the service)}
to a VirtualizedClusterComputeService: {} (pilot jobs should not be submitted directly to the service)}
to a CloudComputeService: {} (pilot jobs should not be submitted directly to the service)}
to a HTCondorComputeService: {} (pilot jobs should be be submitted directly to the service)
- Throws:
std::invalid_argument –
Submit a standard job to a compute service.
- Parameters:
job – a standard job
compute_service – a compute service
service_specific_args – arguments specific for compute services:
to a BareMetalComputeService: {{“taskID”, “[hostname:][num_cores]}, …}
If no value is provided for a task, then the service will choose a host and use as many cores as possible on that host.
If a “” value is provided for a task, then the service will choose a host and use as many cores as possible on that host.
If a “hostname” value is provided for a task, then the service will run the task on that host, using as many of its cores as possible
If a “num_cores” value is provided for a task, then the service will run that task with this many cores, but will choose the host on which to run it.
If a “hostname:num_cores” value is provided for a task, then the service will run that task with the specified number of cores on that host.
to a BatchComputeService: {{“-t”:”<int>” (requested number of seconds)},{“-N”:”<int>” (number of requested hosts)},{“-c”:”<int>” (number of requested cores per host)}[,{“taskID”:”[node_index:]num_cores”}] [,{“-u”:”<string>” (username)}]}
to a VirtualizedClusterComputeService: {} (jobs should not be submitted directly to the service)}
to a CloudComputeService: {} (jobs should not be submitted directly to the service)}
to a HTCondorComputeService:
For a “grid universe” job that will be submitted to a child BatchComputeService: {{“-universe”:”grid”, {“-t”:”<int>” (requested number of seconds)},{“-N”:”<int>” (number of requested hosts)},{“-c”:”<int>” (number of requested cores per host)}[,{“-service”:”<string>” (BatchComputeService service name)}] [, {“taskID”:”[node_index:]num_cores”}] [, {“-u”:”<string>” (username)}]}
For a “non-grid universe” job that will be submitted to a child BareMetalComputeService: {}
- Throws:
std::invalid_argument –
Terminate a compound job that hasn’t completed/expired/failed yet.
- Parameters:
job – the job to be terminated
- Throws:
std::invalid_argument –
std::runtime_error –
Terminate a pilot job that hasn’t completed/expired/failed yet.
- Parameters:
job – the job to be terminated
- Throws:
std::invalid_argument –
std::runtime_error –
Terminate a standard job that hasn’t completed/expired/failed yet.
- Parameters:
job – the job to be terminated
- Throws:
std::invalid_argument –
std::runtime_error –
-
~JobManager() override