wrench::JobManager

class JobManager : public wrench::Service

A helper daemon (co-located with and explicitly started by an execution controller), which is used to handle all job executions.

Public Functions

std::shared_ptr<CompoundJob> createCompoundJob(std::string name)

Create a Compound job.

Parameters:

name – the job’s name (if empty, a unique job name will be picked for you)

Returns:

the job

std::shared_ptr<PilotJob> createPilotJob()

Create a pilot job.

Returns:

the pilot job

std::shared_ptr<StandardJob> createStandardJob(const std::shared_ptr<WorkflowTask> &task)

Create a standard job.

Parameters:

task – a task (which must be ready)

Returns:

the standard job

std::shared_ptr<StandardJob> createStandardJob(const std::shared_ptr<WorkflowTask> &task, const std::map<std::shared_ptr<DataFile>, std::shared_ptr<FileLocation>> &file_locations)

Create a standard job.

Parameters:
  • task – a task (which must be ready)

  • file_locations – a map that specifies locations where input/output files should be read/written. When unspecified, it is assumed that the ComputeService’s scratch storage space will be used.

Returns:

the standard job

std::shared_ptr<StandardJob> createStandardJob(const std::shared_ptr<WorkflowTask> &task, std::map<std::shared_ptr<DataFile>, std::vector<std::shared_ptr<FileLocation>>> file_locations)

Create a standard job.

Parameters:
  • task – a task (which must be ready)

  • file_locations – a map that specifies, for each file, a list of locations, in preference order, where input/output files should be read/written. When unspecified, it is assumed that the ComputeService’s scratch storage space will be used.

Returns:

the standard job

std::shared_ptr<StandardJob> createStandardJob(const std::vector<std::shared_ptr<WorkflowTask>> &tasks)

Create a standard job.

Parameters:

tasks – a list of tasks (which must be either READY, or children of COMPLETED tasks or of tasks also included in the list)

Returns:

the standard job

std::shared_ptr<StandardJob> createStandardJob(const std::vector<std::shared_ptr<WorkflowTask>> &tasks, const std::map<std::shared_ptr<DataFile>, std::shared_ptr<FileLocation>> &file_locations)

Create a standard job.

Parameters:
  • tasks – a list of tasks (which must be either READY, or children of COMPLETED tasks or of tasks also included in the list)

  • file_locations – a map that specifies locations where files, if any, should be read/written. When empty, it is assumed that the ComputeService’s scratch storage space will be used.

Returns:

the standard job

std::shared_ptr<StandardJob> createStandardJob(const std::vector<std::shared_ptr<WorkflowTask>> &tasks, const std::map<std::shared_ptr<DataFile>, std::shared_ptr<FileLocation>> &file_locations, std::vector<std::tuple<std::shared_ptr<FileLocation>, std::shared_ptr<FileLocation>>> pre_file_copies, std::vector<std::tuple<std::shared_ptr<FileLocation>, std::shared_ptr<FileLocation>>> post_file_copies, std::vector<std::shared_ptr<FileLocation>> cleanup_file_deletions)

Create a standard job.

Parameters:
  • tasks – a list of tasks (which must be either READY, or children of COMPLETED tasks or of tasks also included in the standard job)

  • file_locations – a map that specifies locations where input/output files, if any, should be read/written. When empty, it is assumed that the ComputeService’s scratch storage space will be used.

  • pre_file_copies – a vector of tuples that specify which file copy operations should be completed before task executions begin.

  • post_file_copies – a vector of tuples that specify which file copy operations should be completed after task executions end.

  • cleanup_file_deletions – a vector of file tuples that specify file deletion operations that should be completed at the end of the job.

Returns:

the standard job

std::shared_ptr<StandardJob> createStandardJob(const std::vector<std::shared_ptr<WorkflowTask>> &tasks, std::map<std::shared_ptr<DataFile>, std::vector<std::shared_ptr<FileLocation>>> file_locations)

Create a standard job.

Parameters:
  • tasks – a list of tasks (which must be either READY, or children of COMPLETED tasks or of tasks also included in the list)

  • file_locations – a map that specifies, for each file, a list of locations, in preference order, where input/output files should be read/written. When unspecified, it is assumed that the ComputeService’s scratch storage space will be used.

Returns:

the standard job

std::shared_ptr<StandardJob> createStandardJob(const std::vector<std::shared_ptr<WorkflowTask>> &tasks, std::map<std::shared_ptr<DataFile>, std::vector<std::shared_ptr<FileLocation>>> file_locations, std::vector<std::tuple<std::shared_ptr<FileLocation>, std::shared_ptr<FileLocation>>> pre_file_copies, std::vector<std::tuple<std::shared_ptr<FileLocation>, std::shared_ptr<FileLocation>>> post_file_copies, std::vector<std::shared_ptr<FileLocation>> cleanup_file_deletions)

Create a standard job.

Parameters:
  • tasks – a list of tasks (which must be either READY, or children of COMPLETED tasks or of tasks also included in the standard job)

  • file_locations – a map that specifies, for each file, a list of locations, in preference order, where input/output files should be read/written. When unspecified, it is assumed that the ComputeService’s scratch storage space will be used.

  • pre_file_copies – a vector of tuples that specify which file copy operations should be completed before task executions begin.

  • post_file_copies – a vector of tuples that specify which file copy operations should be completed after task executions end.

  • cleanup_file_deletions – a vector of file tuples that specify file deletion operations that should be completed at the end of the job.

Returns:

the standard job

S4U_CommPort *getCreatorCommPort()

Return the commport of the job manager’s creator.

Returns:

a CommPort

unsigned long getNumRunningPilotJobs() const

Get the list of currently running pilot jobs.

Returns:

a set of pilot jobs

void kill()

Kill the job manager (brutally terminate the daemon, clears all jobs)

virtual void stop() override

Stop the job manager.

void submitJob(const std::shared_ptr<CompoundJob> &job, const std::shared_ptr<ComputeService> &compute_service, std::map<std::string, std::string> service_specific_args = {})

Submit a compound job to a compute service.

Parameters:
  • job – a compound job

  • compute_service – a compute service

  • service_specific_args – arguments specific for compute services:

    • to a BareMetalComputeService: {{“actionID”, “[hostname:][num_cores]}, …}

      • If no value is provided for a task, then the service will choose a host and use as many cores as possible on that host.

      • If a “” value is provided for a task, then the service will choose a host and use as many cores as possible on that host.

      • If a “hostname” value is provided for a task, then the service will run the task on that host, using as many of its cores as possible

      • If a “num_cores” value is provided for a task, then the service will run that task with this many cores, but will choose the host on which to run it.

      • If a “hostname:num_cores” value is provided for a task, then the service will run that task with the specified number of cores on that host.

    • to a BatchComputeService: {{“-t”:”<int>” (requested number of seconds)},{“-N”:”<int>” (number of requested hosts)},{“-c”:”<int>” (number of requested cores per host)}[,{“actionID”:”[node_index:]num_cores”}] [,{“-u”:”<string>” (username)}]}

    • to a VirtualizedClusterComputeService: {} (jobs should not be submitted directly to the service)}

    • to a CloudComputeService: {} (jobs should not be submitted directly to the service)}

    • to a HTCondorComputeService:

      • For a “grid universe” job that will be submitted to a child BatchComputeService: {{“-universe”:”grid”, {“-t”:”<int>” (requested number of seconds)},{“-N”:”<int>” (number of requested hosts)},{“-c”:”<int>” (number of requested cores per host)}[,{“-service”:”<string>” (BatchComputeService service name)}] [, {“actionID”:”[node_index:]num_cores”}] [, {“-u”:”<string>” (username)}]}

      • For a “non-grid universe” job that will be submitted to a child BareMetalComputeService: {}

void submitJob(const std::shared_ptr<PilotJob> &job, const std::shared_ptr<ComputeService> &compute_service, std::map<std::string, std::string> service_specific_args = {})

Submit a pilot job to a compute service.

Parameters:
  • job – a pilot job

  • compute_service – a compute service

  • service_specific_args – arguments specific for compute services:

void submitJob(const std::shared_ptr<StandardJob> &job, const std::shared_ptr<ComputeService> &compute_service, std::map<std::string, std::string> service_specific_args = {})

Submit a standard job to a compute service.

Parameters:
  • job – a standard job

  • compute_service – a compute service

  • service_specific_args – arguments specific for compute services:

    • to a BareMetalComputeService: {{“taskID”, “[hostname:][num_cores]}, …}

      • If no value is provided for a task, then the service will choose a host and use as many cores as possible on that host.

      • If a “” value is provided for a task, then the service will choose a host and use as many cores as possible on that host.

      • If a “hostname” value is provided for a task, then the service will run the task on that host, using as many of its cores as possible

      • If a “num_cores” value is provided for a task, then the service will run that task with this many cores, but will choose the host on which to run it.

      • If a “hostname:num_cores” value is provided for a task, then the service will run that task with the specified number of cores on that host.

    • to a BatchComputeService: {{“-t”:”<int>” (requested number of seconds)},{“-N”:”<int>” (number of requested hosts)},{“-c”:”<int>” (number of requested cores per host)}[,{“taskID”:”[node_index:]num_cores”}] [,{“-u”:”<string>” (username)}]}

    • to a VirtualizedClusterComputeService: {} (jobs should not be submitted directly to the service)}

    • to a CloudComputeService: {} (jobs should not be submitted directly to the service)}

    • to a HTCondorComputeService:

      • For a “grid universe” job that will be submitted to a child BatchComputeService: {{“-universe”:”grid”, {“-t”:”<int>” (requested number of seconds)},{“-N”:”<int>” (number of requested hosts)},{“-c”:”<int>” (number of requested cores per host)}[,{“-service”:”<string>” (BatchComputeService service name)}] [, {“taskID”:”[node_index:]num_cores”}] [, {“-u”:”<string>” (username)}]}

      • For a “non-grid universe” job that will be submitted to a child BareMetalComputeService: {}

void terminateJob(const std::shared_ptr<CompoundJob> &job)

Terminate a compound job that hasn’t completed/expired/failed yet.

Parameters:

job – the job to be terminated

void terminateJob(const std::shared_ptr<PilotJob> &job)

Terminate a pilot job that hasn’t completed/expired/failed yet.

Parameters:

job – the job to be terminated

void terminateJob(const std::shared_ptr<StandardJob> &job)

Terminate a standard job that hasn’t completed/expired/failed yet.

Parameters:

job – the job to be terminated