Interacting with a batch compute service
A job, either wrench::StandardJob
or wrench::PilotJob
, can be
submitted to a wrench::BatchComputeService
by a call to the
wrench::JobManager::submitJob()
method. However, it is required
to pass to it a service-specific argument. This argument is a
std::map<std::string, std::string>
of key-value pairs, and must have
the following three elements:
key:
-t
; value: a requested runtime in seconds that, if exceeded, causes forceful job termination (e.g., “60”);key:
-N
; value: a requested number of compute nodes (e.g., “2”); andkey:
-c
; value: a requested number of cores per compute nodes (e.g., “4”).
You may note that the above corresponds to the arguments that must be provided to the Slurm batch scheduler.
Here is an example job submission to the batch service:
std::shared_ptr<wrench::BatchComputeService> some_batch_service;
// Create a job manager
auto job_manager = this->createJobManager();
// Create a job
auto job = job_manager->createStandardJob(tasks);
// Create service-specific arguments
std::map<std::string, std::string> service_specific_args;
// The job will run no longer than 1 hour
service_specific_args["-t"] = "3600";
// The job will run on 2 compute nodes
service_specific_args["-N"] = "2";
// The job will use 4 cores on each compute nodes
service_specific_args["-c"] = "4";
// Submit the job
job_manager->submitJob(job, some_batch_service, service_specific_args);
// Wait for and process the next event
this->waitForAndProcessNextEvent();
If the service-specific arguments are invalid (e.g., number of nodes too
large), wrench::JobManager::submitJob()
method throws a
wrench::ExecutionException
.
See the execution controller implementation in
examples/workflow_api/basic-examples/batch-bag-of-tasks/TwoTasksAtATimeBatchWMS.cpp
for a more complete example.
A batch compute service also supports pilot jobs. Once started, a pilot job exposes a temporary (only running until its containing pilot job expires) bare-metal compute service. Here is a simple code excerpt:
// create a pilot job
auto pilot_job = job_manager->createPilotJob();
// submit it to the batch compute service, asking for 2 10-core nodes for 1200 seconds
std::map<std::string, std::string> service_specific_arguments =
{{"-N","2"},{"-c","10"},{"-t","1200"}};
job_manager->submitJob(pilot_job, some_batch_service, service_specific_arguments);
// Waiting for the next event (which will be a pilot job start event)
this->waitForAndProcessNextEvent();
// Get a reference to the bare-metal compute service running on the pilot job
auto cs = pilot_job->getComputeService();
// Start using the bare-metal compute service
[...]
While the pilot job is running, standard jobs can be submitted to its bare-metal service.
See the execution controller implementation in
examples/workflow_api/basic-examples/batch-pilot-job/PilotJobWMS.cpp
for a more complete example.