wrench::BatchComputeService

class BatchComputeService : public wrench::ComputeService

A batch-scheduled compute service that manages a set of compute hosts and controls access to their resource via a batch queue.

In the current implementation of this service, like for many of its real-world counterparts, memory_manager_service partitioning among jobs onq the same host is not handled. When multiple jobs share hosts, which can happen when jobs require only a few cores per host and can thus be co-located on the same hosts in a non-exclusive fashion, each job simply runs as if it had access to the full RAM of each compute host it is scheduled on. The simulation of these memory_manager_service contended scenarios is thus, for now, not realistic as there is no simulation of the effects of memory_manager_service sharing (e.g., swapping).

Public Functions

BatchComputeService(const std::string &hostname, std::vector<std::string> compute_hosts, const std::string &scratch_space_mount_point, WRENCH_PROPERTY_COLLECTION_TYPE property_list = {}, WRENCH_MESSAGE_PAYLOAD_COLLECTION_TYPE messagepayload_list = {})

Constructor.

Parameters:
  • hostname – the hostname on which to start the service

  • compute_hosts – the list of names of the available compute hosts

    • the hosts must be homogeneous (speed, number of cores, and RAM size)

    • all cores are usable by the BatchComputeService service on each host

    • all RAM is usable by the BatchComputeService service on each host

  • scratch_space_mount_point – the mount point of the scratch storage space for the service (”” means “no scratch space”)

  • property_list – a property list that specifies BatchComputeServiceProperty values ({} means “use all defaults”)

  • messagepayload_list – a message payload list that specifies BatchComputeServiceMessagePayload values ({} means “use all defaults”)

std::vector<std::tuple<std::string, std::string, int, int, int, double, double>> getQueue()

Gets the state of the BatchComputeService queue.

Returns:

A vector of tuples:

  • std::string: username

  • string: job name

  • int: num hosts

  • int: num cores per host

  • int: time in seconds

  • double: submit time

  • double: start time (-1.0 if not started yet)

std::map<std::string, double> getStartTimeEstimates(std::set<std::tuple<std::string, unsigned long, unsigned long, sg_size_t>> set_of_jobs)

Retrieve start time estimates for a set of job configurations.

Parameters:

set_of_jobs – the set of job configurations, each of them with an id. Each configuration is a tuple as follows:

  • a configuration id (std::string)

  • a number of hosts (unsigned long)

  • a number of cores per host (unsigned long)

  • a duration in seconds (double)

Returns:

start date predictions in seconds (as a map of ids). A prediction that’s negative means that the job configuration can not run on the service (e.g., not enough hosts, not enough cores per host)

void reclaimHosts(const std::set<std::string> &hostnames)

Reclaim a set of hosts, which will: (i) terminate the jobs that are running on any of these hosts; and (ii) make these hosts unavailable until they are released.

Parameters:

hostnames – the list of the hostnames of the hosts to reclaim

void releaseHosts(const std::set<std::string> &hostnames)

Release hosts that were previously reclaimed (as a set)

Parameters:

hostnames – list of hostnames of the hosts to release

virtual bool supportsCompoundJobs() override

Returns true if the service supports compound jobs.

Returns:

true or false

virtual bool supportsFunctions() override

Returns true if the service supports functions.

Returns:

true or false

virtual bool supportsPilotJobs() override

Returns true if the service supports pilot jobs.

Returns:

true or false

virtual bool supportsStandardJobs() override

Returns true if the service supports standard jobs.

Returns:

true or false