Creating a HTCondor compute service

Overview

HTCondor is a workload management framework that supervises task executions on local and remote resources. HTCondor is composed of six main service daemons (startd, starter, schedd, shadow, negotiator, and collector). In addition, each host on which one or more of these daemons is spawned must also run a master daemon, which controls the execution of all other daemons (including initialization and completion).

Creating an HTCondor Service

HTCondor is composed of a pool of resources in which jobs are submitted to perform their computation. In WRENCH, an HTCondor service represents a compute service (wrench::ComputeService), which is defined by the wrench::HTCondorComputeService class. An instantiation of an HTCondor service requires the following parameters:

The set of compute services may represent any computing instance natively provided by WRENCH (e.g., bare-metal servers, cloud platforms, batch-scheduled clusters, etc.) or additional services derived from the wrench::ComputeService base class. The example below creates an instance of an HTCondor service with a pool of resources containing a Bare-metal server:

// Simulation
wrench::Simulation simulation;
simulation.init(&argc, argv);
// Create bare-metal service
std::set<wrench::ComputeService *> compute_services;
compute_services.insert(new wrench::BareMetalComputeService(
"execution_hostname",
{std::make_pair(
"execution_hostname",
std::make_tuple(wrench::Simulation::getHostNumCores("execution_hostname"),
"/scratch/"));
auto compute_service = simulation->add(
"local",
std::move(compute_services),
));

Anatomy of the HTCondor Service

In WRENCH, we implement the 3 fundamental HTCondor services, implemented as particular sets of daemons. The Job Execution Service consists of a startd daemon, which adds the host on which it is running to the HTCondor pool, and of a starter daemon, which manages task executions on this host. The Central Manager Service consists of a collector daemon, which collects information about all other daemons, and of a negotiator daemon, which performs task/resource matchmaking. The Job Submission Service consists of a schedd daemon, which maintains a queue of tasks, and of several instances of a shadow daemon, each of which corresponds to a task submitted to the Condor pool for execution.

std::shared_ptr< T > add(T *t)
Method to add a service to the simulation.
Definition: Simulation.h:71
A workload management framework compute service.
Definition: HTCondorComputeService.h:27
A compute service that manages a set of multi-core compute hosts and provides access to their resourc...
Definition: BareMetalComputeService.h:48
void init(int *, char **)
Initialize the simulation, which parses out WRENCH-specific and SimGrid-specific command-line argumen...
Definition: Simulation.cpp:92
static double getHostMemoryCapacity(std::string hostname)
Get the memory capacity of a host given a hostname.
Definition: Simulation.cpp:673
static unsigned long getHostNumCores(std::string hostname)
Get the number of cores of a host given a hostname.
Definition: Simulation.cpp:682
A class that provides basic simulation methods. Once the simulation object has been explicitly or imp...
Definition: Simulation.h:45
static const std::string SUPPORTS_PILOT_JOBS
Whether the compute service supports pilot jobs (true or false)
Definition: ComputeServiceProperty.h:26