User Documentation

WRENCH 101 is a page and a set of documents that provide detailed information for each WRENCH's classes of users, and higher-level content than the API Reference. For instructions on how to install, run a first example, or create a basic WRENCH-based simulator, please refer to their respective sections in the documentation.

This User 101 guide describes all the WRENCH simulation components (building blocks) necessary to build a custom simulator and run simulation scenarios.

10,000-ft view of a WRENCH-based simulator

A WRENCH-based simulator can be as simple as a single main() function that first creates a platform to be simulated (the hardware) and a set of services that run on the platform (the software). These services correspond to software that knows how to store data, perform computation, and many other useful things that real-world cyberinfrastructure services can do.

The simulator then needs to create a workflow (or a set of workflows) to be executed, which consists of a set of compute tasks each with input and output files, and thus data-dependencies. A special service is then created, called a Workflow Management System (WMS), that will be in charge of executing the workflow on the platform. (This service must have been implemented by a WRENCH "developer", i.e., a user that has used the Developer API). The set of input files to the workflow, if any, are staged on the platform at particular storage locations.

The simulation is then launched via a single call. When this call returns, the WMS has terminated (typically after completing the execution of the workflow, or failing to executed it) and the simulation output can be analyzed.

Blueprint for a WRENCH-based simulator

Here are the steps that a WRENCH-based simulator typically follows:

  1. Create and initialize a simulation – In WRENCH, a user simulation is defined via the wrench::Simulation class. An instance of this class must be created, and the wrench::Simulation::init() method is called to initialize the simulation (and parse WRENCH-specific and SimGrid-specific command-line arguments).
  2. Instantiate a simulated platform – This is done with the wrench::Simulation::instantiatePlatform() method which takes as argument a SimGrid virtual platform description file. Any SimGrid simulation must be provided with the description of the platform on which an application/system execution is to be simulated (compute hosts, clusters of hosts, storage resources, network links, routers, routes between hosts, etc.)
  3. Instantiate services on the platform – The wrench::Simulation::add() method is used to add services to the simulation. Each class of service is created with a particular constructor, which also specifies host(s) on which the service is to be started. Typical kinds of services include compute services, storage services, network proximity services, and file registry services.
  4. Create at least one workflow – This is done by creating an instance of the wrench::Workflow class, which has methods to manually add tasks and files to the workflow application, but also methods to import workflows from standard workflow description files (DAX and JSON). If there are input files to the workflow's entry tasks, these must be staged on instantiated storage services.
  5. Instantiate at least one WMS per workflow – At least one of the services instantiated must be a wrench::WMS instance, i.e., a service that is in charge of executing the workflow, as implemented by a WRENCH "developer" using the Developer API. Associating a workflow to a WMS is done via the wrench::WMS::addWorkflow() method.
  6. Launch the simulation – This is done via the wrench::Simulation::launch() call which first sanity checks the simulation setup and then launches all simulated services, until all WMS services have exited (after they have completed or failed to complete workflows).
  7. Process simulation output – The wrench::Simulation::getOutput() method returns an object that is a collection of time-stamped traces of simulation events. These traces can be processed/analyzed at will.

Available services

To date, these are the (simulated) services that can be instantiated on the simulated platform:

  • Compute Services (classes that derive wrench::ComputeService): These are services that know how to compute workflow tasks. These include bare-metal servers (wrench::MultihostMulticoreComputeService), cloud platforms (wrench::CloudService), virtualized cluster platforms (wrench::VirtualizedClusterService), batch-scheduled clusters (wrench::BatchService). It is not technically required to instantiate a compute service, but then no workflow task can be executed by the WMS.
  • Storage Services (classes that derive wrench::StorageService): These are services that know how to store workflow files, which can then be accessed in reading/writing by the compute services when executing tasks that read/write files. It is not technically required to instantiate a storage service, but then no workflow task can have an input or an output file.
  • File Registry Services (the wrench::FileRegistryService class): These services, often known as replica catalogs, are simply databases of <filename, list of locations> key-value pairs of the storage services on which a copies of files are available. They are used during workflow execution to decide where input files for tasks can be acquired. It is not required to instantiate a file registry service, unless the workflow's entry tasks have input files (because in this case these files have to be stored at some storage services before the execution can start, and all file registry service are then automatically made aware of where these files are stored). Note that some WMS implementations may complain if no file registry service is available.
  • Network Proximity Services (the class wrench::NetworkProximityService): These are services that monitor the network and maintain a database of host-to-host network distances. This database can be queried by WMSs to make informed decisions, e.g., to pick from which storage service a file should be retrieved so as to reduce communication time. Typically, network distances are estimated based on round-trip-times between hosts. It is not required to instantiate a network proximity service, but some WMS implementations may complain if none is available.
  • Workflow Management Systems (WMSs) (classes that derive wrench::WMS): A workflow management system provides the mechanisms for executing workflow applications, include decision-making for optimizing various objectives (the most common one is to minimize workflow execution time). By default, WRENCH does not provide a WMS implementation as part of its core components, however a simple implementation (wrench::SimpleWMS) is available in the examples/simple-example folder. Please, refer to the Developer 101 Guide section for further information on how to develop a WMS. At least one WMS should be provided for running a simulation. Additional WMSs implementations may also be found in the WRENCH project website.

Customizing Services

Each service is customizable by passing to its constructor a property list, i.e., a key-value map where each key is a property and each value is a string. Each service defines a property class. For instance, the wrench::Service class has an associated wrench::ServiceProperty class, the wrench::ComputeService class has an associated wrench::ComputeServiceProperty class, and so on at all levels of the service class hierarchy.

The API documentation for these property classes explains what each property means, what possible values are, and what default values are. Other properties have more to do with what the service can or should do when in operation. For instance, the wrench::BatchServiceProperty class defines a wrench::BatchServiceProperty::BATCH_SCHEDULING_ALGORITHM which specifies what scheduling algorithm a batch service should use for prioritizing jobs. All property classes inherit from the wrench::ServiceProperty class, and one can explore that hierarchy to discover all possible (and there are many) service customization opportunities.

Finally, each service exchanges messages on the network with other services (e.g., a WMS service sends a "do some work" message to a compute service). The size in bytes, or payload, of all messages can be customized similarly to the properties, i.e., by passing a key-value map to the service's constructor. For instance, the wrench::ServiceMessagePayload class defines a wrench::ServiceMessagePayload::STOP_DAEMON_MESSAGE_PAYLOAD property which can be used to customize the size, in bytes, of the control message sent to the service daemon (that is the entry point to the service) to tell it to terminate. Each service class has a corresponding message payload class, and the API documentation for these message payload classes details all messages whose payload can be customized.

Customizing logging

When running a WRENCH simulator you will notice that there is quite a bit of logging output. While logging output can be useful to inspect visually the way in which the simulation proceeds, it often becomes necessary to disable it. WRENCH's logging system is a thin layer on top of SimGrid's logging system, and as such is controlled via command-line arguments. The simple example in examples/simple-example is executed as follows, assuming the working directory is examples/simple-example:

``` ./wrench-simple-example-cloud platform_files/cloud_hosts.xml workflow_files/genome.dax ```

One first way in which to modify logging is to disable colors, which can be useful to redirect output to a file, is to use the --wrench-no-color command-line option, anywhere in the argument list, for instance:

``` ./wrench-simple-example-cloud –wrench-no-color platform_files/cloud_hosts.xml workflow_files/genome.dax ```

Disabling all logging is done with the SimGrid option --log=root.threshold:critical:

``` ./wrench-simple-example-cloud –log=root.threshold:critical platform_files/cloud_hosts.xml workflow_files/genome.dax ```

Particular "log categories" can be toggled on and off. Log category names are attached to *.cpp files in the WRENCH and SimGrid code. Using the --help-log-categories option shows the entire log category hierarchy. For instance, there is a log category that is called wms for the WMS, i.e., those logging messages in the wrench:WMS class and a log category that is called simple_wms for logging message in the wrench::SimpleWMS class, which inherits from wrench::WMS. These messages are thus logging output produced by the WMS in the simple example. They can be enabled while other messages are disabled as follows:

``` ./wrench-simple-example-cloud platform_files/cloud_hosts.xml workflow_files/genome.dax –log=root.threshold:critical –log=simple_wms.threshold=debug –log=wms.threshold=debug ```

Use the --help-logs option displays information on the way SimGrid logging works. See the full SimGrid logging documentation for all details.

Analyzing Simulation Output

Once the wrench::Simulation::launch() method has returned, it is possible to process time-stamped traces to analyze simulation output. The wrench::Simulation::getOutput() method returns an instance of wrench::SimulationOutput. This object has a templated wrench::SimulationOutput::getTrace() method to retrieve traces for various information types. For instance, the call ``` simulation.getOutput().getTrace<wrench::SimulationTimestampTaskCompletion>() ``` returns a vector of time-stamped task completion events. The classes that implement time-stamped events are all classes named wrench::SimulationTimestampSomething, where Something is pretty self-explanatory (e.g., TaskCompletion).