Relay MPI

The Conduit Relay MPI library enables MPI communication using conduit::Node instances as payloads. It provides two categories of functionality: Known Schema Methods and Generic Methods. These categories balance flexibility and performance tradeoffs. In all cases the implementation tries to avoid unnecessary reallocation, subject to the constraints of MPI’s API input requirements.

Known Schema Methods

Methods that transfer a Node’s data, assuming the schema is known. They assume that Nodes used for output are implicitly compatible with their sources.

Supported MPI Primitives:
  • send/recv
  • isend/irecv
  • reduce/all_reduce
  • broadcast
  • gather/all_gather

For both point to point and collectives, here is the basic logic for how input Nodes are treated by these methods:

  • For Nodes holding data to be sent:
  • If the Node is compact and contiguously allocated, the Node’s pointers are passed directly to MPI.
  • If the Node is not compact or not contiguously allocated, the data is compacted to temporary contiguous buffers that are passed to MPI.
  • For Nodes used to hold output data:
  • If the output Node is compact and contiguously allocated, the Node’s pointers are passed directly to MPI.
  • If the output Node is not compact or not contiguously allocated, a Node with a temporary contiguous buffer is created and that buffer is passed to MPI. An update call is used to copy out the data from the temporary buffer to the output Node. This avoids re-allocation and modifying the schema of the output Node.

Generic Methods

Methods that transfer both a Node’s data and schema. These are useful for generic messaging, since the schema does not need to be known by receiving tasks. The semantics of MPI place constraints on what can be supported in this category.

Supported MPI Primitives:
  • send/recv
  • gather/all_gather
  • broadcast
Unsupported MPI Primitives:
  • isend/irecv
  • reduce/all_reduce

For both point to point and collectives, here is the basic logic for how input Nodes are treated by these methods:

  • For Nodes holding data to be sent:
  • If the Node is compact and contiguously allocated:
  • The Node’s schema is sent as JSON
  • The Node’s pointers are passed directly to MPI
  • If the Node is not compact or not contiguously allocated:
  • The Node is compacted to temporary Node
  • The temporary Node’s schema is sent as JSON
  • The temporary Nodes’s pointers are passed to MPI
  • For Nodes used to hold output data:
  • If the output Node is not compatible with the received schema, it is reset using the received schema.
  • If the output Node is compact and contiguously allocated, its pointers are passed directly to MPI.
  • If the output Node is not compact or not contiguously allocated, a Node with a temporary contiguous buffer is created and that buffer is passed to MPI. An update call is used to copy out the data from the temporary buffer to the output Node. This avoids re-allocation and modifying the schema of the output Node.