Extensible Connection-oriented Messaging (XCM)
|
This is the documentation for the Extensible Connection-oriented Messaging (XCM) programming APIs.
XCM consists the core API in xcm.h, an address helper library API in xcm_addr.h, and the attribute APIs in xcm_attr.h and xcm_attr_map.h. Obsolete, but still present, functions are available in xcm_compat.h
The low API/ABI version number is a result of all XCM releases being backward compatible, and thus left the major version at 0.
XCM is an inter-process communication API and an implementation of this C API in the form a shared library. For certain transports, XCM may also be used to denote a tiny bit of wire protocol, providing framing over byte stream transport protocols.
The primary component of the XCM library is a set of inter-process communication transports - all hosted under the same API. A XCM transport provides a connection-oriented, reliable service, with in-order delivery. There are two types of transports; one providing a messaging service and another providing a byte stream.
The XCM API allows a straight-forward mapping to TLS, TCP and SCTP for remote communcation, as well as more efficient inter-process commmunication (IPC) mechanisms for local communication.
This document focuses on the API, but also contains information specific to the implementation.
XCM reuses much of the terminology of the BSD Sockets API. Compared to the BSD Socket API, XCM has more uniform semantics across underlying transports.
XCM implements a connection-oriented, client-server model. The server process creates one or more server sockets (e.g, with xcm_server()) bound to a specific address, after which clients may successfully establish connections to the server. When a connection is establishment, two connection sockets will be created; one on the server side (e.g., returned from xcm_accept()), and one of the client side (e.g., returned from xcm_connect()). Thus, a server serving multiple clients will have multiple sockets; one server socket and N connection sockets, one each for every client. A client will typically have one connection socket for each server it is connected to.
User application data (messages or bytes, depending on service type) are always sent and received on a particular connection socket - never on a server socket.
A XCM transport either provides a messaging or a byte stream service.
Messaging transports preserve message boundaries across the network. The buffer passed to xcm_send() constitutes one (and only one) message. What's received on the other end, in exactly one xcm_receive() call, is a buffer with the same length and contents.
The UX Transport, TCP Transport, TLS Transport, UTLS Transport, and SCTP Transport all provide a messaging type service.
For byte streams, there's no such thing as message boundaries: the data transported on the connection is just a sequence of bytes. The fact that xcm_send() accepts an array of bytes of a particular length, as opposed to individual bytes one-by-one, is a mere performance optimization.
For example, if two messages "abc" and "d" are passed to xcm_send() on to a messaging transport, they will arrive as "abc" and "d" in exactly two xcm_receive() call on the receiver. On a byte stream transport however, all the data "abcd" may arrive in a single xcm_receive(), or it may arrive in multiple calls, such as three calls, each producing "ab", "c", and "d", respectively, or any other combination.
The BTLS Transport transport provides a byte stream service.
Applications that allow the user to configure an arbitrary XCM address, but are designed to handle only a certain service type, may limit what type of sockets may be instantiated to be of only the messaging service type, or only byte stream, by passing the "xcm.service" attribute with the appropriate value (see Generic Attributes for details) at the time of socket creation. Because of XCM's history as a messaging-only framework, "xcm.service" defaults to "messaging".
Applications which are designed to handle both messaging and byte stream transports may retrieve the value of "xcm.service" and use it to differentiate the treatment where so is required (e.g., in xcm_send() return code handling).
Connections spawned off a server socket (e.g., with xcm_accept()) always have the same service type as their parent socket.
In-order delivery - that data arrives at the recipient in the same order it was sent by the sender - is guaranteed, but only for data sent on the same connection.
XCM transports support flow control. Thus, if the sender message rate or bandwidth is higher than the network or the receiver can handle on a particular connection, xcm_send() in the sender process will eventually block (or return an error EAGAIN, if in non-blocking mode). Unless XCM is used for bulk data transfer (as oppose to signaling traffic), xcm_send() blocking because of slow network or a slow receiver should be rare in practice. TCP, TLS, and UNIX domain socket transports all have large protocol windows and/or socket buffers to allow a large amount of outstanding data.
In XCM, the application is in control of which transport will be used, using the address supplied to xcm_connect() and xcm_server() including both the transport name and the transport address.
However, there is nothing preventing a XCM transport to use a more abstract addressing format, and internally include multiple underlying IPC transport mechanism. This model is implemented by the UTLS Transport.
Addresses are represented as strings with the following general syntax: <transport-name>:<transport-address>
For the UX UNIX Domain Socket transport, the addresses has this more specific form:
The addresses of the UXF UNIX Domain Socket transport variant have the following format:
For the TCP, TLS, UTLS, SCTP and BTLS transports the syntax is:
'*' is a shorthand for '0.0.0.0' (i.e. bind to all IPv4 interfaces). '[*]' is the IPv6 equivalent, creating a server socket accepting connections on all IPv4 and IPv6 addresses.
IPv6 link local addresses are not supported.
Some examples addresses:
For TCP, TLS, UTLS, SCTP and BTLS server socket addresses, the port can be set to 0, in which case XCM (or rather, the Linux kernel) will allocate a free TCP port from the local port range.
For transports allowing a DNS domain name as a part of the address, the transport will attempt resoĺv the name to an IP address. A DNS domain name may resolv to zero or more IPv4 addresses and/or zero or more IPv6 addresses. XCM relies on the operating system to prioritize between IPv4 and IPv6.
XCM accepts IPv4 addresses in the dotted-decimal format
XCM allows only complete addresses with three '.', and not the archaic, classful, forms, where some bytes where left out, and thus the address contained fewer separators.
XCM transports attempt to detect a number of conditions which can lead to lost connectivity, and does so even on idle connections.
If the remote end closes the connection, the local xcm_receive() will return 0. If the process on the remote end crashed, xcm_receive() will return -1 and set errno ECONNRESET. If network connectivity to the remote end is lost, xcm_receive() will return -1 and errno will be set to ETIMEDOUT.
In general, XCM follows the UNIX system API tradition when it comes to error handling. Where possible, errors are signaled to the application by using unused parts of the value range of the function return type. For functions returning signed integer types, this means the value of -1 (in case -1 is not a valid return value). For functions returning pointers, NULL is used to signal that an error has occurred. For functions where neither -1 or NULL can be used, or where the function does not return anything (side-effect only functions), an 'int' is used as the return type, and is used purely for the purpose to signal success (value 0), or an error (-1) to the application.
The actual error code is stored in the thread-local errno variable. The error codes are those from the fixed set of errno values defined by POSIX, found in errno.h. Standard functions such as perror() and strerror() may be used to turn the code into a human-readable string.
In non-blocking operation, given the fact the actual transmission might be defered (and the message buffered in the XCM layer), and that message receive processing might happen before the application has called receive, the error being signaled at the point of a certain XCM call might not be a direct result of the requested operation, but rather an error discovered previously.
The documentation for xcm_finish() includes a list of generic error codes, applicable xcm_connect(), xcm_accept(), xcm_send() and xcm_receive().
Also, for errors resulting in an unusable connection, repeated calls will produce the same errno.
In UNIX-style event-driven programming, a single application thread handles multiple clients (and thus multiple XCM connection sockets) and the task of accepting new clients on the XCM server socket concurrently (although not in parallel). To wait for events from multiple sources, an I/O multiplexing facility such as select(2), poll(2) or epoll(2) is used.
Each XCM socket is represented by a single fd, retrieved with xcm_fd(). The fd number and underlying file object is stable across the life-time of the socket.
On BSD Sockets, the socket fd being readable means it's likely that the application can successfully read data from the socket. Similarily, a fd marked writable by, for example, poll() means that the application is likely to be able to write data to the BSD Sockets fd. For an application using XCM going into select(), it must always wait for all the fds its XCM sockets to become readable (e.g. being in the readfds
in the select() call), regardless what are their target conditions. Thus, even if the application is waiting for an opportunity to try to send a message on a XCM socket, or it doesn't want to do anything with the socket, it must wait for the socket fd to become readable. Not wanting to do nothing here means that the application has the xcm_await() condition set to 0, and is neither interested in waiting to call xcm_send(), xcm_receive(), nor xcm_accept() on the socket. An application may never leave a XCM socket unattended in the sense its fd is not in the set of fds passed to select() and/or xcm_send(), xcm_receive(), xcm_accept() or xcm_finish() are not called.
XCM is oblivious to what I/O multiplexing mechanism employed by the application. It may call select(), poll() or epoll_wait() directly, or make use of any of the many available event loop libraries (such as libevent). For simplicity, select() is used in this documentation to denote the whole family of Linux I/O multiplexing facilities.
An event-driven application needs to set the XCM sockets it handles into non-blocking mode, by calling xcm_set_blocking(), setting the "xcm.blocking" socket attribute, or using the XCM_NONBLOCK flag in xcm_connect().
For XCM sockets in non-blocking mode, all potentially blocking API calls related to XCM connections - xcm_connect(), xcm_accept(), xcm_send(), and xcm_receive() - finish immediately.
For xcm_send(), xcm_connect() and xcm_accept(), XCM signaling success means that the XCM layer has accepted the request. It may or may not have completed the operation.
In case the XCM_NONBLOCK flag is set in the xcm_connect() call, or in case the a XCM server socket is in non-blocking mode at the time of a xcm_accept() call, the newly created XCM connection returned to the application may be in a semi-operational state, with some internal processing and/or signaling with the remote peer still required before actual message transmission and reception may occur.
The application may attempt to send or receive messages on such semi-operational connections.
There are ways for an application to determine when connection establishment or the task of accepting a new client have completed. See Finishing Outstanding Tasks for more information.
To receive a message on a XCM connection socket in non-blocking mode, the application may need to wait for the right conditions to arise (i.e. a message being available). The application needs to inform the socket that it wants to receive by calling xcm_await() with the XCM_SO_RECEIVABLE
bit in the condition
bit mask set. It will pass the fd it received from xcm_fd() into select(), asking to get notified when the fd becomes readable. When select() marks the socket fd as readable, the application should issue xcm_receive() to attempt to retrieve a message.
xcm_receive() may also called on speculation, prior to any select() call, to poll the socket for incoming messages.
A XCM connection socket may have a number of messages buffered, and applications should generally, for optimal performance, repeat xcm_receive() until it returns an error, and errno is set to EAGAIN.
Similarly to receiving a message, an application may set the XCM_SO_SENDABLE
bit in the condition
bit mask, if it wants to wait for a socket state where it's likely it can successfully send a message. When select() marks the socket fd as readable, the application should attempt to send a message.
Just like with xcm_receive(), it may also choose to issue a xcm_send() call on speculation (i.e. without going into select()), which is often a good idea for performance reasons.
For send operations on non-blocking connection sockets, XCM may buffer whole or part of the message (or data, for byte stream transports) before transmission to the lower layer. This may be due to socket output buffer underrun, or the need for some in-band signaling, like cryptographic key exchange, to happen before the transmission of the complete message may finish. The XCM layer will (re-)attempt to hand the message over to the lower layer at a future call to xcm_finish(), xcm_send(), or xcm_receive().
For applications wishing to determine when all buffered data have successfully been deliver to the lower layer, may use xcm_finish() to do so. Normally, applications aren't expected to require this kind of control. Please also note that the fact a message has left the XCM layer doesn't necessarily mean it has successfully been delivered to the recipient. In particular, if for some reason the data can be dispatched immediately, it may be lingering in kernel buffers. Such buffers may be discarded in case the application close the connection.
xcm_connect(), xcm_accept(), xcm_send() may all leave the socket in a state where work is initiated, but not completed. In addition, the socket may have pending internal tasks, such flushing the output buffer into the TCP/IP stack, processing XCM control interface messages, or finishing the TLS hand shake procedure.
After waking up from a select() call, where a particular XCM non-blocking socket's fd is marked readable, the application must, if no xcm_send(), xcm_receive() or xcm_accept() calls are to be made, call xcm_finish(). This is to allow the socket to finish any outstanding tasks, even in the case the application has no immediate plans for the socket.
Prior to changing a socket from non-blocking to blocking mode, any outstanding tasks should be finished, or otherwise the switch might cause xcm_set_blocking() to return -1 and set errno to EAGAIN.
For example, if a server socket's desired condition has been set (with xcm_await()) to XCM_SO_ACCEPTABLE
, and the application wakes up from select() with the socket's fd marked readable, a call to xcm_accept() may still not produce a new connection socket.
The same holds true when reaching XCM_SO_RECEIVABLE
and a xcm_receive() call is made, and XCM_SO_SENDABLE
and calls to xcm_send().
In this example, the application connects and tries to send a message, before knowing if the connection is actually established. This may fail (for example, in case TCP and/or TLS-level connection establishment has not yet been completed), in which case the application will fall back and wait with the use of xcm_await(), xcm_fd() and select().
In case the application wants to know when the connection establishment has finished, it may use xcm_finish() to do so, like in the below example sequence.
While connecting to a server socket, the client's connection attempt may be refused immediately.
In many cases, the application is handed a connection socket before the connection establishment is completed. Any errors occuring during this process is handed over to the application at a future call to xcm_finish(), xcm_send() or xcm_receive().
In this example the application flushes any internal XCM buffers before shutting down the connection, to ensure that any buffered messages are delivered to the lower layer.
In this sequence, a server accepts a new connection, and continues to attempt to receive a message on this connection, while still, concurrently, is ready to accept more clients on the server socket.
Tied to an XCM server or connection socket is a set of key-value pairs known as attributes. Which attributes are available varies across different transports, and different socket types.
An attribute's name is a string, and follows a hierarchical naming schema. For example, all generic XCM attributes, available in all transports, have the prefix "xcm.". Transport-specific attributes are prefixed with the transport or protocol name (e.g. "tcp." for TCP-specific attributes applicable to the TLS, BTLS, and TCP transports).
An attribute may be read-only, write-only or available both for reading and writing. This is referred to as the attribute's mode. The mode may vary across the lifetime of the socket. For example, an attribute may be writable at the time of the xcm_connect() call, and read-only thereafter.
The attribute value is coded in the native C data type and byte order. Strings are NUL-terminated, and the NUL character is included in the length of the attribute. There are four value types; a boolean type, a 64-bit signed integer type, a string type and a type for arbitrary binary data. See xcm_attr_types.h for details.
The attribute access API is in xcm_attr.h.
Retrieving an integer attribute's value may look like this:
Changing an integer attribyte value may be done in the following manner:
Both of these examples are missing error handling.
XCM allows supplying a set of writable attributes at the time of socket creation, by using the xcm_connect_a(), xcm_server_a(), or xcm_accept_a() functions.
The attribute sets are represented by the xcm_attr_map
type in xcm_attr_map.h.
A somewhat contrived example:
These attributes are expected to be found on XCM sockets regardless of transport type.
For TCP transport-specific attributes, see TCP Socket Attributes, and for TLS, see TLS Socket Attributes.
Attribute Name | Socket Type | Value Type | Mode | Description |
---|---|---|---|---|
xcm.type | All | String | R | The socket type: "server" or "connection". |
xcm.transport | All | String | R | The transport type. |
xcm.service | All | String | RW | The service type: "messaging" or "bytestream". Writable only at the time of socket creation. If specified, it may be used by an application to limit the type of transports being used. The string "any" may be used to signify that any type of service is accepted. The default is "messaging". |
xcm.local_addr | All | String | RW | The local address of a socket. Writable only if supplied to xcm_connect_a() together with a TLS, UTLS or TCP type address. Usually only needs to be written on multihomed hosts, in cases where the application needs to specify the source IP address to be used. Also see xcm_local_addr(). |
xcm.blocking | All | Boolean | RW | See xcm_set_blocking() and xcm_is_blocking(). |
xcm.remote_addr | Connection | String | R | See xcm_remote_addr(). |
xcm.max_msg_size | Connection | Integer | R | The maximum size of any message transported by this connection. |
XCM connections sockets keeps track of the amount of data entering or leaving the XCM layer, both from the application and to the lower layer. Additionally, messaging transports also track the number of messages.
Some of the message and byte counter attributes use the concept of a "lower layer". What this means depends on the transport. For the UX And TCP transports, it is the Linux kernel. For example, for TCP, if the xcm.to_lower_msgs is incremented, it means that XCM has successfully sent the complete message to the kernel's networking stack for further processing. It does not means it has reached the receiving process. It may have, but it also may be sitting on the local or remote socket buffer, on a NIC queue, or be in-transmit in the network. For TLS, the lower layer is OpenSSL.
The counters only reflect data succesfully sent and/or received.
These counters are available on both byte stream and messaging type connection sockets.
The byte counters are incremented with the length of the XCM data (as in the length field in xcm_send()), and thus does not include any underlying headers or other lower layer overhead.
Attribute Name | Socket Type | Value Type | Mode | Description |
---|---|---|---|---|
xcm.from_app_bytes | Connection | Integer | R | Bytes sent from the application and accepted into XCM. |
xcm.to_app_bytes | Connection | Integer | R | Bytes delivered from XCM to the application. |
xcm.from_lower_bytes | Connection | Integer | R | Bytes received by XCM from the lower layer. |
xcm.to_lower_bytes | Connection | Integer | R | Bytes successfully sent by XCM into the lower layer. |
These counters are available only on messaging type connection sockets.
Attribute Name | Socket Type | Value Type | Mode | Description |
---|---|---|---|---|
xcm.from_app_msgs | Connection | Integer | R | Messages sent from the application and accepted into XCM. |
xcm.to_app_msgs | Connection | Integer | R | Messages delivered from XCM to the application. |
xcm.from_lower_msgs | Connection | Integer | R | Messages received by XCM from the lower layer. |
xcm.to_lower_msgs | Connection | Integer | R | Messages successfully sent by XCM into the lower layer. |
XCM includes a control interface, which allows iteration over the OS instance's XCM server and connection sockets (for processes with the appropriate permissions), and access to their attributes (see Socket Attributes).
The control interface is optional by means of build-time configuration.
For each XCM server or connection socket, there is a corresponding UNIX domain socket which is used for control signaling (i.e. state retrieval).
By default, the control interface's UNIX domain sockets are stored in the /run/xcm/ctl
directory.
This directory should to be created prior to running any XCM applications for the control interface to worker properly and should be writable for all XCM users.
A particular process using XCM may be configured to use a non-default directory for storing the UNIX domain sockets used for the control interface by means of setting the XCM_CTL
variable. Please note that using this setting will cause the XCM connections to be not visible globally on the OS instance (unless all other XCM-using processes also are using this non-default directory).
Generally, since the application is left unaware (from an API perspective) of the existence of the control interface, errors are not reported up to the application. They are however logged.
Application threads owning XCM sockets, but which are busy with non-XCM processing for a long duration of time, or otherwise are leaving their XCM sockets unattended to (in violation of XCM API contract), will not respond on the control interface's UNIX domain sockets (corresponding to their XCM sockets). Only the presence of these sockets may be detected, but their state cannot be retrieved.
Internally, the XCM implementation has control interface client library, but this library's API is not public at this point.
XCM includes a command-line program xcmctl
which uses the Control API to iterate of the system's current XCM sockets, and allow access (primarily for debugging purposes) to the sockets' attributes.
Unlike BSD sockets, a XCM socket may not be shared among different threads without synchronization external to XCM. With proper external serialization, a socket may be shared by different threads in the same process, although it might provide difficult in practice since a thread in a blocking XCM function will continue to hold the lock, and thus preventing other threads from accessing the socket at all.
For non-blocking sockets, threads sharing a socket need to agree on what is the appropriate socket condition
to wait for. When this condition is met, all threads are woken up, returning from select().
It is safe to "give away" a XCM socket from one thread to another, provided the appropriate memory fences are used.
These limitations (compared to BSD Sockets) are in place to allow socket state outside the kernel (which is required for TCP framing and TLS).
Sharing a XCM socket between threads in different processes is not possible.
After a fork() call, either of the two process (the parent, or the child) must be designated the owner of every XCM socket the parent owned.
The owner may continue to use the XCM socket normally.
The non-owner may not call any other XCM API call than xcm_cleanup(), which frees local memory tied to this socket in the non-owner's process address space, without impacting the connection state in the owner process.
The core XCM API functions are oblivious to the transports used. However, the support for building, and parsing addresses are available only for a set of pre-defined set of transports. There is nothing preventing xcm_addr.h from being extended, and also nothing prevents an alternative XCM implementation to include more transports without extending the address helper API.
The UX transport uses UNIX Domain (AF_UNIX, also known as AF_LOCAL) Sockets to providing a service of the messaging type.
UX sockets may only be used with the same OS instance (or, more specifically, between processes in the same Linux kernel network namespace).
UNIX Domain Sockets comes in a number of flavors, and XCM uses the SOCK_SEQPACKET variety. SOCK_SEQPACKET sockets are connection-oriented, preserves message boundaries and delivers messages in the same order they were sent; perfectly matching XCM semantics and provides for an near-trivial mapping.
UX is the most efficient of the XCM transports.
The standard UNIX Domain Sockets as defined by POSIX uses the file system as its namespace, with the sockets also being files. However, for simplicity and to avoid situations where stale socket files (originating from crashed processes) causing problems, the UX transport uses a Linux-specific extension, allowing a private UNIX Domain Socket namespace. This is known as the abstract namespace (see the unix(7) man page for details). With the abstract namespace, server socket address allocation has the same life time as TCP ports (i.e. if the process dies, the address is free'd).
The UX transport enables the SO_PASSCRED BSD socket option, to give the remote peer a name (which UNIX domain connection socket doesn't have by default). This is for debugging and observability purposes. Without a remote peer name, in server processes with multiple incoming connections to the same server socket, it's difficult to say which of the server-side connection sockets goes to which remote peer. The kernel-generated, unique, name is an integer in the form "%05x" (printf format). Applications using hardcoded UX addresses should avoid such names by, for example, using a prefix.
The UTLS Transport also indirectly uses the UX namespace, so care should be taken to avoid any clashes between UX and UTLS sockets in the same network namespace.
The UXF transport is identical to the UX transport, only it uses the standard POSIX naming mechanism. The name of a server socket is a file system path, and the socket is also a file.
The UXF sockets resides in a file system namespace, as opposed to UX sockets, which live in a network namespace.
Upon xcm_close(), the socket will be closed and the file removed. If an application crashes or otherwise fails to run xcm_close(), it will leave a file in the file system pointing toward a non-existing socket. This file will prevent the creation another server socket with the same name.
The TCP transport uses the Transmission Control Protocol (TCP), by means of the BSD Sockets API.
TCP is a byte-stream service, but the XCM TCP transport adds framing on top of the stream. A single-field 32-bit header containing the message length in network byte order is added to every message.
TCP uses TCP Keepalive to detect lost network connectivity between the peers.
The TCP transport supports IPv4 and IPv6.
Since XCM is designed for signaling traffic, the TCP transport disables the Nagle algorithm of TCP to avoid its excessive latency.
The read-only TCP attributes are retrieved from the kernel (struct tcp_info in linux/tcp.h).
The read-write attributes are mapped directly to setsockopt() calls.
See the tcp(7) manual page for a more detailed description of these attributes. The struct retrieved with TCP_INFO
is the basis for the read-only attributes. The read-write attributes are mapped to TCP_KEEP*
and TCP_USER_TIMEOUT
.
Attribute Name | Socket Type | Value Type | Mode | Description |
---|---|---|---|---|
tcp.rtt | Connection | Integer | R | The current TCP round-trip estimate (in us). |
tcp.total_retrans | Connection | Integer | R | The total number of retransmitted TCP segments. |
tcp.segs_in | Connection | Integer | R | The total number of segments received. |
tcp.segs_out | Connection | Integer | R | The total number of segments sent. |
tcp.keepalive | Connection | Boolean | RW | Controls if TCP keepalive is enabled. The default is true. |
tcp.keepalive_time | Connection | Integer | RW | The time (in s) before the first keepalive probe is sent on an idle connection. The default is 1 s. |
tcp.keepalive_interval | Connection | Integer | RW | The time (in s) between keepalive probes. The default is 1 s. |
tcp.keepalive_count | Connection | Integer | RW | The number of keepalive probes sent before the connection is dropped. The default is 3. |
tcp.user_timeout | Connection | Integer | RW | The time (in s) before a connection is dropped due to unacknowledged data. The default is 3 s. |
tcp.segs_in
and tcp.segs_out
are only present when running XCM on Linux kernel 4.2 or later.The TLS transport uses the Transport Layer Security (TLS) protocol to provide a secure, private, two-way authenticated transport over TCP. A TLS connection is a byte stream, but the XCM TLS transport adds framing in the same manner as does the XCM TCP transport.
The TLS transport supports IPv4 and IPv6. It disables the Nagle algorithm of TCP.
The TLS transport honors any limitations set by the X.509 extended key usage extension, if present in the remote peer's certificate.
The TLS transport uses only TLS 1.2 and, if the XCM library is built with OpenSSL 1.1.1 or later, TLS 1.3 as well.
TLS 1.2 renegotiation is disabled, if the XCM library is built with OpenSSL 1.1.1c or later.
The TLS transport disables both client and server-side TLS session caching, and thus does not allow for TLS session reuse across TCP connections.
The TLS 1.2 cipher list is (in order of preference, using OpenSSL naming): ECDHE-ECDSA-AES128-GCM-SHA256, ECDHE-ECDSA-AES256-GCM-SHA384, ECDHE-ECDSA-CHACHA20-POLY1305, ECDHE-RSA-AES128-GCM-SHA256, ECDHE-RSA-AES256-GCM-SHA384, ECDHE-RSA-CHACHA20-POLY1305, DHE-RSA-AES128-GCM-SHA256, DHE-RSA-AES256-GCM-SHA384, and DHE-RSA-CHACHA20-POLY1305.
The TLS 1.3 cipher suites used are: TLS_AES_256_GCM_SHA384, TLS_CHACHA20_POLY1305_SHA256 and TLS_AES_128_GCM_SHA256.
The TLS cipher lists are neither build- nor run-time configurable.
The TLS transport reads the leaf certificate and its private key from the file system, as well as a file containing all trusted CA certificates. Default paths are configured at build-time.
TLS Socket Attributes may be used to override one or more of the default paths, on a per-socket basis. Paths set on server sockets are inherited by its connection sockets, but may in turn be overriden at the time of a xcm_accept_a() call, using the proper attributes.
The default paths may also be overriden on a per-process basis by means of a UNIX environment variable. The current value of XCM_TLS_CERT
(at the time of xcm_connect() or xcm_accept()) determines the certificate directory used for that connection.
The TLS transport will, at the time of xcm_connect() or xcm_server(), look up the process' current network namespace, unless that file's path was given as a TLS Socket Attributes. If the namespace is given a name per the iproute2 convention, XCM will retrieve this name and use it in the certificate and key lookup.
In case the certificate, key and trusted CA files are configured using TLS Socket Attributes, no network namespace lookup will be performed.
In the certificate directory (either the compile-time default, or the directory specified with XCM_TLS_CERT
), the TLS transport expects the files to follow the following naming conventions (where <ns> is the namespace):
The private key is stored in:
The trusted CA certificates are stored in:
For the default namespace (or any other network namespace not named according to iproute2 standards), the certificate need to be stored in a file "cert.pem", the private key in "key.pem" and the trusted CA certificates in "tc.pem".
In case the certificate, key or trusted CAs files are not in place (for a particular namespace), a xcm_server() call will return an error and set errno to EPROTO. The application may choose to retry at a later time.
In case a certificate, private key, or trusted CAs file is modified, the new version of the file(s) will be used by new connections. Such a change does not affect already-established connections. The TLS transport works with differences between set of files, and thus the new generation of files need not nesserarily be newer (as in having a more recent file system mtime).
The certificate, key and trusted CA certificates should be updated in an atomic manner, or XCM may end up using the certificate file from one generation of files and the key file from another, for example.
One way of achieving an atomic update is to have the three files in a common directory. This certificate directory is then made a symbolic link to the directory where the actual files are located. Upon update, a new directory is created and populated, and the old symbolic link is replace an atomic manner (i.e. with rename(2)).
By default, on sockets that represent the client side of a XCM TLS connection (e.g., returned from xcm_connect_a()), the XCM TLS transport will act as a TLS client. Similarly, the default behavior for sockets representing the XCM (and TCP) server side of a connection is to act as a TLS server.
The default may be changed by setting the "tls.client" attribute, so that sockets that are XCM (and TCP) level clients, act as TLS servers, and vice versa. If the value is true, the socket will act as a TLS client, and if false, the socket is a TLS server.
Connection sockets created by xcm_accept() or xcm_accept_a() inherit the "tls.client" attribute value from their parent server sockets.
The TLS role must be specified at the time of socket creation, and thus cannot be changed on already-established connections.
By default, both the client and server side authenticate the other peer, often referred to as mutual TLS (mTLS).
TLS remote peer authentication may be disabled by setting the "tls.auth" socket attribute to false.
Connection sockets created by xcm_accept() or xcm_accept_a() inherit the "tls.auth" attribute value from their parent server sockets.
The "tls.auth" socket attribute may only be set at the time of socket creation (except for server sockets).
The TLS transport supports verifying the remote peer's certificate subject name against an application-specified expected name, or a set of names. "Subject name" here is used as per RFC 6125 definition, and is either a Distingushed Name (DN) of the X.509 certificate's subject field, or a DNS type subject alternative name extension. XCM does not make any distinction between the two.
Subject name verification may be enabled by setting the "tls.verify_peer_name" socket attribute to true. It is disabled by default.
If enabled, XCM will verify the hostname in the address supplied in the xcm_connect_a() call. In case the attribute "tls.peer_names" is also supplied, it overrides this behavior. The value of this attribute is a ':'-separated set of subject names.
If there is a non-zero overlap between these two sets, the verification is considered successful. The actual procedure is delegated to OpenSSL. Wildcard matching is disabled (X509_CHECK_FLAG_NO_WILDCARDS
) and the check includes the subject field (X509_CHECK_FLAG_ALWAYS_CHECK_SUBJECT
).
Subject name verification may be used both by a client (in its xcm_connect_a() call) or by a server (in xcm_server_a() or xcm_accept_a()). "tls.peer_names" must be specified in case "tls.verify_peer_name" is set to true on connection sockets created by accepting a TLS connection from a server socket (since there is no hostname to fall back to).
Connection sockets created by xcm_accept() or xcm_accept_a() inherit the "tls.verify_name" and "tls.peer_names" attributes from their parent server sockets.
After a connection is established, the "tls.peer_names" will be updated to reflect the remote peer's actual subject names, as opposed to those which were originally allowed.
OpenSSL refers to this functionality as hostname validation, and that is also how it's usually used. However, the subject name passed in "tls.peer_names" need not be DNS domain name, but can be any kind of name or identifier. All names must follow DNS domain name syntax rules (including label and total length limitations). Also, while uppercase and lowercase letters are allowed in domain names, no significance is attached to the case.
Attribute Name | Socket Type | Value Type | Mode | Description |
---|---|---|---|---|
tls.cert_file | All | String | RW | The leaf certificate file. For connection sockets, writable only at socket creation. |
tls.key_file | All | String | RW | The leaf certificate private key file. For connection sockets, writable only at socket creation. |
tls.tc_file | All | String | RW | The trusted CA certificates bundle. For connection sockets, writable only at socket creation. |
tls.client | All | Boolean | RW | Controls whether to act as a TLS-level client or a server. For connection sockets, writable only |
tls.auth | All | Boolean | RW | Controls whether or not to authenticate the remote peer. For connection sockets, writable only at socket creation. |
tls.verify_peer_name | All | Boolean | RW | Controls if subject name verification should be performed. For connection sockets, writable only at socket creation. |
tls.peer_names | All | String | RW | At socket creation, a list of acceptable peer subject names. After connection establishment, a list of actual peer subject names. For connection sockets, writable only at socket creation. |
tls.peer_subject_key_id | Connection | String | R | The X509v3 Subject Key Identifier of the remote peer, or a zero-length string in case the TLS connection is not established. |
In addition to the TLS-specific attributes, a TLS socket also has all the TCP Socket Attributes.
The UTLS transport provides a hybrid transport, utilizing both the TLS and UX transports internally for actual connection establishment and message delivery.
On the client side, at the time of xcm_connect(), the UTLS transport determines if the server socket can be reached by using the UX transport (i.e. if the server socket is located on the same OS instance, in the same network namespace). If not, UTLS will attempt to reach the server by means of the TLS transport.
For a particular UTLS connection, either TLS or UX is used (never both). XCM connections to a particular UTLS server socket may be a mix of the two different types.
For an UTLS server socket with the address utls:<ip>:<port>
, two underlying addresses will be allocated; tls:<ip>:<port>
and ux:<ip>:<port>
.
In case DNS is used: tls:<hostname>:<port>
and ux:<hostname>:<port>
.
UTLS sockets accept all the TLS Socket Attributes, as well as the Generic Attributes. In case a UTLS connection is being established as a UX connection socket, all TLS attributes are ignored.
A wildcard should never be used when creating a UTLS server socket.
If a DNS hostname is used in place of the IP address, both the client and server need employ DNS, and also agree upon which hostname to use (in case there are several pointing at the same IP address).
Failure to adhere to the above two rules will prevent a client from finding a local server. Such a client will instead establish a TLS connection to the server.
The SCTP transport uses the Stream Control Transmission Protocol (SCTP). SCTP provides a reliable, message-oriented service. In-order delivery is optional, but to adhere to XCM semantics (and for other reasons) XCM leaves SCTP in-order delivery enabled.
The SCTP transport utilizes the native Linux kernel's implementation of SCTP, via the BSD Socket API. The operating mode is such that there is a 1:1-mapping between an association and a socket (fd).
The SCTP transport supports IPv4 and IPv6.
To minimize latency, the SCTP transport disables the Nagle algorithm.
The BTLS transport uses the Transport Layer Security (TLS) protocol to provide a secure, private, two-way authenticated byte stream service over TCP.
Unlike the TLS Transport, BTLS doesn't have a framing header nor anything else on the wire protocol level that is specific to XCM. It's a "raw" TLS connection.
Other than providing a byte stream, it's identical to the TLS Transport.
Namespaces is a Linux kernel facility concept for creating multiple, independent namespaces for kernel resources of a certain kind.
Linux Network Namespaces will affect all transports, except the UXF Transport.
XCM has no explicit namespace support. Rather, the application is expected to use the Linux kernel facilities for this functionality (i.e. switch to the right namespace before xcm_server() och xcm_connect()).
In case the system follows the iproute2 conventions in regards to network namespace naming, the TLS and UTLS transports support per-network namespace TLS certificates and private keys.
XCM, in its current form, does not support binding to a local socket before doing connect() - something that is possible with BSD Sockets, but very rarely makes sense.
XCM also doesn't have a sendmmsg() or recvmmsg() equivalent. Those could easily be added, and would provide some major performance improvements for applications that are sending or receiving multiple messages on the same connection on the same time. *mmsg() equivalents have been left out because there are strong doubts there are such applications.