Shared resources and persistent storage

Persistent storage
Shared resources
Managing upgrades

Persistent storage

In all of the various deployment models described in deployment models, resilient persistent storage is required for the Authoring terminology server as it hosts the content under development.

Similarly if a syndication server is used, it requires resilient persistent storage to preserve release states.

Without a syndication server, the Production (and Staging) servers likely require resilient persistent storage. This allows recovery if the server fails or is lost without reloading content from the latest approve state of the Authoring server which may have changed.

With a syndication server, the Production (and Staging) servers can be ephemeral with non-persistent storage and the syndication server can be relied upon to persistently store the release state. New servers can be booted and their content pulled into the server from the persistent syndication server should they fail or be lost.

Persistent storage trade-offs for read-only endpoints

While read/write endpoints need persistent storage, read-only endpoints can be provided by an Ontoserver instance (or instances in the case of horizontal scaling) with persistent or transient storage and database. The advantages and trade-offs of which to use largely come down to the size of preload feed being used to populate the read-only Ontoserver instance(s), which is greatly affected by the number of SNOMED CT versions in the feed.

Transient storage and database has the advantages of:

Cost – using Docker host node storage is typically the cheapest option.
Performance – Docker host node storage (if SSD) will outperform externally mounted storage simply for latency, similarly a co-located dedicated PostgreSQL database container will have low latency.
Simplicity – an Ontoserver node is booted, reads its content from a preload feed and is ready for service which is an independent, simple and repeatable process.

The major disadvantage to using transient storage is boot time in cases where the preload feed contains a very large amount of content. This can increase the time it takes to complete the preload process to an intolerable duration, particularly if the Ontoserver instance is unexpectedly lost (e.g. loss of the Docker host) and a new instance needs to be booted to replace it promptly to bring the endpoint back online. Note that Ontoserver will be able to serve requests while it is completing the preload process and can be brought into service prior to completing preload, however it will not have all content available until the preload process is complete – it depends on the use case as to whether this is acceptable.

If such a large amount of content is in use, one way to reduce Ontoserver boot time is to use persistent storage and database for the Ontoserver instance (or instances for a scaled instance, see Horizontally scaled read-only endpoint). Under this scenario, if an Ontoserver instance using an externally mounted persistent disk and database is lost (e.g. loss of the Docker host), another Ontoserver instance can be booted using the same disk and database. The new Ontoserver instance will start the preload process upon boot, yet skip all of the entries in the preload feed as it discovers they are already present in the persistent database and disk. This results in a very fast boot time for Ontoserver in the event that a container is lost or needs to be moved.

However this does not apply for the boot time deploying a new content release. The preload process is additive; if the replacement feed removes some resources in the previous feed (by omission) the preload process using the existing disk/database state would add new resources to the disk/database but not remove entries omitted from the new preload feed. Hence a new persistent disk and database is required, and the full preload process must be performed.

Therefore the persistent storage approach is:

More complex:
- The release process for a new content release (e.g. new or updated preload feed) requires that the existing persistent disk and database be cleaned up, and a new Ontoserver instance be booted using a blank disk and blank database which must be provisioned – this adds more infrastructure provisioning/decommissioning steps to the process.
More costly:
- Disk and database resources will need to be provisioned at an extra cost to the containers.
Less performant:
- The persistent externally managed resources for the disk and database present higher latency (although premium infrastructure can limit this loss at greater cost).

The major factor in load time of the preload feed is the number of SNOMED CT indexes. SNOMED CT indexes are very large, and the data transfer times alone can be significant when many tens of versions are in use. Ontoserver additionally performs SHA256 checks once resources are downloaded, and then needs to unzip each index for use – which are time, CPU and IO intensive operations. Minimising the number of SNOMED CT indexes in the preload feed will help minimise the preload duration for an Ontoserver instance.

On balance, if a sufficiently small preload feed (particularly in terms of number of SNOMED CT indexes) can be achieved, the cheapest, simplest and most performant option is to use transient private storage and database for each Ontoserver instance. However, if the preload feed contains sufficient content to make the boot time in event of an Ontoserver instance loss intolerable, persistent storage can be used, at some additional cost.

Shared resources

Atomio and Ontocloak instances do not share resources with any other containers, and use of multiple Atomio or Ontocloak containers is not necessary.

This is with the exception that Ontocloak may share a PostgreSQL database server with other Ontoserver instances, however in this case it should connect using its own user account to a schema that is only used by Ontocloak. The usual example of this is supporting Ontocloak and an Authoring Ontoserver instance using a single platform database such as AWS RDS or Azure Database for PostgreSQL with separate database users and schemas for each application.

Ontoserver uses two main resources that may be shared:

Filesystem storage for the Ontoserver container, and;
PostgreSQL database and schema.

Multiple Ontoserver instances can share these resources when being used to support a “scaled” instance – a terminology server endpoint supported by multiple Ontoserver instances, typically done for horizontal scaling and/or high availability. See Horizontally scaled read-only endpoint.

Managing upgrades

Atomio, Ontocloak, and Ontoserver all handle database schema and data migrations when new versions are deployed. When deploying a new version of one of these products, they will attempt to migrate the data upon start if an existing out-of-date schema is found. If a rollback to an earlier software version is required, the persistent state should also be rolled back or restored to that prior state.

If using multiple Ontoserver instances with shared filesystem and database, these Ontoserver instances must all be running the same software version. If an upgrade is required on the persistent storage being used by multiple Ontoserver instances, all Ontoserver instances must be shut down before starting Ontoserver instances of a new software version using these shared resources. Refer to Horizontally scaled read-only endpoint for more details.