Syndication and content ingestion
This page will give a short overview of the various content types supported by Ontoserver, and a high level view of Ontoserver’s syndication capabilities, explaining how to import content, create and import indexes for SNOMED CT and LOINC, and remove indexes.
- Types of content
- Importing content from syndication sources
- Indexing SNOMED CT and LOINC
- Importing an index
- Removing an index
Types of content
Ontoserver is a FHIR terminology server, and supports the main FHIR terminology resources CodeSystem, ConceptMap, ValueSet and NamingSystem. Ontoserver also supports StructureDefinition resources to provide features related to validation and terminology binding, and Bundle resources.
However for larger and complex code systems, specifically SNOMED CT editions and LOINC which have detailed specifications in FHIR, Ontoserver has a special process to create binary indexes it uses to enable advanced functionality supported by these rich code systems.
Ontoserver can build these indexes from SNOMED CT and LOINC code system sources. However as this is a resource intensive process Ontoserver also supports sharing these indexes via its syndication capabilities to minimize the number of Ontoserver users who need to perform this resource intensive task.
Importing content from syndication sources
Content can be added to Ontoserver through its FHIR API, and its inbuilt syndication client. The FHIR API is principally used to author content, and can be used to push FHIR resources into Ontoserver.
The syndication API is used to pull content into the server, and can be used to pull FHIR resources and binary indexes from preconfigured sources using a syndication client built into Ontoserver. The preconfigured “upstream” syndication sources are usually Ontoserver or Atomio instances, however they can be as simple as a conformant Atom feed on a web server or file system.
These upstream syndication sources are considered authoritative and may only be configured by an administrator. This syndication mechanism is the only way to import binary indexes built by another Ontoserver instance.
Importing content from upstream syndication sources
Ontoserver must be configured with upstream syndication sources by an administrator. Ontoserver can then be instructed to pull selected items or the entire feed, and this can be done at start up, manually at any time, automatically when new content is available, or configured to occur on a schedule.
Configuring syndication feed
Ontoserver can be configured with multiple upstream syndication feeds, specified with the atom.syndication.feedLocation
property by an administrator. If the feed requires authorization, Ontoserver can be configured with credentials to use for either basic authentication or the OAuth2 client credentials flow. Once configured, items can be imported using the syndication API, or via OntoCommand.
Preload syndication feeds
Ontoserver can also have pre-load syndication feeds, where is feed content is imported when a server is started or restarted. This is typically used in read-only Ontoserver instances designed for using and querying released terminology content, as opposed to a read/write instance which is aimed primarily at authoring content.
Multiple preload URLs can be specified with the atom.preload.feedLocation
property, and when the preload process is triggered Ontoserver will import the union of entries in the feeds. If the instance already contains items in the feed they will be skipped.
This process may also be triggered on a running Ontoserver instance without a restart. This is particularly useful if the preload feed content has changed, so that new content can be pulled into the Ontoserver instance without interruptions.
Scheduled preload
Preloading can be scheduled using a cron syntax to specify the schedule on the atom.preload.schedule.cron
property. This is useful if a read-only Ontoserver instance is connected to a read/write Ontoserver instance using a continuous delivery model. The read-only Ontoserver instance can be configured to regularly pull newly available content from the read/write server’s syndication feed which can have content promoted to it as it becomes available.
Indexing SNOMED CT and LOINC
Downloading versus building indexes
Indexes are only required for SNOMED CT editions and LOINC. These indexes are used to support the advanced features specified for these large code systems in the FHIR specification. Other code systems are supported by transformation to FHIR CodeSystem resources.
Building indexes with Ontoserver requires much more memory than operating as a read/write or read-only server, and the indexing process ties up CPU for some time. Therefore these indexes are best created and distributed by an authoritative source, such as the publisher of the code system or a central/reputable organization within the ecosystem of Ontoserver instances. As a result most Ontoserver operators won’t need to create binary indexes and will typically import indexes built by others.
Indexing options
Ontoserver instances, as running servers, are able to create indexes. This is achieved by interacting with the Ontoserver instance via its API to upload the source data of the SNOMED CT edition or LOINC version, check the status of the indexing process to determine when it is completed, and download the index from the server.
Ontoserver offers an indexing mode where it can be given a series of input parameters, started to create the index, and then stops. This is easier to execute on temporarily provisioned infrastructure in an automated manner.
However, using an Ontoserver instance to perform this process can be cumbersome as the resources required for indexing are expensive and ideally would only be provisioned during the indexing process. Additionally, scripts making multiple REST calls are required.
Indexer mode
This mode can be executed using a command line docker run command, or as a Kubernetes job, and is well suited to continuous integration servers and other forms of automated execution.
The input code system must be in a zip file conforming to either the SNOMED International or the Regenstrief distribution format, and can be supplied via the file system or over HTTP or HTTPS using authentication if required. For SNOMED CT editions spanning multiple zip files, these zip files can all be specified to the process which will merge them during execution.
The indexer mode can deliver the resultant binary index file as a file on the file system with a location and filename specified by input parameters to enable automation, or can be configured to push the index directly into Atomio. If an Atomio syndication server is not being used, the index will need to be in a conformant Atom syndication feed to be fed into Ontoserver.
Command line parameter | Usage |
---|---|
-s, --system-uri |
The canonical URI of the code system being indexed. For SNOMED CT this value will be http://snomed.info/sct |
-v, --version |
The appropriate FHIR code system version, in the form http://snomed.info/sct/{module_id}/version/{version} , where:{module_id} is the edition module for the SNOMED CT edition being indexed {version} in the form YYYYMMDD, matching the version encoded in the RF2 files This is particularly important for SNOMED CT releases, as it also dictates how the content of the release files passed in are interpreted. |
-k, --rf2-kind |
The format of the files to be processed – either “Full” or “Snapshot”. Due to a slight optimisation, the “Snapshot” format is preferred if available. Multiple formats can be present in the supplied zip file, the specified format will be used and the others ignored. |
-f, --sourceUrls |
The input file locations in the form of a URL. The process will support http://, https://, or file:// protocols.This parameter can be repeated for multiple file locations. |
-synd, --synd-url |
The URL of the syndication server for index uploading* |
-feed, --feed-name |
The name of the feed to upload the index to* |
-t or --entry-title |
The title the new feed entry should have* |
-file, --entry-file-name |
The filename of the resultant feed entry. When not specified the process generates a sensible filename from the other metadata. |
-perms, --entry-permissions |
FHIR security labels to apply to the feed entry. Repeated parameters can be used if multiple security labels are required. If not specified no security labels will be added. |
-o, –output | The full file path within the container where the indexer should copy the result to upon successful completion. This location can be mounted to a location on the Docker host filesystem where the file can be retrieved from after execution.** |
**This parameter is intended for systems that are not using Atomio
In order to get terminology release files to index and/or POST the resultant index zip file to the syndication server, the indexer may need to authenticate to one or more sources. The same method is used by this process as is used to specify credentials for various hosts used by Ontoserver. These parameters are documented at Configuration properties in alphabetical order.
Importing an index
Ontoserver can only import indexes from configured syndication feeds. Atomio provides the simplest solution as it provides an API for index uploading, which Ontoserver integrates to in indexing mode, as well as a syndication feed that Ontoserver instances can use to import the index.
Alternatively, a simple webserver can host a hand-crafted Atom feed file referencing the index. More information can be found at https://ontoserver.csiro.au/docs/6/syndication.html.
Once in a feed the index can be imported into Ontoserver using the process described in Import content from syndication sources.
Removing an index
It may be necessary to remove an index from an Ontoserver instance for a number of reasons. For example, the instance may have a large number of historical versions of SNOMED CT editions on it, and to reduce its storage requirements, old unnecessary indexes may be removed. It is also possible that an installed index was built from a version of SNOMED CT or LOINC that has been recalled and needs to be removed to prevent its use.
This operation can only currently be performed using Ontoserver’s administration API, and a bearer token with administration authority will first need to be retrieved from Ontoserver’s authorization server if one is in use. The sample role Authoring server administrator in the default realm configuration grants appropriate permissions. This can be granted to a user or a set of client credentials through the authorisation administration console. The delete code system API is documented on ontoserver.csiro.au and is a simple REST call specifying the URL and version of the code system index to remove.