Atomio
Syndication server for terminology servers (especially Ontoserver).
Purpose
Atomio was created to host terminology content in a syndication feed/s for use in terminology servers. It publishes those feeds using an extension to the Atom Syndication Feed Format described at https://www.healthterminologies.gov.au/specs/v2/conformant-server-apps/syndication-api/syndication-feed.
API documentation
Atomio hosts its own API documentation and the Swagger UI at /swagger-ui.html wherever Atomio is hosted, which will render the OpenAPI 3 documentation Atomio also hosts at /v3/api-docs. This will contain the API documentation for the running Atomio version.
For an example, see https://synd.ontoserver.csiro.au/swagger-ui.html and https://synd.ontoserver.csiro.au/v3/api-docs.
Health checks and information
Atomio also by default exposes
- a healthcheck endpoint at /actuator/health which is useful for checking the instance's health and readiness, and
- an information endpoint at /actuator/info which is useful to determine the exact Atomio version which is deployed.
The healthcheck endpoint is particularly useful for configuring in container orchestration tools like Kubernetes, or dashboards.
Version 2.0.0 upgrade
Atomio 2.0.0 upgrades to Spring 3.2 which among many dependency updates, includes an upgrade to H2. The version change between H2 1.x and H2 2.x requires a database migration.
For those using Postgres (configuration explained below) this migration is unnecessary. For those using the H2 default database, the H2 migration requires that the database content is exported and reimported into a new H2 database - the binary format has changed.
To simplify this process a Docker image has been created which will run the migration process. It performs the following steps
- Copies the current database to a backup location
- Exports the database to a zip file in the backup location named
.export. .zip - Deletes the current database and creates a new blank H2 database at the same location
- Imports the
.export. .zip file
The image name is quay.io/aehrc/atomio-h2-migration:1.0.0
There are a number of environment variables you can use to control this process.
| Variable | Description | Default for Atomio |
|---|---|---|
| DATABASE_URL | The database URL passed to Atomio | jdbc:h2:/workspace/atomio/db |
| USERNAME | Database username to connect to the database | sa |
| PASSWORD | Database password to connect to the database | password |
If the above environment variables are not set the migration will not run.
The process to perform the upgrade is
- Shut down the existing version 1.x Atomio server
- Start the atomio-h2-migration image in the Atomio image's place - it is key the container has access to the same disk mount Atomio has (with access to the H2 database), and appropriately set the above environment variables.
- Start the new version 2.x Atomio server
The existing Atomio version 1 server's database is backed up by this process to a directory called migration-backup where the database is stored - for example at /workspace/atomio/migration-backup if the database URL is jdbc:h2:/workspace/atomio/db. If a rollback is required, the content of this directory can be restored and the Atomio version 1 image started.
Configuration
All configuration is done via Spring properties, which may be set with system properties which can be passed through to the Docker container via environment variables using Spring Boot's relaxed binding.
NOTE: if you wish to have Atomio clone entries or feeds from remote sources you need to set atomio.client.urlWhitelist described below.
Volume mounts
Atomio will by default (which can be overridden as described below) write its database and all downloaded artefacts into
/workspace/atomio
This could be volume mounted to somewhere appropriate for persistent storage.
Properties
The following are the default configuration items in the container which may be overridden
Spring settings
- spring.datasource.url=jdbc:h2:/workspace/atomio/db
- spring.datasource.driverClassName=org.h2.Driver
- spring.datasource.username=sa
- spring.datasource.password=password
- spring.jpa.database-platform=org.hibernate.dialect.H2Dialect
- spring.jpa.hibernate.ddl-auto = update
- server.error.include-message=always
- spring.servlet.multipart.maxFileSize=3221225472
- spring.servlet.multipart.maxRequestSize=3221225472
- spring.jpa.open-in-view=true
As seen above, by default an H2 database on disk will be used. There is no need to override these values unless you require a different configuration.
PostgreSQL support
To use PostgreSQL as Atomio's database, there is a postgres profile which can be enabled that will change the above H2 properties (driver class, dialect etc). It is then a matter of setting the spring.datasource.url, spring.datasource.username, and spring.datasource.password appropriately for the PostgreSQL database you are using.
For example
- spring.profiles.active=postgres
- spring.datasource.url=jdbc:postgresql://localhost/atomio
- spring.datasource.username=username
- spring.datasource.password=password
Storage self test
- atomio.scheduled.storage.test.enabled=true
- atomio.scheduled.storage.test.skip.sha.check=false
- atomio.scheduled.storage.test.cron=0 0 0 *
These parameters control the server's storage self test - by default according to the above parameters the server will validate all the files it has referenced by entries, every night at midnight, and will validate the file against the entry's length and SHA256.
Because SHA256 calculation is expensive, but length checking is cheap, the SHA256 checking can be turned off. However simple length checking is still a useful sanity check that verifies the file still exists and is plausible.
The schedule it uses can be modified by specifying the required cron expression, bearing in mind that for large numbers of large files with SHA256 checking will take a while so it pays to keep frequency relatively low.
This feature can be disabled by changing atomio.scheduled.storage.test.enabled to false.
Any errors detected by this feature will be written into the server's log as error messages, log monitoring is required to identify issues.
Security
- atomio.security.audience=atomio
- atomio.security.hsts=true
- atomio.security.enabled=false
- atomio.security.anonymousFeedRead=false
- atomio.security.feedLevelSecurity=true
- atomio.security.entryLevelSecurity=true
By default, the application security is turned off - that is the server doesn't require authentication or authorisation for any of its operations.
This can be changed by setting atomio.security.enabled to true. This enables token based security and requires configuration for the server to validate token signatures.
The preferred way to do this is set (example from Keycloak's URL patterns)
- atomio.security.issuer-uri=https://some.host/auth/realms/realm-name
The server will then discover on start up the certificates required for signature validation and the issuer value to check in the tokens. This will work for OAuth 2.0 or OIDC well-known configuration using Spring's discovery methods.
If issuer well-known discovery doesn't work or can't be used, JWKS can be used. By specifying the JWKS URL as follows the server will get the key to use from the authorisation server directly, the following examples are from Keycloak
- atomio.security.jwk-set-uri=https://some.host/auth/realms/realm-name/protocol/openid-connect/certs
The issuer URI configuration or JWK URI configuration is preferred because it gracefully manages authorisation server signing certificate changes, however won't work unless the SSL certificate being used by the authorisation server is valid.
In terms of the security itself, when turned on the server will require tokens to have
- an audience of "atomio" or whatever value is configured into
atomio.security.audience
After this audience verification, Atomio will allow various operations depending on what authorities are present, the following authorities are recognised broadly:
- "SYND_READ" as an authority in the token to perform GET operations
- "SYND_WRITE" as an authority in the token to perform POST, PUT, or DELETE operations
- "API_READ" as an authority to read to the /admin endpoints
- "API_WRITE" as an authority to write to the /admin endpoints (currently unused)
- "PERM_READ" as an authority to bypass any more fine detail feed/alias/entry read permissions
- "PERM_WRITE" as an authority to bypass any more fine detail feed/alias/entry write permissions
Subsequent to being authenticated with these authorities, specific feeds/aliases/entities may have additional and more specific read authorities against them, and to read these additional authorities may need to be provided.
- if the specific feeds/aliases/entries have the 'atomio.security.anonymousReadTag' specified then no additional authentication is required and the read proceeds (assuming SYND_READ)
- if the "PERM_READ" authority is present then no further authorisation is required and the read proceeds
- if the feed/alias/entry has no permissions then fine detail read permissions are not required and the read proceeds.
- if there is authority of form "(AUDIENCE)(ROLE_)PERM_READ" in this context if the (Audience) is blank or matches the audience of atomio, then the "(Authority).read" is appended to the read authorities considered in the next step
- Otherwise, the set of permissions on the feed/alias/entity has additional permissions, and at least one of these must match with the set of authorities provided in the token for the read to proceed.
PERM_WRITE and (AUDIENCE)(ROLE_)PERM_WRITE behave similarly to PERM_READ for the write capabilities.
atomio.security.hsts can probably be set to false for situations where Atomio has a proxy server in front of it (which should be most deployments), and this should be the proxy's responsibility.
If security is enabled, Atomio will require an appropriately authorised bearer token when a feed, entry or artefact is requested. However there are circumstances where it is convenient to have Atomio openly advertise the feeds that it has, and the entries in those feeds. Setting atomio.security.anonymousFeedRead to true will enable this mode, where GET requests to list all feeds or get a specific feed's Atom XML will be accepted without authorisation, however all other requests (such as downloading an artefact) will require authorisation as defined above.
By default, atomio.security.feedLevelSecurity and atomio.security.entryLevelSecurity will be true when atomio.security.enabled is set to true.
These can be used to control access to feeds and entries respectively, and can be used independently of each other.
Setting either of these to true will result in the application checking for a variation of PERM_READ/PERM_WRITE:
- The master (AUDIENCE)(ROLE_)PERM_READ/PERM_WRITE authority, or
- The feed- or entry-specific (AUDIENCE)(ROLE_)PERM_(permission)_READ for read access, or
- The feed- or entry-specific (AUDIENCE)(ROLE_)PERM_(permission)_WRITE for write access
where the specified permission matches permissions specified on the feed or entry to be accessed.
Authorisation auto discovery
If security is enabled atomio.security.enabled=true, Atomio supports providing clients with authorisation discovery metadata at
- /.well-known/smart-configuration
- /.well-known/openid-configuration
If atomio.security.issuer-uri is set, Atomio will attempt to proxy the issuer's OpenID configuration at /.well-known/smart-configuration
and /.well-known/openid-configuration. This is the preferred approach requiring minimal configuration.
If atomio.security.issuer-uri cannot be used (e.g. does not work with the authorisation server being used), or the proxying of the issuers
OpenID configuration does not work (e.g. the authorisation server does not support standard metadata locations) the following properties
allow minimal manual configuration.
- atomio.security.smartConfiguration.authorisationEndpointUrl
- atomio.security.smartConfiguration.tokenEndpointUrl
- atomio.security.smartConfiguration.grantTypesSupported
for example
- atomio.security.smartConfiguration.authorisationEndpointUrl=https://my.auth.server/auth
- atomio.security.smartConfiguration.tokenEndpointUrl=https://my.auth.server/token
- atomio.security.smartConfiguration.grantTypesSupported=authorization_code,implicit,client_credentials,refresh_token
If these atomio.security.smartConfiguration properties are configured they must all be configured. This set represents the minimal set
for SMART on FHIR.
CORS
The following is the default configuarion for CORS
- atomio.cors.allowedOriginPatterns=
- atomio.cors.allowedHeaders=X-Requested-With,Origin,Content-Type,Accept,Authorization,Access-Control-Allow-Headers
- atomio.cors.allowedMethods=PUT,POST,GET,DELETE,OPTIONS
- atomio.cors.exposeHeaders=Cache-Control,Content-Language,Content-Type,Expires,Last-Modified,Pragma
- atomio.cors.maxAge=600
Any of these properties can be overridden by redefining them with different values. atomio.cors.allowedOriginPatterns supports a list of patterns as defined here
Base URL
- atomio.base.url=
This setting controls the base URL of links generated in the Atom syndication format responses generated by the server. By default, this setting is blank, which signals to the server to generate the base URL from the request it receives, which is usually correct unless the server is behind a proxy.
Therefore if using a proxy in front of the server, this should be set to the base URL from which clients will be requesting.
Storage
- atomio.artefact.storage.path=/workspace/atomio/artefacts
As mentioned above, the server uses /workspace/atomio/artefacts inside the container to store its artefacts. This can be changed using this parameter, however it is more likely that this setting will be left as is and this location volume mounted into the container to some external storage location.
Download URL prefix whitelist
For security reasons, Atomio needs to be provided a whitelist of URL prefixes of acceptable locations to download content from. This is to prevent someone requesting Atomio clone a feed or entry with file or internal URLs (e.g. intranet) to gain access to private/internal content. Values passed in must be valid URLs and will be used to determine if a URL begins with one of these whitelisted prefixes before downloading content. Multiple URL prefixes can be provided in a comma separated list.
- atomio.client.urlWhitelist=http://foo.bar,https://another.url/some/limited/subpath
By default Atomio does not whitelist any URLs for download, even external URLs to itself.
Download timeouts
- atomio.artefact.download.connection.timeout=60000
- atomio.artefact.download.read.timeout=600000
These timeouts are used when the server is downloading remote artefacts when cloning another syndication feed. The settings are quite generous because usually the duration of this process is less important than success.
Sentry
Atomio can send error diagnostic messages to Sentry if you have an account. This is a useful way to monitor the deployed application for failures and spot multiple occurrences of common failures
Settings are
- sentry.dsn
- sentry.environment
- sentry.servername
Atomio will automatically set sentry.release to the version being run.
Querying entries
Users can query their Atomio instance using the following url pattern: `/feed/query?_include=title=test,contentItemIdentifier=http://loinc.org&_exclude=category.name=myCategory`
This will return a single feed with entries matching the query provided.
Fields available to be queried are:
- category.name
- category.scheme
- contentItemIdentifier
- contentItemVersion
- fhirVersion
- published
- updated
- feeds
All fields except for published and updated match exactly.
Syntax for querying dates is of the form [gt|lt]yyyy-MM-dd:
# Get entries published before January 01 2025
/feed/query?_include=published=lt2025-01-01
/feed/query?_exclude=published=gt2025-01-01
# Exclude all entries published before January 01 2025
/feed/query?_include=published=gt2025-01-01
/feed/query?_exclude=published=lt2025-01-01
# Fetch entries between dates
/feed/query?_include=published=gt2025-01-01,published=lt2025-02-01
/feed/query?_exclude=published=lt2025-01-01,published=gt2025-02-01
Parameters can be used more than once, and will be treated as an OR statement unless the parameter is a date, e.g.
/feed/query?_include=title=test,title=test2,title=test3
will return all entries with title matching test, test2, or test3.
When using the feeds parameter this will match a feed's name. This will filter the entries to those belonging to the specified feed.