CASE provides an ontology to the community. The ontology is written using RDF as its base language, using the OWL2 vocabulary to define classes, properties and relationships. This RDF is serialized in the Turtle syntax.
CASE instance data is also written using RDF as its base language. Instance data is serialized in the JSON-LD syntax instead of Turtle, to support CASE producers and consumers that work day-to-day in JSON instead of graph engines. (Data can be converted between JSON-LD, Turtle, and other RDF formats with readily available tooling.)
This document provides CASE community guidance and practices with designing instance data.
RDF graphs have nodes, which are linkable data; literals, which are data that can be linked to but can only be annotated with a limited set of primitive types; and edges, which link nodes to either other nodes or to literals.
Nodes and edges are namespaced identifiers, typically seen in instance data in an abbreviated
prefix:identifier. A context definition in the graph will provide the expanded form of this abbreviation. For an instance data node identifier, the namespace will typically represent a knowledge base, e.g.
kb:identifier would be defined as expanding to
http://example.org/kb/identifier. This can be seen in JSON-LD in a context dictionary:
That JSON-LD snippet can be written equivalently to an RDF engine as:RDF is a flexible language, providing many ways to represent the same data. Example instance data the CASE community provides follow a few conventions for the benefit of interoperability among CASE community members.
RDF identifiers must be IRIs. An IRI can follow one of several schemes, including URN, HTTP, or HTTPS. Which scheme to use is up to the consumer, but CASE examples use one particular prefix,
http://example.org/kb/, for a few different reasons:
Early in example data drafting, CASE based its knowledge base URL on a special-case URN designated for usage in examples,
urn:example:. Unfortunately, a graph technology the community used was unable to handle the
urn: scheme in specifically JSON-LD. That technology has since posted a bugfix correction, but we provide this historic note as a reminder to CASE producers to test instance data among the use case scenarios of their expected consumers.
example.org is defined in IETF 6761 Section 6.5 to be a non-resolving domain. Graph engines might include data retrieval capabilities for IRIs encountered in their graphs as users navigate data; but, they are expected to be aware that processing
http://example.org should not result in a network retrieval (whether in their own hard-coded logic, or in lower-level DNS resolution). Though, note that the prohibitions on resolving
example.org from IETF 6761 are worded as
SHOULD NOTs, with only DNS Registrars assigned a
MUST NOT prohibition on registration.
Namespace prefixes in RDF typically end in with the
Hash or slash? decision: Should the identifier end with a
# character to represent an HTML within-page anchor point, or with a / character to represent an independent page at the end of an IRI?
CASE examples end their knowledge base prefix with a slash character, based on the assumption that a knowledge base navigator might be supporting multiple elementary types of clients: Graph engines, which might make programmatic requests of the IRI; and web browsers, for users wanting to view HTTP renders of the IRI. IRIs that end in hash might cause an expectation that a knowledge base provide a
dump of all node identifiers to a web browser, and rely on the browser to skip into the middle of the page.
Note that CASE and UCO ontology files follow the
# pattern, because even the largest ontology files between CASE and UCO have a concise memory footprint, on the order of kilobytes. In contrast, a knowledge base will likely hit millions of node identifiers early in its usage for any case analysis.
CASE examples use the prefix
kb: for instance data, e.g.
kb:node1. Early in example data drafting, a blank prefix was used, e.g.
:node1. This is allowed in most RDF syntaxes, but JSON-LD requires prefixes not be blank (per JSON-LD 1.1, Section 9.1), because some JSON processing technologies are not able to handle the empty string as a dictionary key.
There are 5 versions of UUID currently:
Wikipedia provides a further description of UUID versions.
An RDF triple consists of a subject, predicate, and object, where the subject is a node, and the object may be a node or literal value. For all non-blank nodes in CASE RDF graphs, UUIDs should be generated as the trailing part of the identifier. (Blank nodes are nodes that do not have an explicit identifier.) CASE does not specify a version for adopters because of different pros/cons for the versions. The recommendation for end-users is to use v3/v5 for use cases where repeatable identifiers are desired, i.e. the same input will result in the same UUID. (This is helpful for, say, documentation example serializations generated by software.) Version 4 is recommended for use cases where every node created should have a unique UUID.
name of CASE concept portion is some rendering of the
@type of the node. This portion of the identifier is provided from community members' experience working with graph data and UUID-based node identifiers. For instance, an analyst querying for objects in a CASE bundle could be presented with these results to a query
What CASE objects are in this bundle:
Or, the analyst could be presented with these results:
This practice is only meant to provide an informal hint to the type of the node, and carries no programmatically-derivable significance.
@id's will be accompanied by an
@type dictionary key when creating a new node, while the
@type field is typically missing when used as an object reference within a triple/JSON-LD value. This is because the
@type JSON dictionary key implements the RDF type designation. E.g. this JSON-LD:
is semantically equivalent to this in Turtle:
Most Turtle serializations would present that with
rdfs:type shortened to