Property Graph - RDF Experimental Converter

PREC is a library designed to enable interoperability between RDF graphs and property graphs.

This document mainly describes the PREC ontology: an ontology designed to describes the binding between the terms used in an RDF graph and the labels used in a property graph.

Introduction

PREC is composed of three modules:

To convert a Property Graph (PG) into an RDF graph, PREC operates a two steps conversion:

To define the transformations to apply, PREC uses a Context provided by the user in Turtle-Star format.

PREC-0: From PGs to a literal transcription in RDF graph

The PREC-0 module comunicates with a PG engine using an already existing API, for example Cypher, Gremlin or APOC, and extract the data to build a PREC-0 RDF graph: an RDF graph that describes the Property Graph very literally.

Schema of the PREC-0 graphs

The current schema of PREC-0 graphs is as follows:
The schema of PREC-0 graphs

A partial SHACL Shapes Graph is available at https://github.com/BruJu/PREC/blob/master/docs/prec0shape.ttl . This SHACL Shapes Graph should be conform on every PREC-0 generated graph (graphs extracted by PREC without any context).

Terms used in PREC-0 graphs

The PREC ontology introduces some new types for the graphs generated by PREC-0. Note that the pgo:Node and the pgo:Edge types are used in the graphs generated by PREC-0.

http://bruy.at/prec#CreatedVocabulary

Type of IRIs that has been created by PREC and that should be mapped to an actual ontology.

Types that are subtypes of [= prec:CreatedVocabulary =] are:

  • http://bruy.at/prec#CreatedPropertyKey
  • http://bruy.at/prec#CreatedNodeLabel
  • http://bruy.at/prec#CreatedEdgeLabel

http://bruy.at/prec#PropertyKey

Type of property keys.

http://bruy.at/prec#PropertyKeyValue

Type of property values. Can be seed as the counterpart of pgo:Property in the PREC-0 modelization.

http://bruy.at/prec#hasMetaProperties

States to which (RDF) node the meta properties of a property are attached to.

Contexts

A context is a document, written in Turtle-star, that describes how to convert the different entities in the property graph into an idiomatic RDF graph.

Use of a context

The context can be automatically applied during the convertion of the Property Graph to RDF:

If you have already generated the PREC-0 graph, you can use the ApplyContext tool:

Rulesets

There are two different rulesets for contexts: PRSC and PREC-C. We recommand to use the PRSC ruleset if possible as it is easier to use.

Rules usually target some entities, and have some constraints on which entities it targets. Targettable entities, i.e. first class citizen, are:

The templating system

PREC uses a custom templating system. Triples that should be produced by a rule are the ones described by embed triples used as the object of triples for which the predicate is prec:produces.

When the triples are produced, some terms, usually in the pvar namespace are replaced with terms depending on the data from the property graph.

Each triple that is used as an object of prec:produces is called a template triple. The set of all template triples of a resource, i.e. for a given resource, the set of all objects of triples that have as the subject the resource and as the predicate prec:produces, is named a template graph.

This templating system allows the user to write triples that are very close to the one that are produced. Moreover, because there are no variables, template graphs can be written in plain Turtle-star files.

PRSC - Schema driven mapping

PRSC contexts are contexts that transform all nodes and edges that share a common type by using a given RDF representation. A type is defined as whenever the element is an edge or a node, the list of labels and the list of properties of the element.

This kind of context identify every node and edges to a schema, and the transformation of every schema from PG to RDF must be defined.

Writing PRSC rules

List of valid IRIs for a PRSC rule are: Placeholders for the template graph

Well-behaved PRSC contexts

The PRSC contexts that follow the given rules are said Well-behaved:

The characterization of a template triple can be considered as being equal to the triple where:

If an RDF graph is generated from a PRSC well behaved context:

PREC-C - Low level mapping

Unlike PRSC contexts that only have rules for nodes and edges, PREC-C context ruleset can target node, edges and properties.

Example of a context

Reserved IRIs

IRIs in the `prec` namespace are reserved for the ontology.

The default implementation reserves for itself the IRIs prefixed with `_` and `__`. Therefore, as they are reserved for implementation, using them in a context results in undefined behaviour.

General directives

http://bruy.at/prec#KeepProvenance and http://bruy.at/prec#flagState

By default, a triple is created for each node / edge / property label / property blank node between its IRI / blank node and the corresponding type.
If this flag is set to `false`, the triples in the form `ex:node1 a pgo:Node` are deleted from the output graph.

Change the schema of the generated graph

[= PREC-0 =] generates an RDF Graph with a certain format. It is possible to change the way the properties and the edges are represented in the RDF graph.

By default:

[= PREC-Context =] is able to change the format used to represent the properties and the edges by using templates.
In PREC, we call a template the format wanted by the user.

http://bruy.at/prec#templatedBy

Predicate used to state that the components affected by the subject rule must be represented in the format described in the object.

Edge templates

[= PREC-Context =] uses a special IRI for the set of every edges and defines some base templates that are implcitely defined in every context.

Thanks to these IRIs and [= prec:templatedBy =], it is possible to tranform the representation used for every edge from the standard RDF Reification to any template.

http://bruy.at/prec#Edges

[= prec:Edges =] is the domain of every edge. A template that is applied to this IRI will be applied to every edge.

http://bruy.at/prec#RDFReification

`prec:Edges prec:templatedBy prec:RDFReification .`
Edges must be represented by using the standard RDF Reification. This is the default behaviour.

http://bruy.at/prec#RdfStarUnique

`prec:Edges prec:templatedBy prec:RdfStarUnique .`
Edges are modeled as a triple that is added to th graph, and meta properties are added using RDF-star. This templatewill cause information loss if there are two edges with the same label between two nodes.

http://bruy.at/prec#RDFStarOccurrence

`prec:Edges prec:templatedBy prec:RDFStarOccurrence .`
Edges are modeled as an RDF-star occurence: a blank node represents the occurrence and [= prec:occurrenceOf =] is used to link the node to the triple it is an occurence of.

http://bruy.at/prec#occurrenceOf

This IRI is used in the generated graphs and should not be used in contexts as a "keyword".

Specifies for which triple the edge is an occurrenceOf. The semantics is identic to the one used in the latest RDF-star draft .

http://bruy.at/prec#SingletonProperty

`prec:Edges prec:templatedBy prec:SingletonProperty .`
Edges must be represented by using singleton properties.

Property templates

The template used for properties can also be redefined.

http://bruy.at/prec#Properties

The domain of every properties. A triple that has [= prec:Properties =] as a subject is equivalent to three triples, one with [= prec:NodeProperties =] as the subject, another with [= prec:EdgeProperties =] as the subject and another with [= prec:MetaProperties =] as the subject.

http://bruy.at/prec#NodeProperties

The domain of every node properties. The properties on edges will not be affected by directives applied to this.

http://bruy.at/prec#EdgeProperties

The domain of every edge properties. The properties on nodes will not be affected by directives applied to this.

http://bruy.at/prec#MetaProperties

The domain of every meta properties (properties on properties).

http://bruy.at/prec#Prec0Property

Properties are modeled in the format presented in the [= prec:hasMetaProperties =] example. This is the default behaviour.

http://bruy.at/prec#CombinedBlankNodes

Properties are modeled in the same format as [= prec:Prec0Property =], but the `propertyValue` and the `metaProperty` nodes are merged.

http://bruy.at/prec#DirectTriples

Properties are modeled without any blank node (`:node :propertyKey :thePropertyValueLiteral`), and the meta properties are represented by using RDF-star.

Writting your own templates

So far, this document described how to use predefined templates. It is also possible to write you own templates.

Templates uses the `pvar` namespace (http://bruy.at/prec-trans#) as variable/placeholders. pvar can be seen as a way to write `?` in a Turtle file without actually using a real variable.

http://bruy.at/prec#produces

Used to state the list of RDF triples that composes the template. When the template is applied, every triple used in the default PREC-0 representation will be replaced with the triples that composes the template (after replacing the placeholders with their actual value).

http://bruy.at/prec#EdgeTemplate

The type of templates that can be used for edges.

`EdgeTemplate`s use the following variable:

  • `pvar:edge`: The RDF node that was created to identify the edge.
  • `pvar:source`: The RDF node that represents the PG source node of the edge.
  • `pvar:destination`: The RDF node that represents the PG destination edge.
  • `pvar:edgeIRI`: The RDF node that represents the label of the edge.
  • `pvar:label`: The label of the edge, as a string literal.
  • `prec:edgeIs`: enables to specify to which term bind the properties of the edge.

We are going to show how a user can define its own template

http://bruy.at/prec#PropertyTemplate

The type of templates that can be used for properties.

`prec:PropertyTemplate`s use the following variables:

  • `pvar:entity`: The RDF node that was created to identify the edge of the node.
  • `pvar:propertyKey`: The node that represents the edge label.
  • `pvar:label`: The label of the property key, as a string literal.
  • `pvar:property`: The blank node that represents the property.
  • `pvar:propertyValue`: The literal that contains the property value.
  • `pvar:individualValue`: Most of the time equals to `propertyValue`. If `propertyValue` is an rdf list, `individualValue` will match each individual literal contained in the list instead of the list itself.
  • `pvar:metaPropertyNode`: The blank node that contains every meta property, as described in [= prec:hasMetaProperties =]
  • `prec:entityIs`: The term to which the meta properties must be bound.

The substitution mecanism is the same as described in the [= prec:EdgeTemplate =] section.

Some restrictions exists on property templates:

  • `pvar:entity` can only appear in the "subject-star" position. The `subject-star` position is defined recursively as either the subject of the quad if it is not a nested quad, or the "subject-star" of the quad in the subject position. In other words, when the Triple-star is written in the N-Triples-star format, it is the first non RDF-star term that appears for this triple.
  • Every embedded triple must be asserted by the template.

These restriction exists to make combinations between an edge template and different (meta)-property templates possible. These restrictions will be attenuated in the future.

Rules

Rules are the main way to modify a [= PREC-0 =] RDF graph. They let the user modify specific elements of the graphs by:

  • filtering some of the edges or properties
  • applying a specific template to these entities

As one can expect, there are two different kind of rules : rules for edges ([= prec:EdgeRule =]) and rules for properties ([= prec:PropertyRule =]).

Edge rules

Edge rules aim to modify the way the property graph edges are materialized in the RDF Graph.

http://bruy.at/prec#EdgeRule

The type of edge rules. The [= PREC-Context =] engine currently requires the edge rules to be properly typed to be able to discover them.

http://bruy.at/prec#label

States that the edge must have the given label to match the rule. The value must be a literal.

http://bruy.at/prec#edgeIRI

States that the generated node for the edge must be replaced with the given IRI.

http://bruy.at/prec#sourceLabel

States that the edge source node must have the given label. The value must be a literal.

http://bruy.at/prec#destinationLabel

Same as [= prec:sourceLabel =], but for the destination node.

http://bruy.at/prec#IRIOfEdgeLabel

Shortcut to define a edge rule that matches edges with a specific label and map the edge label to a specific IRI.

The two following contexts are equivalents:

                
              
                
              

Specifiying a template

Like [= prec:Edges =], it is possible to define a target template for edge rules by using the [= prec:templatedBy =] predicate. It is also possible to apply any [= prec:SubstitutionTerm =]s, and not only [= prec:edgeIRI =].

Property rules

A property rule is a rule that can be applied to a property. In a similar fashion as edge rules, a property rule targets some properties, and can modify how they are materialized in the graph.

http://bruy.at/prec#PropertyRule

The type of property rules. The [= PREC-Context =] engine currently requires the property rules to be propertly typed to be able to discover them.

http://bruy.at/prec#propertyKey

The key / the name of the matched properties. The value must be a literal.

http://bruy.at/prec#propertyIRI

States the IRI to use for this property name. Its value will be used instead of the generated node.

http://bruy.at/prec#label

States the label that the node or edge must have for the rule to apply. If there are several values for [= prec:label =], the node must have every labels. The value must be a literal.

http://bruy.at/prec#onKind

The property rule can only be applied if it is a property on a node (prec:Node as the object) or an edge (prec:Edge as the object).

http://bruy.at/prec#IRIOfProperty

Enables to declare simple property rules with only one triple, instead of three.

The two following rules are equivalent:

                
              
                
              

Specifying a template

Like [= prec:Properties =], [= prec:NodeProperties =] and [= prec:EdgeProperties =], it is possible to specify a edge for a specific property rule thanks to [= prec:templatedBy =] and to use [= prec:SubstitutionTerms =] other than [= prec:propertyIRI =].

http://bruy.at/prec#priority

By default, [= PREC-Context =] tries to apply rules from the more specific to the less specific. While the order of the rules is deterministic, the user may prefer to use another. [= prec:priority =] enables to change the order of the rules.

http://bruy.at/prec#IRIOfNodeLabel

Usage: ` :IRIToMapTo prec:IRIOfNodeLabel "Label" .`

The last kind of labels that can be remapped are node labels. Node labels IRIs can be mapped with [= prec:IRIOfNodeLabel =], in a similar fashion as [= prec:IRIOfEdgeLabel =].

[= prec:IRIOfNodeLabel =] states that the node labels equals to its object (`"Label"`) should be mapped to the IRI stated as the subject of the triple (`:IRIToMapTo`), instead of generating / keeping a generated a blank node or IRI.

Complete node rules

Rules can also be written for node labels. Here is the vocabulary related to node labels:

  • http://bruy.at/prec#NodeLabelTemplate: Type of node templates. The variables that can be used are the following:
    • `pvar:node`: The blank node or the IRI that represents PG node
    • `pvar:nodeLabelIRI`: The blank node or the IRI that represents node label
    • `pvar:label`: The label as a string literal
  • http://bruy.at/prec#nodeLabelIRI: The [= prec:SubstitutionTerm =] for `pvar:nodeLabelIRI`
  • http://bruy.at/prec#NodeLabelsTypeOfLabelIRI: The PREC-0 representation of node labels. Composed of the triple `pvar:node rdf:type pvar:nodeLabelIRI`
  • http://bruy.at/prec#NodeLabelRule: The type of rules that applies to node / node label pairs.
  • http://bruy.at/prec#label (in node label rules): Condition on the label of the node. Mandatory.
  • http://bruy.at/prec#NodeLabels: The domain of every node labels, to modify the template used.

http://bruy.at/prec#mapBlankNodesToPrefix

The graph generated by [= PREC-0 =] uses blank nodes for everything that is not a label or a value. The user may prefer to have IRIs generated for the different part of the vocabulary. For example, instead of generating a blank node for the label "Person", the user may prefer to get `http://example.org/Person`.

[= prec:mapBlankNodesToPrefix =] enables to map

  • every node label if the subject is pgo:Node
  • every edge label if the subject is pgo:Edge
  • every property name if the subject is [= prec:PropertyKey =]
to IRis that starts with the prefix specfied in object position.

Substitutions

Most rules actually use alterations of templates. For example, in [= prec:EdgeRule =]s, most of the time the `pvar:edgeIRI` variable will be replaced by an IRI picked by the user thanks to [= prec:edgeIRI =].

The reason why [= prec:edgeIRI =] can modify the template is because it is defined as a [= prec:SubstitutionTerm =] for `pvar:edgeIRI`: when used in a rule, [= prec:edgeIRI =] will look for every `pvar:edgeIRI` occurrence, and replace it with something else.

http://bruy.at/prec#SubstitutionTerm

The type of substitution terms. To be effective, a [= prec:SubstitutionTerm =] should have a value for [= prec:substitutionTarget =].

http://bruy.at/prec#substitutionTarget

States the term that is looked for in the template on which the substitution term applies.

Substitution terms defined in every context

The following substitution terms are defined for every context by [= PREC-Context =]:

  • `prec:subject`: Substitution term for `rdf:subject`.
  • `prec:predicate`: Substitution term for `rdf:predicate`.
  • `prec:object`: Substitution term for `rdf:object`.
  • `prec:edgeIRI`: Substitution term for `pvar:edgeIRI`. Expected to be used in most [= prec:EdgeRule =]s.
  • `prec:propertyIRI`: Substitution term for `pvar:propertyIRI`. Expected to be used in most [= prec:PropertyRule =].

`prec:subject`, `prec:predicate`, `prec:object` are defined to represent an edge as a fully fledge object. They let the user rename more accurate terms than the "RDF grounded terms". This is inspired by http://www.bobdc.com/blog/reification-is-a-red-herring/.

Test infrastructure

Used in `./test/prec/*.ttl` files, which are used for unit tests. These files are inputs for `./z_prec.js`.

Each unit test contains:

Used IRIs