This document describes the syntax and guidelines for writing mapping files for Koral-Mapper. For general project information, installation, and API documentation, see README.md.
A mapping file defines a single mapping list with an ID, optional foundry/layer defaults, and a list of mapping rules:
id: mapping-list-id foundryA: source-foundry layerA: source-layer foundryB: target-foundry layerB: target-layer mappings: - "[pattern1] <> [replacement1]" - "[pattern2] <> [replacement2]"
Mapping files can also be embedded inside a main configuration file under the lists: key (see README.md for configuration file format).
Koral-Mapper supports two mapping types: annotation (the default) and corpus.
Annotation mapping rules rewrite koral:token / koral:term / koral:termGroup structures in query JSON and annotation spans in response snippets.
Each rule consists of two patterns separated by <>. The patterns can be:
[key], [layer=key], [foundry/*=key], [foundry/layer=key], or [foundry/layer=key:value][term1 & term2], [term1 | term2], or [term1 | (term2 & term3)]Example mapping file:
id: stts-upos desc: Mapping from STTS and Universal dependency Part-of-Speech foundryA: opennlp layerA: p foundryB: upos layerB: p mappings: - "[ADJA] <> [ADJ]" - "[ADJD] <> [ADJ & Variant=Short]" - "[ART] <> [DET & PronType=Art]" - "[PIAT] <> [DET & (PronType=Ind | PronType=Neg | PronType=Tot)]"
Koral-Mapper follows a strict precedence hierarchy when determining which foundry and layer values to use during mapping transformations:
Mapping rule foundry/layer (highest priority)
[opennlp/p=DT] has explicit foundry "opennlp" and layer "p"Passed overwrite foundry/layer (medium priority)
foundryA, foundryB, layerA, layerB)Mapping list foundry/layer (lowest priority)
Corpus mapping rules use key=value <> key=value syntax for rewriting koral:doc / koral:docGroup structures in the corpus/collection section of a KoralQuery request, and enriching fields arrays in responses.
mappings: - "textClass=novel <> genre=fiction"
The left side is "side A" and the right side is "side B". With dir=atob, the query matcher rewrites A-side matches to B-side replacements. With dir=btoa, the direction is reversed.
Rules can specify match operators and value types:
mappings: - "pubDate=2020:geq <> yearFrom=2020:geq" # match type (eq, ne, geq, leq, contains, excludes) - "pubDate=2020-01#date <> year=2020#string" # value type (string, regex, date) - "textClass=wissenschaft.*#regex <> genre=science" # regex matching
When a rule specifies a match type (e.g. :geq), it only matches nodes with that exact match type. When no match type is specified, the rule matches any match type and preserves the original.
Rules can use AND (&) and OR (|) groups on either side:
mappings: # Single field → AND group - "textClass=novel <> (genre=fiction & type=book)" # AND group → single field (matches AND docGroups via subset matching) - "genre=fiction <> (textClass=kultur & textClass=musik)" # OR group → single field (matches individual docs or OR docGroups) - "(genre=fiction | genre=novel) <> textClass=belletristik" # Complex: OR-of-AND on B-side - "Entertainment <> ((kultur & musik) | (kultur & film))"
Key points for groups:
fieldA / fieldBWhen fieldA / fieldB are set in the mapping list header, values without a key= prefix are shorthand. The field name is filled in from the header:
id: satek-wiki-dereko type: corpus fieldA: wikiCat fieldB: textClass mappings: # Equivalent to "wikiCat=Entertainment <> textClass=kultur" - "Entertainment <> kultur" # Groups work too - "Entertainment <> (kultur & musik)"
Corpus rules are applied iteratively: each rule is applied to the entire tree in order, and subsequent rules see the transformed result of all previous rules. This means multiple rules can transform successive AST states, just like the annotation matcher.
For each rule, the matcher tries matching at the current node first. If no match is found and the node is a koral:docGroup / koral:fieldGroup, the rule recurses into operands.
OR patterns like (a | b) match in two ways:
koral:doc / koral:field): An OR pattern matches if any operand matches the leaf. For example, the pattern (Entertainment | Culture) matches a single koral:doc with value Entertainment.koral:docGroup / koral:fieldGroup): Structural matching — the node must be an OR group with exactly the same operands (commutative, exact count).AND patterns like (a & b) use subset matching: the node must be an AND koral:docGroup / koral:fieldGroup containing at least all pattern operands. Extra operands beyond the pattern are preserved alongside the replacement.
For example, if the rule is genre=fiction <> (textClass=kultur & textClass=musik) and the input is AND(textClass=kultur, textClass=musik, pubDate=2020), the AND pattern matches (subset of 3 operands), and the result is AND(genre=fiction, pubDate=2020) — the replacement plus the preserved extra operand.
If all operands match (no extras), the group is replaced entirely by the replacement node.
For response field enrichment, the matching rules work as follows:
koral:field entries. OR group replacements are skipped because response fields are flat key/value entries and OR semantics (one-of) cannot be represented.Examples:
(a | b) <> (c & d) — when field a is in the response, both c and d are added.(a | b) <> (c | d) — when field a is in the response, nothing is added (OR replacement skipped).a <> (c & d) — when field a is in the response, both c and d are added.a <> c — when field a is in the response, c is added.(Supported @type aliases: koral:field for koral:doc, koral:fieldGroup for koral:docGroup).
Rules should be ordered from most specific to most general (by total leaf count across both sides, descending). Because rules are applied iteratively, more specific rules should appear first to transform the AST before more general rules get a chance to match. Generated mapping files typically contain complementary rule types such as:
koral:docGroup nodes (subset matching)Because rules are applied iteratively (each rule sees the result of previous rules), you can chain transformations:
mappings: - "textClass=novel <> genre=fiction" - "genre=fiction <> category=lit"
With this configuration and dir=atob, an input textClass=novel is first rewritten to genre=fiction by rule 1, then to category=lit by rule 2.
This also means that for bidirectional mappings, you often need complementary rules that handle decomposed groups:
mappings: # Forward: source category → OR-of-AND target categories (for AtoB) - "Entertainment <> ((kultur & musik) | (kultur & film))" # Reverse AND: multiple source categories ← AND group (for BtoA with AND input) - "(Entertainment | Culture) <> (kultur & film)"