Fix payload document
Change-Id: Iaf59a5c074c707d909606bd08e76bfcd5f25dcde
diff --git a/misc/payloads.md b/misc/payloads.md
index 0327636..9842276 100644
--- a/misc/payloads.md
+++ b/misc/payloads.md
@@ -2,15 +2,15 @@
Apache Lucene supports payloads as arbitrary byte sequences to store information for terms specific to any token position. Krill uses payloads to store various information in a compact way. This documents describes the payload information for index payloads (payloads stored in the index for different term concepts) and computed payloads (payloads created during the retrieval phase).
## Payload Type Identifier (PTI)
-Payloads (both indexed and computed) have a leading byte indicating the type of the payload sequence. This is necessary because the origin (i.e. the requested term) of a payload is lost during the retrieval phase. Payload type identifiers range between 0 and 255 and have the length of a byte (<b>). In case a term has no payload, no payload type identifier is stored.
+Payloads (both indexed and computed) have a leading byte indicating the type of the payload sequence. This is necessary because the origin (i.e. the requested term) of a payload is lost during the retrieval phase. Payload type identifiers range between 0 and 255 and have the length of a byte (\<b\>). In case a term has no payload, no payload type identifier is stored.
## Token-unique Identifier (TUI)
-Terms, elements and relations in the index may contain token-unique identifiers (TUI) to distinguish between lucene-terms starting at the same token position. TUIs are used for matching attributes to terms, elements and relations, and to refer to terms and elements from relations. TUIs have the length of a short (<s>).
+Terms, elements and relations in the index may contain token-unique identifiers (TUI) to distinguish between lucene-terms starting at the same token position. TUIs are used for matching attributes to terms, elements and relations, and to refer to terms and elements from relations. TUIs have the length of a short (\<s\>).
## Index Payloads
### Token position payloads
-A token always has a special character payload storing the start and end offset of the token. The special character is a reference symbol for this payload, which is an underscore followed by the corresponding token position. For example, the _1$<i>0<i>3 is the special character payload for the token in position 1 describing that the token ranges from 0 to 3. This offset information is stored in integer.
+A token always has a special character payload storing the start and end offset of the token. The special character is a reference symbol for this payload, which is an underscore followed by the corresponding token position. For example, the _1$\<i\>0\<i\>3 is the special character payload for the token in position 1 describing that the token ranges from 0 to 3. This offset information is stored in integer.
Token payloads are not retrieved via SpanQueries and therefore do not have a PTI.
### Term payloads
@@ -35,34 +35,34 @@
integer, whereas the TUI is stored in short, and the depth and certainty
information is stored as byte values. The stored data type for the end
element, the depth, the TUI and the certainty are written explicitly:
-<i> for integer (4 bytes), <s> for short (2 bytes), and <b> for byte
-(1 byte). For example:
+\<i\> for integer (4 bytes), \<s\> for short (2 bytes), and \<b\> for
+byte (1 byte). For example:
<>:s$<b>64<i>0<i>38<i>7<b>0
-means that element <s> starts from character offset position 0 and
+means that element \<s\> starts from character offset position 0 and
ends to character offset position 38. The element ends at token
position 7 which is stored in integer. It is a root element or no
further information on a tree level is given (depth=0).
<>:s$<b>64<i>0<i>38<i>7<b>0<s>1
-means <s> has an additional TUI.
+means \<s\> has an additional TUI.
<>:s$<b>64<i>0<i>38<i>7<b>0<b>166
-means <s> has an additional certainty value.
+means \<s\> has an additional certainty value.
<>:s$<b>64<i>0<i>38<i>7<b>0<s>1<b>166
-means <s> has an additional TUI and a certainty value.
+means \<s\> has an additional TUI and a certainty value.
Elements may also be empty - meaning they behave as milestones.
In that case, character offsets are only given once.
<>:s$<b>65<i>38<b>0
-means <s> is a milestone at position 38 in root.
+means \<s\> is a milestone at position 38 in root.
*PTIs* (It’s an element payload if the second bit is set):
64. Element (with optional TUI and certainty)
@@ -78,8 +78,8 @@
Each relation comprises two parts: a left part and a right part.
The positions of a relation instance always refer to the positions
of the left part, that are:
-* the source token/span positions for > relation
-* the target token/span positions for < relation.
+* the source token/span positions for \> relation
+* the target token/span positions for \< relation.
Relation payloads are varied based on the types of their left and
ight parts, which again can be either a source or a target of the
@@ -103,7 +103,7 @@
relation TUI, 1 short for the left-part TUI, and 1 short for
right-part TUI. For example:
- >:dependency$<b>32<i>3<s>3<s>5<s>4
+ \>:dependency$<b>32<i>3<s>3<s>5<s>4
has a token as the right part at (end) position 3, the relation
TUI 3, the source TUI 5 and the target TUI 4.
@@ -112,7 +112,7 @@
has 1 integer for the start position of the right part, 1 integer
or the end position of the right part, and 3 TUIs as above.
- >:dependency$<b>33<i>1<i>3<s>3<s>5<s>4
+ \>:dependency$<b>33<i>1<i>3<s>3<s>5<s>4
means the right part starts at token position 1 and ends at token
position 3.
@@ -122,7 +122,7 @@
to differentiate payload length, 1 integer for end position of the
right part, and 3 TUIs as above.
- >:dependency$<b>34<i>2<b>0<i>3<s>3<s>5<s>4
+ \>:dependency$<b>34<i>2<b>0<i>3<s>3<s>5<s>4
means the left part ends at token position 2, and right part is a
term ending at position 3.
@@ -132,20 +132,20 @@
start position of the right part, 1 integer for end position of the
right part, and 3 TUIs as above.
- >:dependency$<b>35<i>2<i>3<i>4<s>3<s>5<s>4
+ \>:dependency$<b>35<i>2<i>3<i>4<s>3<s>5<s>4
means the left part ends at token position 2, the right part is an
element starting at position 3 and ending at position 4.
*PTIs* (it’s a relation payload if the third bit is set):
-32. >, term to term (with optional TUI and certainty)
-33. >, term to element (with optional TUI and certainty)
-34. >, element to term (with optional TUI and certainty)
-35. >, element to element (with optional TUI and certainty)
-40. <, term to term (with optional TUI and certainty)
-41. <, term to element (with optional TUI and certainty)
-42. <, element to term (with optional TUI and certainty)
-43. <, element to element (with optional TUI and certainty)
+32. \>, term to term (with optional TUI and certainty)
+33. \>, term to element (with optional TUI and certainty)
+34. \>, element to term (with optional TUI and certainty)
+35. \>, element to element (with optional TUI and certainty)
+40. \<, term to term (with optional TUI and certainty)
+41. \<, term to element (with optional TUI and certainty)
+42. \<, element to term (with optional TUI and certainty)
+43. \<, element to element (with optional TUI and certainty)
### Attribute payloads
Each attribute has two payloads: