commit | 7c857800d9d8f0c691761338d3e5cbf8291b8583 | [log] [tgz] |
---|---|---|
author | Akron <nils@diewald-online.de> | Tue Mar 29 10:13:12 2022 +0200 |
committer | Akron <nils@diewald-online.de> | Tue Mar 29 10:13:12 2022 +0200 |
tree | c650c58015b048c7cc1f34bba1479ed7d10ac340 | |
parent | f06f7fa3ce2d4b9ebc9f5aebb3d0554a8589d23c [diff] |
Some more notes regarding data conversion Change-Id: I52b50d2149885d8a5acfc8e8cd79b2165ed244dc
Install docker and docker compose.
To download, intialize and run KorAP pointing to a certain directory index (in this example myindex
in the local directory), run
$ INDEX=./myindex docker-compose up
This will make the frontend be available at localhost:64543
.
Depending on the corpus data to be indexed, it must first be converted. In the case of a conversion from TEI p5/i5 format, the tools required for this have already been installed with the above command.
In the following we assume that an i5 file mycorpus.i5.xml
is located in the local folder.
The command ...
$ docker run --rm -v ${PWD}:/data korap/kalamar tei2korapxml --input /data/mycorpus.i5.xml > mycorpus.zip
... will convert the i5 file into a KorAP-XML file using tei2korapxml.
To convert the KorAP-XML archive in a second step into individual Krill JSON, the following command ...
$ docker run --rm -u root \ -v ${PWD}/:/kalamar/data/ korap/kalamar korapxml2krill archive \ -z -i /kalamar/data/mycorpus.zip -o ./data/
... will use korapxml2krill.
Depending on how the source data is designed, different parameters must be specified for the conversion.