KorAP consists of several components, but they can easily be installed together using Docker. This repository contains a recipe to install all components needed to run KorAP on a local machine with a single command.
In addition, all relevant tools are installed and made available that are necessary for data conversion and indexing of corpora in the widely used TEI-P5 (I5) format for KorAP. For different options of the tools we refer to the respective repositories.
Install docker and docker compose.
To download, intialize and run KorAP pointing to a certain directory index (in this example index
in the local directory), run
$ INDEX=./index docker-compose up
This will make the frontend be available at localhost:64543
.
Depending on the corpus data to be indexed, it must first be converted. In the case of a conversion from TEI P5 (I5) format, the tools required for this have already been installed with the command above.
In the following we take the Dortmunder Chatkorpus 2.2 as an example to build an index.
The file is located at example/dck-part1.i5.xml
.
The command ...
$ docker run --rm \ -v ${PWD}/example/:/data korap/kalamar:latest-conv \ tei2korapxml \ --inline-tokens '!cmc#morpho' \ --input /data/dck-part1.i5.xml > dck.zip
... will convert the i5 file into a KorAP-XML file using tei2korapxml.
To convert the KorAP-XML archive in a second step into individual Krill JSON files, the following command ...
$ mkdir json $ docker run --rm -u root \ -v ${PWD}/:/kalamar/data/ korap/kalamar:latest-conv\ korapxml2krill archive \ -z \ -i /kalamar/data/dck.zip \ --jobs -1 \ --token 'cmc#morpho' \ --base-paragraphs 'DeReKo#Structure' \ --base-sentences 'DeReKo#Structure' \ -o ./data/json/
... will use korapxml2krill.
Depending on how the source data is designed, different parameters must be specified for the conversion.
Krill's indexer tool can now be used to index the json files:
$ mkdir index $ docker run -u root --rm -v ${PWD}/:/data/ korap/kustvakt \ Krill-Indexer.jar -c kustvakt-lite.conf \ -i /data/json -o /data/index/
After that, the index can be loaded with the aforementioned call and is searchable via the browser.