| commit | c1c3083ad6b7673a8d4bb1324cdf476332611898 | [log] [tgz] |
|---|---|---|
| author | Nicolas Arnold <nicolas.arnold@swhk.ids-mannheim.de> | Wed Jun 19 12:22:39 2024 +0200 |
| committer | Nicolas Arnold <nicolas.arnold@swhk.ids-mannheim.de> | Wed Jun 19 12:31:07 2024 +0200 |
| tree | 3aefcc626b78da175351e588fa76c826ace6200e | |
| parent | 7bb2b6f16e33f6a2459a5016782e9f8c2e36d9fa [diff] |
FIX: Handle EPUB with .htm files as well Fixes #30
make -j $(nproc) target/dnb18.i5.xml SRC_DIR=test/resources/DNB YEARS=18
make -j $(nproc) i5 SRC_DIR=test/resources/DNB
Prerequisite: KorAP-XML-CoNLL-U
make -j $(nproc) target/dnb18.zip SRC_DIR=test/resources/DNB YEARS=18
make -j $(nproc) index
The index will be in target/dnb.index.
Adjust the following line in your korap4dnb-compose.yml to point to your index (it is in target/dnb.index by default, but should better be copied to a safe place):
- "${PWD}/target/dnb.index:/kustvakt/index:z"
and start the docker:
docker compose -p korap4dnb --profile=lite -f korap4dnb-compose.yml up -d
docker compose -p korap4dnb down
Install prerequisite korap/conllu2treetagger and korap/conllu2spacy docker images if not present:
docker image inspect korap/conllu2treetagger:latest || curl -Ls 'https://gitlab.ids-mannheim.de/KorAP/CoNLL-U-Treetagger/-/jobs/artifacts/master/raw/conllu2treetagger.xz?job=build-docker-image' | docker load docker image inspect korap/conllu2spacy:latest || curl -Ls https://corpora.ids-mannheim.de/tools/conllu2spacy.tar.xz | docker load
Make annotations fro dnb20:
make -j $(nproc) target/dnb20.marmot-malt.zip target/dnb20.spacy.zip target/dnb20.tree_tagger.zip
Build KorAP all, up to the deployable index:
make -j $(nproc) all
2024-05-26
2024-05-08
idno elements with all ids given by dnb SRU api2024-04-19
2024-04-15
2024-04-10
make YY=22 to select 20222024-03-24
2024-03-18
make deploy to install new index and restart local KorAP@DNB instance (also available as ci target)show-server-logs and show-server-status make targets to monitor the local KorAP@DNB instance2024-03-17
make all to build all targets, including the index2024-03-16
2024-03-15: DNB test data added
2024-03-08: example EPub and I5 added from DeReKo KJL corpus: Christiane F. ; Kai Hermann ; Horst Rieck: Wir Kinder vom Bahnhof Zoo in the folder test/resources/ – do not distribute (copyrighted data)