commit | f4d6add8ef718b1665438a8017c73dc1cecfb7fd | [log] [tgz] |
---|---|---|
author | Akron <nils@diewald-online.de> | Thu Nov 14 15:36:14 2024 +0100 |
committer | Akron <nils@diewald-online.de> | Thu Nov 14 15:36:14 2024 +0100 |
tree | c296adbe5d81bb7b4095e47655eba51cdb84aeda | |
parent | e85a372d8706f1906baff9d2c5eec3974a348b8a [diff] |
Don't stop indexing on parsing errors Change-Id: I81e188c9aeb9d7ba89d818e2551b38306f2c5278
Kalamar-Plugin-ExternalResources
is a web service that integrates in the plugin framework of Kalamar, to allow linking texts by their text sigle to external data providers, mainly for full text access.
Kalamar-Plugin-ExternalResources
is meant to be a basic plugin and should demonstrate and evaluate the plugin capabilities of Kalamar.
This is software in an early development stage. Its behaviour may change without warnings!
Go 1.19 or later
To build the latest version of Kalamar-Plugin-ExternalResources
, do ...
git clone https://github.com/KorAP/Kalamar-Plugin-ExternalResources cd Kalamar-Plugin-ExternalResources go test . go build .
The binary can be started without prerequisites. The templates
folder has to be kept in the root directory. A db
folder contains the database.
Registration of the plugin in Kalamar is not yet officially supported - but it works by passing the JSON blob generated at /plugin.json
to the plugin registration handler.
To index further data, the mappings need to be stored in a csv
-file, like
"WPD11/A00/00001","Wikipedia","http://de.wikipedia.org/wiki/Alan_Smithee" "WPD11/A00/00003","Wikipedia","http://de.wikipedia.org/wiki/Actinium" "WPD11/A00/00005","Wikipedia","http://de.wikipedia.org/wiki/Ang_Lee"
With the first column being the textSigle (aka the document identifier), the second being the provider name and the third being the URL. These files can be gzipped as well. Then run the indexation with:
./Kalamar-Plugin-ExternalResources data.csv
The following environment variables can be set either as environment variables or via .env
file in the calling directory.
KORAP_SERVER
: The server URL of the hosting service.KORAP_EXTERNAL_RESOURCES_PORT
: The port the service should be listen to.KORAP_EXTERNAL_RESOURCES
: The exposed URL the service is hosted.Currently no official Docker image is provided. To build an image based on the provided Dockerfile, run
docker build \ -f Dockerfile \ -t korap/kalamar-plugin-externalresources .
To create a container on Linux based on the image with a mounted database in db
and a configuration file, run
docker run \ --rm \ --network host \ -v ${PWD}/db/:/db/:z \ -v ${PWD}/.env:/.env korap/ \ kalamar-plugin-externalresources
To index using docker, run
docker run \ --rm \ --network host \ -v ${PWD}/db/:/db/:z \ -v ${PWD}/data/:/data \ -v ${PWD}/.env:/.env \ korap/kalamar-plugin-externalresources \ data/data.csv.gz
Copyright (c) 2023-2024, IDS Mannheim, Germany
Author: Nils Diewald
Kalamar-Plugin-ExternalResources is developed as part of the KorAP Corpus Analysis Platform at the Leibniz Institute for the German Language (IDS).
Kalamar-Plugin-ExternalResources is published under the BSD-2 License.