commit | 89f71388adb7576839b1a3fe9770e5757dc040ad | [log] [tgz] |
---|---|---|
author | Rainer Perkuhn <perkuhn@ids-mannheim.de> | Wed Dec 06 15:19:15 2023 +0100 |
committer | Rainer Perkuhn <perkuhn@ids-mannheim.de> | Wed Dec 06 15:19:15 2023 +0100 |
tree | 30256c9f577dc32bd3150c38a805995aa73eb0e7 | |
parent | 28bbc1c3cc9c944ffb0ee5570a5b8fd012583989 [diff] |
Fixed RE for excluding Romanze (without relying on $ as EOL) Proved as correct corpusStats(kco, "textType=/.*[Rr]oman.*/") corpusStats(kco, "textType=/.*[Rr]omanz.*/") corpusStats(kco, "textType=/.*[Rr]oman([^z].*|$)/") corpusStats(kco, "textType=/.*[Rr]oman([^z].*)?/") Getting size of virtual corpus "textType=/.*[Rr]oman.*/": 35793450 tokens Getting size of virtual corpus "textType=/.*[Rr]omanz.*/": 25404 tokens Getting size of virtual corpus "textType=/.*[Rr]oman([^z].*|$)/": 9398408 tokens Getting size of virtual corpus "textType=/.*[Rr]oman([^z].*)?/": 35768046 tokens 35793450 - 25404 = 35768046
Prototype of a corpus composition analyzer for KorAP
docker build -f Dockerfile -t korap/corpuscomposition:snapshot .
Or get the docker image from GitLab:
curl -L 'https://gitlab.ids-mannheim.de/KorAP/CorpusCompositionAnalyzer/-/jobs/artifacts/master/raw/corpuscomposition-snapshot.xz?job=build-docker-image' | unxz | docker load
docker run --rm -p 3838:3838 korap/corpuscomposition:snapshot
Then open http://localhost:3838/ for the default corpora or http://localhost:3838/?cq=<vc-definition-1>;<vc-definition-2>;<vc-definition-n>
for comparing specific corpora.
Run shiny/app.R
in RStudio or from the command line:
R -e "shiny::runApp('shiny/app.R')"