Fixed RE for excluding Romanze (without relying on $ as EOL)

Proved as correct

corpusStats(kco, "textType=/.*[Rr]oman.*/")
corpusStats(kco, "textType=/.*[Rr]omanz.*/")
corpusStats(kco, "textType=/.*[Rr]oman([^z].*|$)/")
corpusStats(kco, "textType=/.*[Rr]oman([^z].*)?/")

Getting size of virtual corpus "textType=/.*[Rr]oman.*/": 35793450 tokens
Getting size of virtual corpus "textType=/.*[Rr]omanz.*/": 25404 tokens
Getting size of virtual corpus "textType=/.*[Rr]oman([^z].*|$)/": 9398408 tokens
Getting size of virtual corpus "textType=/.*[Rr]oman([^z].*)?/": 35768046 tokens

35793450 - 25404 = 35768046
1 file changed
tree: 30256c9f577dc32bd3150c38a805995aa73eb0e7
  1. ci/
  2. shiny/
  3. .gitignore
  4. .gitlab-ci-local-env
  5. .gitlab-ci.yml
  6. CorpusCompositionAnalyzer.Rproj
  7. Dockerfile
  8. Readme.md
Readme.md

Corpus Composition Analyzer (Prototype)

Prototype of a corpus composition analyzer for KorAP

Installation

docker build -f Dockerfile -t korap/corpuscomposition:snapshot .

Or get the docker image from GitLab:

curl -L 'https://gitlab.ids-mannheim.de/KorAP/CorpusCompositionAnalyzer/-/jobs/artifacts/master/raw/corpuscomposition-snapshot.xz?job=build-docker-image' | unxz | docker load

Run

docker run --rm -p 3838:3838 korap/corpuscomposition:snapshot

Then open http://localhost:3838/ for the default corpora or http://localhost:3838/?cq=<vc-definition-1>;<vc-definition-2>;<vc-definition-n> for comparing specific corpora.

Run and test without Docker

Run shiny/app.R in RStudio or from the command line:

R -e "shiny::runApp('shiny/app.R')"