DeReKoVecs (server and web app)

Visualizes paradigmatic and syntagmatic relations between words based on wang2vec / structured skip-n-gram (Ling et al. 2015) word embeddings (Mikolov et al. 2013) and word embedding networks.

DeReKoVecs (Fankhauser / Kupietz 2017, 2019, 2022; Kupietz et al. 2018) serves as part of the new open lab of the Corpus Linguistics group at IDS Mannheim. Similar to the Collocation Database CCDB (Keibel / Belica 2007, Belica 2011), DeReKoVecs serves for investigating and comparing of measurements, dimension reduction procedures, visualizations etc., to track down detailed paradigmatic and syntagmatic relations between words based on their use in very large corpora such as the German Reference Corpus DeReKo (Kupietz et al. 2010).

Installation from source


Build and install

cpanm --installdeps .

perl Makefile.PL
make install

A detailed and known to work installation procedure can also be found in the GitLab CI pipeline script.

Please note the IDS::DeReKoVecs::Read is not stable and not recommended to be used, yet.

Build your own models

You can build you own models with dereko2vec.


From prebuilt docker image

docker run -v ./example-models:/example-models:z -e MOJO_CONFIG=/example-models/example-docker.conf -p 3000:3000 idscorpuslinguistics/derekovecs

From prebuilt docker image with docker compose

docker compose up

From source in debug mode

MOJO_CONFIG=$(pwd)/example.conf morbo script/derekovecs-server

From source in production mode

MOJO_CONFIG=$(pwd)/example.conf hypnotoad script/derekovecs-server

The web user interface will than be available for example at http://localhost:3000

Web Service API

In addition to the web user interface, derekovecs also provides a web api which is however still very unsystematic and not stable. To figure out the meaning of still undocumented result components, have a look at the table head mouse-overs in the GUI or at the source code around here.

/word, n, dedupe, cutoff, json=1get paradigmatic and syntagmatic neighbours, from word embeddings
getCollocationAssociationw, cget association scores for specific node collocate pairs
getSimilarityw1, w2get cosine similarity of w1 and w2
getVersionget version of derekovecs
getModelNameget name of model (inferred from the file name)
getVocabSizeget vocabulary size of model

Get classical (count-based) collocates

getClassicCollocatorswget count based collocates of word w

Example Result (node: Grund)

   "N" : 55650540526,           // number of tokens in corpus
   "collocates" : [             // array of collocates
         "afwin" : 64,          // binary encoded auto-focus window
                                // (see Perkuhn et al. 2012: E8-15):
                                // 64 = 2^6 ≙ 00010 node 00000
                                // (Aus [gutem] Grund)
         "delta" : 0,           // rank delta compared to collocation in a background
                                // corpus (currently unused)
         "dice" : 0.00198886,   // dice score
         "f" : 113490,          // abs. frequency of collocation
         "f2" : 10965575,       // abs. frequency of collocate
         "ld" : 5.02616,        // log-dice score (Rychlý 2008) for whole window
         "ldaf" : 7.39257,      // log-dice score for auto focus window
         "lfmd" : 36.0655,      // log-frequency biased mutual dependency ≙ pmi³
                                // (Dalle 1994; Thanopoulos et al. 2002)
         "llr" : 204906,        // log-likelihood (Dunning 1993; Evert 2004)
         "ln_count" : 36,       // frequency of collocate as left neighbour of node
         "ln_pmi" : -5.81926,   // pmi as left neighbour
         "md" : 19.2733,        // mutual dependency ≙ pmi²
                                // (Dalle 1994; Thanopoulos et al. 2002)
         "npmi" : 0.111633,     // normalized pmi (Bouma 2009)
         "pmi" : 2.4811,        // pointwise mutual information
         "rn_count" : 386,      // frequency of collocate as right neighbour of node
         "rn_pmi" : -2.39672,   // pmi as right neighbour
         "win" : 1023,          // binary encoded positions at which the collocate
                                // appears at least once 1023 = 2^10-1 ≙ 11111 node 11111
                                // (unmarked scores refer to this)
         "word" : "Aus"         // collocate
      // ...

Get top predictive collocates position-wise

/getPosWiseW2VCollocatorsw(,max=200,format=json)get top maxpredictive collocates position-wise of word w


GET 'http://localhost:3000/?word=Grund&n=10&dedupe=0&sort=0&cutoff=1000000&json=1' | json_pp |less
curl -L http://localhost:3000/getClassicCollocators?w=Grund
GET 'http://localhost:3000/getCollocationAssociation?w=Grund&c=diesem'
GET 'http://localhost:3000/getPosWiseW2VCollocators?w=Test'

(Build and) run using docker / podman

Optional: Build docker image from source

docker build -t idscorpuslinguistics/derekovecs .

Optional: Slim down image using Slim(toolkit)

slim build --include-path /usr/local/share/perl5 --mount ./example-models:/example-models:z --env MOJO_CONFIG=/example-models/example-docker.conf idscorpuslinguistics/derekovecs

Will build an image ids-kl/derekovecs.slim reduced to ~25% of the original size.

Run docker image

docker run -v ./example-models:/example-models:z -e MOJO_CONFIG=/example-models/example-docker.conf -p 3000:3000 idscorpuslinguistics/derekovecs

Client library for R

See rderekovecs.


See Changelog

Development and License

Author: Marc Kupietz

Contributors: Peter Fankhauser, Rainer Perkuhn, Tim Feldmüller

Copyright (c) 2016-2024, Leibniz Institute for the German Language, Mannheim, Germany

DeReKoVecs is published under the Apache 2.0 License.

How to cite

If you are using DeReKoVecs (results) for a scientific publication, please cite at least Fankhauser / Kupietz (2022).


