commit | 04784b96d4ac2e3a57e1bf4e503c808881d0a4e4 | [log] [tgz] |
---|---|---|
author | Marc Kupietz <kupietz@ids-mannheim.de> | Sun May 04 13:38:12 2025 +0200 |
committer | Marc Kupietz <kupietz@ids-mannheim.de> | Sun May 04 13:41:10 2025 +0200 |
tree | fd9f6908c479e09ad768d6d2995d58ee53026ce5 |
Initial import (translated from rderekovecs) Change-Id: Ib4a4747f6474dfe67d79288be3f8bdaf66a513b8
A Python client package that makes the DeReKoVecs web service API accessible from Python.
pip install git+https://korap.ids-mannheim.de/gerrit/IDS-Mannheim/pyderekovecs.git
Or clone the repository and install locally:
git clone https://korap.ids-mannheim.de/gerrit/IDS-Mannheim/pyderekovecs.git cd pyderekovecs pip install -e .
import pyderekovecs as pd # Get paradigmatic neighbors for a word neighbors = pd.paradigmatic_neighbours("Haus") print(neighbors.head()) # Get syntagmatic neighbors collocates = pd.syntagmatic_neighbours("Haus") print(collocates.head()) # Get word embedding embedding = pd.word_embedding("Haus") print(len(embedding)) # Should be 200 # Calculate cosine similarity between two words similarity = pd.cosine_similarity("Haus", "Gebäude") print(f"Similarity: {similarity}")
import os os.environ["DEREKOVECS_SERVER"] = "https://corpora.ids-mannheim.de/openlab/kokokomvecs"
import os os.environ["DEREKOVECS_SERVER"] = "https://corpora.ids-mannheim.de/openlab/corolavecs"
syntagmatic_neighbours(word, **params)
: Get the syntagmatic neighbour predictions of a wordcountbased_collocates(w, **params)
: Get the collocates of a word in the count-based dereko modelword_frequency(w, **params)
: Get the absolute frequency of a word in the corpuscorpus_size(w, **params)
: Get the token size of the corpus used to train the modelparadigmatic_neighbours(word, **params)
: Get the paradigmatic neighbours of a wordword_embedding(word, **params)
: Get the normalized embedding vector of a wordfrequency_rank(word, **params)
: Get the frequency rank of a word in the training dataserver_version()
: Get the version of the derekovecs servervocab_size()
: Get the vocabulary size of the modelmodel_name()
: Get the name of the modelcollocation_scores(w, c, **params)
: Calculate the association scores between a node and a collocatecosine_similarity(w1, w2, **params)
: Calculate the cosine similarity between two wordsTo run tests:
python -m unittest discover tests