| commit | bca9e043b35df6f3e92caccb8197e50947c74374 | [log] [tgz] |
|---|---|---|
| author | Marc Kupietz <kupietz@ids-mannheim.de> | Wed Apr 13 12:44:02 2022 +0200 |
| committer | Marc Kupietz <kupietz@ids-mannheim.de> | Wed Apr 13 12:44:02 2022 +0200 |
| tree | d7d0517511e7c1a17ad102568a9d01abab45ca5d | |
| parent | b738ec7bf64b212a3537df18491c2c05d11076cb [diff] |
Add test for building collocation db Change-Id: I841aa353483e71b0fc595c80655d2c85b82aa292
Fork of wang2vec with extensions for re-training and count based models and a more accurate ETA prognosis.
cd dereko2vec mkdir build cd build cmake .. make && ctest3 --extra-verbose && sudo make install
The command to build word embeddings is exactly the same as in the original version, except that we added type 5 for setting up a purely count based collocation database.
The -type argument is a integer that defines the architecture to use. These are the possible parameters:
0 - cbow
1 - skipngram
2 - cwindow (see below)
3 - structured skipngram(see below)
4 - collobert's senna context window model (still experimental)
5 - build a collocation count database instead of word embeddings
./dereko2vec -train input_file -output embedding_file -type 0 -size 50 -window 5 -negative 10 -nce 0 -hs 0 -sample 1e-4 -threads 1 -binary 1 -iter 5 -cap 0
@InProceedings{Ling:2015:naacl,
author = {Ling, Wang and Dyer, Chris and Black, Alan and Trancoso, Isabel},
title="Two/Too Simple Adaptations of word2vec for Syntax Problems",
booktitle="Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
year="2015",
publisher="Association for Computational Linguistics",
location="Denver, Colorado",
}
@InProceedings{FankhauserKupietz2019,
author = {Peter Fankhauser and Marc Kupietz},
title = {Analyzing domain specific word embeddings for a large corpus of contemporary German},
series = {Proceedings of the 10th International Corpus Linguistics Conference},
publisher = {University of Cardiff},
address = {Cardiff},
year = {2019},
note = {\url{https://doi.org/10.14618/ids-pub-9117}}
}