Update README.md Change-Id: I25c4af0e6d1aad706ab4f9ce092bd5c020dc6e05

commit: 0fb4fd6728706822bdc841832ad6c627f7d4a308 [log] [tgz]
author: Marc Kupietz <kupietz@ids-mannheim.de> Mon Oct 21 18:41:07 2024 +0200
committer: Marc Kupietz <kupietz@ids-mannheim.de> Mon Oct 21 18:41:07 2024 +0200
tree: 002eb585b7f76f8a8eb9a3f47147492946270114
parent: 7f1fc3399340b6c017bad9a8a107616ea8ed0705 [diff]
diff --git a/README.md b/README.md
index 6316f29..b4052ce 100644
--- a/README.md
+++ b/README.md

@@ -1,19 +1,24 @@
 # dereko2vec
-Fork of [wang2vec](https://github.com/wlin12/wang2vec) with extensions for re-training and count based models and a 
-more accurate ETA prognosis.
+
+Fork of [wang2vec](https://github.com/wlin12/wang2vec) with extensions for re-training and count based models, support for tokens with frequencies > 2³² and a more accurate ETA prognosis.
 
 ## Installation
+
 ### Dependencies
+
 * cmake3
 * [libcollocaltordb](https://korap.ids-mannheim.de/gerrit/plugins/gitiles/ids-kl/collocatordb) >= v1.3.0
+
 ### Build and install
-```
+
+```bash
 cd dereko2vec
 mkdir build
 cd build
 cmake ..
 make && ctest3 --extra-verbose && sudo make install
 ```
+
 ## Run
 
 The command to build word embeddings is exactly the same as in the original version, except that we added type 5 for setting up a purely count based collocation database.
@@ -27,7 +32,8 @@
 5 - build a collocation count database instead of word embeddings
 
 ### Example
-```
+
+```bash
 ./dereko2vec -train input_file -output embedding_file -type 0 -size 50 -window 5 -negative 10 -nce 0 -hs 0 -sample 1e-4 -threads 1 -binary 1 -iter 5 -cap 0
 ```
 
@@ -35,12 +41,13 @@
 
 The [KorAP-XML-CoNLL-U](https://github.com/KorAP/KorAP-XML-CoNLL-U) tool can be used to generate input files for dereko2vec from KorAP-XML ZIPs using its tokenization and setence boundary information, for example:
 
-```
+```bash
 korapxml2conllu --word2vec wpd19.zip > wpd19.w2vinput
 ```
 
 ## References
-```
+
+```bash
 @InProceedings{Ling:2015:naacl,  
 author = {Ling, Wang and Dyer, Chris and Black, Alan and Trancoso, Isabel},  
 title="Two/Too Simple Adaptations of word2vec for Syntax Problems",
commit	0fb4fd6728706822bdc841832ad6c627f7d4a308	[log] [tgz]
author	Marc Kupietz <kupietz@ids-mannheim.de>	Mon Oct 21 18:41:07 2024 +0200
committer	Marc Kupietz <kupietz@ids-mannheim.de>	Mon Oct 21 18:41:07 2024 +0200
tree	002eb585b7f76f8a8eb9a3f47147492946270114
parent	7f1fc3399340b6c017bad9a8a107616ea8ed0705 [diff]