commit | f48811229c96fa0dc4288d494dba917e9616437b | [log] [tgz] |
---|---|---|
author | Marc Kupietz <kupietz@ids-mannheim.de> | Tue Dec 17 14:55:39 2024 +0100 |
committer | Marc Kupietz <kupietz@ids-mannheim.de> | Wed Dec 18 15:55:37 2024 +0100 |
tree | 7b00feec712cda9f3c1ff8887fd80f1d3331615e | |
parent | e5374f2d084e63309105093994e9b8c83f9d3521 [diff] |
Add matchStart and matchEnd columns to collectedMatches in corpusQuery result Resolves #22 Change-Id: I6af9de503e5911cbe5c566b0fae529cfba7b764c
R client package to access the web service API of the KorAP Corpus Analysis Platform developed at IDS Mannheim
library(RKorAPClient) new("KorAPConnection", verbose=TRUE) %>% corpusQuery("Hello world") %>% fetchAll()
library(RKorAPClient) library(ggplot2) kco <- new("KorAPConnection", verbose=TRUE) expand_grid(condition = c("textDomain = /Wirtschaft.*/", "textDomain != /Wirtschaft.*/"), year = (2002:2018)) %>% cbind(frequencyQuery(kco, "[tt/l=Heuschrecke]", paste0(.$condition," & pubDate in ", .$year))) %>% ipm() %>% ggplot(aes(x = year, y = ipm, fill = condition, colour = condition)) + geom_freq_by_year_ci()
See the Highcharts license notes below.
library(RKorAPClient) query = c("macht []{0,3} Sinn", "ergibt []{0,3} Sinn") years = c(1980:2010) as.alternatives = TRUE vc = "textType = /Zeit.*/ & pubDate in" new("KorAPConnection", verbose=T) %>% frequencyQuery(query, paste(vc, years), as.alternatives = as.alternatives) %>% hc_freq_by_year_ci(as.alternatives)
collocationAnalysis
functionlibrary(RKorAPClient) library(knitr) new("KorAPConnection", verbose = TRUE) %>% collocationAnalysis( "focus(in [tt/p=NN] {[tt/l=setzen]})", leftContextSize = 1, rightContextSize = 0, exactFrequencies = FALSE, searchHitsSampleLimit = 1000, topCollocatesLimit = 20 ) %>% mutate(LVC = sprintf("[in %s setzen](%s)", collocate, webUIRequestUrl)) %>% select(LVC, logDice, pmi, ll) %>% head(10) %>% kable(format="pipe", digits=2)
LVC | logDice | pmi | ll |
---|---|---|---|
in Szene setzen | 9.66 | 10.86 | 465467.52 |
in Gang setzen | 9.21 | 10.57 | 256146.92 |
in Verbindung setzen | 8.46 | 9.62 | 189682.19 |
in Kenntnis setzen | 8.28 | 9.81 | 101112.02 |
in Bewegung setzen | 8.11 | 9.24 | 149397.91 |
in Brand setzen | 8.10 | 9.33 | 122427.05 |
in Anführungszeichen setzen | 7.50 | 11.96 | 33959.99 |
in Kraft setzen | 6.88 | 7.88 | 77796.85 |
in Marsch setzen | 6.87 | 9.27 | 22041.63 |
in Klammern setzen | 6.55 | 10.08 | 15643.27 |
In order to perform collocation analysis and other textual queries on corpus parts for which KWIC access requires a login, you need to authorize your application with an access token.
In the case of DeReKo, this can be done in two different ways.
<access token>
in the following example:kco <- new("KorAPConnection", accessToken="<access token>")
The whole process is shown in this video:
[^1]: This new method has been made possible purely on the server side, so that it will also work with older versions of RKorAPClient.
<application ID>
in the following example code:library(httr) korap_app <- oauth_app("korap-client", key = "<application ID>", secret = NULL) korap_endpoint <- oauth_endpoint(NULL, "settings/oauth/authorize", "api/v1.0/oauth2/token", base_url = "https://korap.ids-mannheim.de") token_bundle = oauth2.0_token(korap_endpoint, korap_app, scope = "search match_info", cache = FALSE) kco <- new("KorAPConnection", accessToken = token_bundle[["credentials"]][["access_token"]])
See also the displayKwics demo.
How to request access, only if no access token has been provided or persisted, is illustrated in the gender variants demo (try demo("pluralGenderVariants")
) and in the adjective collocates demo (try demo("adjectiveCollocates")
).
You can also persist the access token for subsequent sessions with the persistAccessToken
function:
persistAccessToken(kco)
Afterwards a simple kco <- new("KorAPConnection")
will retrieve the stored token.
To use the access token for simple corpus queries, i.e. to make corpusQuery
return KWIC snippets, the metadataOnly
parameter must be set to FALSE
, for example:
corpusQuery(kco, "Ameisenplage", metadataOnly = FALSE) %>% fetchAll()
should return KWIC snippets, if you have authorized your application successfully.
More elaborate R scripts demonstrating the use of the package can be found in the demo folder.
# Debian, Ubuntu, ... sudo apt -f install # install possibly missing RStudio dependencies sudo apt install r-base-dev r-cran-rcpp r-cran-cpp11 libcurl4-gnutls-dev libxml2-dev libsodium-dev libsecret-1-dev libfontconfig1-dev libssl-dev libv8-dev # Fedora, CentOS, RHEL (for older versions use `yum` instead of `dnf`) sudo dnf install R-devel libcurl-devel openssl-devel libxml2-devel libsodium-devel libsecret-devel fontconfig-devel v8-devel # Arch Linux pacman -S base-devel gcc-fortran libsodium curl
Start RStudio and click on Install Packages… in the Tools menu. Enter RKorAPClient in the Packages input field and click on the Install button (keeping Install Dependencies checked).
If the installation fails for some reason, you might need to update your installed R packages first (Tools -> Check for Package Updates, Select All, Install Updates).
Start R, then install RKorAPClient from CRAN (or development version from GitHub or KorAP's gerrit server).
install.packages("RKorAPClient")
devtools::install_github("KorAP/RKorAPClient") remotes::install_github("KorAP/RKorAPClient") devtools::install_git("https://korap.ids-mannheim.de/gerrit/KorAP/RKorAPClient") remotes::install_git("https://korap.ids-mannheim.de/gerrit/KorAP/RKorAPClient")
Authors: Marc Kupietz, Nils Diewald
Copyright (c) 2024, Leibniz Institute for the German Language, Mannheim, Germany
This package is developed as part of the KorAP Corpus Analysis Platform at the Leibniz Institute for German Language (IDS).
It is published under the BSD-2 License.
The KorAP logo was designed by Norbert Cußler-Volz and is released under the terms of the Creative Commons License BY-NC-ND 4.0.
RKorAPClient imports parts of the highcharter package which has a dependency on Highcharts, a commercial JavaScript charting library. Highcharts offers both a commercial license as well as a free non-commercial license. Please review the licensing options and terms before using the highcharter plot options, as the RKorAPClient
license neither provides nor implies a license for Highcharts.
Highcharts is a Highsoft product which is not free for commercial and governmental use.
By using RKorAPClient you agree to the respective terms of use of the accessed KorAP API services which will be printed upon opening a connection (new("KorAPConnection", ...
).
Contributions are very welcome!
Your contributions should ideally be committed via our Gerrit server to facilitate reviewing (see Gerrit Code Review - A Quick Introduction if you are not familiar with Gerrit). However, we are also happy to accept comments and pull requests via GitHub.
Please note that unless you explicitly state otherwise any contribution intentionally submitted for inclusion into this software shall – as this software itself – be under the BSD-2 License.
Kupietz, Marc / Margaretha, Eliza / Diewald, Nils / Lüngen, Harald / Fankhauser, Peter (2019): What’s New in EuReCo? Interoperability, Comparable Corpora, Licensing. In: Bański, Piotr/Barbaresi, Adrien/Biber, Hanno/Breiteneder, Evelyn/Clematide, Simon/Kupietz, Marc/Lüngen, Harald/Iliadi, Caroline (eds.): Proceedings of the International Corpus Linguistics Conference 2019 Workshop "Challenges in the Management of Large Corpora (CMLC-7)", 22nd of July Mannheim: Leibniz-Institut für Deutsche Sprache, 33-39.
Kupietz, Marc / Diewald, Nils / Margaretha, Eliza (2020): RKorAPClient: An R package for accessing the German Reference Corpus DeReKo via KorAP. In: Calzolari, Nicoletta, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis (eds.): Proceedings of The 12th Language Resources and Evaluation Conference (LREC 2020). Marseille: European Language Resources Association (ELRA), 7017-7023.