Add matchStart and matchEnd columns to collectedMatches in corpusQuery result

Resolves #22

Change-Id: I6af9de503e5911cbe5c566b0fae529cfba7b764c
5 files changed
tree: 7b00feec712cda9f3c1ff8887fd80f1d3331615e
  1. .github/
  2. ci/
  3. demo/
  4. inst/
  5. man/
  6. R/
  7. tests/
  8. .gitignore
  9. .gitlab-ci.yml
  10. .Rbuildignore
  11. cran-comments.md
  12. DESCRIPTION
  13. LICENSE
  14. LICENSE.md
  15. NAMESPACE
  16. NEWS.md
  17. Readme.md
  18. RKorAPClient.Rproj
Readme.md

KorAP web service client package for R

CRAN_Status_Badge CRAN downloads Project Status: Active – The project has reached a stable, usable state and is being actively developed. Lifecycle:stable R build status Codecov test coverage Last commit GitHub closed issues GitHub issues Github Stars

Description

R client package to access the web service API of the KorAP Corpus Analysis Platform developed at IDS Mannheim

Examples

Hello world

library(RKorAPClient)
new("KorAPConnection", verbose=TRUE) %>% corpusQuery("Hello world") %>% fetchAll()

Frequencies over time and domains using ggplot2

library(RKorAPClient)
library(ggplot2)
kco <- new("KorAPConnection", verbose=TRUE)
expand_grid(condition = c("textDomain = /Wirtschaft.*/", "textDomain != /Wirtschaft.*/"), 
            year = (2002:2018)) %>%
    cbind(frequencyQuery(kco, "[tt/l=Heuschrecke]", paste0(.$condition," & pubDate in ", .$year)))  %>%
    ipm() %>%
    ggplot(aes(x = year, y = ipm, fill = condition, colour = condition)) +
    geom_freq_by_year_ci()

Percentages over time using highcharter

See the Highcharts license notes below.

library(RKorAPClient)
query = c("macht []{0,3} Sinn", "ergibt []{0,3} Sinn")
years = c(1980:2010)
as.alternatives = TRUE
vc = "textType = /Zeit.*/ & pubDate in"
new("KorAPConnection", verbose=T) %>%
  frequencyQuery(query, paste(vc, years), as.alternatives = as.alternatives) %>%
  hc_freq_by_year_ci(as.alternatives)

Proportion of "ergibt … Sinn"  versus "macht … Sinn" between 1980 and 2010 in newspapers and magazines

Identify in … setzen light verb constructions by using the new collocationAnalysis function

Lifecycle:experimental

library(RKorAPClient)
library(knitr)
new("KorAPConnection", verbose = TRUE) %>%
  collocationAnalysis(
    "focus(in [tt/p=NN] {[tt/l=setzen]})",
    leftContextSize = 1,
    rightContextSize = 0,
    exactFrequencies = FALSE,
    searchHitsSampleLimit = 1000,
    topCollocatesLimit = 20
  ) %>%
  mutate(LVC = sprintf("[in %s setzen](%s)", collocate, webUIRequestUrl)) %>%
  select(LVC, logDice, pmi, ll) %>%
  head(10) %>%
  kable(format="pipe", digits=2)
LVClogDicepmill
in Szene setzen9.6610.86465467.52
in Gang setzen9.2110.57256146.92
in Verbindung setzen8.469.62189682.19
in Kenntnis setzen8.289.81101112.02
in Bewegung setzen8.119.24149397.91
in Brand setzen8.109.33122427.05
in Anführungszeichen setzen7.5011.9633959.99
in Kraft setzen6.887.8877796.85
in Marsch setzen6.879.2722041.63
in Klammern setzen6.5510.0815643.27

Authorizing RKorAPClient applications to access restricted KWICs from copyrighted texts

In order to perform collocation analysis and other textual queries on corpus parts for which KWIC access requires a login, you need to authorize your application with an access token.

In the case of DeReKo, this can be done in two different ways.

The old way: Authorize your RKorAPClient application manually

  1. Log in into the KorAP DeReKo instance
  2. Open the KorAP OAuth settings
  3. If you have not yet registered a client application, or not the desired one, register one (it is sufficient to fill in the marked fields).
  4. Click the intended client application name.
  5. If you do not have any access tokens yet, click on the "Issue new token" button.
  6. Copy one of your access tokens to you clipboard by clicking on the copy symbol ⎘ behind it.
  7. In R/RStudio, paste the token into you KorAPConnection initialization, overwriting <access token> in the following example:
    kco <- new("KorAPConnection", accessToken="<access token>")
    

The whole process is shown in this video:

https://user-images.githubusercontent.com/11092081/142769056-b389649b-eac4-435f-ac6d-1715474a5605.mp4

The new way (since March 2023)[^1]: Authorize your RKorAPClient application via the usual OAuth browser flow

[^1]: This new method has been made possible purely on the server side, so that it will also work with older versions of RKorAPClient.

  1. Follow steps 1-4 of the old way shown above.
  2. Click on the copy symbol ⎘ behind the ID of your client application.
  3. Paste your clipboard content overwriting <application ID> in the following example code:
    library(httr)
    
    korap_app <- oauth_app("korap-client", key = "<application ID>", secret = NULL)
    korap_endpoint <- oauth_endpoint(NULL,
                  "settings/oauth/authorize",
                  "api/v1.0/oauth2/token",
                  base_url = "https://korap.ids-mannheim.de")
    token_bundle = oauth2.0_token(korap_endpoint, korap_app, scope = "search match_info", cache = FALSE)
    
    kco <- new("KorAPConnection", accessToken = token_bundle[["credentials"]][["access_token"]])
    

See also the displayKwics demo.

How to request access, only if no access token has been provided or persisted, is illustrated in the gender variants demo (try demo("pluralGenderVariants") ) and in the adjective collocates demo (try demo("adjectiveCollocates") ).

Storing and testing your authorized access

You can also persist the access token for subsequent sessions with the persistAccessToken function:

persistAccessToken(kco)

Afterwards a simple kco <- new("KorAPConnection") will retrieve the stored token.

To use the access token for simple corpus queries, i.e. to make corpusQuery return KWIC snippets, the metadataOnly parameter must be set to FALSE, for example:

corpusQuery(kco, "Ameisenplage", metadataOnly = FALSE) %>% fetchAll()

should return KWIC snippets, if you have authorized your application successfully.

Demos

More elaborate R scripts demonstrating the use of the package can be found in the demo folder.

Installation

Install R and RStudio

  1. Install latest R version for your OS, following the instructions from CRAN
  2. Download and install latest RStudio Desktop from RStudio downloads

Install the RKorAPClient package

Linux only: Install system dependencies

# Debian, Ubuntu, ...
sudo apt -f install # install possibly missing RStudio dependencies
sudo apt install r-base-dev r-cran-rcpp r-cran-cpp11 libcurl4-gnutls-dev libxml2-dev libsodium-dev libsecret-1-dev libfontconfig1-dev libssl-dev libv8-dev

# Fedora, CentOS, RHEL (for older versions use `yum` instead of `dnf`)
sudo dnf install R-devel libcurl-devel openssl-devel libxml2-devel libsodium-devel libsecret-devel fontconfig-devel v8-devel

# Arch Linux
pacman -S base-devel gcc-fortran libsodium curl

In RStudio

Start RStudio and click on Install Packages… in the Tools menu. Enter RKorAPClient in the Packages input field and click on the Install button (keeping Install Dependencies checked).

Installation of RKorAPClient package in RStudio

If the installation fails for some reason, you might need to update your installed R packages first (Tools -> Check for Package Updates, Select All, Install Updates).

Or from the command line

Start R, then install RKorAPClient from CRAN (or development version from GitHub or KorAP's gerrit server).

CRAN version:
install.packages("RKorAPClient")
Development version (alternatives):
devtools::install_github("KorAP/RKorAPClient")
remotes::install_github("KorAP/RKorAPClient")
devtools::install_git("https://korap.ids-mannheim.de/gerrit/KorAP/RKorAPClient")
remotes::install_git("https://korap.ids-mannheim.de/gerrit/KorAP/RKorAPClient")

Full installation videos

Mac

https://user-images.githubusercontent.com/11092081/142773435-ea7ef92a-7ea4-4c6d-a252-950e486352f2.mp4

Ubuntu

https://user-images.githubusercontent.com/11092081/142772382-1354b8db-551f-48de-a416-4fd59267662d.mp4

Development and License

RKorAPClient

Authors: Marc Kupietz, Nils Diewald

Copyright (c) 2024, Leibniz Institute for the German Language, Mannheim, Germany

This package is developed as part of the KorAP Corpus Analysis Platform at the Leibniz Institute for German Language (IDS).

It is published under the BSD-2 License.

Further Affected Licenses and Terms of Services

Bundled Assets

The KorAP logo was designed by Norbert Cußler-Volz and is released under the terms of the Creative Commons License BY-NC-ND 4.0.

Highcharts

RKorAPClient imports parts of the highcharter package which has a dependency on Highcharts, a commercial JavaScript charting library. Highcharts offers both a commercial license as well as a free non-commercial license. Please review the licensing options and terms before using the highcharter plot options, as the RKorAPClient license neither provides nor implies a license for Highcharts.

Highcharts is a Highsoft product which is not free for commercial and governmental use.

Accessed API Services

By using RKorAPClient you agree to the respective terms of use of the accessed KorAP API services which will be printed upon opening a connection (new("KorAPConnection", ...).

Contributions

Contributions are very welcome!

Your contributions should ideally be committed via our Gerrit server to facilitate reviewing (see Gerrit Code Review - A Quick Introduction if you are not familiar with Gerrit). However, we are also happy to accept comments and pull requests via GitHub.

Please note that unless you explicitly state otherwise any contribution intentionally submitted for inclusion into this software shall – as this software itself – be under the BSD-2 License.

References