KorAP web service client package for Python

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. CI check Last commit GitHub closed issues GitHub issues GitHub license PyPI - Python Version PyPI - Downloads

Description

Python client wrapper package to access the web service API of the KorAP Corpus Analysis Platform developed at IDS Mannheim. Currently, this is no native Python package. Internally, it uses KorAP's client package for R via rpy2. The latter also automatically translates between R data frames (or tibbles) and pandas DataFrames.

Installation

1. Install R (version >= 3.5)

From CRAN or, alternatively, on some recent Linux distributions:

#### Debian / Ubuntu
sudo apt-get install -y r-base r-base-dev r-cran-tidyverse r-cran-r.utils r-cran-pixmap r-cran-webshot r-cran-ade4 r-cran-segmented r-cran-purrr r-cran-dygraphs r-cran-cvst r-cran-quantmod r-cran-graphlayouts r-cran-rappdirs r-cran-ggdendro r-cran-seqinr r-cran-heatmaply r-cran-igraph r-cran-plotly libcurl4-gnutls-dev libssl-dev libxml2-dev libsodium-dev python3-pip python3-rpy2 python3-pandas

#### Fedora / CentOS / RHEL
sudo yum install -y R R-devel libcurl-devel openssl-devel libxml2-devel libsodium-devel python3-pandas

2. Windows only: Point environment variables to your R installation, e.g.:

set R_HOME="C:Program Files\R\R-4.0.2"
set R_USER=%R_HOME%
set PATH=%R_HOME%\bin;%R_HOME%\bin\x64;%PATH%

3. Install the R package

Rscript -e "install.packages('RKorAPClient', repos='https://cloud.r-project.org/')"

4. Install the Python package

python3 -m pip install git+https://github.com/KorAP/PythonKorAPClient

Documentation

Currently, there is no dedicated documentation for the Python variant of the library. Please refer to the Refernce Manual of RKorAPClient for now. For translating the R syntax to Python and vice versa, pleas refer to the rpy2 Documentation.

Please note that some arguments in the original RKorAPClient functions use characters that are not allowed in Python keyword argument names. For these cases, you can however use Python's **kwargs syntax. For example, to get the result of corpusStats as a pandas.DataFrame, and print the size of the whole corpus in tokens, you can write:

print(kcon.corpusStats(**{"as.df": True})['tokens'])

Examples

Frequencies over years and countries

from KorAPClient import KorAPClient, KorAPConnection
import plotly.express as px

QUERY = "Hello World"
YEARS = range(2010, 2019)
COUNTRIES = ["DE", "CH"]

kcon = KorAPConnection(verbose=True)

vcs = ["textType=/Zeit.*/ & pubPlaceKey=" + c + " & pubDate in " + str(y) for c in COUNTRIES for y in YEARS]
df = KorAPClient.ipm(kcon.frequencyQuery(QUERY, vcs))
df['Year'] = [y for c in COUNTRIES for y in YEARS]
df['Country'] = [c for c in COUNTRIES for y in YEARS]

fig = px.line(df, title=QUERY, x="Year", y="ipm", color="Country",
              error_y="conf.high", error_y_minus="conf.low")
fig.show()

Frequency per million words of “Hello World“ in DE vs. AT from 2010 to 2018 in newspapers and magazines

Accessed API Services

By using the KorAPClient you agree to the respective terms of use of the accessed KorAP API services which will be printed upon opening a connection.

Development and License

Author: Marc Kupietz

Copyright (c) 2020, Leibniz Institute for the German Language, Mannheim, Germany

This package is developed as part of the KorAP Corpus Analysis Platform at the Leibniz Institute for German Language (IDS).

It is published under the BSD-2 License.

Contributions

Contributions are very welcome!

Your contributions should ideally be committed via our Gerrit server to facilitate reviewing (see Gerrit Code Review - A Quick Introduction if you are not familiar with Gerrit). However, we are also happy to accept comments and pull requests via GitHub.

Please note that unless you explicitly state otherwise any contribution intentionally submitted for inclusion into this software shall – as this software itself – be under the BSD-2 License.

References