Initial import

Change-Id: I91c66e3ceb8d17e547f2e96bf273beb7a4b76762
5 files changed
tree: e58fd825c18cb343e94c786ca427f9ef9f1ee919
  1. examples/
  2. figures/
  3. .gitignore
  4. LICENSE
  5. Readme.md
Readme.md

Python Client Support

Currently, there is no native KorAP client library for Python, yet. With rpy2, however, you can already use the KorAP client library for R from within Python.

Using the RKorAPClient from within Python

Installing Dependencies

Linux

#### Debian / Ubuntu
sudo apt install r-base r-base-dev libcurl4-gnutls-dev libssl-dev libxml2-dev libsodium-dev python3-pip python3-rpy2 python3-pandas
echo 'install.packages("RKorAPClient", repos="http://cran.rstudio.com/")' | R --vanilla
pip3 install plotly-express

#### Fedora / CentOS / RHEL
sudo yum install r-base R-devel libcurl-devel openssl-devel libxml2-devel libsodium-devel python3-pandas
echo 'install.packages("RKorAPClient", repos="http://cran.rstudio.com/")' | R --vanilla
pip3 install rpy2 plotly-express

Other Operating Systems (currently untested)

Examples

Frequencies over years and countries

import rpy2.robjects.packages as packages
import rpy2.robjects.pandas2ri as pandas2ri
import plotly.express as px
pandas2ri.activate()

QUERY = "Hello World"
YEARS = range(2010, 2019)
COUNTRIES = ["DE", "CH"]

RKorAPClient = packages.importr('RKorAPClient')
kcon = RKorAPClient.KorAPConnection(verbose=True)

vcs = ["textType=/Zeit.*/ & pubPlaceKey=" + c + " & pubDate in " + str(y) for c in COUNTRIES for y in YEARS]
df = RKorAPClient.ipm(RKorAPClient.frequencyQuery(kcon, QUERY, vcs))
df['Year'] = [y for c in COUNTRIES for y in YEARS]
df['Country'] = [c for c in COUNTRIES for y in YEARS]

fig = px.line(df, title=QUERY, x="Year", y="ipm", color="Country",
              error_y="conf.high", error_y_minus="conf.low")
fig.show()

Frequency per million words of “Hello World“ in DE vs. AT from 2010 to 2018 in newspapers and magazines

Accessed API Services

By using the KorAPClient you agree to the respective terms of use of the accessed KorAP API services which will be printed upon opening a connection.

Development and License

Author: Marc Kupietz

Copyright (c) 2020, Leibniz Institute for the German Language, Mannheim, Germany

This package is developed as part of the KorAP Corpus Analysis Platform at the Leibniz Institute for German Language (IDS).

It is published under the BSD-2 License.

Contributions

Contributions are very welcome!

Your contributions should ideally be committed via our Gerrit server to facilitate reviewing (see Gerrit Code Review - A Quick Introduction if you are not familiar with Gerrit). However, we are also happy to accept comments and pull requests via GitHub.

Please note that unless you explicitly state otherwise any contribution intentionally submitted for inclusion into this software shall – as this software itself – be under the BSD-2 License.

References