Marc Kupietz | a6e4ee6 | 2021-03-05 09:00:15 +0100 | [diff] [blame] | 1 | % Generated by roxygen2: do not edit by hand |
| 2 | % Please edit documentation in R/ci.R, R/misc.R |
| 3 | \name{ci} |
| 4 | \alias{ci} |
| 5 | \alias{misc-functions} |
| 6 | \alias{ipm} |
| 7 | \alias{percent} |
| 8 | \alias{queryStringToLabel} |
| 9 | \alias{geom_freq_by_year_ci} |
Marc Kupietz | a6e4ee6 | 2021-03-05 09:00:15 +0100 | [diff] [blame] | 10 | \title{Add confidence interval and relative frequency variables} |
| 11 | \usage{ |
| 12 | ci(df, x = totalResults, N = total, conf.level = 0.95) |
| 13 | |
| 14 | ipm(df) |
| 15 | |
| 16 | percent(df) |
| 17 | |
| 18 | queryStringToLabel(data, pubDateOnly = FALSE, excludePubDate = FALSE) |
| 19 | |
| 20 | geom_freq_by_year_ci(mapping = aes(ymin = conf.low, ymax = conf.high), ...) |
Marc Kupietz | a6e4ee6 | 2021-03-05 09:00:15 +0100 | [diff] [blame] | 21 | } |
| 22 | \arguments{ |
Marc Kupietz | 67edcb5 | 2021-09-20 21:54:24 +0200 | [diff] [blame] | 23 | \item{df}{table returned from \code{\link[=frequencyQuery]{frequencyQuery()}}} |
Marc Kupietz | a6e4ee6 | 2021-03-05 09:00:15 +0100 | [diff] [blame] | 24 | |
| 25 | \item{x}{column with the observed absolute frequency.} |
| 26 | |
| 27 | \item{N}{column with the total frequencies} |
| 28 | |
| 29 | \item{conf.level}{confidence level of the returned confidence interval. Must |
| 30 | be a single number between 0 and 1.} |
| 31 | |
| 32 | \item{data}{string or vector of query or vc definition strings} |
| 33 | |
| 34 | \item{pubDateOnly}{discard all but the publication date} |
| 35 | |
| 36 | \item{excludePubDate}{discard publication date constraints} |
| 37 | |
| 38 | \item{mapping}{Set of aesthetic mappings created by aes() or aes_(). If specified and inherit.aes = TRUE (the default), it is combined with the default mapping at the top level of the plot. You must supply mapping if there is no plot mapping.} |
| 39 | |
Marc Kupietz | 5fb892e | 2021-03-05 08:18:25 +0100 | [diff] [blame] | 40 | \item{...}{Other arguments passed to geom_ribbon, geom_line, and geom_click_point.} |
Marc Kupietz | a6e4ee6 | 2021-03-05 09:00:15 +0100 | [diff] [blame] | 41 | } |
| 42 | \value{ |
| 43 | original table with additional column \code{ipm} and converted columns \code{conf.low} and \code{conf.high} |
| 44 | |
| 45 | original table with converted columns \code{f}, \code{conf.low} and \code{conf.high} |
| 46 | |
| 47 | string or vector of strings with clipped off common prefixes and suffixes |
| 48 | } |
| 49 | \description{ |
Marc Kupietz | 67edcb5 | 2021-09-20 21:54:24 +0200 | [diff] [blame] | 50 | Using \code{\link[=prop.test]{prop.test()}}, \code{ci} adds three columns to a data frame: |
| 51 | \enumerate{ |
| 52 | \item relative frequency (\code{f}) |
| 53 | \item lower bound of a confidence interval (\code{ci.low}) |
| 54 | \item upper bound of a confidence interval |
| 55 | } |
Marc Kupietz | a6e4ee6 | 2021-03-05 09:00:15 +0100 | [diff] [blame] | 56 | |
| 57 | Convenience function for converting frequency tables to instances per |
| 58 | million. |
| 59 | |
| 60 | Convenience function for converting frequency tables of alternative variants |
| 61 | (generated with \code{as.alternatives=TRUE}) to percent. |
| 62 | |
| 63 | Converts a vector of query or vc strings to typically appropriate legend labels |
| 64 | by clipping off prefixes and suffixes that are common to all query strings. |
| 65 | |
| 66 | Experimental convenience function for plotting typical frequency by year graphs with confidence intervals using ggplot2. |
Marc Kupietz | 67edcb5 | 2021-09-20 21:54:24 +0200 | [diff] [blame] | 67 | \strong{Warning:} This function may be moved to a new package. |
Marc Kupietz | a6e4ee6 | 2021-03-05 09:00:15 +0100 | [diff] [blame] | 68 | } |
| 69 | \details{ |
Marc Kupietz | 67edcb5 | 2021-09-20 21:54:24 +0200 | [diff] [blame] | 70 | Given a table with columns \code{f}, \code{conf.low}, and \code{conf.high}, \code{ipm} ads a \verb{column ipm} |
Marc Kupietz | a6e4ee6 | 2021-03-05 09:00:15 +0100 | [diff] [blame] | 71 | und multiplies conf.low and \code{conf.high} with 10^6. |
| 72 | } |
| 73 | \examples{ |
Marc Kupietz | 6ae7605 | 2021-09-21 10:34:00 +0200 | [diff] [blame] | 74 | \dontrun{ |
| 75 | |
Marc Kupietz | a6e4ee6 | 2021-03-05 09:00:15 +0100 | [diff] [blame] | 76 | library(ggplot2) |
| 77 | kco <- new("KorAPConnection", verbose=TRUE) |
| 78 | expand_grid(year=2015:2018, alternatives=c("Hate Speech", "Hatespeech")) \%>\% |
| 79 | bind_cols(corpusQuery(kco, .$alternatives, sprintf("pubDate in \%d", .$year))) \%>\% |
| 80 | mutate(total=corpusStats(kco, vc=vc)$tokens) \%>\% |
| 81 | ci() \%>\% |
| 82 | ggplot(aes(x=year, y=f, fill=query, color=query, ymin=conf.low, ymax=conf.high)) + |
| 83 | geom_point() + geom_line() + geom_ribbon(alpha=.3) |
| 84 | } |
Marc Kupietz | 6ae7605 | 2021-09-21 10:34:00 +0200 | [diff] [blame] | 85 | \dontrun{ |
| 86 | |
Marc Kupietz | a6e4ee6 | 2021-03-05 09:00:15 +0100 | [diff] [blame] | 87 | new("KorAPConnection") \%>\% frequencyQuery("Test", paste0("pubDate in ", 2000:2002)) \%>\% ipm() |
| 88 | } |
Marc Kupietz | 6ae7605 | 2021-09-21 10:34:00 +0200 | [diff] [blame] | 89 | \dontrun{ |
| 90 | |
Marc Kupietz | a6e4ee6 | 2021-03-05 09:00:15 +0100 | [diff] [blame] | 91 | new("KorAPConnection") \%>\% |
| 92 | frequencyQuery(c("Tollpatsch", "Tolpatsch"), |
| 93 | vc=paste0("pubDate in ", 2000:2002), |
| 94 | as.alternatives = TRUE) \%>\% |
| 95 | percent() |
| 96 | } |
| 97 | queryStringToLabel(paste("textType = /Zeit.*/ & pubDate in", c(2010:2019))) |
| 98 | queryStringToLabel(c("[marmot/m=mood:subj]", "[marmot/m=mood:ind]")) |
| 99 | queryStringToLabel(c("wegen dem [tt/p=NN]", "wegen des [tt/p=NN]")) |
| 100 | |
Marc Kupietz | 548ac35 | 2023-04-18 17:38:37 +0200 | [diff] [blame] | 101 | \dontrun{ |
Marc Kupietz | a6e4ee6 | 2021-03-05 09:00:15 +0100 | [diff] [blame] | 102 | library(ggplot2) |
| 103 | kco <- new("KorAPConnection", verbose=TRUE) |
Marc Kupietz | 6ae7605 | 2021-09-21 10:34:00 +0200 | [diff] [blame] | 104 | |
Marc Kupietz | a6e4ee6 | 2021-03-05 09:00:15 +0100 | [diff] [blame] | 105 | expand_grid(condition = c("textDomain = /Wirtschaft.*/", "textDomain != /Wirtschaft.*/"), |
| 106 | year = (2005:2011)) \%>\% |
| 107 | cbind(frequencyQuery(kco, "[tt/l=Heuschrecke]", |
| 108 | paste0(.$condition," & pubDate in ", .$year))) \%>\% |
| 109 | ipm() \%>\% |
| 110 | ggplot(aes(year, ipm, fill = condition, color = condition)) + |
| 111 | geom_freq_by_year_ci() |
| 112 | } |
Marc Kupietz | a6e4ee6 | 2021-03-05 09:00:15 +0100 | [diff] [blame] | 113 | } |
| 114 | \seealso{ |
Marc Kupietz | 67edcb5 | 2021-09-20 21:54:24 +0200 | [diff] [blame] | 115 | \code{ci} is already included in \code{\link[=frequencyQuery]{frequencyQuery()}} |
Marc Kupietz | a6e4ee6 | 2021-03-05 09:00:15 +0100 | [diff] [blame] | 116 | } |