fixed and documentation for bunch request function
diff --git a/NAMESPACE b/NAMESPACE
index bfa2a7a..36c8809 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -2,5 +2,6 @@
export(gpt3.test_request)
export(gpt3_authenticate)
+export(gpt3_bunch_request)
export(gpt3_simple_request)
export(url.completions)
diff --git a/R/bunch_request.R b/R/bunch_request.R
index 44fd78f..398a7aa 100644
--- a/R/bunch_request.R
+++ b/R/bunch_request.R
@@ -1,58 +1,131 @@
-gpt3.bunch_request = function(data
- , prompt_var
- , completion_var_name = 'gpt3_completion'
+#' Makes bunch completion requests to the GPT-3 API
+#'
+#' @description
+#' `gpt3_bunch_request()` is the package's main function for rquests and takes as input a vector of prompts and processes each prompt as per the defined parameters. It extends the `gpt3_simple_request()` function to allow for bunch processing of requests to the Open AI GPT-3 API.
+#' @details
+#' The easiest (and intended) use case for this function is to create a data.frame or data.table with variables that contain the prompts to be requested from GPT-3 and a prompt id (see examples below).
+#' For a general guide on the completion requests, see [https://beta.openai.com/docs/guides/completion](https://beta.openai.com/docs/guides/completion). This function provides you with an R wrapper to send requests with the full range of request parameters as detailed on [https://beta.openai.com/docs/api-reference/completions](https://beta.openai.com/docs/api-reference/completions) and reproduced below.
+#'
+#' For the `best_of` parameter: The `gpt3_simple_request()` (which is used here in a vectorised manner) handles the issue that best_of must be greater than n by setting if(best_of <= n){ best_of = n}.
+#'
+#' If `id_var` is not provided, the function will use `prompt_1` ... `prompt_n` as id variable.
+#'
+#' Parameters not included/supported:
+#' - `logit_bias`: [https://beta.openai.com/docs/api-reference/completions/create#completions/create-logit_bias](https://beta.openai.com/docs/api-reference/completions/create#completions/create-logit_bias)
+#' - `echo`: [https://beta.openai.com/docs/api-reference/completions/create#completions/create-echo](https://beta.openai.com/docs/api-reference/completions/create#completions/create-echo)
+#' - `stream`: [https://beta.openai.com/docs/api-reference/completions/create#completions/create-stream](https://beta.openai.com/docs/api-reference/completions/create#completions/create-stream)
+#'
+#' @param prompt_var character vector that contains the prompts to the GPT-3 request
+#' @param id_var (optional) character vector that contains the user-defined ids of the prompts. See details.
+#' @param param_model a character vector that indicates the [model](https://beta.openai.com/docs/models/gpt-3) to use; one of "text-davinci-002" (default), "text-curie-001", "text-babbage-001" or "text-ada-001"
+#' @param param_output_type character determining the output provided: "complete" (default), "text" or "meta"
+#' @param param_suffix character (default: NULL) (from the official API documentation: _The suffix that comes after a completion of inserted text_)
+#' @param param_max_tokens numeric (default: 100) indicating the maximum number of tokens that the completion request should return (from the official API documentation: _The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length. Most models have a context length of 2048 tokens (except for the newest models, which support 4096)_)
+#' @param param_temperature numeric (default: 0.9) specifying the sampling strategy of the possible completions (from the official API documentation: _What sampling temperature to use. Higher values means the model will take more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend altering this or top_p but not both._)
+#' @param param_top_p numeric (default: 1) specifying sampling strategy as an alternative to the temperature sampling (from the official API documentation: _An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both._)
+#' @param param_n numeric (default: 1) specifying the number of completions per request (from the official API documentation: _How many completions to generate for each prompt. **Note: Because this parameter generates many completions, it can quickly consume your token quota.** Use carefully and ensure that you have reasonable settings for max_tokens and stop._)
+#' @param param_logprobs numeric (default: NULL) (from the official API documentation: _Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. For example, if logprobs is 5, the API will return a list of the 5 most likely tokens. The API will always return the logprob of the sampled token, so there may be up to logprobs+1 elements in the response. The maximum value for logprobs is 5. If you need more than this, please contact support@openai.com and describe your use case._)
+#' @param param_stop character or character vector (default: NULL) that specifies after which character value when the completion should end (from the official API documentation: _Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence._)
+#' @param param_presence_penalty numeric (default: 0) between -2.00 and +2.00 to determine the penalisation of repetitiveness if a token already exists (from the official API documentation: _Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics._). See also: [https://beta.openai.com/docs/api-reference/parameter-details](https://beta.openai.com/docs/api-reference/parameter-details)
+#' @param param_frequency_penalty numeric (default: 0) between -2.00 and +2.00 to determine the penalisation of repetitiveness based on the frequency of a token in the text already (from the official API documentation: _Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim._). See also: [https://beta.openai.com/docs/api-reference/parameter-details](https://beta.openai.com/docs/api-reference/parameter-details)
+#' @param param_best_of numeric (default: 1) that determines the space of possibilities from which to select the completion with the highest probability (from the official API documentation: _Generates `best_of` completions server-side and returns the "best" (the one with the highest log probability per token)_). See details.
+#'
+#' @return A list with two data tables (if `param_output_type` is the default "complete"): [[1]] contains the data table with the columns `n` (= the mo. of `n` responses requested), `prompt` (= the prompt that was sent), `gpt3` (= the completion as returned from the GPT-3 model) and `id` (= the provided `id_var` or its default alternative). [[2]] contains the meta information of the request, including the request id, the parameters of the request and the token usage of the prompt (`tok_usage_prompt`), the completion (`tok_usage_completion`), the total usage (`tok_usage_total`), and the `id` (= the provided `id_var` or its default alternative).
+#'
+#' If `output_type` is "text", only the data table in slot [[1]] is returned.
+#'
+#' If `output_type` is "meta", only the data table in slot [[2]] is returned.
+#' @examples
+#' # First authenticate with your API key via `gpt3_authenticate('pathtokey')`
+#'
+#' # Once authenticated:
+#' # Assuming you have a data.table with 3 different prompts:
+#' dt_prompts = data.table::data.table('prompts' = c('What is the meaning if life?', 'Write a tweet about London:', 'Write a research proposal for using AI to fight fake news:'), 'prompt_id' = c(LETTERS[1:3]))
+#'gpt3_bunch_request(prompt_var = dt_prompts$prompts
+#' , id_var = dt_prompts$prompt_id)
+#'
+#' ## With more controls
+#'gpt3_bunch_request(prompt_var = dt_prompts$prompts
+#' , id_var = dt_prompts$prompt_id
+#' , param_max_tokens = 50
+#' , param_temperature = 0.5
+#' , param_n = 5)
+#'
+#' ## Reproducible example (deterministic approach)
+#'gpt3_bunch_request(prompt_var = dt_prompts$prompts
+#' , id_var = dt_prompts$prompt_id
+#' , param_max_tokens = 50
+#' , param_temperature = 0.0)
+#'
+#' ## Changing the GPT-3 model
+#'gpt3_bunch_request(prompt_var = dt_prompts$prompts
+#' , id_var = dt_prompts$prompt_id
+#' , param_model = 'text-babbage-001'
+#' , param_max_tokens = 50
+#' , param_temperature = 0.4)
+#' @export
+gpt3_bunch_request = function(prompt_var
+ , id_var
+ , param_output_type = 'complete'
, param_model = 'text-davinci-002'
, param_suffix = NULL
- , param_max_tokens = 256
+ , param_max_tokens = 100
, param_temperature = 0.9
, param_top_p = 1
, param_n = 1
- , param_stream = F
, param_logprobs = NULL
- , param_echo = F
, param_stop = NULL
, param_presence_penalty = 0
, param_frequency_penalty = 0
- , param_best_of = 1
- , param_logit_bias = NULL){
+ , param_best_of = 1){
+ data_length = length(prompt_var)
+ if(missing(id_var)){
+ data_id = paste0('prompt_', 1:data_length)
+ } else {
+ data_id = id_var
+ }
- data_ = data
-
- data_length = data_[, .N]
-
- data_[, completion_name := '']
-
+ empty_list = list()
+ meta_list = list()
for(i in 1:data_length){
print(paste0('Request: ', i, '/', data_length))
- row_outcome = gpt3.make_request(prompt = as.character(unname(data_[i, ..prompt_var]))
- , model = param_model
- , output_type = 'detail'
- , suffix = param_suffix
- , max_tokens = param_max_tokens
- , temperature = param_temperature
- , top_p = param_top_p
- , n = param_n
- , stream = param_stream
- , logprobs = param_logprobs
- , echo = param_echo
- , stop = param_stop
- , presence_penalty = param_presence_penalty
- , frequency_penalty = param_frequency_penalty
- , best_of = param_best_of
- , logit_bias = param_logit_bias)
+ row_outcome = gpt3_simple_request(prompt_input = prompt_var[i]
+ , model = param_model
+ , output_type = 'complete'
+ , suffix = param_suffix
+ , max_tokens = param_max_tokens
+ , temperature = param_temperature
+ , top_p = param_top_p
+ , n = param_n
+ , logprobs = param_logprobs
+ , stop = param_stop
+ , presence_penalty = param_presence_penalty
+ , frequency_penalty = param_frequency_penalty
+ , best_of = param_best_of)
+ row_outcome[[1]]$id = data_id[i]
+ row_outcome[[2]]$id = data_id[i]
- data_$completion_name[i] = row_outcome$choices[[1]]$text
-
+ empty_list[[i]] = row_outcome[[1]]
+ meta_list[[i]] = row_outcome[[2]]
}
- data_cols = ncol(data_)
- names(data_)[data_cols] = completion_var_name
+ bunch_core_output = data.table::rbindlist(empty_list)
+ bunch_meta_output = data.table::rbindlist(meta_list)
- return(data_)
+ if(param_output_type == 'complete'){
+ output = list(bunch_core_output
+ , bunch_meta_output)
+ } else if(param_output_type == 'meta'){
+ output = bunch_meta_output
+ } else if(param_output_type == 'text'){
+ output = bunch_core_output
+ }
+
+ return(output)
}
diff --git a/R/make_request.R b/R/make_request.R
index a8fe10a..adbe24c 100644
--- a/R/make_request.R
+++ b/R/make_request.R
@@ -1,7 +1,7 @@
#' Makes a single completion request to the GPT-3 API
#'
#' @description
-#' `gpt3_single_request()` sends a single [completion request](https://beta.openai.com/docs/api-reference/completions) to the Open AI GPT-3 API.
+#' `gpt3_simple_request()` sends a single [completion request](https://beta.openai.com/docs/api-reference/completions) to the Open AI GPT-3 API.
#' @details For a general guide on the completion requests, see [https://beta.openai.com/docs/guides/completion](https://beta.openai.com/docs/guides/completion). This function provides you with an R wrapper to send requests with the full range of request parameters as detailed on [https://beta.openai.com/docs/api-reference/completions](https://beta.openai.com/docs/api-reference/completions) and reproduced below.
#'
#' For the `best_of` parameter: When used with n, best_of controls the number of candidate completions and n specifies how many to return – best_of must be greater than n. Note that this is handled by the wrapper automatically if(best_of <= n){ best_of = n}.
diff --git a/man/gpt3_bunch_request.Rd b/man/gpt3_bunch_request.Rd
new file mode 100644
index 0000000..45c4949
--- /dev/null
+++ b/man/gpt3_bunch_request.Rd
@@ -0,0 +1,106 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/bunch_request.R
+\name{gpt3_bunch_request}
+\alias{gpt3_bunch_request}
+\title{Makes bunch completion requests to the GPT-3 API}
+\usage{
+gpt3_bunch_request(
+ prompt_var,
+ id_var,
+ param_output_type = "complete",
+ param_model = "text-davinci-002",
+ param_suffix = NULL,
+ param_max_tokens = 100,
+ param_temperature = 0.9,
+ param_top_p = 1,
+ param_n = 1,
+ param_logprobs = NULL,
+ param_stop = NULL,
+ param_presence_penalty = 0,
+ param_frequency_penalty = 0,
+ param_best_of = 1
+)
+}
+\arguments{
+\item{prompt_var}{character vector that contains the prompts to the GPT-3 request}
+
+\item{id_var}{(optional) character vector that contains the user-defined ids of the prompts. See details.}
+
+\item{param_output_type}{character determining the output provided: "complete" (default), "text" or "meta"}
+
+\item{param_model}{a character vector that indicates the \href{https://beta.openai.com/docs/models/gpt-3}{model} to use; one of "text-davinci-002" (default), "text-curie-001", "text-babbage-001" or "text-ada-001"}
+
+\item{param_suffix}{character (default: NULL) (from the official API documentation: \emph{The suffix that comes after a completion of inserted text})}
+
+\item{param_max_tokens}{numeric (default: 100) indicating the maximum number of tokens that the completion request should return (from the official API documentation: \emph{The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length. Most models have a context length of 2048 tokens (except for the newest models, which support 4096)})}
+
+\item{param_temperature}{numeric (default: 0.9) specifying the sampling strategy of the possible completions (from the official API documentation: \emph{What sampling temperature to use. Higher values means the model will take more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend altering this or top_p but not both.})}
+
+\item{param_top_p}{numeric (default: 1) specifying sampling strategy as an alternative to the temperature sampling (from the official API documentation: \emph{An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10\% probability mass are considered. We generally recommend altering this or temperature but not both.})}
+
+\item{param_n}{numeric (default: 1) specifying the number of completions per request (from the official API documentation: \emph{How many completions to generate for each prompt. \strong{Note: Because this parameter generates many completions, it can quickly consume your token quota.} Use carefully and ensure that you have reasonable settings for max_tokens and stop.})}
+
+\item{param_logprobs}{numeric (default: NULL) (from the official API documentation: \emph{Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. For example, if logprobs is 5, the API will return a list of the 5 most likely tokens. The API will always return the logprob of the sampled token, so there may be up to logprobs+1 elements in the response. The maximum value for logprobs is 5. If you need more than this, please contact support@openai.com and describe your use case.})}
+
+\item{param_stop}{character or character vector (default: NULL) that specifies after which character value when the completion should end (from the official API documentation: \emph{Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.})}
+
+\item{param_presence_penalty}{numeric (default: 0) between -2.00 and +2.00 to determine the penalisation of repetitiveness if a token already exists (from the official API documentation: \emph{Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.}). See also: \url{https://beta.openai.com/docs/api-reference/parameter-details}}
+
+\item{param_frequency_penalty}{numeric (default: 0) between -2.00 and +2.00 to determine the penalisation of repetitiveness based on the frequency of a token in the text already (from the official API documentation: \emph{Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.}). See also: \url{https://beta.openai.com/docs/api-reference/parameter-details}}
+
+\item{param_best_of}{numeric (default: 1) that determines the space of possibilities from which to select the completion with the highest probability (from the official API documentation: \emph{Generates \code{best_of} completions server-side and returns the "best" (the one with the highest log probability per token)}). See details.}
+}
+\value{
+A list with two data tables (if \code{param_output_type} is the default "complete"): [\link{1}] contains the data table with the columns \code{n} (= the mo. of \code{n} responses requested), \code{prompt} (= the prompt that was sent), \code{gpt3} (= the completion as returned from the GPT-3 model) and \code{id} (= the provided \code{id_var} or its default alternative). [\link{2}] contains the meta information of the request, including the request id, the parameters of the request and the token usage of the prompt (\code{tok_usage_prompt}), the completion (\code{tok_usage_completion}), the total usage (\code{tok_usage_total}), and the \code{id} (= the provided \code{id_var} or its default alternative).
+
+If \code{output_type} is "text", only the data table in slot [\link{1}] is returned.
+
+If \code{output_type} is "meta", only the data table in slot [\link{2}] is returned.
+}
+\description{
+\code{gpt3_bunch_request()} is the package's main function for rquests and takes as input a vector of prompts and processes each prompt as per the defined parameters. It extends the \code{gpt3_simple_request()} function to allow for bunch processing of requests to the Open AI GPT-3 API.
+}
+\details{
+The easiest (and intended) use case for this function is to create a data.frame or data.table with variables that contain the prompts to be requested from GPT-3 and a prompt id (see examples below).
+For a general guide on the completion requests, see \url{https://beta.openai.com/docs/guides/completion}. This function provides you with an R wrapper to send requests with the full range of request parameters as detailed on \url{https://beta.openai.com/docs/api-reference/completions} and reproduced below.
+
+For the \code{best_of} parameter: The \code{gpt3_simple_request()} (which is used here in a vectorised manner) handles the issue that best_of must be greater than n by setting if(best_of <= n){ best_of = n}.
+
+If \code{id_var} is not provided, the function will use \code{prompt_1} ... \code{prompt_n} as id variable.
+
+Parameters not included/supported:
+\itemize{
+\item \code{logit_bias}: \url{https://beta.openai.com/docs/api-reference/completions/create#completions/create-logit_bias}
+\item \code{echo}: \url{https://beta.openai.com/docs/api-reference/completions/create#completions/create-echo}
+\item \code{stream}: \url{https://beta.openai.com/docs/api-reference/completions/create#completions/create-stream}
+}
+}
+\examples{
+# First authenticate with your API key via `gpt3_authenticate('pathtokey')`
+
+# Once authenticated:
+# Assuming you have a data.table with 3 different prompts:
+dt_prompts = data.table::data.table('prompts' = c('What is the meaning if life?', 'Write a tweet about London:', 'Write a research proposal for using AI to fight fake news:'), 'prompt_id' = c(LETTERS[1:3]))
+gpt3_bunch_request(prompt_var = dt_prompts$prompts
+ , id_var = dt_prompts$prompt_id)
+
+## With more controls
+gpt3_bunch_request(prompt_var = dt_prompts$prompts
+ , id_var = dt_prompts$prompt_id
+ , param_max_tokens = 50
+ , param_temperature = 0.5
+ , param_n = 5)
+
+## Reproducible example (deterministic approach)
+gpt3_bunch_request(prompt_var = dt_prompts$prompts
+ , id_var = dt_prompts$prompt_id
+ , param_max_tokens = 50
+ , param_temperature = 0.0)
+
+## Changing the GPT-3 model
+gpt3_bunch_request(prompt_var = dt_prompts$prompts
+ , id_var = dt_prompts$prompt_id
+ , param_model = 'text-babbage-001'
+ , param_max_tokens = 50
+ , param_temperature = 0.4)
+}
diff --git a/man/gpt3_simple_request.Rd b/man/gpt3_simple_request.Rd
index 71ed790..f302021 100644
--- a/man/gpt3_simple_request.Rd
+++ b/man/gpt3_simple_request.Rd
@@ -55,7 +55,7 @@
If \code{output_type} is "meta", only the data table in slot [\link{2}] is returned.
}
\description{
-\code{gpt3_single_request()} sends a single \href{https://beta.openai.com/docs/api-reference/completions}{completion request} to the Open AI GPT-3 API.
+\code{gpt3_simple_request()} sends a single \href{https://beta.openai.com/docs/api-reference/completions}{completion request} to the Open AI GPT-3 API.
}
\details{
For a general guide on the completion requests, see \url{https://beta.openai.com/docs/guides/completion}. This function provides you with an R wrapper to send requests with the full range of request parameters as detailed on \url{https://beta.openai.com/docs/api-reference/completions} and reproduced below.