added new second gen embeddingsas default
diff --git a/CITATION.cff b/CITATION.cff
index 1964c47..9a7a921 100644
--- a/CITATION.cff
+++ b/CITATION.cff
@@ -5,7 +5,7 @@
given-names: "Bennett"
orcid: "https://orcid.org/0000-0003-1658-9086"
title: "rgpt3: Making requests from R to the GPT-3 API"
-version: 0.3
-date-released: 2022-11-16
+version: 0.3.1
+date-released: 2022-12-23
url: "https://github.com/ben-aaron188/rgpt3"
doi: "10.5281/zenodo.7327667"
diff --git a/DESCRIPTION b/DESCRIPTION
index 1ce09c2..d613c67 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,6 +1,6 @@
Package: rgpt3
Title: Making requests from R to the GPT-3 API
-Version: 0.3
+Version: 0.3.1
Authors@R:
person("Bennett", "Kleinberg", email = "bennett.kleinberg@tilburguniversity.edu", role = c("aut", "cre"))
Description: With this package you can interact with the powerful GPT-3 models in two ways: making requests for completions (e.g., ask GPT-3 to write a novel, classify text, answer questions, etc.) and retrieving text embeddings representations (i.e., obtain a low-dimensional vector representation that allows for downstream analyses). You need to authenticate with your own Open AI API key and all requests you make count towards you token quota. For completion requests and embeddings requests, two functions each allow you to send either sinlge requests (`gpt3_single_request()` and `gpt3_single_embedding()`) or send bunch requests where the vectorised structure is used (`gpt3_requests()` and `gpt3_embeddings()`).
diff --git a/NAMESPACE b/NAMESPACE
index 6731899..cc8834a 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -6,5 +6,6 @@
export(gpt3_single_completion)
export(gpt3_single_embedding)
export(gpt3_test_completion)
+export(price_base_davinci)
export(to_numeric)
export(url.completions)
diff --git a/R/gpt3_embeddings.R b/R/gpt3_embeddings.R
index 2e7c167..2a7b7c7 100644
--- a/R/gpt3_embeddings.R
+++ b/R/gpt3_embeddings.R
@@ -2,9 +2,10 @@
#'
#' @description
#' `gpt3_embeddings()` extends the single embeddings function `gpt3_single_embedding()` to allow for the processing of a whole vector
-#' @details The returned data.table contains the column `id` which indicates the text id (or its generic alternative if not specified) and the columns `dim_1` ... `dim_{max}`, where `max` is the length of the text embeddings vector that the four different models return. For the default "Ada" model, these are 1024 dimensions (i.e., `dim_1`... `dim_1024`).
+#' @details The returned data.table contains the column `id` which indicates the text id (or its generic alternative if not specified) and the columns `dim_1` ... `dim_{max}`, where `max` is the length of the text embeddings vector that the different models (see below) return. For the default "Ada 2nd gen." model, these are 1536 dimensions (i.e., `dim_1`... `dim_1536`).
#'
-#' The function supports the text similarity embeddings for the four GPT-3 models as specified in the parameter list. The main difference between the four models is the sophistication of the embedding representation as indicated by the vector embedding size.
+#' The function supports the text similarity embeddings for the [five GPT-3 embeddings models](https://beta.openai.com/docs/guides/embeddings/embedding-models) as specified in the parameter list. It is strongly advised to use the second generation model "text-embedding-ada-002". The main difference between the five models is the size of the embedding representation as indicated by the vector embedding size and the pricing. The newest model (default) is the fastest, cheapest and highest quality one.
+#' - Ada 2nd generation `text-embedding-ada-002` (1536 dimensions)
#' - Ada (1024 dimensions)
#' - Babbage (2048 dimensions)
#' - Curie (4096 dimensions)
@@ -15,7 +16,7 @@
#' These vectors can be used for downstream tasks such as (vector) similarity calculations.
#' @param input_var character vector that contains the texts for which you want to obtain text embeddings from the GPT-3 model
#' #' @param id_var (optional) character vector that contains the user-defined ids of the prompts. See details.
-#' @param param_model a character vector that indicates the [similarity embedding model](https://beta.openai.com/docs/guides/embeddings/similarity-embeddings); one of "text-similarity-ada-001" (default), "text-similarity-curie-001", "text-similarity-babbage-001", "text-similarity-davinci-001"
+#' @param param_model a character vector that indicates the [embedding model](https://beta.openai.com/docs/guides/embeddings/embedding-models); one of "text-embedding-ada-002" (default), "text-similarity-ada-001", "text-similarity-curie-001", "text-similarity-babbage-001", "text-similarity-davinci-001"
#' @return A data.table with the embeddings as separate columns; one row represents one input text. See details.
#' @examples
#' # First authenticate with your API key via `gpt3_authenticate('pathtokey')`
@@ -35,7 +36,7 @@
#' @export
gpt3_embeddings = function(input_var
, id_var
- , param_model = 'text-similarity-ada-001'){
+ , param_model = 'text-embedding-ada-002'){
data_length = length(input_var)
if(missing(id_var)){
diff --git a/R/gpt3_single_embedding.R b/R/gpt3_single_embedding.R
index dc6c2ea..ac0cee5 100644
--- a/R/gpt3_single_embedding.R
+++ b/R/gpt3_single_embedding.R
@@ -3,6 +3,7 @@
#' @description
#' `gpt3_single_embedding()` sends a single [embedding request](https://beta.openai.com/docs/guides/embeddings) to the Open AI GPT-3 API.
#' @details The function supports the text similarity embeddings for the four GPT-3 models as specified in the parameter list. The main difference between the four models is the sophistication of the embedding representation as indicated by the vector embedding size.
+#' - Second-generation embeddings model `text-embedding-ada-002` (1536 dimensions)
#' - Ada (1024 dimensions)
#' - Babbage (2048 dimensions)
#' - Curie (4096 dimensions)
@@ -12,7 +13,7 @@
#'
#' These vectors can be used for downstream tasks such as (vector) similarity calculations.
#' @param input character that contains the text for which you want to obtain text embeddings from the GPT-3 model
-#' @param model a character vector that indicates the [similarity embedding model](https://beta.openai.com/docs/guides/embeddings/similarity-embeddings); one of "text-similarity-ada-001" (default), "text-similarity-curie-001", "text-similarity-babbage-001", "text-similarity-davinci-001"
+#' @param model a character vector that indicates the [similarity embedding model](https://beta.openai.com/docs/guides/embeddings/similarity-embeddings); one of "text-embedding-ada-002" (default), "text-similarity-ada-001", "text-similarity-curie-001", "text-similarity-babbage-001", "text-similarity-davinci-001". Note: it is strongly recommend to use the faster, cheaper and higher quality second generation embeddings model "text-embedding-ada-002".
#' @return A numeric vector (= the embedding vector)
#' @examples
#' # First authenticate with your API key via `gpt3_authenticate('pathtokey')`
@@ -28,7 +29,7 @@
#' , model = 'text-similarity-curie-001')
#' @export
gpt3_single_embedding = function(input
- , model = 'text-similarity-ada-001'
+ , model = 'text-embedding-ada-002'
){
parameter_list = list(model = model
diff --git a/R/request_prices.R b/R/request_prices.R
new file mode 100644
index 0000000..9e1ec54
--- /dev/null
+++ b/R/request_prices.R
@@ -0,0 +1,9 @@
+#' Contains the pricing for completion requests (see: [https://openai.com/api/pricing/#faq-completions-pricing](https://openai.com/api/pricing/#faq-completions-pricing))
+#'
+#' @description
+#' These are the prices listed for 1k tokens of requests for the various models. These are needed for the `rgpt3_cost_estimate(...)` function.
+#' @export
+price_base_davinci = 0.02
+price_base_curie = 0.002
+price_base_babbage = 0.0005
+price_base_ada = 0.0004
diff --git a/README.md b/README.md
index 9b35226..bb5f3af 100644
--- a/README.md
+++ b/README.md
@@ -176,6 +176,7 @@
- [update] 29 Nov 2022: the just released [davinci-003 model](https://beta.openai.com/docs/models/gpt-3) for text completions is now the default model for the text completion functions.
- [minor fix] 3 Dec 2022: included handling for encoding issues so that `rbindlist` uses `fill=T` (in `gpt3_completions(...)`)
+- [update] 23 Dec 2022: the embeddings functions now default to the second generation embeddings "text-embedding-ada-002".
## Citation
@@ -184,10 +185,10 @@
@software{Kleinberg_rgpt3_Making_requests_2022,
author = {Kleinberg, Bennett},
doi = {10.5281/zenodo.7327667},
- month = {11},
+ month = {12},
title = {{rgpt3: Making requests from R to the GPT-3 API}},
url = {https://github.com/ben-aaron188/rgpt3},
- version = {0.3},
+ version = {0.3.1},
year = {2022}
}
```
diff --git a/man/gpt3_embeddings.Rd b/man/gpt3_embeddings.Rd
index 6c2ac9a..db44b30 100644
--- a/man/gpt3_embeddings.Rd
+++ b/man/gpt3_embeddings.Rd
@@ -4,13 +4,13 @@
\alias{gpt3_embeddings}
\title{Retrieves text embeddings for character input from a vector from the GPT-3 API}
\usage{
-gpt3_embeddings(input_var, id_var, param_model = "text-similarity-ada-001")
+gpt3_embeddings(input_var, id_var, param_model = "text-embedding-ada-002")
}
\arguments{
\item{input_var}{character vector that contains the texts for which you want to obtain text embeddings from the GPT-3 model
#' @param id_var (optional) character vector that contains the user-defined ids of the prompts. See details.}
-\item{param_model}{a character vector that indicates the \href{https://beta.openai.com/docs/guides/embeddings/similarity-embeddings}{similarity embedding model}; one of "text-similarity-ada-001" (default), "text-similarity-curie-001", "text-similarity-babbage-001", "text-similarity-davinci-001"}
+\item{param_model}{a character vector that indicates the \href{https://beta.openai.com/docs/guides/embeddings/embedding-models}{embedding model}; one of "text-embedding-ada-002" (default), "text-similarity-ada-001", "text-similarity-curie-001", "text-similarity-babbage-001", "text-similarity-davinci-001"}
}
\value{
A data.table with the embeddings as separate columns; one row represents one input text. See details.
@@ -19,10 +19,11 @@
\code{gpt3_embeddings()} extends the single embeddings function \code{gpt3_single_embedding()} to allow for the processing of a whole vector
}
\details{
-The returned data.table contains the column \code{id} which indicates the text id (or its generic alternative if not specified) and the columns \code{dim_1} ... \verb{dim_\{max\}}, where \code{max} is the length of the text embeddings vector that the four different models return. For the default "Ada" model, these are 1024 dimensions (i.e., \code{dim_1}... \code{dim_1024}).
+The returned data.table contains the column \code{id} which indicates the text id (or its generic alternative if not specified) and the columns \code{dim_1} ... \verb{dim_\{max\}}, where \code{max} is the length of the text embeddings vector that the different models (see below) return. For the default "Ada 2nd gen." model, these are 1536 dimensions (i.e., \code{dim_1}... \code{dim_1536}).
-The function supports the text similarity embeddings for the four GPT-3 models as specified in the parameter list. The main difference between the four models is the sophistication of the embedding representation as indicated by the vector embedding size.
+The function supports the text similarity embeddings for the \href{https://beta.openai.com/docs/guides/embeddings/embedding-models}{five GPT-3 embeddings models} as specified in the parameter list. It is strongly advised to use the second generation model "text-embedding-ada-002". The main difference between the five models is the size of the embedding representation as indicated by the vector embedding size and the pricing. The newest model (default) is the fastest, cheapest and highest quality one.
\itemize{
+\item Ada 2nd generation \code{text-embedding-ada-002} (1536 dimensions)
\item Ada (1024 dimensions)
\item Babbage (2048 dimensions)
\item Curie (4096 dimensions)
diff --git a/man/gpt3_single_embedding.Rd b/man/gpt3_single_embedding.Rd
index 4b38629..106f3c9 100644
--- a/man/gpt3_single_embedding.Rd
+++ b/man/gpt3_single_embedding.Rd
@@ -4,12 +4,12 @@
\alias{gpt3_single_embedding}
\title{Obtains text embeddings for a single character (string) from the GPT-3 API}
\usage{
-gpt3_single_embedding(input, model = "text-similarity-ada-001")
+gpt3_single_embedding(input, model = "text-embedding-ada-002")
}
\arguments{
\item{input}{character that contains the text for which you want to obtain text embeddings from the GPT-3 model}
-\item{model}{a character vector that indicates the \href{https://beta.openai.com/docs/guides/embeddings/similarity-embeddings}{similarity embedding model}; one of "text-similarity-ada-001" (default), "text-similarity-curie-001", "text-similarity-babbage-001", "text-similarity-davinci-001"}
+\item{model}{a character vector that indicates the \href{https://beta.openai.com/docs/guides/embeddings/similarity-embeddings}{similarity embedding model}; one of "text-embedding-ada-002" (default), "text-similarity-ada-001", "text-similarity-curie-001", "text-similarity-babbage-001", "text-similarity-davinci-001". Note: it is strongly recommend to use the faster, cheaper and higher quality second generation embeddings model "text-embedding-ada-002".}
}
\value{
A numeric vector (= the embedding vector)
@@ -20,6 +20,7 @@
\details{
The function supports the text similarity embeddings for the four GPT-3 models as specified in the parameter list. The main difference between the four models is the sophistication of the embedding representation as indicated by the vector embedding size.
\itemize{
+\item Second-generation embeddings model \code{text-embedding-ada-002} (1536 dimensions)
\item Ada (1024 dimensions)
\item Babbage (2048 dimensions)
\item Curie (4096 dimensions)
diff --git a/man/price_base_davinci.Rd b/man/price_base_davinci.Rd
new file mode 100644
index 0000000..b78e21e
--- /dev/null
+++ b/man/price_base_davinci.Rd
@@ -0,0 +1,16 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/request_prices.R
+\docType{data}
+\name{price_base_davinci}
+\alias{price_base_davinci}
+\title{Contains the pricing for completion requests (see: \url{https://openai.com/api/pricing/#faq-completions-pricing})}
+\format{
+An object of class \code{numeric} of length 1.
+}
+\usage{
+price_base_davinci
+}
+\description{
+These are the prices listed for 1k tokens of requests for the various models. These are needed for the \code{rgpt3_cost_estimate(...)} function.
+}
+\keyword{datasets}
diff --git a/rgpt3_0.3.1.pdf b/rgpt3_0.3.1.pdf
new file mode 100644
index 0000000..0976ede
--- /dev/null
+++ b/rgpt3_0.3.1.pdf
Binary files differ
diff --git a/rgpt3_0.3.pdf b/rgpt3_0.3.pdf
deleted file mode 100644
index 8006a72..0000000
--- a/rgpt3_0.3.pdf
+++ /dev/null
Binary files differ