added new second gen embeddingsas default

commit: 68434e442f6b7389e4374f9b3962344e421d1213 [log] [tgz]
author: ben-aaron188 <ben-aaron188@users.noreply.github.com> Sat Dec 24 20:04:21 2022 +0100
committer: ben-aaron188 <ben-aaron188@users.noreply.github.com> Sat Dec 24 20:04:21 2022 +0100
tree: 4fa08f8aa9216a10cbbc5f37e37fb589e527153c
parent: d0b8e53b503e6081e094e3548adbe8c15a91a882 [diff]
diff --git a/R/gpt3_embeddings.R b/R/gpt3_embeddings.R
index 2e7c167..2a7b7c7 100644
--- a/R/gpt3_embeddings.R
+++ b/R/gpt3_embeddings.R

@@ -2,9 +2,10 @@
 #'
 #' @description
 #' `gpt3_embeddings()` extends the single embeddings function `gpt3_single_embedding()` to allow for the processing of a whole vector
-#' @details The returned data.table contains the column `id` which indicates the text id (or its generic alternative if not specified) and the columns `dim_1` ... `dim_{max}`, where `max` is the length of the text embeddings vector that the four different models return. For the default "Ada" model, these are 1024 dimensions (i.e., `dim_1`... `dim_1024`).
+#' @details The returned data.table contains the column `id` which indicates the text id (or its generic alternative if not specified) and the columns `dim_1` ... `dim_{max}`, where `max` is the length of the text embeddings vector that the different models (see below) return. For the default "Ada 2nd gen." model, these are 1536 dimensions (i.e., `dim_1`... `dim_1536`).
 #'
-#' The function supports the text similarity embeddings for the four GPT-3 models as specified in the parameter list. The main difference between the four models is the sophistication of the embedding representation as indicated by the vector embedding size.
+#' The function supports the text similarity embeddings for the [five GPT-3 embeddings models](https://beta.openai.com/docs/guides/embeddings/embedding-models) as specified in the parameter list. It is strongly advised to use the second generation model "text-embedding-ada-002". The main difference between the five models is the size of the embedding representation as indicated by the vector embedding size and the pricing. The newest model (default) is the fastest, cheapest and highest quality one.
+#'   - Ada 2nd generation `text-embedding-ada-002` (1536 dimensions)
 #'   - Ada (1024 dimensions)
 #'   - Babbage (2048 dimensions)
 #'   - Curie (4096 dimensions)
@@ -15,7 +16,7 @@
 #' These vectors can be used for downstream tasks such as (vector) similarity calculations.
 #' @param input_var character vector that contains the texts for which you want to obtain text embeddings from the GPT-3 model
 #' #' @param id_var (optional) character vector that contains the user-defined ids of the prompts. See details.
-#' @param param_model a character vector that indicates the [similarity embedding model](https://beta.openai.com/docs/guides/embeddings/similarity-embeddings); one of "text-similarity-ada-001" (default), "text-similarity-curie-001", "text-similarity-babbage-001", "text-similarity-davinci-001"
+#' @param param_model a character vector that indicates the [embedding model](https://beta.openai.com/docs/guides/embeddings/embedding-models); one of "text-embedding-ada-002" (default), "text-similarity-ada-001", "text-similarity-curie-001", "text-similarity-babbage-001", "text-similarity-davinci-001"
 #' @return A data.table with the embeddings as separate columns; one row represents one input text. See details.
 #' @examples
 #' # First authenticate with your API key via `gpt3_authenticate('pathtokey')`
@@ -35,7 +36,7 @@
 #' @export
 gpt3_embeddings = function(input_var
                                 , id_var
-                                , param_model = 'text-similarity-ada-001'){
+                                , param_model = 'text-embedding-ada-002'){
 
   data_length = length(input_var)
   if(missing(id_var)){

diff --git a/R/gpt3_single_embedding.R b/R/gpt3_single_embedding.R
index dc6c2ea..ac0cee5 100644
--- a/R/gpt3_single_embedding.R
+++ b/R/gpt3_single_embedding.R

@@ -3,6 +3,7 @@
 #' @description
 #' `gpt3_single_embedding()` sends a single [embedding request](https://beta.openai.com/docs/guides/embeddings) to the Open AI GPT-3 API.
 #' @details The function supports the text similarity embeddings for the four GPT-3 models as specified in the parameter list. The main difference between the four models is the sophistication of the embedding representation as indicated by the vector embedding size.
+#'   - Second-generation embeddings model `text-embedding-ada-002` (1536 dimensions)
 #'   - Ada (1024 dimensions)
 #'   - Babbage (2048 dimensions)
 #'   - Curie (4096 dimensions)
@@ -12,7 +13,7 @@
 #'
 #' These vectors can be used for downstream tasks such as (vector) similarity calculations.
 #' @param input character that contains the text for which you want to obtain text embeddings from the GPT-3 model
-#' @param model a character vector that indicates the [similarity embedding model](https://beta.openai.com/docs/guides/embeddings/similarity-embeddings); one of "text-similarity-ada-001" (default), "text-similarity-curie-001", "text-similarity-babbage-001", "text-similarity-davinci-001"
+#' @param model a character vector that indicates the [similarity embedding model](https://beta.openai.com/docs/guides/embeddings/similarity-embeddings); one of "text-embedding-ada-002" (default), "text-similarity-ada-001", "text-similarity-curie-001", "text-similarity-babbage-001", "text-similarity-davinci-001". Note: it is strongly recommend to use the faster, cheaper and higher quality second generation embeddings model "text-embedding-ada-002".
 #' @return A numeric vector (= the embedding vector)
 #' @examples
 #' # First authenticate with your API key via `gpt3_authenticate('pathtokey')`
@@ -28,7 +29,7 @@
 #'     , model = 'text-similarity-curie-001')
 #' @export
 gpt3_single_embedding = function(input
-                               , model = 'text-similarity-ada-001'
+                               , model = 'text-embedding-ada-002'
                                ){
 
   parameter_list = list(model = model

diff --git a/R/request_prices.R b/R/request_prices.R
new file mode 100644
index 0000000..9e1ec54
--- /dev/null
+++ b/R/request_prices.R

@@ -0,0 +1,9 @@
+#' Contains the pricing for completion requests (see: [https://openai.com/api/pricing/#faq-completions-pricing](https://openai.com/api/pricing/#faq-completions-pricing))
+#'
+#' @description
+#' These are the prices listed for 1k tokens of requests for the various models. These are needed for the `rgpt3_cost_estimate(...)` function.
+#' @export
+price_base_davinci = 0.02
+price_base_curie = 0.002
+price_base_babbage = 0.0005
+price_base_ada = 0.0004
commit	68434e442f6b7389e4374f9b3962344e421d1213	[log] [tgz]
author	ben-aaron188 <ben-aaron188@users.noreply.github.com>	Sat Dec 24 20:04:21 2022 +0100
committer	ben-aaron188 <ben-aaron188@users.noreply.github.com>	Sat Dec 24 20:04:21 2022 +0100
tree	4fa08f8aa9216a10cbbc5f37e37fb589e527153c
parent	d0b8e53b503e6081e094e3548adbe8c15a91a882 [diff]