blob: 1e475c06e4a6c83ccf8eeb98bc63e69b75e6fe75 [file] [log] [blame]
ben-aaron188287b30b2022-09-11 16:46:37 +02001#' Obtains text embeddings for a single character (string) from the GPT-3 API
2#'
3#' @description
ben-aaron188ca1c3982022-09-22 15:15:53 +02004#' `gpt3_single_embedding()` sends a single [embedding request](https://beta.openai.com/docs/guides/embeddings) to the Open AI GPT-3 API.
ben-aaron188287b30b2022-09-11 16:46:37 +02005#' @details The function supports the text similarity embeddings for the four GPT-3 models as specified in the parameter list. The main difference between the four models is the sophistication of the embedding representation as indicated by the vector embedding size.
ben-aaron18868434e42022-12-24 20:04:21 +01006#' - Second-generation embeddings model `text-embedding-ada-002` (1536 dimensions)
ben-aaron188287b30b2022-09-11 16:46:37 +02007#' - Ada (1024 dimensions)
8#' - Babbage (2048 dimensions)
9#' - Curie (4096 dimensions)
10#' - Davinci (12288 dimensions)
11#'
12#' Note that the dimension size (= vector length), speed and [associated costs](https://openai.com/api/pricing/) differ considerably.
13#'
14#' These vectors can be used for downstream tasks such as (vector) similarity calculations.
15#' @param input character that contains the text for which you want to obtain text embeddings from the GPT-3 model
ben-aaron18868434e42022-12-24 20:04:21 +010016#' @param model a character vector that indicates the [similarity embedding model](https://beta.openai.com/docs/guides/embeddings/similarity-embeddings); one of "text-embedding-ada-002" (default), "text-similarity-ada-001", "text-similarity-curie-001", "text-similarity-babbage-001", "text-similarity-davinci-001". Note: it is strongly recommend to use the faster, cheaper and higher quality second generation embeddings model "text-embedding-ada-002".
ben-aaron188287b30b2022-09-11 16:46:37 +020017#' @return A numeric vector (= the embedding vector)
18#' @examples
19#' # First authenticate with your API key via `gpt3_authenticate('pathtokey')`
20#'
21#' # Once authenticated:
22#'
23#' ## Simple request with defaults:
24#' sample_string = "London is one of the most liveable cities in the world. The city is always full of energy and people. It's always a great place to explore and have fun."
ben-aaron188ca1c3982022-09-22 15:15:53 +020025#' gpt3_single_embedding(input = sample_string)
ben-aaron188287b30b2022-09-11 16:46:37 +020026#'
27#' ## Change the model:
ben-aaron18860f638b2023-03-15 23:33:56 +010028#' gpt3_single_embedding(input = sample_string
29#' , model = 'text-similarity-curie-001')
ben-aaron188287b30b2022-09-11 16:46:37 +020030#' @export
ben-aaron188ca1c3982022-09-22 15:15:53 +020031gpt3_single_embedding = function(input
ben-aaron18868434e42022-12-24 20:04:21 +010032 , model = 'text-embedding-ada-002'
ben-aaron188287b30b2022-09-11 16:46:37 +020033 ){
ben-aaron1883818e7c2022-09-08 17:49:01 +020034
ben-aaron188287b30b2022-09-11 16:46:37 +020035 parameter_list = list(model = model
36 , input = input)
ben-aaron1883818e7c2022-09-08 17:49:01 +020037
38 request_base = httr::POST(url = url.embeddings
39 , body = parameter_list
40 , httr::add_headers(Authorization = paste("Bearer", api_key))
41 , encode = "json")
42
43
44 output_base = httr::content(request_base)
45
ben-aaron188287b30b2022-09-11 16:46:37 +020046 embedding_raw = to_numeric(unlist(output_base$data[[1]]$embedding))
ben-aaron1883818e7c2022-09-08 17:49:01 +020047
48 return(embedding_raw)
49
50}