ben-aaron188 | 287b30b | 2022-09-11 16:46:37 +0200 | [diff] [blame^] | 1 | % Generated by roxygen2: do not edit by hand |
| 2 | % Please edit documentation in R/bunch_embedding.R |
| 3 | \name{gpt3_bunch_embedding} |
| 4 | \alias{gpt3_bunch_embedding} |
| 5 | \title{Retrieves text embeddings for character input from a vector from the GPT-3 API} |
| 6 | \usage{ |
| 7 | gpt3_bunch_embedding( |
| 8 | input_var, |
| 9 | id_var, |
| 10 | param_model = "text-similarity-ada-001" |
| 11 | ) |
| 12 | } |
| 13 | \arguments{ |
| 14 | \item{input_var}{character vector that contains the texts for which you want to obtain text embeddings from the GPT-3 model |
| 15 | #' @param id_var (optional) character vector that contains the user-defined ids of the prompts. See details.} |
| 16 | |
| 17 | \item{param_model}{a character vector that indicates the \href{https://beta.openai.com/docs/guides/embeddings/similarity-embeddings}{similarity embedding model}; one of "text-similarity-ada-001" (default), "text-similarity-curie-001", "text-similarity-babbage-001", "text-similarity-davinci-001"} |
| 18 | } |
| 19 | \value{ |
| 20 | A data.table with the embeddings as separate columns; one row represents one input text. See details. |
| 21 | } |
| 22 | \description{ |
| 23 | \code{gpt3_bunch_embedding()} extends the single embeddings function \code{gpt3_make_embedding()} to allow for the processing of a whole vector |
| 24 | } |
| 25 | \details{ |
| 26 | The returned data.table contains the column \code{id} which indicates the text id (or its generic alternative if not specified) and the columns \code{dim_1} ... \verb{dim_\{max\}}, where \code{max} is the length of the text embeddings vector that the four different models return. For the default "Ada" model, these are 1024 dimensions (i.e., \code{dim_1}... \code{dim_1024}). |
| 27 | |
| 28 | The function supports the text similarity embeddings for the four GPT-3 models as specified in the parameter list. The main difference between the four models is the sophistication of the embedding representation as indicated by the vector embedding size. |
| 29 | \itemize{ |
| 30 | \item Ada (1024 dimensions) |
| 31 | \item Babbage (2048 dimensions) |
| 32 | \item Curie (4096 dimensions) |
| 33 | \item Davinci (12288 dimensions) |
| 34 | } |
| 35 | |
| 36 | Note that the dimension size (= vector length), speed and \href{https://openai.com/api/pricing/}{associated costs} differ considerably. |
| 37 | |
| 38 | These vectors can be used for downstream tasks such as (vector) similarity calculations. |
| 39 | } |
| 40 | \examples{ |
| 41 | # First authenticate with your API key via `gpt3_authenticate('pathtokey')` |
| 42 | |
| 43 | # Use example data: |
| 44 | ## The data below were generated with the `gpt3_make_request()` function as follows: |
| 45 | ##### DO NOT RUN ##### |
| 46 | # travel_blog_data = gpt3_make_request(prompt_input = "Write a travel blog about a dog's journey through the UK:", temperature = 0.8, n = 10, max_tokens = 200)[[1]] |
| 47 | ##### END DO NOT RUN ##### |
| 48 | |
| 49 | # You can load these data with: |
| 50 | data("travel_blog_data") # the dataset contains 10 completions for the above request |
| 51 | |
| 52 | |
| 53 | ## Obtain text embeddings for the completion texts: |
| 54 | gpt3_bunch_embedding(input = sample_string |
| 55 | , model = 'text-similarity-curie-001') |
| 56 | } |