Blame - R/gpt3_single_completion.R - ids-kl/rgpt3

2022-09-10 20:30:30 +0200

[diff] [blame]

16

#' @param output_type character determining the output provided: "complete" (default), "text" or "meta"

17

#' @param suffix character (default: NULL) (from the official API documentation: _The suffix that comes after a completion of inserted text_)

18

#' @param max_tokens numeric (default: 100) indicating the maximum number of tokens that the completion request should return (from the official API documentation: _The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length. Most models have a context length of 2048 tokens (except for the newest models, which support 4096)_)

19

#' @param temperature numeric (default: 0.9) specifying the sampling strategy of the possible completions (from the official API documentation: _What sampling temperature to use. Higher values means the model will take more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend altering this or top_p but not both._)

20

#' @param top_p numeric (default: 1) specifying sampling strategy as an alternative to the temperature sampling (from the official API documentation: _An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both._)

21

#' @param n numeric (default: 1) specifying the number of completions per request (from the official API documentation: _How many completions to generate for each prompt. **Note: Because this parameter generates many completions, it can quickly consume your token quota.** Use carefully and ensure that you have reasonable settings for max_tokens and stop._)

ben-aaron188

360f88f

2022-12-01 14:30:17 +0100

[diff] [blame^]

22

#' @param logprobs numeric (default: NULL) (from the official API documentation: _Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. For example, if logprobs is 5, the API will return a list of the 5 most likely tokens. The API will always return the logprob of the sampled token, so there may be up to logprobs+1 elements in the response. The maximum value for logprobs is 5. If you need more than this, please go to [https://help.openai.com/en/](https://help.openai.com/en/) and describe your use case._)

ben-aaron188

2022-09-10 20:30:30 +0200

[diff] [blame]

23

#' @param stop character or character vector (default: NULL) that specifies after which character value when the completion should end (from the official API documentation: _Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence._)

24

#' @param presence_penalty numeric (default: 0) between -2.00 and +2.00 to determine the penalisation of repetitiveness if a token already exists (from the official API documentation: _Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics._). See also: [https://beta.openai.com/docs/api-reference/parameter-details](https://beta.openai.com/docs/api-reference/parameter-details)

25

#' @param frequency_penalty numeric (default: 0) between -2.00 and +2.00 to determine the penalisation of repetitiveness based on the frequency of a token in the text already (from the official API documentation: _Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim._). See also: [https://beta.openai.com/docs/api-reference/parameter-details](https://beta.openai.com/docs/api-reference/parameter-details)

26

#' @param best_of numeric (default: 1) that determines the space of possibilities from which to select the completion with the highest probability (from the official API documentation: _Generates `best_of` completions server-side and returns the "best" (the one with the highest log probability per token)_). See details.

27

#'

28

#' @return A list with two data tables (if `output_type` is the default "complete"): [[1]] contains the data table with the columns `n` (= the mo. of `n` responses requested), `prompt` (= the prompt that was sent), and `gpt3` (= the completion as returned from the GPT-3 model). [[2]] contains the meta information of the request, including the request id, the parameters of the request and the token usage of the prompt (`tok_usage_prompt`), the completion (`tok_usage_completion`) and the total usage (`tok_usage_total`).

29

#'

30

#' If `output_type` is "text", only the data table in slot [[1]] is returned.

31

#'

32

#' If `output_type` is "meta", only the data table in slot [[2]] is returned.

ben-aaron188

2022-09-08 17:49:01 +0200

[diff] [blame]

33

#' @examples

ben-aaron188

2022-09-10 20:30:30 +0200

[diff] [blame]

34

#' # First authenticate with your API key via `gpt3_authenticate('pathtokey')`

35

#'

36

#' # Once authenticated:

37

#'

38

#' ## Simple request with defaults:

ben-aaron188

2022-10-24 14:28:51 +0200

[diff] [blame]

39

#' gpt3_single_completion(prompt_input = 'How old are you?')

ben-aaron188

2022-09-10 20:30:30 +0200

[diff] [blame]

40

#'

41

#' ## Instruct GPT-3 to write ten research ideas of max. 150 tokens with some controls:

ben-aaron188

2022-10-24 14:28:51 +0200

[diff] [blame]

42

#'gpt3_single_completion(prompt_input = 'Write a research idea about using text data to understand human behaviour:'

ben-aaron188

5bcd911

2022-09-10 21:33:50 +0200

[diff] [blame]

43

#' , temperature = 0.8

44

#' , n = 10

45

#' , max_tokens = 150)

ben-aaron188

2022-09-10 20:30:30 +0200

[diff] [blame]

46

#'

47

#' ## For fully reproducible results, we need `temperature = 0`, e.g.:

ben-aaron188

2022-10-24 14:28:51 +0200

[diff] [blame]

48

#' gpt3_single_completion(prompt_input = 'Finish this sentence:/n There is no easier way to learn R than'

ben-aaron188

5bcd911

2022-09-10 21:33:50 +0200

[diff] [blame]

49

#' , temperature = 0.0

50

#' , max_tokens = 50)

ben-aaron188

2022-09-10 20:30:30 +0200

[diff] [blame]

51

#'

52

#' ## The same example with a different GPT-3 model:

ben-aaron188

2022-10-24 14:28:51 +0200

[diff] [blame]

53

#' gpt3_single_completion(prompt_input = 'Finish this sentence:/n There is no easier way to learn R than'

ben-aaron188

5bcd911

2022-09-10 21:33:50 +0200

[diff] [blame]

54

#' , model = 'text-babbage-001'

55

#' , temperature = 0.0

56

#' , max_tokens = 50)

ben-aaron188

2022-09-08 17:49:01 +0200

[diff] [blame]

57

#' @export

ben-aaron188

2022-10-24 14:28:51 +0200

[diff] [blame]

58

gpt3_single_completion = function(prompt_input

ben-aaron188

4000fe9

2022-11-29 12:52:14 +0100

[diff] [blame]

59

, model = 'text-davinci-003'

ben-aaron188

2022-09-10 20:30:30 +0200

[diff] [blame]

60

, output_type = 'complete'

, suffix = NULL

, max_tokens = 100

, temperature = 0.9

, top_p = 1

, n = 1

, logprobs = NULL

, stop = NULL

, presence_penalty = 0

69

, frequency_penalty = 0

70

, best_of = 1){

ben-aaron188

2022-09-08 17:49:01 +0200

[diff] [blame]

71

ben-aaron188

2022-09-10 20:30:30 +0200

[diff] [blame]

72

#check for request issues with `n` and `best_of`

73

if(best_of < n){

74

best_of = n

75

message('To avoid an `invalid_request_error`, `best_of` was set to equal `n`')

76

}

77

78

if(temperature == 0 & n > 1){

79

n = 1

80

message('You are running the deterministic model, so `n` was set to 1 to avoid unnecessary token quota usage.')

81

}

82

83

parameter_list = list(prompt = prompt_input

84

, model = model

85

, suffix = suffix

86

, max_tokens = max_tokens

87

, temperature = temperature

88

, top_p = top_p

89

, n = n

90

, logprobs = logprobs

91

, stop = stop

92

, presence_penalty = presence_penalty

93

, frequency_penalty = frequency_penalty

94

, best_of = best_of)

ben-aaron188

2022-09-08 17:49:01 +0200

[diff] [blame]

95

96

request_base = httr::POST(url = url.completions

ben-aaron188

2022-09-10 20:30:30 +0200

[diff] [blame]

97

, body = parameter_list

98

, httr::add_headers(Authorization = paste("Bearer", api_key))

99

, encode = "json")

100

101

request_content = httr::content(request_base)

102

103

if(n == 1){

104

core_output = data.table::data.table('n' = 1

105

, 'prompt' = prompt_input

106

, 'gpt3' = request_content$choices[[1]]$text)

107

} else if(n > 1){

108

109

core_output = data.table::data.table('n' = 1:n

110

, 'prompt' = rep(prompt_input, n)

111

, 'gpt3' = rep("", n))

112

113

for(i in 1:n){

114

core_output$gpt3[i] = request_content$choices[[i]]$text

115

}

116

117

}

ben-aaron188

2022-09-08 17:49:01 +0200

[diff] [blame]

118

119

ben-aaron188

2022-09-10 20:30:30 +0200

[diff] [blame]

120

meta_output = data.table::data.table('request_id' = request_content$id

121

, 'object' = request_content$object

122

, 'model' = request_content$model

123

, 'param_prompt' = prompt_input

124

, 'param_model' = model

125

, 'param_suffix' = suffix

126

, 'param_max_tokens' = max_tokens

127

, 'param_temperature' = temperature

128

, 'param_top_p' = top_p

129

, 'param_n' = n

130

, 'param_logprobs' = logprobs

131

, 'param_stop' = stop

132

, 'param_presence_penalty' = presence_penalty

133

, 'param_frequency_penalty' = frequency_penalty

134

, 'param_best_of' = best_of

135

, 'tok_usage_prompt' = request_content$usage$prompt_tokens

136

, 'tok_usage_completion' = request_content$usage$completion_tokens

137

, 'tok_usage_total' = request_content$usage$total_tokens)

138

139

if(output_type == 'complete'){

140

output = list(core_output

141

, meta_output)

142

} else if(output_type == 'meta'){

143

output = meta_output

144

} else if(output_type == 'text'){

145

output = core_output

ben-aaron188