Chat Completion 생성

인증

Authorization

string

header

필수

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

본문

application/json

messages

필수

사용자가 보낸 메시지와 관계없이, 모델이 반드시 따라야 하는 개발자 제공 지침입니다. o1 모델 및 그 이후 버전에서는 developer 메시지가 기존의 system 메시지를 대체합니다.

Show child attributes

model

string | null

frequency_penalty

number | null

기본값:0

logit_bias

Logit Bias · object

Show child attributes

logprobs

boolean | null

기본값:false

top_logprobs

integer | null

기본값:0

max_tokens

integer | null

지원 중단

max_completion_tokens

integer | null

기본값:1

presence_penalty

number | null

기본값:0

response_format

ResponseFormat · object

ResponseFormat
StructuralTagResponseFormat
LegacyStructuralTagResponseFormat

Show child attributes

seed

integer | null

필수 범위: -9223372036854776000 <= x <= 9223372036854776000

stop

기본값:[]

stream

boolean | null

기본값:false

stream_options

StreamOptions · object

Show child attributes

temperature

number | null

top_p

number | null

tools

ChatCompletionToolsParam · object[] | null

Show child attributes

tool_choice

기본값:none

Allowed value: "none"

reasoning_effort

enum<string> | null

사용 가능한 옵션:

low,

medium,

high

include_reasoning

boolean

기본값:true

parallel_tool_calls

boolean | null

기본값:true

user

string | null

use_beam_search

boolean

기본값:false

top_k

integer | null

min_p

number | null

repetition_penalty

number | null

length_penalty

number

기본값:1

stop_token_ids

integer[] | null

include_stop_str_in_output

boolean

기본값:false

ignore_eos

boolean

기본값:false

min_tokens

integer

기본값:0

skip_special_tokens

boolean

기본값:true

spaces_between_special_tokens

boolean

기본값:true

truncate_prompt_tokens

integer | null

필수 범위: x >= -1

prompt_logprobs

integer | null

allowed_token_ids

integer[] | null

bad_words

string[]

echo

boolean

기본값:false

true인 경우, 새 메시지는 역할(role)이 동일하다면 마지막 메시지 바로 앞에 이어서(prepend) 추가됩니다.

add_generation_prompt

boolean

기본값:true

true로 설정하면 생성 프롬프트가 채팅 템플릿에 추가됩니다. 이 값은 모델의 tokenizer 구성에서 채팅 템플릿이 사용하는 매개변수입니다.

continue_final_message

boolean

기본값:false

이 값을 설정하면, 채팅의 마지막 메시지가 EOS 토큰 없이 열린 형태가 되도록 대화가 포맷됩니다. 모델은 새로운 메시지를 시작하는 대신 이 마지막 메시지를 이어서 계속 생성합니다. 이를 통해 모델 응답의 일부를 미리 "프리필(prefill)"할 수 있습니다. add_generation_prompt와 동시에 사용할 수 없습니다.

add_special_tokens

boolean

기본값:false

true로 설정하면, 채팅 템플릿이 추가하는 내용에 더해 특수 토큰(예: BOS)이 프롬프트에 추가됩니다. 대부분의 모델에서는 채팅 템플릿이 이러한 특수 토큰 추가를 처리하므로, 이 옵션은 기본값인 false로 두는 것이 좋습니다.

documents

Documents · object[] | null

모델이 RAG(retrieval-augmented generation)를 사용하는 경우, 모델이 참조할 수 있는 문서들을 나타내는 딕셔너리(dict) 목록입니다. 템플릿이 RAG를 지원하지 않으면 이 인자는 아무 효과도 없습니다. 각 문서는 "title"과 "text" 키를 포함하는 딕셔너리 형태로 제공할 것을 권장합니다.

Show child attributes

chat_template

string | null

이 변환에 사용될 Jinja 템플릿입니다. transformers v4.44부터는 기본 채팅 템플릿이 더 이상 허용되지 않으므로, 토크나이저에 채팅 템플릿이 정의되어 있지 않은 경우 반드시 채팅 템플릿을 직접 제공해야 합니다.

chat_template_kwargs

Chat Template Kwargs · object

템플릿 렌더러에 전달할 추가 키워드 인수입니다. 채팅 템플릿에서 이 인수에 접근할 수 있습니다.

mm_processor_kwargs

Mm Processor Kwargs · object

Hugging Face 프로세서에 전달할 추가 kwargs입니다.

structured_outputs

StructuredOutputsParams · object

구조화된 출력에 사용할 추가 kwargs입니다.

Show child attributes

priority

integer

기본값:0

요청의 우선순위입니다(값이 낮을수록 먼저 처리되며, 기본값은 0입니다). 서빙 중인 모델이 우선순위 스케줄링을 지원하지 않는 경우, 0이 아닌 우선순위를 설정하면 오류가 발생합니다.

request_id

string

이 요청과 연관된 request_id입니다. 호출 측에서 값을 설정하지 않으면 random_uuid가 생성됩니다. 이 ID는 전체 추론 과정에서 사용되며, 응답에도 함께 포함되어 반환됩니다.

logits_processors

(string | LogitsProcessorConstructor · object)[] | null

샘플링 시 적용할 logits processor의 정규화된 이름(qualified name) 목록 또는 생성자(constructor) 객체의 목록입니다. 생성자는 JSON 객체이며, processor 클래스/팩토리의 정규화된 이름을 지정하는 필수 필드 'qualname'과 위치 인자 및 키워드 인자를 포함하는 선택 필드 'args'와 'kwargs'를 가집니다. 예시: {'qualname': 'my_module.MyLogitsProcessor', 'args': [1, 2], 'kwargs': {'param': 'value'}}.

return_tokens_as_token_ids

boolean | null

'logprobs'와 함께 설정하면 토큰이 'token_id:{token_id}' 형식의 문자열로 표현됩니다. 이를 통해 JSON으로 인코딩할 수 없는 토큰을 식별할 수 있습니다.

return_token_ids

boolean | null

설정하면 결과에 생성된 텍스트와 함께 토큰 ID도 포함됩니다. 스트리밍 모드에서는 prompt_token_ids가 첫 번째 청크에만 포함되며, token_ids에는 각 청크에 대한 델타 토큰이 포함됩니다. 이는 디버깅을 하거나 생성된 텍스트를 다시 입력 토큰에 매핑해야 할 때 유용합니다.

cache_salt

string | null

지정된 경우, 프리픽스 캐시는 다중 사용자 환경에서 공격자가 프롬프트를 추론하거나 맞히는 것을 방지하기 위해 제공된 문자열로 솔트(salt)를 추가합니다. 솔트는 무작위로 생성되어야 하며, 제3자가 접근할 수 없도록 보호되어야 하고, 예측이 불가능할 만큼 충분히 길어야 합니다(예: 256비트에 해당하는 43자 길이의 base64 인코딩 문자열).

kv_transfer_params

Kv Transfer Params · object

분리형 서빙(disaggregated serving)에 사용되는 KVTransfer 매개변수입니다.

vllm_xargs

Vllm Xargs · object

커스텀 확장에서 사용하는 추가 요청 매개변수로, 문자열 또는 숫자 값(또는 해당 값들의 목록)을 가질 수 있습니다.

Show child attributes

응답

성공 응답

model

string

필수

choices

ChatCompletionResponseChoice · object[]

필수

Show child attributes

usage

UsageInfo · object

필수

Show child attributes

string

object

string

기본값:chat.completion

Allowed value: "chat.completion"

created

integer

service_tier

enum<string> | null

사용 가능한 옵션:

auto,

default,

flex,

scale,

priority

system_fingerprint

string | null

prompt_logprobs

(object | null)[] | null

Show child attributes

prompt_token_ids

integer[] | null

kv_transfer_params

Kv Transfer Params · object

KVTransfer 매개변수입니다.

서버리스 RL

서버리스 SFT

API 레퍼런스

인증

본문

응답

서버리스 RL

서버리스 SFT

API 레퍼런스

Documentation Index

인증

본문

응답