Overview

The Large Language Model (LLM) NIM API endpoints provide simple access to use natural language based generative AI. This single API endpoint provides access to top models for use in a wide range of tasks including: chat, instruction following, question answering, summarization, creative text generation, and code generation.

NOTE: Select models are available as downloadable container images and supported with an NVIDIA AI Enterprise entitlement. These select models have additional OpenAI API spec details for running self-hosted localized NIMs. Please refer to the Downloadable NIM documentation for additional information.

URL: https://integrate.api.nvidia.com

Endpoint: POST /v1/chat/completions

Models

abacusai

Model	Endpoint
abacusai / dracarys-llama-3.1-70b-instruct	Creates a model response for the given chat conversation. (dracarys-llama-3.1-70b-instruct)

aisingapore

Model	Endpoint
aisingapore / sea-lion-7b-instruct	Create a chat completion (sea-lion-7b-instruct)

bigcode

Model	Endpoint
bigcode / starcoder2-7b	Create Completion (starcoder2-7b)

bytedance

Model	Endpoint
bytedance / seed-oss-36b-instruct	Creates a model response for the given chat conversation. (seed-oss-36b-instruct)

deepseek-ai

Model	Endpoint
deepseek-ai / deepseek-v3.1-terminus	Creates a model response for the given chat conversation. (deepseek-v3.1-terminus)
deepseek-ai / deepseek-v3.2	Creates a model response for the given chat conversation. (deepseek-v3.2)
deepseek-ai / deepseek-v4-flash	Creates a model response for the given chat conversation. (deepseek-v4-flash)
deepseek-ai / deepseek-v4-pro	Creates a model response for the given chat conversation. (deepseek-v4-pro)

google

Model	Endpoint
google / codegemma-7b	Create a chat completion (codegemma-7b)
google / gemma-2-2b-it	Creates a model response for the given chat conversation. (gemma-2-2b-it)
google / gemma-7b	Create a chat completion (gemma-7b)
google / shieldgemma-9b	Creates a model response for the given chat conversation. (shieldgemma-9b)

marin

Model	Endpoint
marin / marin-8b-instruct	Creates a model response for the given chat conversation. (marin-8b-instruct)

Model	Endpoint
meta / llama2-70b	Create a chat completion (llama2-70b)
meta / llama3-8b	Creates a chat completion (llama3-8b)
meta / llama-3.1-8b-instruct	Creates a model response for the given chat conversation. (llama-3.1-8b-instruct)
meta / llama-3.1-70b-instruct	Creates a model response for the given chat conversation. (llama-3.1-70b-instruct)
meta / llama-3.1-405b-instruct	Creates a model response for the given chat conversation. (llama-3.1-405b-instruct)
meta / llama-3.2-1b-instruct	Creates a model response for the given chat conversation. (llama-3.2-1b-instruct)
meta / llama-3.2-3b-instruct	Creates a model response for the given chat conversation. (llama-3.2-3b-instruct)
meta / llama-3.3-70b-instruct	Creates a model response for the given chat conversation. (llama-3.3-70b-instruct)

microsoft

Model	Endpoint
microsoft / phi-3-medium-128k-instruct	Creates a model response for the given chat conversation. (phi-3-medium-128k-instruct)
microsoft / phi-3-medium-4k-instruct	Creates a chat completion (phi-3-medium-4k-instruct)
microsoft / phi-3-mini-128k-instruct	Creates a model response for the given chat conversation. (phi-3-mini-128k-instruct)
microsoft / phi-3-mini-4k-instruct	Creates a model response for the given chat conversation. (phi-3-mini-4k-instruct)
microsoft / phi-3-small-128k-instruct	Creates a chat completion (phi-3-small-128k-instruct)
microsoft / phi-3-small-8k-instruct	Create a chat completion (phi-3-small-8k-instruct)
microsoft / phi-3.5-mini	Creates a model response for the given chat conversation. (phi-3.5-mini)
microsoft / phi-4-mini-instruct	Creates a model response for the given chat conversation. (phi-4-mini-instruct)
microsoft / phi-4-mini-flash-reasoning	Creates a model response for the given chat conversation. (phi-4-mini-flash-reasoning)

minimaxai

Model	Endpoint
minimaxai / minimax-m2.5	Creates a model response for the given chat conversation. (minimax-m2.5)
minimaxai / minimax-m2.7	Creates a model response for the given chat conversation. (minimax-m2.7)

mistralai

Model	Endpoint
mistralai / codestral-22b-instruct-v0.1	Creates a model response for the given chat conversation. (codestral-22b-instruct-v0.1)
mistralai / devstral-2-123b-instruct-2512	Creates a model response for the given chat conversation. (devstral-2-123b-instruct-2512)
mistralai / magistral-small-2506	Creates a model response for the given chat conversation. (magistral-small-2506)
mistralai / mamba-codestral-7b-v0.1	Creates a model response for the given chat conversation. (mamba-codestral-7b-v0.1)
mistralai / mistral-7b-instruct	Create a chat completion (mistral-7b-instruct)
mistralai / mistral-7b-instruct-v0.3	Creates a model response for the given chat conversation. (mistral-7b-instruct-v0.3)
mistralai / mistral-large	Create a chat completion (mistral-large)
mistralai / mistral-nemotron	Creates a model response for the given chat conversation. (mistral-nemotron)
mistralai / mistral-small-24b-instruct	Creates a model response for the given chat conversation. (mistral-small-24b-instruct)
mistralai / mixtral-8x7b-instruct	Create a chat completion (mixtral-8x7b-instruct)
mistralai / mixtral-8x22b-instruct	Create a chat completion (mixtral-8x22b-instruct)

moonshotai

Model	Endpoint
moonshotai / kimi-k2-instruct	Creates a model response for the given chat conversation. (kimi-k2-instruct)
moonshotai / kimi-k2-instruct-0905	Creates a model response for the given chat conversation. (kimi-k2-instruct-0905)
moonshotai / kimi-k2-thinking	Creates a model response for the given chat conversation. (kimi-k2-thinking)

nvidia

Model	Endpoint
nvidia / gliner-pii	Extract named entities from text using GLiNER PII model (gliner-pii)
nvidia / llama-3.1-nemoguard-8b-content-safety	Creates a model response for the given chat conversation. (llama-3.1-nemoguard-8b-content-safety)
nvidia / llama-3.1-nemoguard-8b-topic-control	Creates a model response for the given chat conversation. (llama-3.1-nemoguard-8b-topic-control)
nvidia / llama-3.1-nemotron-nano-4b-v1_1	Creates a model response for the given chat conversation. (llama-3.1-nemotron-nano-4b-v1_1)
nvidia / llama-3.1-nemotron-nano-8b-v1	Creates a model response for the given chat conversation. (llama-3.1-nemotron-nano-8b-v1)
nvidia / llama-3_1-nemotron-safety-guard-8b-v3	Creates a model response for the given chat conversation. (llama-3_1-nemotron-safety-guard-8b-v3)
nvidia / llama-3.1-nemotron-ultra-253b-v1	Creates a model response for the given chat conversation. (llama-3.1-nemotron-ultra-253b-v1)
nvidia / llama-3.2-nemoretriever-1b-vlm-embed-v1	Creates an embedding vector from the input text. (llama-3.2-nemoretriever-1b-vlm-embed-v1)
nvidia / llama-3.3-nemotron-super-49b-v1	Creates a model response for the given chat conversation. (llama-3.3-nemotron-super-49b-v1)
nvidia / llama-3.3-nemotron-super-49b-v1.5	Creates a model response for the given chat conversation. (llama-3.3-nemotron-super-49b-v1.5)
nvidia / mistral-nemo-minitron-8b-base	Create Completion (mistral-nemo-minitron-8b-base)
nvidia / nemoguard-jailbreak-detect	Classify text for jailbreak attempt. (nemoguard-jailbreak-detect)
nvidia / nemotron-3-nano-30b-a3b	Creates a model response for the given chat conversation. (nemotron-3-nano-30b-a3b)
nvidia / nemotron-3-super-120b-a12b	Creates a model response for the given chat conversation. (nemotron-3-super-120b-a12b)
nvidia / nemotron-4-mini-hindi-4b-instruct	Creates a model response for the given chat conversation. (nemotron-4-mini-hindi-4b-instruct)
nvidia / nemotron-content-safety-reasoning-4b	Creates a model response for the given chat conversation. (nemotron-content-safety-reasoning-4b)
nvidia / nemotron-mini-4b-instruct	Creates a model response for the given chat conversation. (nemotron-mini-4b-instruct)
nvidia / nvidia-nemotron-nano-9b-v2	Creates a model response for the given chat conversation. (nvidia-nemotron-nano-9b-v2)
nvidia / riva-translate-4b-instruct-v1_1	Creates a model response for the given chat conversation. (riva-translate-4b-instruct-v1_1)
nvidia / usdcode	Creates a model response for the given chat conversation. (usdcode)
nvidia / usdsearch	Search Post (usdsearch)

openai

Model	Endpoint
openai / gpt-oss-20b	Creates a model response for the given chat conversation. (gpt-oss-20b)
openai / gpt-oss-120b	Creates a model response for the given chat conversation. (gpt-oss-120b)

opengpt-x

Model	Endpoint
opengpt-x / teuken-7b-instruct-commercial-v0.4	Creates a model response for the given chat conversation. (teuken-7b-instruct-commercial-v0.4)

qwen

Model	Endpoint
qwen / qwen2-7b-instruct	Creates a model response for the given chat conversation. (qwen2-7b-instruct)
qwen / qwen2.5-7b-instruct	Creates a model response for the given chat conversation. (qwen2.5-7b-instruct)
qwen / qwen2.5-coder-7b-instruct	Creates a model response for the given chat conversation. (qwen2.5-coder-7b-instruct)
qwen / qwen2.5-coder-32b-instruct	Creates a model response for the given chat conversation. (qwen2.5-coder-32b-instruct)
qwen / qwen3-5-122b-a10b	Request response from the model (qwen3-5-122b-a10b)
qwen / qwen3-coder-480b-a35b-instruct	Creates a model response for the given chat conversation. (qwen3-coder-480b-a35b-instruct)
qwen / qwen3-next-80b-a3b-instruct	Creates a model response for the given chat conversation. (qwen3-next-80b-a3b-instruct)
qwen / qwen3-next-80b-a3b-thinking	Creates a model response for the given chat conversation. (qwen3-next-80b-a3b-thinking)
qwen / qwq-32b	Creates a model response for the given chat conversation. (qwq-32b)

rakuten

Model	Endpoint
rakuten / rakutenai-7b-chat	Creates a model response for the given chat conversation. (rakutenai-7b-chat)
rakuten / rakutenai-7b-instruct	Creates a model response for the given chat conversation. (rakutenai-7b-instruct)

sarvamai

Model	Endpoint
sarvamai / sarvam-m	Creates a model response for the given chat conversation. (sarvam-m)

stepfun-ai

Model	Endpoint
stepfun-ai / step-3-5-flash	Creates a model response for the given chat conversation. (step-3-5-flash)

stockmark

Model	Endpoint
stockmark / stockmark-2-100b-instruct	Creates a model response for the given chat conversation. (stockmark-2-100b-instruct)

upstage

Model	Endpoint
upstage / solar-10.7b-instruct	Creates a model response for the given chat conversation. (solar-10.7b-instruct)

z-ai

Model	Endpoint
z-ai / glm4.7	Creates a model response for the given chat conversation. (glm4.7)
z-ai / glm5.1	Creates a model response for the given chat conversation. (glm5.1)