LLM APIs

Overview

The Large Language Model (LLM) NIM API endpoints provide simple access to use natural language based generative AI. This single API endpoint provides access to top models for use in a wide range of tasks including: chat, instruction following, question answering, summarization, creative text generation, and code generation.

NOTE: Select models are available as downloadable container images and supported with an NVIDIA AI Enterprise entitlement. These select models have additional OpenAI API spec details for running self-hosted localized NIMs. Please refer to the Downloadable NIM documentation for additional information.

URL: https://integrate.api.nvidia.com

Endpoint: POST /v1/chat/completions

Models

abacusai

aisingapore

bigcode

bytedance

deepseek-ai

google

marin

meta

microsoft

minimaxai

mistralai

moonshotai

nvidia

ModelEndpoint
nvidia / gliner-piiExtract named entities from text using GLiNER PII model (gliner-pii)
nvidia / llama-3.1-nemoguard-8b-content-safetyCreates a model response for the given chat conversation. (llama-3.1-nemoguard-8b-content-safety)
nvidia / llama-3.1-nemoguard-8b-topic-controlCreates a model response for the given chat conversation. (llama-3.1-nemoguard-8b-topic-control)
nvidia / llama-3.1-nemotron-nano-4b-v1_1Creates a model response for the given chat conversation. (llama-3.1-nemotron-nano-4b-v1_1)
nvidia / llama-3.1-nemotron-nano-8b-v1Creates a model response for the given chat conversation. (llama-3.1-nemotron-nano-8b-v1)
nvidia / llama-3_1-nemotron-safety-guard-8b-v3Creates a model response for the given chat conversation. (llama-3_1-nemotron-safety-guard-8b-v3)
nvidia / llama-3.1-nemotron-ultra-253b-v1Creates a model response for the given chat conversation. (llama-3.1-nemotron-ultra-253b-v1)
nvidia / llama-3.2-nemoretriever-1b-vlm-embed-v1Creates an embedding vector from the input text. (llama-3.2-nemoretriever-1b-vlm-embed-v1)
nvidia / llama-3.3-nemotron-super-49b-v1Creates a model response for the given chat conversation. (llama-3.3-nemotron-super-49b-v1)
nvidia / llama-3.3-nemotron-super-49b-v1.5Creates a model response for the given chat conversation. (llama-3.3-nemotron-super-49b-v1.5)
nvidia / mistral-nemo-minitron-8b-baseCreate Completion (mistral-nemo-minitron-8b-base)
nvidia / nemoguard-jailbreak-detectClassify text for jailbreak attempt. (nemoguard-jailbreak-detect)
nvidia / nemotron-3-nano-30b-a3bCreates a model response for the given chat conversation. (nemotron-3-nano-30b-a3b)
nvidia / nemotron-3-super-120b-a12bCreates a model response for the given chat conversation. (nemotron-3-super-120b-a12b)
nvidia / nemotron-4-mini-hindi-4b-instructCreates a model response for the given chat conversation. (nemotron-4-mini-hindi-4b-instruct)
nvidia / nemotron-content-safety-reasoning-4bCreates a model response for the given chat conversation. (nemotron-content-safety-reasoning-4b)
nvidia / nemotron-mini-4b-instructCreates a model response for the given chat conversation. (nemotron-mini-4b-instruct)
nvidia / nvidia-nemotron-nano-9b-v2Creates a model response for the given chat conversation. (nvidia-nemotron-nano-9b-v2)
nvidia / riva-translate-4b-instruct-v1_1Creates a model response for the given chat conversation. (riva-translate-4b-instruct-v1_1)
nvidia / usdcodeCreates a model response for the given chat conversation. (usdcode)
nvidia / usdsearchSearch Post (usdsearch)

openai

opengpt-x

qwen

rakuten

sarvamai

stepfun-ai

stockmark

upstage

z-ai

country_code