NVIDIA Nemotron LLMs on AWS: Super & Nano Models Now Available
Note: This post may contain affiliate links and we may earn a commission (with No additional cost for you) if you make a purchase via our link. See our disclosure for more info
NVIDIA's new Nemotron Super 49B V1 and Nano 8B V1 large language models (LLMs) are now accessible via Amazon Bedrock Marketplace and SageMaker JumpStart. These reasoning models, derived from Meta's Llama models, are optimized for efficiency and high performance. Nemotron Super, based on Llama 3.3, boasts a 49B parameter count, significantly reduced using Neural Architecture Search (NAS), allowing deployment on a single H200 GPU. It supports 128K token context windows and is post-trained for reasoning, RAG, and tool calling. Nemotron Nano, based on Llama 3.1, is a smaller 8B parameter model that runs on a single H100 or A100 GPU, making it suitable for resource-constrained environments. It also offers improvements over Meta's baseline model accuracy. Both models offer a ‘soft switch' to toggle reasoning on or off. The target audience includes businesses looking to integrate advanced reasoning capabilities into their applications. Deployment is straightforward through either the Amazon Bedrock console or the SageMaker SDK, providing options for various scalability needs. The Bedrock Marketplace offers a unified experience with access to other tools like Bedrock Agents and Guardrails. SageMaker JumpStart provides easy deployment and access to MLOps features. While the models offer significant advantages, potential drawbacks might include the cost associated with running these large models, especially on high-end GPU instances. Comparisons to other LLMs will depend on specific benchmark results and user requirements, but Nemotron models highlight a focus on both performance and efficiency.