NV Embedding Cache is a domain-specific SDK for high performance recommender systems embedding lookup. We accelerate embedding lookups with a combination of SW caches in GPU/Host memory and customized CUDA kernels. The main focus is recommender inference with embeddings that exceed the local GPU's memory capacity.
The SDK offers several configurations to support different memory allocations:
- All embeddings are allocated in linear GPU memory: use NVEmbedding with cache_type NoCache(Py) / GPUEmbeddingLayer (C++)
- Some embeddings are cached in GPU memory and all embeddings are in linear memory (Host or other GPUs): use NVEmbedding with cache_type LinearUVM(Py) / LinearUVMEmbeddingLayer (C++)
- Some embeddings are cached in GPU memory, Some embeddings cached in host memory and all embeddings kept in a remote parameter server: use NVEmbedding with cache_type Hierarchical(Py) / HierarchicalEmbeddingLayer (C++)
** Linear memory in this context, means all embeddings are consecutive in virtual memory space. More specifically, the address of embedding i can be computed as start_address + i * embedding_size
- C++17 capable compiler (we test with both GCC 13.3 and Clang 20.1)
- CUDA 12.8+ (earlier version will work with minor code changes)
- CMake 3.18+
- Python 3.10+
- Torch
- (Optional) Redis 7.0.15+ - used in some tests
The provided Dockerfile satisfies these prerequisites. If you're using your own environment, you can skip step (2) in the installation instructions below.
- Clone the repo:
git clone git@github.com:NVIDIA/nv-embedding-cache.git cd nv-embedding-cache git submodule update --init --recursive - Start the docker container:
docker build -t nve --build-arg START_DIR=$(pwd) --build-arg UID=$(id -u) --build-arg UNAME=$(id -u -n) --build-arg GID=$(id -g) --build-arg GNAME=$(id -g -n) . docker run --cap-add=ALL --net=host --ipc=host --gpus all -it --rm -v $(pwd):$(pwd) nve
- Build and install the Python bindings (by default in ./build)
pip install . - Alternatively, build C++ sources with samples and tests
mkdir build_dir cd build_dir cmake .. make all -j cd -
The docs dir contains our documentation. It's structured as follows:
docs
├── advanced.md # Advanced topics
├── benchmarks.md # Benchmarking instructions
├── cpp_api.md # C++ API documentation
├── overview.md # SDK Overview <-- Start Here!
├── python_api.md # Python bindings documentation
└── samples.md # Samples listing and descriptionA good place to start is: docs/overview.md.
Samples are listed in docs/samples.md. The basics are covered in simple_cpp and pytorch/simple_sample
Benchmarking scripts are available in benchmarks/. See instructions at docs/benchmarks.md
The NV Embedding Cache SDK is licensed under the terms of the Apache 2.0 license. See LICENSE for more information.
This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.
Third party dependencies are available as git submodules and can be found at third_party. Their respective licenses are listed below.