Building from Source Code on Windows
Note
This section is for advanced users. Skip this section if you plan to use the pre-built TensorRT-LLM release wheel.
Prerequisites
- Install prerequisites listed in our Installing on Windows document. 
- Install CMake, version 3.27.7 is recommended, and select the option to add it to the system path. 
- Download and install Visual Studio 2022. 
- Download and unzip TensorRT 10.0.1.6. 
Building a TensorRT-LLM Docker Image
Docker Desktop
- Install Docker Desktop on Windows. 
- Set the following configurations: 
- Right-click the Docker icon in the Windows system tray (bottom right of your taskbar) and select Switch to Windows containers…. 
- In the Docker Desktop settings on the General tab, uncheck Use the WSL 2 based image. 
- On the Docker Engine tab, set your configuration file to: 
{
  "experimental": true
}
Note
After building, copy the files out of your container. docker cp is not supported on Windows for Hyper-V based images. Unless you are using WSL 2 based images, mount a folder, for example, trt-llm-build, to your container when you run it for moving files between the container and host system.
Acquire an Image
The Docker container will be hosted for public download in a future release. At this time, it must be built manually. From the TensorRT-LLM\windows\ folder, run the build command:
docker build -f .\docker\Dockerfile -t tensorrt-llm-windows-build:latest .
And your image is now ready for use.
Run the Container
Run the container in interactive mode with your build folder mounted. Specify a memory limit with the -m flag. By default, the limit is 2 GB, which is not sufficient to build TensorRT-LLM.
docker run -it -m 12g -v .\trt-llm-build:C:\workspace\trt-llm-build tensorrt-llm-windows-build:latest
Build and Extract Files
- Clone and setup the TensorRT-LLM repository within the container. 
git clone https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive
- Build TensorRT-LLM. This command generates - build\tensorrt_llm-*.whl.
python .\scripts\build_wheel.py -a "89-real" --trt_root C:\workspace\TensorRT-10.0.1.6\
- Copy or move - build\tensorrt_llm-*.whlinto your mounted folder so it can be accessed on your host machine. If you intend to use the C++ runtime, you’ll also need to gather various DLLs from the build into your mounted folder. For more information, refer to C++ Runtime Usage.
Building TensorRT-LLM on Bare Metal
Prerequisites
- Install all prerequisites ( - git,- python,- CUDA) listed in our Installing on Windows document.
- Install Nsight NVTX. TensorRT-LLM on Windows currently depends on NVTX assets that do not come packaged with the CUDA 12.4 installer. To install these assets, download the CUDA 11.8 Toolkit. - During installation, select Advanced installation. 
- Nsight NVTX is located in the CUDA drop-down. 
- Deselect all packages, and select Nsight NVTX. 
 
- Install the dependencies one of two ways: - Run the - setup_build_env.ps1script, which installs CMake, Microsoft Visual Studio Build Tools, and TensorRT automatically with default settings.- Run PowerShell as Administrator to use the script. 
 - ./setup_build_env.ps1 -TRTPath <TRT-containing-folder> [-skipCMake] [-skipVSBuildTools] [-skipTRT] - Close and reopen PowerShell after running the script so that - Pathchanges take effect.
- Supply a directory that already exists to contain TensorRT to - -TRTPath, for example,- -TRTPath ~/inferencemay be valid, but- -TRTPath ~/inference/TensorRTwill not be valid if- TensorRTdoes not exist.- -TRTPathisn’t required if- -skipTRTis supplied.
 
- Install the dependencies one at a time. - Install CMake, version 3.27.7 is recommended, and select the option to add it to the system path. 
- Download and install Visual Studio 2022. When prompted to select more Workloads, check Desktop development with C++. 
- Download and unzip TensorRT 10.0.1.6. Move the folder to a location you can reference later, such as - %USERPROFILE%\inference\TensorRT.- Add the libraries for TensorRT to your system’s - Pathenvironment variable. Your- Pathshould include a line like this:
 - %USERPROFILE%\inference\TensorRT\lib - Close and re-open any existing PowerShell or Git Bash windows so they pick up the new - Path.
- Remove existing - tensorrtwheels first by executing
 - pip uninstall -y tensorrt tensorrt_libs tensorrt_bindings pip uninstall -y nvidia-cublas-cu12 nvidia-cuda-nvrtc-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12 - Install the TensorRT core libraries, run PowerShell, and use - pipto install the Python wheel.
 - pip install %USERPROFILE%\inference\TensorRT\python\tensorrt-*.whl - Verify that your TensorRT installation is working properly. 
 - python -c "import tensorrt as trt; print(trt.__version__)" 
 
 
Steps
- Launch a 64-bit Developer PowerShell. From your usual PowerShell terminal, run one of the following two commands. - If you installed Visual Studio Build Tools (that is, used the - setup_build_env.ps1script):
 - & 'C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\Common7\Tools\Launch-VsDevShell.ps1' -Arch amd64 - If you installed Visual Studio Community (e.g. via manual GUI setup): 
 - & 'C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\Tools\Launch-VsDevShell.ps1' -Arch amd64 
- In PowerShell, from the - TensorRT-LLMroot folder, run:
python .\scripts\build_wheel.py -a "89-real" --trt_root <path_to_trt_root>
The -a flag specifies the device architecture. "89-real" supports GeForce 40-series cards.
The flag -D "ENABLE_MULTI_DEVICE=0", while not specified here, is implied on Windows. Multi-device inference is supported on Linux, but not on Windows.
This command generates build\tensorrt_llm-*.whl.
Linking with the TensorRT-LLM C++ Runtime
Note
This section is for advanced users. Skip this section if you do not intend to use the TensorRT-LLM C++ runtime directly. You must build from source to use the C++ runtime.
Building from source creates libraries that can be used if you wish to directly link against the C++ runtime for TensorRT-LLM. These libraries are also required if you wish to run C++ unit tests and some benchmarks.
Building from source produces the following library files.
- tensorrt_llmlibraries located in- cpp\build\tensorrt_llm- tensorrt_llm.dll- Shared library
- tensorrt_llm.exp- Export file
- tensorrt_llm.lib- Stub for linking to- tensorrt_llm.dll
 
- Dependency libraries (these get copied to - tensorrt_llm\libs\)- nvinfer_plugin_tensorrt_llmlibraries located in- cpp\build\tensorrt_llm\plugins\- nvinfer_plugin_tensorrt_llm.dll
- nvinfer_plugin_tensorrt_llm.exp
- nvinfer_plugin_tensorrt_llm.lib
 
- th_commonlibraries located in- cpp\build\tensorrt_llm\thop\- th_common.dll
- th_common.exp
- th_common.lib
 
 
The locations of the DLLs, in addition to some torch DLLs, must be added to the Windows Path in order to use the TensorRT-LLM C++ runtime. Append the locations of these libraries to your Path. When complete, your Path should include lines similar to these:
%USERPROFILE%\inference\TensorRT-LLM\cpp\build\tensorrt_llm
%USERPROFILE%\AppData\Local\Programs\Python\Python310\Lib\site-packages\tensorrt_llm\libs
%USERPROFILE%\AppData\Local\Programs\Python\Python310\Lib\site-packages\torch\lib
Your Path additions may differ, particularly if you used the Docker method and copied all the relevant DLLs into a single folder.