Installing on Windows
Note
The Windows release of TensorRT-LLM is currently in beta. We recommend checking out the v0.10.0 tag for the most stable experience.
Prerequisites
- Clone this repository using Git for Windows. 
- Install the dependencies one of two ways: - Install all dependencies together. - Run the provided PowerShell script - setup_env.ps1located under the- /windows/folder which installs Python and CUDA 12.4 automatically with default settings. Run PowerShell as Administrator to use the script.
 - ./setup_env.ps1 [-skipCUDA] [-skipPython] - Close and re-open any existing PowerShell or Git Bash windows so they pick up the new - Pathmodified by the- setup_env.ps1script above.
 
- Install the dependencies one at a time. - Install Python 3.10. - Select Add python.exe to PATH at the start of the installation. The installation may only add the - pythoncommand, but not the- python3command.
- Navigate to the installation path - %USERPROFILE%\AppData\Local\Programs\Python\Python310(- AppDatais a hidden folder) and copy- python.exeto- python3.exe.
 
- Install CUDA 12.4 Toolkit. Use the Express Installation option. Installation may require a restart. 
 
 
Steps
- Install TensorRT-LLM. 
If you have an existing TensorRT installation (from older versions of tensorrt_llm), please execute
pip uninstall -y tensorrt tensorrt_libs tensorrt_bindings
pip uninstall -y nvidia-cublas-cu12 nvidia-cuda-nvrtc-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12
before installing TensorRT-LLM with the following command.
pip install tensorrt_llm==0.10.0 --extra-index-url https://pypi.nvidia.com
Run the following command to verify that your TensorRT-LLM installation is working properly.
python -c "import tensorrt_llm; print(tensorrt_llm._utils.trt_version())"
- Build the model. 
- Deploy the model.