Installing on Windows
Note
The Windows release of TensorRT-LLM is currently in beta. We recommend checking out the v0.10.0 tag for the most stable experience.
Prerequisites
Clone this repository using Git for Windows.
Install the dependencies one of two ways:
Install all dependencies together.
Run the provided PowerShell script
setup_env.ps1
located under the/windows/
folder which installs Python and CUDA 12.4 automatically with default settings. Run PowerShell as Administrator to use the script.
./setup_env.ps1 [-skipCUDA] [-skipPython]
Close and re-open any existing PowerShell or Git Bash windows so they pick up the new
Path
modified by thesetup_env.ps1
script above.
Install the dependencies one at a time.
Install Python 3.10.
Select Add python.exe to PATH at the start of the installation. The installation may only add the
python
command, but not thepython3
command.Navigate to the installation path
%USERPROFILE%\AppData\Local\Programs\Python\Python310
(AppData
is a hidden folder) and copypython.exe
topython3.exe
.
Install CUDA 12.4 Toolkit. Use the Express Installation option. Installation may require a restart.
Steps
Install TensorRT-LLM.
If you have an existing TensorRT installation (from older versions of tensorrt_llm
), please execute
pip uninstall -y tensorrt tensorrt_libs tensorrt_bindings
pip uninstall -y nvidia-cublas-cu12 nvidia-cuda-nvrtc-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12
before installing TensorRT-LLM with the following command.
pip install tensorrt_llm==0.10.0 --extra-index-url https://pypi.nvidia.com
Run the following command to verify that your TensorRT-LLM installation is working properly.
python -c "import tensorrt_llm; print(tensorrt_llm._utils.trt_version())"
Build the model.
Deploy the model.