[Bug]: Not Able To Detect GPU, Even Though Ollama Detects It.
What's Going Wrong?
Not able to detect GPU, even though ollama detects it. This issue is causing problems with the OpenLIT SDK, which is unable to find any supported GPUs on the system.
Steps to Reproduce
- Use the Docker Image: Use the Docker image
nvidia/cuda:12.8.1-base-ubuntu24.04
to reproduce the issue. - Run the OpenLIT SDK: Run the OpenLIT SDK using the Docker image.
- Check the Logs: Check the logs for any errors related to GPU detection.
What Did You Expect?
The OpenLIT SDK should be able to detect the GPU on the system and use it for inference. However, in this case, the SDK is unable to detect the GPU, even though ollama detects it.
Any Screenshots?
The following screenshot shows the logs from the OpenLIT SDK:
2025/04/19 07:21:24 routes.go:1231: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-04-19T07:21:24.063Z level=INFO source=images.go:458 msg="total blobs: 9"
time=2025-04-19T07:21:24.063Z level=INFO source=images.go:465 msg="total unused blobs removed: 0"
time=2025-04-19T07:21:24.063Z level=INFO source=routes.go:1298 msg="Listening on 127.0.0.1:11434 (version 0.6.5)"
time=2025-04-19T07:21:24.063Z level=DEBUG source=sched.go:107 msg="starting llm scheduler"
time=2025-04-19T07:21:24.063Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-04-19T07:21:24.064Z level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-04-19T07:21:24.064Z level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so*
time=2025-04-19T07:21:24.064Z level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/usr/local/lib/ollama/libcuda.so* /usr/local/cuda/lib64/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2025-04-19T07:21:24.064Z level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.570.86.15]
initializing /usr/lib/x86_64-linux-gnu/libcuda.so.570.86.15
dlsym: cuInit - 0x7fbabfd0de00
dlsym: cuDriverGetVersion - 0x7fbabfd0de20
dlsym: cuDeviceGetCount - 0x7fbabfd0de60
dlsym: cuDeviceGet - 0x7fbabfd0de40
dlsym: cuDeviceGetAttribute - 0x7fbabfd0df40
dlsym: cuDeviceGetUuid - 0x7fbabfd0dea0
dlsym: cuDeviceGetName - 0x7fbabfd0de80
dlsym: cuCtxCreate_v3 - 0x7fbabfd0e120
dlsym: cuMemGetInfo_v2 - 0x7fbabfd0e8a0
dlsym: cuCtxDestroy - 0x7fbabfd6c9f0
calling cuInit
calling cuDriverGetVersion
raw version 0x2f30
CUDA driver version: 12.8
calling cuDeviceGetCount
device count 1
time=2025-04-19T07:21:24.124Z level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.570.86.15
[GPU-bd4533c9-c880-0ab8-27e7-6c5248646113] CUDA totalMem 22565 mb
[GPU-bd4533c9-c880-0ab8-27e7-6c5248646113] CUDA freeMem 22073 mb
[GPU-bd4533c9-c880-0ab8-27e7-6c5248646113] Compute Capability 8.9
time=2025-04-19T07:21:24.231Z level=DEBUG source=amd_linux.go:419 msg="amdgpu driver not detected /sys/module/amdgpu"
releasing cuda driver library
time=2025-04-19T07:21:24.231Z level=INFO source=types.go:130 msg="inference compute" id=GPU-bd4533c9-c880-0ab8-27e7-6c5248646113 library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4090" total="22.0 GiB" available="21.6 GiB"
OpenLIT GPU Instrumentation Error: No supported GPUs found.If this is a non-GPU host, set `collect_gpu_stats=False` to disable GPU stats.
Your Setup
- OpenLIT SDK Version:
pip list
givesopenlit 1.33.19
- Deployment Method: Nomad, Docker driver
Troubleshooting
To troubleshoot this issue, we can try the following:
- Check the GPU Driver: Make sure that the GPU driver is up-to-date and compatible with the OpenLIT SDK.
- Check the CUDA Installation: Make sure that CUDA is installed correctly and that the
libcuda.so
file is present in the correct location. - Check the OpenLIT SDK Configuration: Make sure that the OpenLIT SDK is configured correctly to use the GPU.
Conclusion
Q: What is the issue with the OpenLIT SDK not detecting the GPU?
A: The issue is that the OpenLIT SDK is unable to detect the GPU on the system, even though ollama detects it. This is causing problems with the OpenLIT SDK, which is unable to find any supported GPUs on the system.
Q: What are the possible causes of this issue?
A: There are several possible causes of this issue, including:
- Incorrect GPU Driver: The GPU driver may not be up-to-date or compatible with the OpenLIT SDK.
- Incorrect CUDA Installation: CUDA may not be installed correctly or the
libcuda.so
file may not be present in the correct location. - Incorrect OpenLIT SDK Configuration: The OpenLIT SDK may not be configured correctly to use the GPU.
Q: How can I troubleshoot this issue?
A: To troubleshoot this issue, you can try the following:
- Check the GPU Driver: Make sure that the GPU driver is up-to-date and compatible with the OpenLIT SDK.
- Check the CUDA Installation: Make sure that CUDA is installed correctly and that the
libcuda.so
file is present in the correct location. - Check the OpenLIT SDK Configuration: Make sure that the OpenLIT SDK is configured correctly to use the GPU.
Q: What are the symptoms of this issue?
A: The symptoms of this issue include:
- OpenLIT SDK unable to detect GPU: The OpenLIT SDK is unable to detect the GPU on the system.
- Error messages: Error messages may be displayed indicating that the OpenLIT SDK is unable to find any supported GPUs on the system.
Q: How can I resolve this issue?
A: To resolve this issue, you can try the following:
- Update the GPU Driver: Update the GPU driver to the latest version.
- Reinstall CUDA: Reinstall CUDA to ensure that it is installed correctly.
- Reconfigure the OpenLIT SDK: Reconfigure the OpenLIT SDK to use the GPU.
Q: What are the benefits of resolving this issue?
A: Resolving this issue will allow the OpenLIT SDK to detect the GPU on the system and use it for inference. This will improve the performance of the OpenLIT SDK and enable it to handle larger models and more complex tasks.
Q: How can I prevent this issue from occurring in the future?
A: To prevent this issue from occurring in the future, you can:
- Regularly update the GPU Driver: Regularly update the GPU driver to ensure that it is up-to-date and compatible with the OpenLIT SDK.
- Regularly reinstall CUDA: Regularly reinstall CUDA to ensure that it is installed correctly.
- Regularly reconfigure the OpenLIT SDK: Regularly reconfigure the OpenLIT SDK to use the GPU.
Conclusion
In conclusion, the OpenLIT SDK is unable to detect the GPU on the system, even though ollama detects it. issue is causing problems with the OpenLIT SDK, which is unable to find any supported GPUs on the system. To troubleshoot this issue, you can try checking the GPU driver, CUDA installation, and OpenLIT SDK configuration. Resolving this issue will allow the OpenLIT SDK to detect the GPU on the system and use it for inference, improving the performance of the OpenLIT SDK and enabling it to handle larger models and more complex tasks.