A couple years ago, I had attempted to get involved in Tensorflow/Keras but go waylaid with a few projects. Anyway, I have a new use case to be able to touch my toe in that water (more to come). The configuration is not difficult, but can be time consuming, so I thought it worth recording for the next time I need to get this taken care of.
You can see if TensorFlow has access to your GPU by running the following (it assumes TensorFlow has been installed):
import tensorflow as tf tf.test.is_gpu_available() # deprecated tf.config.list_physical_devices('GPU')
A bunch of debugging output will be printed (for the is_gpu_available
), before returning True/False, while the latter will return a list of the available devices. The output should help you determine which DLL is missing and trace it down.
My desktop has an NVIDIA GeForce GTX 1080 (you can look this up under Device Manager > Display adapters). The official instructions are recorded on the TensorFlow website: https://www.tensorflow.org/install/gpu, with the Windows setup lagging toward the bottom.
First, to use TensorFlow, it has to be installed as well as Python. I have Python 3.8 and created a virtual environment where I have TensorFlow 2.1 installed (pip install tensorflow
). The version is very important, as different versions of TensorFlow are built for particular software versions of CUDA. See: https://www.tensorflow.org/install/gpu#software_requirements. With TensorFlow 2.1, I need CUDA 10.1. (Even though 10.2 is available, it won’t work, and TensorFlow will default to using the CPU.)
Tested build configurations are also listed here: https://www.tensorflow.org/install/source#tested_build_configurations, which suggests that I should be using Python 3.7 with tensorflow-2.1, but 3.8 seems sufficiently happy.
Second, install the GPU driver (I’d already completed this step having setup the driver when I built my current machine). https://www.nvidia.com/download/index.aspx
Third, download/install CUDA 10.1 (or the relevant version for your version of TensorFlow). I grabbed the most recent update “update2”. Go directly to the archive: https://developer.nvidia.com/cuda-toolkit-archive, otherwise it defaults to just the latest release). You will probably need about 5GB of space for the entire install.
By default, CUDA will install to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA. For TensorFlow to access the requisite DLLs, you’ll need to add a few paths to the PATH environment variable:

Fourth, the cuDNN SDK. This is probably the most annoying as an account must be created with NVIDIA prior to getting access (and a short survey completed when accessing each new tool). The direct link is https://developer.nvidia.com/cudnn. And, once again, choose the cuDNN for the appropriate version of CUDA:

Extract the download zip archive (I pointed it to D:\prog\CUDA) which must then be added to your path.

Finally, test to make sure it’s working, by running the list_physical_devices
function. (Be sure to restart any prompts so that the updated PATH variable can take effect.)
>>> tf.config.list_physical_devices('GPU') 2020-06-02 17:20:13.459081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:09:00.0 name: GeForce GTX 1080 computeCapability: 6.1 coreClock: 1.835GHz coreCount: 20 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 298.32GiB/s 2020-06-02 17:20:13.467609: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll 2020-06-02 17:20:13.473219: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll 2020-06-02 17:20:13.477389: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll 2020-06-02 17:20:13.483137: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll 2020-06-02 17:20:13.487272: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll 2020-06-02 17:20:13.493035: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll 2020-06-02 17:20:13.497192: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2020-06-02 17:20:13.503284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
And, checking Task Manager shows GPU usage for the desired process:

The percentage seems rather low, though I’m not familiar enough with my application to if this is in the expected range, but the program is executing (and not crashing as it did with CPU-only).
The only potential issue I’ve encountered has to do with ‘relying on driver to perform ptx compilation’ when starting to use the GPU:
2020-06-02 16:14:24.829554: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll 2020-06-02 16:14:25.402273: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2020-06-02 16:14:26.306640: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once.
A related issue might be: https://github.com/tensorflow/models/issues/7640, though this describes the application hanging at the point (where mine carries on).