在 ubuntu 16.04 上安裝 tensorflow + gpu

安裝 Nvidia Driver

可以到這邊找適合的 driver: https://www.nvidia.com/Download/index.aspx?lang=en-us

sudo add-apt-repository ppa:graphics-drivers/ppa -y
sudo apt -y update
sudo apt install -y nvidia-384

不過有可能遇到一些狀況，比如在 x-server 下無法安裝的問題

可以按 [ctrk] + [alt] + [F1] 進入文字模式

然後關閉 x server

sudo service lightdm stop
sudo init 3

這個時候才開始安裝 nvidia 的 .run 檔案

如果還是遇到問題，那可能就要把 nouveau 停用

vi /etc/modprobe.d/blacklist-nouveau.conf

# 加入
blacklist nouveau
options nouveau modeset=0

# 然後執行
sudo update-initramfs -u

設定好後就可以重開機了!!

安裝 CUDA

到這邊下載 CUDA

https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=deblocal

然後執行以下命令

sudo dpkg -i cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb
sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
sudo apt-get -y update
sudo apt-get -y install cuda libcupti-dev

增加環境變數

export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

安裝 cuDNN v7.0

到這邊下載 https://developer.nvidia.com/cudnn

然後解開安裝

sudo dpkg -i libcudnn7_7.4.1.5-1+cuda9.2_amd64.deb

安裝 python 跟 virtual-env

sudo apt-get install -y python-pip python-dev python-virtualenv

建立一個虛擬的 python 環境

virtualenv --system-site-packages ~/tensorflow

Active 該虛擬環境

source ~/tensorflow/bin/activate

安裝 NVIDIA TensorRT 3.0

可以到這邊下載: https://developer.nvidia.com/nvidia-tensorrt-download

安裝 tensorrt

sudo dpkg -i nv-tensorrt-repo-ubuntu1604-cuda9.0-trt5.0.0.10-rc-20180906_1-1_amd64.deb
sudo apt-get update
sudo apt-get install tensorrt libcudnn7
sudo apt-get install uff-converter-tf graphsurgeon-tf
sudo apt-get install libcudnn7=7.3.0.29-1+cuda9.0 libcudnn7-dev=7.3.0.29-1+cuda9.0
sudo apt-mark hold libcudnn7 libcudnn7-dev

安裝 PyCUDA

pip install 'pycuda>=2017.1.1'

或是用 tar 的方式安裝

tar -zxvf TensorRT-5.0.0.10.Ubuntu-16.04.4.x86_64-gnu.cuda-10.0.cudnn7.3.tar.gz 
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:TensorRT-5.0.0.10/lib
cd TensorRT-5.0.0.10/python 
sudo pip2 install tensorrt-5.0.0.10-py2.py3-none-any.whl 
cd TensorRT-5.0.0.10/uff 
sudo pip2 install uff-0.5.1-py2.py3-none-any.whl 
cd TensorRT-5.0.0.10/graphsurgeon 
sudo pip2 install graphsurgeon-0.2.2-py2.py3-none-any.whl

安裝 gpu 版本的 tensorflow

先升級 pip 版本

easy_install -U pip

安裝 tensorflow

pip install --upgrade tensorflow-gpu

如果裝了有問題，那就是是看用 1.5 版

pip install tensorflow-gpu==1.5

Hello World

如果都好了，那就跑個程式試試看是否正確吧!

# Python
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()

如果你的輸出有看到 GPU，那應該就沒問題了!~

以下是我的輸出 (炫耀文)

2018-10-30 23:27:53.582391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Device peer to peer matrix
2018-10-30 23:27:53.582698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1126] DMA: 0 1 2 3 4 5 6 7 
2018-10-30 23:27:53.582708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 0:   Y N N N N N N N 
2018-10-30 23:27:53.582713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 1:   N Y N N N N N N 
2018-10-30 23:27:53.582718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 2:   N N Y N N N N N 
2018-10-30 23:27:53.582722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 3:   N N N Y N N N N 
2018-10-30 23:27:53.582727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 4:   N N N N Y N N N 
2018-10-30 23:27:53.582740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 5:   N N N N N Y N N 
2018-10-30 23:27:53.582751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 6:   N N N N N N Y N 
2018-10-30 23:27:53.582758: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 7:   N N N N N N N Y 
2018-10-30 23:27:53.582770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: P106-100, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-10-30 23:27:53.582778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: P106-100, pci bus id: 0000:02:00.0, compute capability: 6.1)
2018-10-30 23:27:53.582785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: P106-100, pci bus id: 0000:03:00.0, compute capability: 6.1)
2018-10-30 23:27:53.582791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: P106-100, pci bus id: 0000:04:00.0, compute capability: 6.1)
2018-10-30 23:27:53.582797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:4) -> (device: 4, name: P106-100, pci bus id: 0000:05:00.0, compute capability: 6.1)
2018-10-30 23:27:53.582803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:5) -> (device: 5, name: P106-100, pci bus id: 0000:06:00.0, compute capability: 6.1)
2018-10-30 23:27:53.582809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:6) -> (device: 6, name: P106-100, pci bus id: 0000:09:00.0, compute capability: 6.1)
2018-10-30 23:27:53.582815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:7) -> (device: 7, name: P106-100, pci bus id: 0000:0b:00.0, compute capability: 6.1)
Hello, TensorFlow!

無法正常執行

如果甚麼都裝了，但是一執行就出現

tensorflow-gpu illegal instruction (core dumped)

可以考慮試試看用 anaconda，先到這邊下載並執行

https://www.anaconda.com/download/#linux

安裝好之後，再執行以下指令建立並啟用 tensorflow 環境

conda create -n tensorflow
conda activate tensorflow

然後安裝 tensorflow-gpu

conda install tensorflow-gpu -n tensorflow

安裝相關套件

conda install -c anaconda matplotlib cudatoolkit _tflow_190_select PIL keras msgpack

建立 GPU 環境

conda create -n tensorflow_gpuenv tensorflow-gpu
conda activate tensorflow_gpuenv
conda install -c anaconda matplotlib cudatoolkit _tflow_190_select PIL keras msgpack

安裝相關套件

pip install --upgrade pip
pip install glob2 opencv-python

查看 tensorflow 版本

python -c "import tensorflow as tf; print(tf.__version__)"

圖片引用來源: https://dwijaybane.wordpress.com/2017/08/01/installing-tensor-flow-gpu-version-on-ubuntu-16-04/

老洪的 IT 學習系統