在 ubuntu 16.04 上安裝 tensorflow + gpu
  • 15,032 views,
  • 2018-10-30,
  • 上傳者: Kuann Hung,
  •  0
43d4b89454baa3d13ce65da567d14d2d.png
步驟
1.
安裝 Nvidia Driver
可以到這邊找適合的 driver: https://www.nvidia.com/Download/index.aspx?lang=en-us
sudo add-apt-repository ppa:graphics-drivers/ppa -y
sudo apt -y update
sudo apt install -y nvidia-384
不過有可能遇到一些狀況,比如在 x-server 下無法安裝的問題
可以按 [ctrk] + [alt] + [F1] 進入文字模式
然後關閉 x server
sudo service lightdm stop
sudo init 3
這個時候才開始安裝 nvidia 的 .run 檔案
 
如果還是遇到問題,那可能就要把 nouveau 停用
vi /etc/modprobe.d/blacklist-nouveau.conf

# 加入
blacklist nouveau
options nouveau modeset=0

# 然後執行
sudo update-initramfs -u

設定好後就可以重開機了!!
 
2.
安裝 CUDA
到這邊下載 CUDA
然後執行以下命令
sudo dpkg -i cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb
sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
sudo apt-get -y update
sudo apt-get -y install cuda libcupti-dev
3.
增加環境變數
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
4.
安裝 cuDNN v7.0
然後解開安裝
sudo dpkg -i libcudnn7_7.4.1.5-1+cuda9.2_amd64.deb
5.
安裝 python 跟 virtual-env
sudo apt-get install -y python-pip python-dev python-virtualenv
建立一個虛擬的 python 環境
virtualenv --system-site-packages ~/tensorflow
Active 該虛擬環境
source ~/tensorflow/bin/activate
6.
安裝 NVIDIA TensorRT 3.0
安裝 tensorrt
sudo dpkg -i nv-tensorrt-repo-ubuntu1604-cuda9.0-trt5.0.0.10-rc-20180906_1-1_amd64.deb
sudo apt-get update
sudo apt-get install tensorrt libcudnn7
sudo apt-get install uff-converter-tf graphsurgeon-tf
sudo apt-get install libcudnn7=7.3.0.29-1+cuda9.0 libcudnn7-dev=7.3.0.29-1+cuda9.0
sudo apt-mark hold libcudnn7 libcudnn7-dev
 
安裝 PyCUDA
pip install 'pycuda>=2017.1.1'
 
或是用 tar 的方式安裝
tar -zxvf TensorRT-5.0.0.10.Ubuntu-16.04.4.x86_64-gnu.cuda-10.0.cudnn7.3.tar.gz 
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:TensorRT-5.0.0.10/lib
cd TensorRT-5.0.0.10/python 
sudo pip2 install tensorrt-5.0.0.10-py2.py3-none-any.whl 
cd TensorRT-5.0.0.10/uff 
sudo pip2 install uff-0.5.1-py2.py3-none-any.whl 
cd TensorRT-5.0.0.10/graphsurgeon 
sudo pip2 install graphsurgeon-0.2.2-py2.py3-none-any.whl
7.
安裝 gpu 版本的 tensorflow
先升級 pip 版本
easy_install -U pip
安裝 tensorflow
pip install --upgrade tensorflow-gpu
如果裝了有問題,那就是是看用 1.5 版
pip install tensorflow-gpu==1.5
8.
Hello World
如果都好了,那就跑個程式試試看是否正確吧!
# Python
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
如果你的輸出有看到 GPU,那應該就沒問題了!~
以下是我的輸出 (炫耀文)
2018-10-30 23:27:53.582391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Device peer to peer matrix
2018-10-30 23:27:53.582698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1126] DMA: 0 1 2 3 4 5 6 7 
2018-10-30 23:27:53.582708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 0:   Y N N N N N N N 
2018-10-30 23:27:53.582713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 1:   N Y N N N N N N 
2018-10-30 23:27:53.582718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 2:   N N Y N N N N N 
2018-10-30 23:27:53.582722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 3:   N N N Y N N N N 
2018-10-30 23:27:53.582727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 4:   N N N N Y N N N 
2018-10-30 23:27:53.582740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 5:   N N N N N Y N N 
2018-10-30 23:27:53.582751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 6:   N N N N N N Y N 
2018-10-30 23:27:53.582758: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 7:   N N N N N N N Y 
2018-10-30 23:27:53.582770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: P106-100, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-10-30 23:27:53.582778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: P106-100, pci bus id: 0000:02:00.0, compute capability: 6.1)
2018-10-30 23:27:53.582785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: P106-100, pci bus id: 0000:03:00.0, compute capability: 6.1)
2018-10-30 23:27:53.582791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: P106-100, pci bus id: 0000:04:00.0, compute capability: 6.1)
2018-10-30 23:27:53.582797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:4) -> (device: 4, name: P106-100, pci bus id: 0000:05:00.0, compute capability: 6.1)
2018-10-30 23:27:53.582803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:5) -> (device: 5, name: P106-100, pci bus id: 0000:06:00.0, compute capability: 6.1)
2018-10-30 23:27:53.582809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:6) -> (device: 6, name: P106-100, pci bus id: 0000:09:00.0, compute capability: 6.1)
2018-10-30 23:27:53.582815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:7) -> (device: 7, name: P106-100, pci bus id: 0000:0b:00.0, compute capability: 6.1)
Hello, TensorFlow!
9.
無法正常執行
如果甚麼都裝了,但是一執行就出現
tensorflow-gpu illegal instruction (core dumped)
可以考慮試試看用 anaconda,先到這邊下載並執行
 
https://www.anaconda.com/download/#linux
 
安裝好之後,再執行以下指令建立並啟用 tensorflow 環境
conda create -n tensorflow
conda activate tensorflow
然後安裝 tensorflow-gpu
conda install tensorflow-gpu -n tensorflow
安裝相關套件
conda install -c anaconda matplotlib cudatoolkit _tflow_190_select PIL keras msgpack
10.
建立 GPU 環境
conda create -n tensorflow_gpuenv tensorflow-gpu
conda activate tensorflow_gpuenv
conda install -c anaconda matplotlib cudatoolkit _tflow_190_select PIL keras msgpack 
安裝相關套件
pip install --upgrade pip
pip install glob2 opencv-python
11.
查看 tensorflow 版本
python -c "import tensorflow as tf; print(tf.__version__)"
訪客如要回應,請先 登入
    資料夾 :
    發表時間 :
    2018-10-30 18:56:56
    觀看數 :
    15,032
    發表人 :
    Kuann Hung
    部門 :
    老洪的 IT 學習系統
    QR Code :