
安裝 Nvidia Driver
可以到這邊找適合的 driver: https://www.nvidia.com/Download/index.aspx?lang=en-us
sudo add-apt-repository ppa:graphics-drivers/ppa -y sudo apt -y update sudo apt install -y nvidia-384
不過有可能遇到一些狀況,比如在 x-server 下無法安裝的問題
可以按 [ctrk] + [alt] + [F1] 進入文字模式
然後關閉 x server
sudo service lightdm stop sudo init 3
這個時候才開始安裝 nvidia 的 .run 檔案
如果還是遇到問題,那可能就要把 nouveau 停用
vi /etc/modprobe.d/blacklist-nouveau.conf # 加入 blacklist nouveau options nouveau modeset=0 # 然後執行 sudo update-initramfs -u 設定好後就可以重開機了!!
安裝 CUDA
到這邊下載 CUDA
然後執行以下命令
sudo dpkg -i cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub sudo apt-get -y update sudo apt-get -y install cuda libcupti-dev
增加環境變數
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
安裝 cuDNN v7.0
然後解開安裝
sudo dpkg -i libcudnn7_7.4.1.5-1+cuda9.2_amd64.deb
安裝 python 跟 virtual-env
sudo apt-get install -y python-pip python-dev python-virtualenv
建立一個虛擬的 python 環境
virtualenv --system-site-packages ~/tensorflow
Active 該虛擬環境
source ~/tensorflow/bin/activate
安裝 NVIDIA TensorRT 3.0
安裝 tensorrt
sudo dpkg -i nv-tensorrt-repo-ubuntu1604-cuda9.0-trt5.0.0.10-rc-20180906_1-1_amd64.deb sudo apt-get update sudo apt-get install tensorrt libcudnn7 sudo apt-get install uff-converter-tf graphsurgeon-tf sudo apt-get install libcudnn7=7.3.0.29-1+cuda9.0 libcudnn7-dev=7.3.0.29-1+cuda9.0 sudo apt-mark hold libcudnn7 libcudnn7-dev
安裝 PyCUDA
pip install 'pycuda>=2017.1.1'
或是用 tar 的方式安裝
tar -zxvf TensorRT-5.0.0.10.Ubuntu-16.04.4.x86_64-gnu.cuda-10.0.cudnn7.3.tar.gz export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:TensorRT-5.0.0.10/lib cd TensorRT-5.0.0.10/python sudo pip2 install tensorrt-5.0.0.10-py2.py3-none-any.whl cd TensorRT-5.0.0.10/uff sudo pip2 install uff-0.5.1-py2.py3-none-any.whl cd TensorRT-5.0.0.10/graphsurgeon sudo pip2 install graphsurgeon-0.2.2-py2.py3-none-any.whl
安裝 gpu 版本的 tensorflow
先升級 pip 版本
easy_install -U pip
安裝 tensorflow
pip install --upgrade tensorflow-gpu
如果裝了有問題,那就是是看用 1.5 版
pip install tensorflow-gpu==1.5
Hello World
如果都好了,那就跑個程式試試看是否正確吧!
# Python import tensorflow as tf hello = tf.constant('Hello, TensorFlow!') sess = tf.Session()
如果你的輸出有看到 GPU,那應該就沒問題了!~
以下是我的輸出 (炫耀文)
2018-10-30 23:27:53.582391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Device peer to peer matrix 2018-10-30 23:27:53.582698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1126] DMA: 0 1 2 3 4 5 6 7 2018-10-30 23:27:53.582708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 0: Y N N N N N N N 2018-10-30 23:27:53.582713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 1: N Y N N N N N N 2018-10-30 23:27:53.582718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 2: N N Y N N N N N 2018-10-30 23:27:53.582722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 3: N N N Y N N N N 2018-10-30 23:27:53.582727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 4: N N N N Y N N N 2018-10-30 23:27:53.582740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 5: N N N N N Y N N 2018-10-30 23:27:53.582751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 6: N N N N N N Y N 2018-10-30 23:27:53.582758: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 7: N N N N N N N Y 2018-10-30 23:27:53.582770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: P106-100, pci bus id: 0000:01:00.0, compute capability: 6.1) 2018-10-30 23:27:53.582778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: P106-100, pci bus id: 0000:02:00.0, compute capability: 6.1) 2018-10-30 23:27:53.582785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: P106-100, pci bus id: 0000:03:00.0, compute capability: 6.1) 2018-10-30 23:27:53.582791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: P106-100, pci bus id: 0000:04:00.0, compute capability: 6.1) 2018-10-30 23:27:53.582797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:4) -> (device: 4, name: P106-100, pci bus id: 0000:05:00.0, compute capability: 6.1) 2018-10-30 23:27:53.582803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:5) -> (device: 5, name: P106-100, pci bus id: 0000:06:00.0, compute capability: 6.1) 2018-10-30 23:27:53.582809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:6) -> (device: 6, name: P106-100, pci bus id: 0000:09:00.0, compute capability: 6.1) 2018-10-30 23:27:53.582815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:7) -> (device: 7, name: P106-100, pci bus id: 0000:0b:00.0, compute capability: 6.1) Hello, TensorFlow!
無法正常執行
如果甚麼都裝了,但是一執行就出現
tensorflow-gpu illegal instruction (core dumped)
可以考慮試試看用 anaconda,先到這邊下載並執行
https://www.anaconda.com/download/#linux
安裝好之後,再執行以下指令建立並啟用 tensorflow 環境
conda create -n tensorflow conda activate tensorflow
然後安裝 tensorflow-gpu
conda install tensorflow-gpu -n tensorflow
安裝相關套件
conda install -c anaconda matplotlib cudatoolkit _tflow_190_select PIL keras msgpack
建立 GPU 環境
conda create -n tensorflow_gpuenv tensorflow-gpu conda activate tensorflow_gpuenv conda install -c anaconda matplotlib cudatoolkit _tflow_190_select PIL keras msgpack
安裝相關套件
pip install --upgrade pip pip install glob2 opencv-python
查看 tensorflow 版本
python -c "import tensorflow as tf; print(tf.__version__)"