前提
- 安装bazel
教程:https://bazel.build/versions/master/docs/install-compile-source.html - 安装gcc4.7或以上版本(tensorflow使用c++11编码推荐最低gcc 4.8.3)
这里推荐使用centos scl源安装devtoolset-3 (gcc-c++ 4.9)
教程:https://www.softwarecollections.org/en/docs/ - 安装python2.7或更高(可使用epel源安装python3.4)
注意:
tensorflow编译路径,当前账户HOME路径均不能使用NFS文件系统 ,编译后的文件安装不受此限制.
MKL运行时设置环境变量:
MKL_NUM_THREADS=核数
OMP_NUM_THREADS=核数
KMP_AFFINITY=granularity=fine,compact
限制线程或关闭超线程,否则性能反倒会降低.
此编译过程只适用于redhat6 centos6
准备工作
下载文件
下载tensorflow-1.1.0
wget https://github.com/tensorflow/tensorflow/archive/v1.1.0.zip unzip v1.1.0.zip -d tensorflow # or git clone --recurse-submodules https://github.com/tensorflow/tensorflow.git -b v1.1.0 # 更名为tensorflow-1.1.0-mkl, 可选,个人喜好. mv tensorflow tensorflow-1.1.0-mkl |
下载mklml库,并存放到tensorflow third_party mkl文件夹, tensorflow编译配置时会检测此文件夹下的mkl库文件,事实上这里是自动下载的.但在大陆网络不稳定…..
wget https://github.com/01org/mkl-dnn/releases/download/v0.5/mklml_lnx_2017.0.2.20170209.tgz cp mklml_lnx_2017.0.2.20170209.tgz tensorflow-1.1.0-mkl/third_party/mkl/mklml_lnx_2017.0.2.20170209.tgz |
更改tensorflow配置文件
tensorflow-1.1.0默认并未启用mkl, 且在redhat6/centos6上有兼容问题,因此需要更改部分设置.
更改tensorflow-1.1.0-mkl/configure
找到如下内容(第91行):
## Set up MKL related environment settings if false; then # Disable building with MKL for now |
更改为
## Set up MKL related environment settings if true; then # Disable building with MKL for now |
redhat6/centos6太老,为了顺利运行tensorflow代码,增加librt.so链接项(否则编译正常,但安装后运行时会出现 _pywrap_tensorflow_internal.so: undefined symbol: clock_gettime 等类似链接符号错误)
更改tensorflow-1.1.0-mkl/tensorflow/tensorflow.bzl
找到如下内容(第787行)
def tf_extension_linkopts(): return [] # No extension link opts |
更改为
def tf_extension_linkopts(): return ["-lrt"] # No extension link opts |
编译tensorflow
cd tensorflow-1.1.0-mkl # 切换编译器 gcc 4.9. scl enable devtoolset-3 bash # 配置tensorflow ./configure Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3.4 Do you wish to build TensorFlow with MKL support? [y/N] y MKL support will be enabled for TensorFlow Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: # 不要使用jemalloc,否则后续编译会出错(Centos7/RedHat7无此问题) Do you wish to use jemalloc as the malloc implementation? [Y/n] n jemalloc disabled Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] No Google Cloud Platform support will be enabled for TensorFlow Do you wish to build TensorFlow with Hadoop File System support? [y/N] No Hadoop File System support will be enabled for TensorFlow Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] No XLA support will be enabled for TensorFlow Found possible Python library paths: /usr/lib/python3.4/site-packages /usr/lib64/python3.4/site-packages Please input the desired Python library path to use. Default is [/usr/lib/python3.4/site-packages] Using python library path: /usr/lib/python3.4/site-packages Do you wish to build TensorFlow with OpenCL support? [y/N] No OpenCL support will be enabled for TensorFlow # 如有NVIDIA显卡并已安装CUDA Toolkit Do you wish to build TensorFlow with CUDA support? [y/N] y CUDA support will be enabled for TensorFlow Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: Please specify the location where CUDA toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify the Cudnn version you want to use. [Leave empty to use system default]: Please specify the location where cuDNN library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: "3.5,5.2"]: 5.2 Configuration finished |
编译
# 仅CPU,不使用MKL bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package # 仅CPU, 使用MKL(限intel处理器) bazel build --config=opt --config=mkl //tensorflow/tools/pip_package:build_pip_package # 仅CPU, 使用MKL,且CPU是Intel XEON或phi处理器 bazel build --config=opt --config=mkl --copt="-DEIGEN_USE_VML" //tensorflow/tools/pip_package:build_pip_package # 启用CUDA bazel build --config=opt --config=mkl --copt="-DEIGEN_USE_VML" --config=cuda //tensorflow/tools/pip_package:build_pip_package # INTEL CPU + CUDA bazel build --config=opt --config=mkl --config=cuda //tensorflow/tools/pip_package:build_pip_package |
生成python whl包
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg |
安装
sudo pip3 install /tmp/tensorflow_pkg/tensorflow-1.1.0-cp34-cp34m-linux_x86_64.whl |
感謝您這篇詳細的文章!!事實上,這幾天也在CentOS6嘗試安裝tensorflow build from source,但總是在bazel build時顯示缺乏glibc-2.14的錯誤訊息。
想請問您是預設的glibc環境嗎? (2.12?)
還有您bazel是安裝哪個版本呢?(或者可以說明在CentOS6中的安裝方式嗎?)
bazel必须从原码编译安装,安装过程直接参考官方 https://bazel.build/versions/master/docs/install-compile-source.html
使用的版本是bazel release 0.4.5
服务器上只有glibc-2.12(也就是centos6/redhat6自带),只通过yum install devtoolset-3-gcc-c++
其余则是提示碰到了缺少的工具再yum. 除bazel外,未安装任何其他非redhat/centos/epel/rh仓库的组件
您好,感谢您的分享,想向您请教下,基于https://software.intel.com/en-us/articles/tensorflow-optimizations-on-modern-intel-architecture文章,发现用了基于mkl的tensorflow(主要使用了conv2d函数和conv2d_transpose) 反而慢了,您有碰到过这个问题吗,谢谢了
关闭超线程性能会有改善,我也发现mkl并无预期快,应该是tf自己在特定任务和库交互没优化好.
如果我的机器上包含多个phi处理器,请问安装的tensorflow在运行的时候可以自动进行调度在多个phi上并发么?