centos6编译安装tensorflow+mkl

前提

安装bazel
教程:https://bazel.build/versions/master/docs/install-compile-source.html
安装gcc4.7或以上版本(tensorflow使用c++11编码推荐最低gcc 4.8.3)
这里推荐使用centos scl源安装devtoolset-3 (gcc-c++ 4.9)
教程:https://www.softwarecollections.org/en/docs/
安装python2.7或更高(可使用epel源安装python3.4)

注意:
tensorflow编译路径，当前账户HOME路径均不能使用NFS文件系统，编译后的文件安装不受此限制.
MKL运行时设置环境变量:
MKL_NUM_THREADS=核数
OMP_NUM_THREADS=核数
KMP_AFFINITY=granularity=fine,compact
限制线程或关闭超线程，否则性能反倒会降低.
此编译过程只适用于redhat6 centos6

准备工作

下载文件

下载tensorflow-1.1.0

wget https://github.com/tensorflow/tensorflow/archive/v1.1.0.zip
unzip v1.1.0.zip -d tensorflow
# or
git clone --recurse-submodules https://github.com/tensorflow/tensorflow.git -b v1.1.0
# 更名为tensorflow-1.1.0-mkl, 可选，个人喜好.
mv tensorflow tensorflow-1.1.0-mkl

下载mklml库,并存放到tensorflow third_party mkl文件夹, tensorflow编译配置时会检测此文件夹下的mkl库文件，事实上这里是自动下载的.但在大陆网络不稳定.....

wget https://github.com/01org/mkl-dnn/releases/download/v0.5/mklml_lnx_2017.0.2.20170209.tgz
cp mklml_lnx_2017.0.2.20170209.tgz tensorflow-1.1.0-mkl/third_party/mkl/mklml_lnx_2017.0.2.20170209.tgz

更改tensorflow配置文件

tensorflow-1.1.0默认并未启用mkl, 且在redhat6/centos6上有兼容问题，因此需要更改部分设置.
更改tensorflow-1.1.0-mkl/configure
找到如下内容(第91行):

## Set up MKL related environment settings
if false; then # Disable building with MKL for now

更改为

## Set up MKL related environment settings
if true; then # Disable building with MKL for now

redhat6/centos6太老，为了顺利运行tensorflow代码，增加librt.so链接项(否则编译正常，但安装后运行时会出现 _pywrap_tensorflow_internal.so: undefined symbol: clock_gettime 等类似链接符号错误)
更改tensorflow-1.1.0-mkl/tensorflow/tensorflow.bzl
找到如下内容(第787行)

def tf_extension_linkopts():
    return []  # No extension link opts

更改为

def tf_extension_linkopts():
    return ["-lrt"]  # No extension link opts

编译tensorflow

cd tensorflow-1.1.0-mkl
# 切换编译器 gcc 4.9.
scl enable devtoolset-3 bash
# 配置tensorflow
./configure
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3.4
Do you wish to build TensorFlow with MKL support? [y/N] y
MKL support will be enabled for TensorFlow
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 
# 不要使用jemalloc,否则后续编译会出错(Centos7/RedHat7无此问题)
Do you wish to use jemalloc as the malloc implementation? [Y/n] n
jemalloc disabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] 
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] 
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] 
No XLA support will be enabled for TensorFlow
Found possible Python library paths:
  /usr/lib/python3.4/site-packages
  /usr/lib64/python3.4/site-packages
Please input the desired Python library path to use.  Default is [/usr/lib/python3.4/site-packages]
Using python library path: /usr/lib/python3.4/site-packages
Do you wish to build TensorFlow with OpenCL support? [y/N] 
No OpenCL support will be enabled for TensorFlow
# 如有NVIDIA显卡并已安装CUDA Toolkit
Do you wish to build TensorFlow with CUDA support? [y/N] y
CUDA support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 
Please specify the location where CUDA  toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 
Please specify the location where cuDNN  library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 5.2
Configuration finished

编译

# 仅CPU,不使用MKL
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
# 仅CPU, 使用MKL(限intel处理器)
bazel build --config=opt --config=mkl //tensorflow/tools/pip_package:build_pip_package
# 仅CPU, 使用MKL，且CPU是Intel XEON或phi处理器
bazel build --config=opt --config=mkl --copt="-DEIGEN_USE_VML" //tensorflow/tools/pip_package:build_pip_package
# 启用CUDA
bazel build --config=opt --config=mkl --copt="-DEIGEN_USE_VML" --config=cuda //tensorflow/tools/pip_package:build_pip_package
# INTEL CPU + CUDA
bazel build --config=opt --config=mkl --config=cuda //tensorflow/tools/pip_package:build_pip_package

生成python whl包

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

安装

sudo pip3 install /tmp/tensorflow_pkg/tensorflow-1.1.0-cp34-cp34m-linux_x86_64.whl

5 comments.

Yao

June 26th, 2019 at 01:37 am

如果我的机器上包含多个phi处理器，请问安装的tensorflow在运行的时候可以自动进行调度在多个phi上并发么？

elfin

August 12th, 2017 at 07:06 pm

您好，感谢您的分享，想向您请教下，基于https://software.intel.com/en-us/articles/tensorflow-optimizations-on-modern-intel-architecture文章，发现用了基于mkl的tensorflow（主要使用了conv2d函数和conv2d_transpose）反而慢了，您有碰到过这个问题吗，谢谢了

1. zhuolin
  
  September 19th, 2017 at 10:08 am
  
  关闭超线程性能会有改善，我也发现mkl并无预期快，应该是tf自己在特定任务和库交互没优化好.
  
JONAS

May 31st, 2017 at 05:36 pm

感謝您這篇詳細的文章!!事實上，這幾天也在CentOS6嘗試安裝tensorflow build from source，但總是在bazel build時顯示缺乏glibc-2.14的錯誤訊息。
想請問您是預設的glibc環境嗎? (2.12?)
還有您bazel是安裝哪個版本呢?(或者可以說明在CentOS6中的安裝方式嗎?)

1. zhuolin
  
  June 10th, 2017 at 04:29 pm
  
  bazel必须从原码编译安装，安装过程直接参考官方 https://bazel.build/versions/master/docs/install-compile-source.html
  使用的版本是bazel release 0.4.5
  服务器上只有glibc-2.12(也就是centos6/redhat6自带)，只通过yum install devtoolset-3-gcc-c++
  其余则是提示碰到了缺少的工具再yum. 除bazel外，未安装任何其他非redhat/centos/epel/rh仓库的组件

centos6编译安装tensorflow+mkl

前提

准备工作

下载文件

更改tensorflow配置文件

编译tensorflow

5 comments.

Add a new comment.

Recent posts

Recent replies

Category

Archive

Other