TensorFlowとKerasによるディープラーニング①【第2回:GPUコンテナで画像解析〜準備編〜】

【連載企画】GPUコンテナ活用【全6回】
GPUコンテナとは何か？何が便利なのか？【第1回：GPUコンテナで速攻環境構築】
TensorFlowとKerasによるディープラーニング①【第2回:GPUコンテナで画像解析〜準備編〜】
TensorFlowとKerasによるディープラーニング②【第3回:GPUコンテナで画像解析〜実践編〜】
Chainerを使ったディープラーニング【第4回:GPUコンテナで機械学習する】
PyTorchで機械学習【第5回:GPUコンテナでテンソルの基本を理解する】

AI初学者にとってハードルのひとつになっているGPU環境構築をできるだけ効率的に行うために、コンテナを有効活用することを目的に連載をスタートしました。今回はディープラーニングのフレームワークとしてメジャーなTensorFlowを取り上げます。

この連載について
本連載は、こちらの手順で進めています
前回(第1回)の振り返り
- 上記の環境について確認してみましょう。
コンテナからTensorFlowを稼働させる
- GPUに対応しているTensorFlowのDockerイメージをダウンロード
次回予告第3回：TensorFlowとKerasによるディープラーニング②（第3回）

この連載について

本連載は、全6回のシリーズを通してできるだけ効率的に、GPUの環境構築を行うためにコンテナの活用を行っていきます。「機械学習やディープラーニングをGPUで実行してみたいけど難しそう…」など導入にハードルを感じられている方に、コンテナを活用することで、環境構築に要する工数を圧倒的に削減し、即座に課題に取り組むことができるメリットを感じていただきます。そのために必要な知識や操作方法を、当社のGPUサーバーを使い解説していきます。
連載を読み終えるころには、TensorFlowやPyTorchなどのメジャーなフレームワークを使った演習ができるようになっているはずです。

本連載は、こちらの手順で進めています

・GPUコンテナとは何か？何が便利なのか？（第1回）

AI初学者がGPUを使って機械学習やディープラーニングに取り組みたい場合、環境構築に想像以上の工数が発生することがあります。セットアップ作業に要する時間を極力削減するためにコンテナ技術を適用し、コンテナ内からGPUを利用するための準備と手順について紹介します。

●GPUコンテナとは何か？何が便利なのか？（第1回）
https://www.kagoya.jp/howto/cloud/gpu-container1/

・TensorFlowとKerasによるディープラーニング①（第2回）今回の記事

OSS（オープンソースソフトウエア）の機械学習ライブラリの中からTensorFlow（テンソルフロー）、Keras（ケラス）を取り上げ、これらが稼働するコンテナを作成し、コンテナ内からGPUを指定する方法について紹介します。TensorFlowはKerasを取り込む形で公開されていて、ディープラーニングをする際の使い勝手の良さから、多くのユーザーに利用されています。この回ではコンテナ内のTensorFlowでGPUを利用できる状態まで確認します。

・TensorFlowとKerasによるディープラーニング②（第3回）

TensorFlow（テンソルフロー）やTheano（テアノ）／CNTK（Cognitive Toolkit）の複数のバックエンドとして利用可能なKeras（ケラス）を取り上げ、TensorFlowとKerasを使ったディープラーニングを行います。

・Chainerを使ったディープラーニング（第4回）

ディープラーニングのフレームワークとして有名なChainer（チェイナー）の利用方法を紹介します。コンテナからGPUを利用する手順と、GPUによってどれだけ高速化に寄与できたかについて、Pythonのプログラム実行結果を通して確認します。C言語に比べて処理時間がかかると言われているPythonですが、数値計算を効率的に行うための拡張モジュールであるNumPy（ナムパイ）についても取り上げます。

・Pytorchでニューラルネットワーク（第5回）

Pythonの機械学習用フレームワークであるPytorch（パイトーチ）を取り上げます。PyTorchではTensor（テンソル）という型で行列を表現します。Tensorは多次元配列を扱うためのデータ構造であり、GPUをサポートしていることから、Pytorchが稼働するコンテナを利用し、GPUによる高速処理を行う手順について紹介します。

・OpenPoseによる関節点抽出・姿勢推定（第6回）

カメラ画像のAI画像認識と言えば「顔認証」を思い浮かべる人が多いと思いますが、最近は一歩進み、人が映った静止画や動画から関節点抽出・姿勢推定に取り組むケースが増えています。人体、顔、手足などのキーポイントを画像から検出する技術がディープラーニングにより、実用レベルまで向上しているからです。この回ではOpenPose（オープンポーズ）というライブラリをGPU上で動かすコンテナを使い、動画ファイルの関節点抽出手順を紹介します。

前回(第1回)の振り返り

前回は、Dockerを使ってGPUをコンテナ化する作業を行い、下記の環境を構築しました。
【環境】
OS：Ubuntu 16.04 LTS
GPU：NVIDIA Tesla P40
CUDA　version：10.2
Docker version：19.03.2

上記の環境について確認してみましょう。

上記の環境について確認します。

・pciデバイスの確認

lspciコマンドでpciデバイスの情報を確認します。ここでは、grepでnvidiaの文字列に該当するものを検索することで表示する情報を絞り込んでいます。

$ lspci -v | grep -i nvidia
03:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
Subsystem: NVIDIA Corporation Device 11d9
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_440, nvidia_440_drm

Tesla P40を実装したグラフィックボードに関する情報が表示されています。

・NVIDIA Container Toolkitの確認

nvidia-container-cli info コマンドを使って NVIDIA Container Toolkit がインストールされていることを確認します。

$ nvidia-container-cli info
NVRM version: 440.64.00
CUDA version: 10.2

Device Index: 0
Device Minor: 0
Model: Tesla P40
Brand: Tesla
GPU UUID: GPU-ccffd358-4ab2-0dab-10cf-78df33604be1
Bus Location: 00000000:03:00.0
Architecture: 6.1

CUDAのバージョンやGPUのモデル情報が表示されています。

・Dockerのバージョンの確認

docker versionコマンドを実行し、バージョンを確認します。

$ docker version
Client: Docker Engine - Community
Version: 19.03.2
API version: 1.40
Go version: go1.12.8
Git commit: 6a30dfc
Built: Thu Aug 29 05:28:19 2019
OS/Arch: linux/amd64
Experimental: false

Server: Docker Engine - Community
Engine:<br /> Version: 19.03.7
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: 7141c199a2
Built: Wed Mar 4 01:21:22 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8
GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f
docker-init:
Version: 0.18.0
GitCommit: fec3683

Docker クライアント、サーバーの各々についてDockerのバージョン、ビルドされた日時、OSやアーキテクチャに関連する情報が確認できました。

・DockerコンテナからGPUの確認

Docker コンテナ内でnvidia-smiコマンドを実行し、GPUを認識しているか確認します。

$ docker run --gpus all nvidia/cuda nvidia-smi
Fri Jul 17 01:10:29 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P40 Off | 00000000:03:00.0 Off | 0 |
| N/A 27C P8 9W / 250W | 10MiB / 22919MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

Docker コンテナ内からGPU(Tesla P40)を認識できていることが確認できました。

コンテナからTensorFlowを稼働させる

GPUに対応しているTensorFlowのDockerイメージをダウンロード

Docker Hub リポジトリにあるTensorFlow の公式の Docker イメージ(GPU対応)をダウンロードし、実行します。

$ docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu \
python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
2020-07-17 02:13:12.860229: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-17 02:13:12.864310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: Tesla P40 computeCapability: 6.1
coreClock: 1.531GHz coreCount: 30 deviceMemorySize: 22.38GiB deviceMemoryBandwidth: 323.21GiB/s
2020-07-17 02:13:12.864490: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-07-17 02:13:12.866178: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-07-17 02:13:12.867793: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-07-17 02:13:12.868044: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-07-17 02:13:12.869794: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-07-17 02:13:12.870677: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-07-17 02:13:12.874366: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-07-17 02:13:12.876594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-17 02:13:12.876903: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-17 02:13:12.890118: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3499645000 Hz
2020-07-17 02:13:12.891104: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fed98000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-17 02:13:12.891135: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-07-17 02:13:13.012255: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x477cfc0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-17 02:13:13.012312: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P40, Compute Capability 6.1
2020-07-17 02:13:13.015060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: Tesla P40 computeCapability: 6.1
coreClock: 1.531GHz coreCount: 30 deviceMemorySize: 22.38GiB deviceMemoryBandwidth: 323.21GiB/s
2020-07-17 02:13:13.015133: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-07-17 02:13:13.015159: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-07-17 02:13:13.015182: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-07-17 02:13:13.015204: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-07-17 02:13:13.015241: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-07-17 02:13:13.015281: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-07-17 02:13:13.015315: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-07-17 02:13:13.019195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-17 02:13:13.019243: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-07-17 02:13:13.022311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-17 02:13:13.022340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-07-17 02:13:13.022354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-07-17 02:13:13.027246: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21397 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:03:00.0, compute capability: 6.1)
tf.Tensor(1241.9949, shape=(), dtype=float32)

確認のため、REPOSITORYがtensorflow/tensorflowのコンテナイメージを表示します。

$ docker images tensorflow/tensorflow
REPOSITORY              TAG                      IMAGE ID            CREATED             SIZE
tensorflow/tensorflow   latest-gpu               f5ba7a196d56        2 months ago        3.84GB

コンテナイメージができていることが確認できました。

最新の TensorFlow GPU イメージを使用して、コンテナ内で bash シェルセッションを開始します。

$ docker run --gpus all -it tensorflow/tensorflow:latest-gpu bash

________ _______________
___ __/__________________________________ ____/__ /________ __
__ / _ _ \_ __ \_ ___/ __ \_ ___/_ /_ __ /_ __ \_ | /| / /
_ / / __/ / / /(__ )/ /_/ / / _ __/ _ / / /_/ /_ |/ |/ /
/_/ \___//_/ /_//____/ \____//_/ /_/ /_/ \____/____/|__/

WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.

To avoid this, run the container by specifying your user's userid:

$ docker run -u $(id -u):$(id -g) args...

root@0a1280997aba:/#

pythonからGPUを確認します

root@2ec6a2a6b71b:/# python
Python 3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.test.gpu_device_name()

以下のように表示され、最後にdevice:GPU:0と表示されました。

2020-07-17 04:40:26.355144: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-17 04:40:26.369373: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3499645000 Hz
2020-07-17 04:40:26.370812: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f54e8000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-17 04:40:26.370847: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-07-17 04:40:26.375080: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-17 04:40:26.497987: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5032be0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-17 04:40:26.498049: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P40, Compute Capability 6.1
2020-07-17 04:40:26.500908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: Tesla P40 computeCapability: 6.1
coreClock: 1.531GHz coreCount: 30 deviceMemorySize: 22.38GiB deviceMemoryBandwidth: 323.21GiB/s
2020-07-17 04:40:26.501319: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-07-17 04:40:26.504781: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-07-17 04:40:26.507834: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-07-17 04:40:26.508346: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-07-17 04:40:26.511885: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-07-17 04:40:26.513896: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-07-17 04:40:26.521098: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-07-17 04:40:26.525612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-17 04:40:26.525668: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-07-17 04:40:26.530543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-17 04:40:26.530574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-07-17 04:40:26.530585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-07-17 04:40:26.534811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:0 with 21397 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:03:00.0, compute capability: 6.1)
'/device:GPU:0'

最後のdevice:GPU:0という表示はGPUが正しくアサインされていることを示しており、コンテナのPythonシェル内からGPUを利用できる状態になっていることを示しています。
（TensorFlowがGPUを利用できる状態です）

次回予告第3回：TensorFlowとKerasによるディープラーニング②（第3回）

TensorFlowのラッパーとして親和性の高いKerasを使ったディープラーニングに取り組みたいと思います。

HPCサービス SX-Aurora TSUBASA クラウド

NECのスーパーコンピューター「SX-Aurora TSUBASA」をクラウド環境でご利用できる業界随一のサービスです。

世界トップクラスのスペックで大規模データの高速処理を実現するベクトル型スーパーコンピューターを、月額定額料金のクラウドサービスとして利用できます。