📚 UbuntuにDockerをインストールしてGPUを認識させる 2022/09/04 に公開 ツイート Docker NVIDIA # CUDA tech Ubuntu20.04にDockerをインストールして、コンテナ内でNVIDIAのGPUを認識させるまでのメモ。ちなみにGPUはNVIDIA RTX3070の環境でやってます。 NVIDIAドライバインストール とりあえず最新版入れればOK > sudo add-apt-repository ppa:graphics-drivers/ppa > sudo apt update > sudo apt search nvidia-driver- nvidia-driver-515/focal-security 515.86.01-0ubuntu0.20.04.1 amd64 [upgradable from: 515.65.01-0ubuntu0.20.04.1] NVIDIA driver metapackage > sudo apt install nvidia-driver-515 > sudo reboot CUDAインストール Dockerで使う場合はいらないが、ホスト側でも使いたいのでとりあえずインストールしている。バージョンはフレームワークに合わせる必要があるので、ver11.2をインストールした。 (なので、Dockerでやったほうがいい) > wget https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run > sudo sh cuda_11.2.2_460.32.03_linux.run .bashrcに追加 > vi ~/.bashrc export CUDA_HOME=/usr/local/cuda export PATH=$PATH:$CUDA_HOME/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib64 > source ~/.bashrc > nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Built on Sun_Feb_14_21:12:58_PST_2021 Cuda compilation tools, release 11.2, V11.2.152 Build cuda_11.2.r11.2/compiler.29618528_0 コマンドプロンプトで確認 > nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Built on Mon_May__3_19:41:42_Pacific_Daylight_Time_2021 Cuda compilation tools, release 11.3, V11.3.109 Build cuda_11.3.r11.3/compiler.29920130_0 cuDNNのインストール cuDNNはCUDAバージョンに対応したものを入れないと動作しない </span><a href="https://qiita.com/konzo_/items/a6f2e8818e5e8fcdb896" style="" target="_blank" rel="nofollow noopener noreferrer">https://qiita.com/konzo_/items/a6f2e8818e5e8fcdb896</a></p> <p>NVIDIAアカウントでログインする必要がある。</p> <p><span class="embed-block zenn-embedded zenn-embedded-card"><iframe id="zenn-embedded__1451c223b2a1d" src="https://embed.zenn.studio/card#zenn-embedded__1451c223b2a1d" data-content="https%3A%2F%2Fdeveloper.nvidia.com%2Fcompute%2Fmachine-learning%2Fcudnn%2Fsecure%2F8.1.1.33%2F11.2_20210301%2Fcudnn-11.2-linux-x64-v8.1.1.33.tgz" frameborder="0" scrolling="no" loading="lazy"/></span><a href="https://developer.nvidia.com/compute/machine-learning/cudnn/secure/8.1.1.33/11.2_20210301/cudnn-11.2-linux-x64-v8.1.1.33.tgz" style="" target="_blank" rel="nofollow noopener noreferrer">https://developer.nvidia.com/compute/machine-learning/cudnn/secure/8.1.1.33/11.2_20210301/cudnn-11.2-linux-x64-v8.1.1.33.tgz</a></p> <div class="code-block-container"><pre class="language-bash"><code class="language-bash"><span class="token operator">></span> <span class="token function">tar</span> xzvf cudnn-11.2-linux-x64-v8.1.1.33.tgz <span class="token operator">></span> <span class="token function">sudo</span> <span class="token function">cp</span> cuda/include/cudnn.h /usr/local/cuda/include <span class="token operator">></span> <span class="token function">sudo</span> <span class="token function">cp</span> <span class="token parameter variable">-P</span> cuda/lib64/libcudnn /usr/local/cuda/lib64 <span class="token operator">></span> <span class="token function">sudo</span> <span class="token function">chmod</span> a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn <a class="header-anchor-link" href="#docker%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%BC%E3%83%AB" aria-hidden="true"/> Dockerインストール</h3> <p>NVIDIAドライバ関連がインストールできたらDockerをインストールする</p> <div class="code-block-container"><pre class="language-bash"><code class="language-bash"><span class="token operator">></span> <span class="token function">sudo</span> <span class="token function">apt</span> update <span class="token operator">&&</span> <span class="token function">sudo</span> <span class="token function">apt</span> <span class="token function">install</span> ca-certificates <span class="token function">curl</span> gnupg lsb-release <span class="token operator">></span> <span class="token function">sudo</span> <span class="token function">mkdir</span> <span class="token parameter variable">-m</span> 0755 <span class="token parameter variable">-p</span> /etc/apt/keyrings <span class="token operator">&&</span> <span class="token function">curl</span> <span class="token parameter variable">-fsSL</span> https://download.docker.com/linux/ubuntu/gpg <span class="token operator">|</span> <span class="token function">sudo</span> gpg <span class="token parameter variable">--dearmor</span> <span class="token parameter variable">-o</span> /etc/apt/keyrings/docker.gpg <span class="token operator">></span> <span class="token builtin class-name">echo</span> <span class="token punctuation">\</span> <span class="token string">"deb [arch=<span class="token variable"><span class="token variable">$(</span>dpkg --print-architecture<span class="token variable">)</span></span> signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ <span class="token variable"><span class="token variable">$(</span>lsb_release <span class="token parameter variable">-cs</span><span class="token variable">)</span></span> stable"</span> <span class="token operator">|</span> <span class="token function">sudo</span> <span class="token function">tee</span> /etc/apt/sources.list.d/docker.list <span class="token operator">></span> /dev/null <span class="token operator">></span> <span class="token function">sudo</span> <span class="token function">apt</span> update <span class="token operator">&&</span> <span class="token function">sudo</span> <span class="token function">apt-get</span> <span class="token function">install</span> docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin </code></pre></div><div class="code-block-container"><pre class="language-bash"><code class="language-bash"><span class="token operator">></span> <span class="token function">docker</span> <span class="token parameter variable">-v</span> Docker version <span class="token number">23.0</span>.1, build a5ee5b1 <span class="token operator">></span> <span class="token function">docker</span> compose version Docker Compose version v2.16.0 <a class="header-anchor-link" href="#nvidia-container-toolkit%E3%81%AE%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%BC%E3%83%AB" aria-hidden="true"/> NVIDIA Container Toolkitのインストール</h3> <p>Dockerだけではダメで、これを入れる必要がある。</p> <p><span class="embed-block zenn-embedded zenn-embedded-card"><iframe id="zenn-embedded__5a56b1d070305" src="https://embed.zenn.studio/card#zenn-embedded__5a56b1d070305" data-content="https%3A%2F%2Fdocs.nvidia.com%2Fdatacenter%2Fcloud-native%2Fcontainer-toolkit%2Finstall-guide.html%23setting-up-nvidia-container-toolkit" frameborder="0" scrolling="no" loading="lazy"/></span><a href="https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit" style="" target="_blank" rel="nofollow noopener noreferrer">https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit</a></p> <div class="code-block-container"><pre class="language-bash"><code class="language-bash"><span class="token assign-left variable">distribution</span><span class="token operator">=</span><span class="token variable"><span class="token variable">$(</span><span class="token builtin class-name">.</span> /etc/os-release<span class="token punctuation">;</span><span class="token builtin class-name">echo</span> $ID$VERSION_ID<span class="token variable">)</span></span> <span class="token punctuation">\</span> <span class="token operator">&&</span> <span class="token function">curl</span> <span class="token parameter variable">-fsSL</span> https://nvidia.github.io/libnvidia-container/gpgkey <span class="token operator">|</span> <span class="token function">sudo</span> gpg <span class="token parameter variable">--dearmor</span> <span class="token parameter variable">-o</span> /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg <span class="token punctuation">\</span> <span class="token operator">&&</span> <span class="token function">curl</span> <span class="token parameter variable">-s</span> <span class="token parameter variable">-L</span> https://nvidia.github.io/libnvidia-container/<span class="token variable">$distribution</span>/libnvidia-container.list <span class="token operator">|</span> <span class="token punctuation">\</span> <span class="token function">sed</span> <span class="token string">'s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g'</span> <span class="token operator">|</span> <span class="token punctuation">\</span> <span class="token function">sudo</span> <span class="token function">tee</span> /etc/apt/sources.list.d/nvidia-container-toolkit.list <span class="token operator">></span> <span class="token function">sudo</span> <span class="token function">apt</span> update <span class="token operator">&&</span> <span class="token function">sudo</span> <span class="token function">apt</span> <span class="token function">install</span> <span class="token parameter variable">-y</span> nvidia-container-toolkit </code></pre></div><p>以下の<strong>nvidia-ctk</strong>コマンドを実行するのが重要で、docker composeで<code>runtime: nvidia</code>を指定すると「そんなサービス知らない」と言われた記憶がある。</p> <p>ここまで来たら、Dockerを再起動してインストールは完了。</p> <div class="code-block-container"><pre class="language-bash"><code class="language-bash"><span class="token operator">></span> <span class="token function">sudo</span> nvidia-ctk runtime configure <span class="token parameter variable">--runtime</span><span class="token operator">=</span>docker INFO<span class="token punctuation">[</span>0000<span class="token punctuation">]</span> Loading <span class="token function">docker</span> config from /etc/docker/daemon.json INFO<span class="token punctuation">[</span>0000<span class="token punctuation">]</span> Config <span class="token function">file</span> does not exist, creating new one INFO<span class="token punctuation">[</span>0000<span class="token punctuation">]</span> Flushing <span class="token function">docker</span> config to /etc/docker/daemon.json INFO<span class="token punctuation">[</span>0000<span class="token punctuation">]</span> Successfully flushed config INFO<span class="token punctuation">[</span>0000<span class="token punctuation">]</span> Wrote updated config to /etc/docker/daemon.json INFO<span class="token punctuation">[</span>0000<span class="token punctuation">]</span> It is recommended that the <span class="token function">docker</span> daemon be restarted. <span class="token operator">></span> <span class="token function">cat</span> /etc/docker/daemon.json <span class="token string">"runtimes"</span><span class="token builtin class-name">:</span> <span class="token punctuation">{</span> <span class="token string">"nvidia"</span><span class="token builtin class-name">:</span> <span class="token punctuation">{</span> <span class="token string">"args"</span><span class="token builtin class-name">:</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>, <span class="token string">"path"</span><span class="token builtin class-name">:</span> <span class="token string">"nvidia-container-runtime"</span> <span class="token operator">></span> <span class="token function">sudo</span> systemctl restart <span class="token function">docker</span> <a class="header-anchor-link" href="#%E3%82%B3%E3%83%B3%E3%83%86%E3%83%8A%E5%86%85%E3%81%8B%E3%82%89gpu%E3%83%99%E3%83%B3%E3%83%81%E3%83%9E%E3%83%BC%E3%82%AF%E3%82%92%E8%A8%88%E6%B8%AC%E3%81%97%E3%81%A6%E3%81%BF%E3%82%8B" aria-hidden="true"/> コンテナ内からGPUベンチマークを計測してみる</h3> <p>認識できてなければ、計測できない。</p> <p><span class="embed-block zenn-embedded zenn-embedded-card"><iframe id="zenn-embedded__9cddb3a5b5ce8" src="https://embed.zenn.studio/card#zenn-embedded__9cddb3a5b5ce8" data-content="https%3A%2F%2Fzenn.dev%2Fusagi1975%2Farticles%2F2023-02-18-1400_docker-gpu-bench" frameborder="0" scrolling="no" loading="lazy"/></span><a href="https://zenn.dev/usagi1975/articles/2023-02-18-1400_docker-gpu-bench" style="" target="_blank">https://zenn.dev/usagi1975/articles/2023-02-18-1400_docker-gpu-bench</a></p> <div class="code-block-container"><pre class="language-bash"><code class="language-bash"><span class="token operator">></span> <span class="token function">docker</span> run <span class="token parameter variable">--rm</span> <span class="token parameter variable">--gpus</span> all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody <span class="token parameter variable">-gpu</span> <span class="token parameter variable">-benchmark</span> <span class="token parameter variable">-numbodies</span><span class="token operator">=</span><span class="token number">640000</span> </code></pre></div><div class="code-block-container"><pre class="language-bash"><code class="language-bash">GPU Device <span class="token number">0</span>: <span class="token string">"Ampere"</span> with compute capability <span class="token number">8.6</span> <span class="token operator">></span> Compute <span class="token number">8.6</span> CUDA device: <span class="token punctuation">[</span>NVIDIA GeForce RTX <span class="token number">3070</span><span class="token punctuation">]</span> number of bodies <span class="token operator">=</span> <span class="token number">640000</span> <span class="token number">640000</span> bodies, total <span class="token function">time</span> <span class="token keyword">for</span> <span class="token number">10</span> iterations: <span class="token number">7012.648</span> ms <span class="token operator">=</span> <span class="token number">584.087</span> billion interactions per second <span class="token operator">=</span> <span class="token number">11681.750</span> single-precision GFLOP/s at <span class="token number">20</span> flops per interaction <a class="header-anchor-link" href="#(%E3%81%9D%E3%81%AE%E4%BB%96%EF%BC%89tensorflow%E3%81%AE%E3%82%B3%E3%83%B3%E3%83%86%E3%83%8A%E3%81%A7%E7%A2%BA%E8%AA%8D" aria-hidden="true"/> (その他)Tensorflowのコンテナで確認</h2> <div class="code-block-container"><pre class="language-bash"><code class="language-bash"><span class="token operator">></span> <span class="token function">docker</span> run <span class="token parameter variable">--gpus</span> all <span class="token parameter variable">-it</span> <span class="token parameter variable">--rm</span> tensorflow/tensorflow:2.8.1-gpu <span class="token function">bash</span> ________ _______________ ___ __/__________________________________ ____/__ /________ __ __ / _ _ <span class="token punctuation">\</span>_ __ <span class="token punctuation">\</span>_ ___/ __ <span class="token punctuation">\</span>_ ___/_ /_ __ /_ __ <span class="token punctuation">\</span>_ <span class="token operator">|</span> /<span class="token operator">|</span> / / _ / / __/ / / /<span class="token punctuation">(</span>__ <span class="token punctuation">)</span>/ /_/ / / _ __/ _ / / /_/ /_ <span class="token operator">|</span>/ <span class="token operator">|</span>/ / /_/ <span class="token punctuation">\</span>___//_/ /_//____/ <span class="token punctuation">\</span>____//_/ /_/ /_/ <span class="token punctuation">\</span>____/____/<span class="token operator">|</span>__/ <span class="token operator">></span> python <span class="token operator">>></span><span class="token operator">></span> from tensorflow.python.client <span class="token function">import</span> device_lib <span class="token operator">>></span><span class="token operator">></span> print<span class="token punctuation">(</span>device_lib.list_local_devices<span class="token punctuation">(</span><span class="token punctuation">))</span> incarnation: <span class="token number">6470249754415725170</span> physical_device_desc: <span class="token string">"device: 0, name: NVIDIA GeForce RTX 3070, pci bus id: 0000:07:00.0, compute capability: 8.6"</span> xla_global_id: <span class="token number">416903419</span> <a class="header-anchor-link" href="#(%E3%81%9D%E3%81%AE%E4%BB%96%EF%BC%89pytorch%E3%81%AE%E3%82%B3%E3%83%B3%E3%83%86%E3%83%8A%E3%81%A7%E7%A2%BA%E8%AA%8D" aria-hidden="true"/> (その他)PyTorchのコンテナで確認</h2> <div class="code-block-container"><pre class="language-bash"><code class="language-bash"><span class="token operator">></span> <span class="token function">docker</span> pull pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel <span class="token operator">></span> <span class="token function">docker</span> run <span class="token parameter variable">-it</span> <span class="token parameter variable">--rm</span> <span class="token parameter variable">--gpus</span> all pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel <span class="token function">bash</span> <span class="token operator">></span> python <span class="token operator">>></span><span class="token operator">></span> <span class="token function">import</span> torch <span class="token operator">>></span><span class="token operator">></span> torch.cuda.is_available<span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">>></span><span class="token operator">></span> torch.cuda.device_count<span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">>></span><span class="token operator">></span> torch.cuda.current_device<span class="token punctuation">(</span><span class="token punctuation">)</span>