ubuntu上安装NCCL

1 min read
1.安装nccl的情况
在运行insightface的过程当中会出现以下报错,需要安装nccl
Traceback (most recent call last):
File "test.py", line 1, in <module>
import mxnet as mx
File "/root/miniconda3/envs/insightface/lib/python3.8/site-packages/mxnet/__init__.py", line 23, in <module>
from .context import Context, current_context, cpu, gpu, cpu_pinned
File "/root/miniconda3/envs/insightface/lib/python3.8/site-packages/mxnet/context.py", line 23, in <module>
from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
File "/root/miniconda3/envs/insightface/lib/python3.8/site-packages/mxnet/base.py", line 356, in <module>
_LIB = _load_lib()
File "/root/miniconda3/envs/insightface/lib/python3.8/site-packages/mxnet/base.py", line 347, in _load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
File "/root/miniconda3/envs/insightface/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libnccl.so.2: cannot open shared object file: No such file or directory
2.下载安装包
到NVDIA的官网https://developer.nvidia.com/nccl/nccl-legacy-downloads下载指定的适配的nccl
勾上之后可以看到历史版本,选择对应的版本,采用本地下载
这里以cuda 10.1为例,你下什么安装包呢?如果你是Ubuntu系统下载后缀为 .deb的压缩文件
3.安装镜像库,怎么安装呢?
如果你是Ubuntu系统,终端输入命令:
sudo dpkg -i nccl-repo-ubuntu1604-2.4.7-ga-cuda10.1_1-1_amd64.deb
4.添加公钥
#使用下面这个指令可能会出现OK,但是公钥添加失败的情况
sudo apt-key add /var/nccl-repo-2.4.7-ga-cuda10.1/7fa2af80.pub
#查看公钥
apt-key list | grep NVIDIA
#可以换这个指令查看公钥
cat /var/nccl-repo-2.4.7-ga-cuda10.1/7fa2af80.pub
然后使用以下指令重新添加公钥
#添加公钥
sudo gpg --import /var/nccl-repo-2.4.7-ga-cuda10.1/7fa2af80.pub
#查看公钥
sudo gpg --list-keys
5. 更新源镜像,怎么更新呢?
如果你是Ubuntu系统,终端输入命令:
sudo apt update
6. 安装nccl,怎么安装呢?
如果你是Ubuntu系统,终端输入命令:
sudo apt-get install libnccl2=2.4.7-1+cuda10.1 libnccl-dev=2.4.7-1+cuda10.1
7.将nccl添加到环境变量中
首先,找到你nccl的安装目录,你问我怎么找?当然是终端输入命令:
whereis nccl
我的是在/usr/include/nccl.h中
然后,终端输入
vim ~/.bashrc
进入该文件,添加如下内容到文件中(添加到最低行):
#设置cuda库的目录
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64
#将nccl添加到LD_LIBRARY_PATH中
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/include/nccl.h
最后,保存好了,咱还要更新,让配置文件生效啊,终端输入命令:
#环境生效
source ~/.bashrc
# 命令查看环境变量设置是否成功
echo $LD_LIBRARY_PATH
0
Subscribe to my newsletter
Read articles from MealHunter directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
