ubuntu上安装NCCL

MealHunterMealHunter
1 min read

1.安装nccl的情况

在运行insightface的过程当中会出现以下报错,需要安装nccl

Traceback (most recent call last):
  File "test.py", line 1, in <module>
    import mxnet as mx
  File "/root/miniconda3/envs/insightface/lib/python3.8/site-packages/mxnet/__init__.py", line 23, in <module>
    from .context import Context, current_context, cpu, gpu, cpu_pinned
  File "/root/miniconda3/envs/insightface/lib/python3.8/site-packages/mxnet/context.py", line 23, in <module>
    from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
  File "/root/miniconda3/envs/insightface/lib/python3.8/site-packages/mxnet/base.py", line 356, in <module>
    _LIB = _load_lib()
  File "/root/miniconda3/envs/insightface/lib/python3.8/site-packages/mxnet/base.py", line 347, in _load_lib
    lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
  File "/root/miniconda3/envs/insightface/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libnccl.so.2: cannot open shared object file: No such file or directory

2.下载安装包

到NVDIA的官网https://developer.nvidia.com/nccl/nccl-legacy-downloads下载指定的适配的nccl

勾上之后可以看到历史版本,选择对应的版本,采用本地下载

这里以cuda 10.1为例,你下什么安装包呢?如果你是Ubuntu系统下载后缀为 .deb的压缩文件

3.安装镜像库,怎么安装呢?

如果你是Ubuntu系统,终端输入命令:

sudo dpkg -i nccl-repo-ubuntu1604-2.4.7-ga-cuda10.1_1-1_amd64.deb

4.添加公钥

#使用下面这个指令可能会出现OK,但是公钥添加失败的情况
sudo apt-key add /var/nccl-repo-2.4.7-ga-cuda10.1/7fa2af80.pub

#查看公钥
apt-key list | grep NVIDIA

#可以换这个指令查看公钥
cat /var/nccl-repo-2.4.7-ga-cuda10.1/7fa2af80.pub

然后使用以下指令重新添加公钥

#添加公钥
sudo gpg --import /var/nccl-repo-2.4.7-ga-cuda10.1/7fa2af80.pub

#查看公钥
sudo gpg --list-keys

5. 更新源镜像,怎么更新呢?

如果你是Ubuntu系统,终端输入命令:

sudo apt update

6. 安装nccl,怎么安装呢?

如果你是Ubuntu系统,终端输入命令:

sudo apt-get install libnccl2=2.4.7-1+cuda10.1 libnccl-dev=2.4.7-1+cuda10.1

7.将nccl添加到环境变量中

首先,找到你nccl的安装目录,你问我怎么找?当然是终端输入命令:

whereis nccl

我的是在/usr/include/nccl.h中

然后,终端输入

vim ~/.bashrc

进入该文件,添加如下内容到文件中(添加到最低行):

#设置cuda库的目录
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64 
#将nccl添加到LD_LIBRARY_PATH中
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/include/nccl.h

最后,保存好了,咱还要更新,让配置文件生效啊,终端输入命令:

#环境生效
source ~/.bashrc

# 命令查看环境变量设置是否成功
echo $LD_LIBRARY_PATH
0
Subscribe to my newsletter

Read articles from MealHunter directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

MealHunter
MealHunter