How to create a custom parcels in cloudera cluster.
Many of you might have come across a scenario where you wanted to install a package with a consistent version across all nodes in cloudera cluster, for example, if you want a higher python version on all nodes in the cluster what would you do...
It's hard to manage versions of such packages if you have python 2.7 on one node and python3 on another node which would lead to an issue with spark/yarn jobs. In this case, parcels come as a rescue, what if we create a parcel to install/manage python3 on nodes in the cluster, this article guides with the same.
Installing and configuring Miniconda
1. Download miniconda: https://docs.conda.io/en/latest/miniconda.html
2. Install miniconda in /usr/local/miniconda3:
bash Miniconda3-latest-Linux-x86_64.sh
3. Go to the directory:
cd /usr/local/miniconda3
4. Install the wanted python version
bin/conda install python=3.6.10
5. Check python version:
bin/python --version
6. At this point you can install other required python libraries
Prepare the environment to build the parcel
1a. Install build environments:
Install git: yum install -y git
b. Install Java JDK:
yum install -y java-1.8.0-openjdk
c. Install Maven 3:
yum install -y maven
2. Install cm_ext tools:
a. git clone
https://github.com/cloudera/cm_ext.git
b. cd cm_ext/validator
c. mvn package
Creating the Conda Custom Parcel
1. Create a parcel directory and subsequent meta directory as below:
mkdir -p /usr/local/parcels/COND_PYTHON-3.6.10-0
cd /usr/local/parcels/CONDA_PYTHON-3.6.10-0
mkdir meta
(cd /usr/local/miniconda3 && tar cpf - .) | tar xpf -
1. Create a meta/parcel.json file:
{
"schema_version": 1,
"name": “PYTHON_CONDA",
"version": “1”,
"setActiveSymlink": true,
"depends": "",
"replaces": "",
"conflicts": "",
"provides": [
],
"scripts": {
"defines": “python_conda_env.sh"
},
"packages": [
],
"components": [
{ "name" : "miniconda3",
"version" : "4.10.3",
"pkg_version": "4.10.3",
"pkg_release": "4.10.3"
},
{ "name" : "python",
"version" : "3.6.10",
"pkg_version": "3.6.10",
"pkg_release": "3.6.10"
}
],
"users": {
"spark": {
"longname" : "Spark",
"home" : "/var/lib/spark",
"shell" : "/usr/sbin/nologin",
"extra_groups": [ ]
}
},
"groups": []
}
3. Create a meta/python_conda_env.sh file: ( below 2 lines should be in the file)
#!/bin/sh
exit 0
4. Validate everything:
a. java -jar /usr/local/cm_ext/validator/target/validator.jar -p /usr/local/ parcels/CONDA_PYTHON-3.6.10-0/meta/parcel.json
b. java -jar /usr/local/cm_ext/validator/target/validator.jar -d /usr/local/ parcels/CONDA_PYTHON-3.6.10-0/
5. Package the parcel:
a. cd /usr/local/parcels
b. tar zcf /usr/local/parcels/CONDA_PYHTON-3.6.10-0-el7.parcel CONDA_PYTHON-3.6.10-0 --owner=root --group=root
c. java -jar /usr/local/cm_ext/validator/target/validator.jar -f /usr/local/ parcels/CONDA_PYTHON-3.6.10-0-el7.parcel
6. Sign the parcel:
sha1sum < CONDA_PYTHON-3.6.10-0-el7.parcel | cut -d '' -f 1 > CONDA_PYTHON-3.6.10-0-el7.parcel.sha
7. Copy parcels to the /opt/cloudera/parcel-repo dir in the CM Node
8. Change the permissions:
sudo chown cloudera-scm: /opt/cloudera/parcel-repo/CONDA_PYTHON-3.6.10-0-el7.parcel*
9. Go to CM(Cloudera Manager) > Parcels and click on Check for New Parcels
10. After the parcels is detected, click on Distribute and Activate
11. Check python version in all nodes:
/opt/cloudera/parcels/CONDA_PYTHON/bin/python --version
Subscribe to my newsletter
Read articles from Sandip Nikale directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Sandip Nikale
Sandip Nikale
DataOps EngineerTalks about Cloud, DevOps, BigData & AI/ML