Control node setup
This page allows you to configure the control node used to deploy TDP. It can be used for both manual deployment or with TDP Manager.
Prerequisites
The only prerequisite to configure the control node is to have access to Ansible. The following operating systems have been tested:
- Rocky 8
- AlmaLinux 8
The installation can be done on other operating systems, but some features may not be available.
Note: In the context of TDP Manager installation, it is recommended to use the version of Ansible contained in the TDP Manager virtual environment.
Ansible configuration
The Ansible configuration file, ansible.cfg
, requires some specificities specific to TDP:
[defaults]
inventory=/path/to/tdp/inventory ; REQUIRED path to the directory containing the Ansible inventory.
collections_paths=/path/to/tdp/collections/ ; RECOMMENDED paths to the directories containing the Ansible collections. The first directory of this path is used by Ansible galaxy to install collections.
display_skipped_hosts=False ; RECOMMENDED to avoid displaying skipped Ansible tasks and cluttering the logs.
any_errors_fatal=True ; REQUIRED to stop Ansible execution as soon as an error occurs and prevent Ansible from continuing on the remaining hosts.
; RECOMMENDED to significantly speed up playbook launches, as installing TDP involves many calls to the `ansible-playbook` command.
[inventory]
cache = true
cache_plugin = jsonfile
cache_timeout = 7200
cache_connection = .cache
; REQUIRED to activate the `tdp_vars` plugin, which builds the variables used by TDP's Ansible collections.
[tdp]
vars = tdp_vars
; REQUIRED to activate the switch to the `root` user for all Ansible tasks.
[privilege_escalation]
become=True
become_user=root
Refer to the official documentation for more information on configuration options and the location of the ansible.cfg
file.
Collections installation
TDP components are installed through different Ansible collections, using Ansible Galaxy. They are stored in the first directory specified by the collections_paths
property of the ansible.cfg
file.
Desired collections should be specified in a requirements.yml
file. The core collection is mandatory.
collections:
# Core
# Contains the main services (HDFS, YARN, Hive, etc.) as well as the plugins needed to manage variables.
- name: community.general
version: 7.1.0
- name: https://github.com/TOSIT-IO/tdp-collection
type: git
version: master
# Extras (Optional)
# Contains components dedicated to observability (Prometheus, Grafana, etc.)
- name: https://github.com/TOSIT-IO/tdp-collection-extras
type: git
version: master
# Observability (Optional)
# Contains components dedicated to observability (Prometheus, Grafana, etc.)
- name: https://github.com/TOSIT-IO/tdp-observability
type: git
version: main
- name: community.grafana
version: 1.5.4
Collections can then be installed using the ansible-galaxy
command:
ansible-galaxy install -r requirements.yml
Additional dependencies for the observabiity collection
If installed, TDP observability requires additional Python dependencies. They can be added with the following command:
pip install -r /path/to/tdp/collections/ansible_collections/tosit/tdp_observability/requirements.txt
Definition of the Ansible inventory
The Ansible inventory defines the hosts on which to deploy TDP, the distribution of components on these hosts, and the use of any plugins. For TDP, the inventory is organized as follows:
/path/to/tdp/inventory/
├── group_vars/
│ └── all.yml
├── hosts
├── topologies/
│ ├── 01_tdp
│ ├── tdp_extra
│ ├── tdp_observability
│ └── tdp_prerequisites
└── tdp_vars.yml
Installation of the tdp_vars
plugin
The tdp_vars
plugin builds the variables used by TDP’s Ansible collections. It is installed by creating the file /path/to/tdp/inventory/tdp_vars.yml
with the following content:
plugin: tosit.tdp.tdp_vars
Global variables
Some variables must be common to all hosts. They are specified in the file group_vars/all.yml
.
Create the file /path/to/tdp/inventory/group_vars/all.yml
by completing the following values:
---
domain: # DNS domain of the cluster
realm: # Kerberos realm name
kerberos_admin_principal: # Kerberos administrator ID
kerberos_admin_password: # Kerberos administrator password
kadmin_principal: # Kerberos administrator ID
kadmin_password: # Kerberos administrator password
For example:
---
domain: tdp
realm: TDP.LOCAL
kerberos_admin_principal: 'admin/admin@{{ realm }}'
kerberos_admin_password: admin123
kadmin_principal: '{{ kerberos_admin_principal }}'
kadmin_password: '{{ kerberos_admin_password }}'
Hosts
The hosts
file defines the hosts on which to deploy TDP. It is an Ansible inventory file.
The inventory file must define the following groups:
edge
master
master1
master2
master3
worker
Create a file /path/to/tdp/inventory/hosts
adapted to your infrastructure by completing the following content:
[edge]
; List of hosts in the edge group
[master1]
; List of hosts in the master1 group
[master2]
; List of hosts in the master2 group
[master3]
; List of hosts in the master3 group
[master:children]
master1
master2
master3
[worker]
; List of hosts in the worker group
For example, for a cluster of 6 machines (1 edge, 3 masters, and 2 workers), the hosts
file will be as follows:
[edge]
edge-01 ansible_host=edge01.tdp.local
[master1]
master-01 ansible_host=master01.tdp.local
[master2]
master-02 ansible_host=master02.tdp.local
[master3]
master-03 ansible_host=master03.tdp.local
[master:children]
master1
master2
master3
[worker]
worker-01 ansible_host=worker01.tdp.local
worker-02 ansible_host=worker02.tdp.local
Topologies
The topology files define the organization of the components (HDFS NameNode, Hive Metastore, etc.) mapped on their different hosts. They are nothing more than Ansible inventory files.
Each collection comes with an example topology file (topology.ini
file).
tdp-collection
is the main collection. Its topology file must be read first. To do this, it is recommended to prefix its name with 01-
.
Copy the topology file from tdp-collection
to the /path/to/tdp/inventory/topologies
directory:
cp /path/to/tdp/collections/ansible_collections/tosit/tdp/topology.ini /path/to/tdp/inventory/topologies/01-tdp
If other collections are installed, their topologies must also be added so that they are taken into account. The topologies of the other collections can rely on the group names defined in the tdp-collection
topology.
cp /path/to/tdp/collections/ansible_collections/tosit/tdp_extra/topology.ini /path/to/tdp/inventory/topologies/extra
cp /path/to/tdp/collections/ansible_collections/tosit/tdp_observability/topology.ini /path/to/tdp/inventory/topologies/observability
These files are to be edited according to the desired component distribution.
For example, the following topology allows you to deploy a Hive Metastore on the hosts of the master2
and master3
groups:
[hive_ms:children]
master2
master3
Next steps
The control node is now configured to deploy TDP.
It is now necessary to prepare the managed nodes to host the deployment. The list of prerequisites is available on the managed nodes page.