Control node setup

This page allows you to configure the control node used to deploy TDP. It can be used for both manual deployment or with TDP Manager.

Prerequisites

The only prerequisite to configure the control node is to have access to Ansible. The following operating systems have been tested:

  • Rocky 8
  • AlmaLinux 8

The installation can be done on other operating systems, but some features may not be available.

Note: In the context of TDP Manager installation, it is recommended to use the version of Ansible contained in the TDP Manager virtual environment.

Ansible configuration

The Ansible configuration file, ansible.cfg, requires some specificities specific to TDP:

[defaults]
inventory=/path/to/tdp/inventory ; REQUIRED path to the directory containing the Ansible inventory.
collections_paths=/path/to/tdp/collections/ ; RECOMMENDED paths to the directories containing the Ansible collections. The first directory of this path is used by Ansible galaxy to install collections.
display_skipped_hosts=False ; RECOMMENDED to avoid displaying skipped Ansible tasks and cluttering the logs.
any_errors_fatal=True ; REQUIRED to stop Ansible execution as soon as an error occurs and prevent Ansible from continuing on the remaining hosts.

; RECOMMENDED to significantly speed up playbook launches, as installing TDP involves many calls to the `ansible-playbook` command.
[inventory]
cache = true
cache_plugin = jsonfile
cache_timeout = 7200
cache_connection = .cache

; REQUIRED to activate the `tdp_vars` plugin, which builds the variables used by TDP's Ansible collections.
[tdp]
vars = tdp_vars

; REQUIRED to activate the switch to the `root` user for all Ansible tasks.
[privilege_escalation]
become=True
become_user=root

Refer to the official documentation for more information on configuration options and the location of the ansible.cfg file.

Collections installation

TDP components are installed through different Ansible collections, using Ansible Galaxy. They are stored in the first directory specified by the collections_paths property of the ansible.cfg file.

Desired collections should be specified in a requirements.yml file. The core collection is mandatory.

collections:
  # Core
  # Contains the main services (HDFS, YARN, Hive, etc.) as well as the plugins needed to manage variables.
  - name: community.general
    version: 7.1.0
  - name: https://github.com/TOSIT-IO/tdp-collection
    type: git
    version: master
  # Extras (Optional)
  # Contains components dedicated to observability (Prometheus, Grafana, etc.)
  - name: https://github.com/TOSIT-IO/tdp-collection-extras
    type: git
    version: master
  # Observability (Optional)
  # Contains components dedicated to observability (Prometheus, Grafana, etc.)
  - name: https://github.com/TOSIT-IO/tdp-observability
    type: git
    version: main
  - name: community.grafana
    version: 1.5.4

Collections can then be installed using the ansible-galaxy command:

ansible-galaxy install -r requirements.yml

Additional dependencies for the observabiity collection

If installed, TDP observability requires additional Python dependencies. They can be added with the following command:

pip install -r /path/to/tdp/collections/ansible_collections/tosit/tdp_observability/requirements.txt

Definition of the Ansible inventory

The Ansible inventory defines the hosts on which to deploy TDP, the distribution of components on these hosts, and the use of any plugins. For TDP, the inventory is organized as follows:

/path/to/tdp/inventory/
├── group_vars/
│   └── all.yml
├── hosts
├── topologies/
│   ├── 01_tdp
│   ├── tdp_extra
│   ├── tdp_observability
│   └── tdp_prerequisites
└── tdp_vars.yml

Installation of the tdp_vars plugin

The tdp_vars plugin builds the variables used by TDP’s Ansible collections. It is installed by creating the file /path/to/tdp/inventory/tdp_vars.yml with the following content:

plugin: tosit.tdp.tdp_vars

Global variables

Some variables must be common to all hosts. They are specified in the file group_vars/all.yml.

Create the file /path/to/tdp/inventory/group_vars/all.yml by completing the following values:

---
domain: # DNS domain of the cluster
realm: # Kerberos realm name
kerberos_admin_principal: # Kerberos administrator ID
kerberos_admin_password: # Kerberos administrator password
kadmin_principal: # Kerberos administrator ID
kadmin_password: # Kerberos administrator password

For example:

---
domain: tdp
realm: TDP.LOCAL
kerberos_admin_principal: 'admin/admin@{{ realm }}'
kerberos_admin_password: admin123
kadmin_principal: '{{ kerberos_admin_principal }}'
kadmin_password: '{{ kerberos_admin_password }}'

Hosts

The hosts file defines the hosts on which to deploy TDP. It is an Ansible inventory file.

The inventory file must define the following groups:

  • edge
  • master
  • master1
  • master2
  • master3
  • worker

Create a file /path/to/tdp/inventory/hosts adapted to your infrastructure by completing the following content:

[edge]
; List of hosts in the edge group

[master1]
; List of hosts in the master1 group

[master2]
; List of hosts in the master2 group

[master3]
; List of hosts in the master3 group

[master:children]
master1
master2
master3

[worker]
; List of hosts in the worker group

For example, for a cluster of 6 machines (1 edge, 3 masters, and 2 workers), the hosts file will be as follows:

[edge]
edge-01 ansible_host=edge01.tdp.local

[master1]
master-01 ansible_host=master01.tdp.local

[master2]
master-02 ansible_host=master02.tdp.local

[master3]
master-03 ansible_host=master03.tdp.local

[master:children]
master1
master2
master3

[worker]
worker-01 ansible_host=worker01.tdp.local
worker-02 ansible_host=worker02.tdp.local

Topologies

The topology files define the organization of the components (HDFS NameNode, Hive Metastore, etc.) mapped on their different hosts. They are nothing more than Ansible inventory files.

Each collection comes with an example topology file (topology.ini file).

tdp-collection is the main collection. Its topology file must be read first. To do this, it is recommended to prefix its name with 01-.

Copy the topology file from tdp-collection to the /path/to/tdp/inventory/topologies directory:

cp /path/to/tdp/collections/ansible_collections/tosit/tdp/topology.ini /path/to/tdp/inventory/topologies/01-tdp

If other collections are installed, their topologies must also be added so that they are taken into account. The topologies of the other collections can rely on the group names defined in the tdp-collection topology.

cp /path/to/tdp/collections/ansible_collections/tosit/tdp_extra/topology.ini /path/to/tdp/inventory/topologies/extra
cp /path/to/tdp/collections/ansible_collections/tosit/tdp_observability/topology.ini /path/to/tdp/inventory/topologies/observability

These files are to be edited according to the desired component distribution.

For example, the following topology allows you to deploy a Hive Metastore on the hosts of the master2 and master3 groups:

[hive_ms:children]
master2
master3

Next steps

The control node is now configured to deploy TDP.

It is now necessary to prepare the managed nodes to host the deployment. The list of prerequisites is available on the managed nodes page.