Download the PDF version of this document or visit https://openshift-kni.github.io/baremetal-deploy/ |
1. Introduction
This write-up will guide you through the process of using the Ansible
playbooks to deploy a Baremetal Installer Provisioned Infrastructure
(IPI
) of Red Hat OpenShift 4.
For the manual details, visit our Deployment Guide
2. Prerequisites
-
Best Practice Minimum Setup: 6 Physical servers (1 provision node, 3 master and 2 worker nodes)
-
Best Practice Minimum Setup for disconnected environments: 7 Physical servers (1 provision node, 1 registry node[1], 3 master and 2 worker nodes)
-
Minimum Setup: 4 Physical servers (1 provision node, 3 master nodes)
-
Minimum Setup for disconnected environments: 5 Physical servers (1 provision node, 1 registry node[1], 3 master nodes)
-
Each server needs 2 NICs pre-configured. NIC1 for the private network and NIC2 for the baremetal network. NIC interface names must be identical across all nodes[2]
-
It is recommended each server have a RAID-1 configured and initialized (though not enforced)
-
Each server must have IPMI configured
-
Each server must have DHCP setup for the baremetal NICs
-
Each server must have DNS setup for the API, wildcard applications
-
A DNS VIP is IP on the
baremetal
network is required for reservation. Reservation is done via our DHCP server (though not required). -
Optional - Include DNS entries for the hostnames for each of the servers
-
Download a copy of your Pull Secret
Due to the complexities of properly configuring an environment, it is recommended to review the following steps prior to running the Ansible playbook as without proper setup, the Ansible playbook won’t work.
The section to review and ensure proper configuration are as follows:
-
One of the Create DNS records sections
-
One of the Create DHCP reservation sections
-
An existing Registry node (details on creating a registry if required below)
-
An existing webserver to cache required files and the RHCOS images (details on creating a webserver if required below)
Once the above is complete, install Red Hat Enterprise Linux (RHEL) 8.x on your provision node and create a user (i.e. kni
) to deploy as non-root and provide that user sudo
privileges.
For simplicity, the steps to create the user named kni
is as follows:
-
Login into the provision node via
ssh
-
Create a user (i.e
kni
) to deploy as non-root and provide that usersudo
privilegesuseradd kni passwd kni echo "kni ALL=(root) NOPASSWD:ALL" | tee -a /etc/sudoers.d/kni chmod 0440 /etc/sudoers.d/kni
-
Enable a
dnf
local repository on the provision host
3. Tour of the Ansible Playbook
-
inventory
- contains the filehosts.sample
that:-
contains all the modifiable variables, their default values, and their definition. Some variables are empty ensuring users give an explicit value.
-
the setting up of your provision node, master nodes, and worker nodes. Each section will require additional details (i.e. Management credentials).
-
-
requirements
- contains the list of collections required by the playbook.-
The collections include two roles:
redhatci.ocp.node_prep
andredhatci.ocp.installer
.redhatci.ocp.node_prep
handles all the prerequisites that the provisioner node requires prior to running the installer. Theredhatci.ocp.installer
role handles extracting the installer, setting up the manifests, and running the Red Hat OpenShift installation.
-
The tree structure is shown below:
├── ansible.cfg
├── inventory
│ └── hosts.sample
├── playbook.yml
└── requirements.yml
4. Running the Ansible Playbook
The following are the steps to successfully run the Ansible playbook.
4.1. git
clone the Ansible playbook
The first step to using the Ansible playbook is to clone the baremetal-deploy repository.
This should be done on a system that can access the provision host |
-
Clone the
git
repository[user@laptop ~]$ git clone https://github.com/openshift-kni/baremetal-deploy.git
-
Change to the
ansible-ipi-install
directory[user@laptop ~]$ cd /path/to/git/repo/baremetal-deploy/ansible-ipi-install
4.2. Install the required Ansible collections
The Ansible playbook makes use of different collections defined in the requirements.yml
file. Two of the main roles come from the redhatci.ocp
collection.
-
Install required collections
[user@laptop ~]$ ansible-galaxy collection install -r requirements.yml
4.3. The ansible.cfg
file
While the ansible.cfg
may vary upon your environment
a sample is provided in the repository.
[defaults]
inventory=./inventory
remote_user=kni
callback_whitelist = profile_tasks
[privilege_escalation]
become_method=sudo
Ensure to change the |
4.4. Ansible version
Ensure that your environment is using Ansible 2.9 or greater. The following command can be used to verify.
ansible --version
ansible 2.9.1
config file = /path/to/baremetal-deploy/ansible-ipi-install/ansible.cfg
configured module search path = ['/path/to/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.7/site-packages/ansible
executable location = /usr/bin/ansible
python version = 3.7.2 (default, Jan 16 2019, 19:49:22) [GCC 8.2.1 20181215 (Red Hat 8.2.1-6)]
The config file section should point to the path of your ansible.cfg
|
4.5. Copy local SSH key to provision node
With the ansible.cfg
file in place, the next step is
to ensure to copy your public ssh
key to your provision
node using ssh-copy-id
.
From the system that is to run the playbook,
$ ssh-copy-id <user>@provisioner.example.com
<user> should be the user previously created on the provision node (i.e. kni )
|
4.6. Modifying the inventory/hosts
The hosts file provides all the definable variables and provides a description of each variable. Some of the variables are explicitly left empty and require user input for the playbook to run.
The hosts file ensures all your nodes that will be used to deploy
IPI on baremetal are setup. There are 4 groups: masters
, workers
,
provisioner
, and registry_host
(optional). The masters
and
workers
group collects information about the host such as its name,
role, user management (i.e. iDRAC) user, user management (i.e. iDRAC)
password, ipmi_address
, ipmi_port
to access the server and the
provision mac address (NIC1) that resides on the provisioning network.
Below is a sample of the inventory/hosts file
[all:vars]
###############################################################################
# Required configuration variables for IPI on Baremetal Installations #
###############################################################################
# The provisioning NIC (NIC1) used on all baremetal nodes
prov_nic=eno1
# The public NIC (NIC2) used on all baremetal nodes
pub_nic=eno2
# (Optional) Set the provisioning bridge name. Default value is 'provisioning'.
#provisioning_bridge=provisioning
# (Optional) Set the baremetal bridge name. Default value is 'baremetal'.
#baremetal_bridge=baremetal
# (Optional) Activation-key for proper setup of subscription-manager, empty value skips registration
#activation_key=""
# (Optional) Activation-key org_id for proper setup of subscription-manager, empty value skips registration
#org_id=""
# The directory used to store the cluster configuration files (install-config.yaml, pull-secret.txt, metal3-config.yaml)
dir="{{ ansible_user_dir }}/clusterconfigs"
# The version of the openshift-installer, undefined or empty results in the playbook failing with error message.
# Values accepted: 'latest-4.3', 'latest-4.4', explicit version i.e. 4.3.0-0.nightly-2019-12-09-035405
version=""
# Enter whether the build should use 'dev' (nightly builds) or 'ga' for Generally Available version of OpenShift
# Empty value results in playbook failing with error message.
build=""
# (Optional) Provisioning IP Network and dhcp range (default value)
# If defined, make sure to update 'prov_ip' with a valid IP outside of your 'prov_dhcp_range' and update all other places like 'no_proxy_list'
# prov_network=172.22.0.0/21
# prov_dhcp_range="172.22.0.10,172.22.0.100"
# Provisioning IP address (default value)
prov_ip=172.22.0.3
# (Optional) Enable playbook to pre-download RHCOS images prior to cluster deployment and use them as a local
# cache. Default is false.
#cache_enabled=True
# (Optional) Enable IPv6 addressing instead of IPv4 addressing
#ipv6_enabled=True
# (Optional) When ipv6_enabled is set to True, but want IPv4 addressing on provisioning network
# Default is false.
#ipv4_provisioning=True
# (Optional) When ipv6_enabled is set to True, but want IPv4 addressing on baremetal network
#ipv4_baremetal=True
# (Optional) A list of clock servers to be used in chrony by the masters and workers
#clock_servers=["pool.ntp.org","clock.redhat.com"]
# (Optional) Provide HTTP proxy settings
#http_proxy=http://USERNAME:PASSWORD@proxy.example.com:8080
# (Optional) Provide HTTPS proxy settings
#https_proxy=https://USERNAME:PASSWORD@proxy.example.com:8080
# (Optional) comma-separated list of hosts, IP Addresses, or IP ranges in CIDR format
# excluded from proxying
# NOTE: OpenShift does not accept '*' as a wildcard attached to a domain suffix
# i.e. *.example.com
# Use '.' as the wildcard for a domain suffix as shown in the example below.
# i.e. .example.com
#no_proxy_list="172.22.0.0/24,.example.com"
# The default installer timeouts for the bootstrap and install processes may be too short for some baremetal
# deployments. The variables below can be used to extend those timeouts.
# (Optional) Increase bootstrap process timeout by N iterations.
#increase_bootstrap_timeout=2
# (Optional) Increase install process timeout by N iterations.
#increase_install_timeout=2
# (Optional) Disable RedFish inspection to intelligently choose between IPMI or RedFish protocol.
# By default this feature is enabled and set to true. Uncomment below to disable and use IPMI.
#redfish_inspection=false
# (Optional) Modify files on the node filesystems, you can augment the "fake" roots for the
# control plane and worker nodes.
# If defined, playbook will look for files in control plane and worker subdirectories.
# Otherwise, it will look in {{ role_path }}/files/customize_filesystem (default)
# For more information on modifying node filesystems visit: https://bit.ly/36tD30f
#customize_node_filesystems="/path/to/customized/filesystems"
# (Optional) Modify the path to add external manifests to the deployed nodes.
# There are two folders manifests/ and openshift/
# If defined, the playbook will copy manifests from the user provided directory.
# Otherwise, files will be copied from the default location 'roles/installer/files/manifests/*'
#customize_extramanifests_path="/path/to/extra/manifests"
#customize_extramanifestsopenshift_path="/path/to/extra/ogpenshift"
######################################
# Vars regarding install-config.yaml #
######################################
# Base domain, i.e. example.com
domain=""
# Name of the cluster, i.e. openshift
cluster=""
# The public CIDR address, i.e. 10.1.1.0/21
extcidrnet=""
# An IP reserved on the baremetal network.
dnsvip=""
# An IP reserved on the baremetal network for the API endpoint.
# (Optional) If not set, a DNS lookup verifies that api.<clustername>.<domain> provides an IP
#apivip=""
# An IP reserved on the baremetal network for the Ingress endpoint.
# (Optional) If not set, a DNS lookup verifies that *.apps.<clustername>.<domain> provides an IP
#ingressvip=""
# The master hosts provisioning nic
# (Optional) If not set, the prov_nic will be used
#masters_prov_nic=""
# Network Type (OpenShiftSDN or OVNKubernetes). Playbook defaults to OVNKubernetes.
# Uncomment below for OpenShiftSDN
#network_type="OpenShiftSDN"
# (Optional) A URL to override the default operating system image for the bootstrap node.
# The URL must contain a sha256 hash of the image.
# See https://github.com/openshift/installer/blob/master/docs/user/metal/customization_ipi.md
# Example https://mirror.example.com/images/qemu.qcow2.gz?sha256=a07bd...
#bootstraposimage=""
# (Optional) A URL to override the default operating system image for the cluster nodes.
# The URL must contain a sha256 hash of the image.
# See https://github.com/openshift/installer/blob/master/docs/user/metal/customization_ipi.md
# Example https://mirror.example.com/images/metal.qcow2.gz?sha256=3b5a8...
#clusterosimage=""
# A copy of your pullsecret from https://cloud.redhat.com/openshift/install/metal/user-provisioned
pullsecret=""
# (Optional) Disable BMC Certification Validation. When using self-signed certificates for your BMC, ensure to set to True.
# Default value is False.
#disable_bmc_certificate_verification=True
# (Optional) Enable RedFish VirtualMedia/iDRAC VirtualMedia
#enable_virtualmedia=True
# (Required when enable_virtualmedia is set to True) Set an available IP address from the baremetal net for these two variables
#provisioningHostIP=<baremetal_net_IP1>
#bootstrapProvisioningIP=<baremetal_net_IP2>
# (Optional) A MAC address to use for the external NIC on the bootstrap VM. This is optional and if blank is generated by libvirt.
#externalMACAddress="52:54:00:XX:XX:XX"
# Master nodes
# The hardware_profile is used by the baremetal operator to match the hardware discovered on the host
# See https://github.com/metal3-io/baremetal-operator/blob/master/docs/api.md#baremetalhost-status
# ipmi_port is optional for each host. 623 is the common default used if omitted
# poweroff is optional. True or ommited (by default) indicates the playbook will power off the node before deploying OCP
# otherwise set it to false
# (Optional) OpenShift 4.6+, Set Root Device Hints to choose the proper device to install operating system on OpenShift nodes.
# root device hint options include: ['deviceName','hctl','model','vendor','serialNumber','minSizeGigabytes','wwn','rotational']
# Root Device Hint values are case sensitive.
# root_device_hint="deviceName"
# root_device_hint_value="/dev/sda"
[masters]
master-0 name=master-0 role=master ipmi_user=admin ipmi_password=password ipmi_address=192.168.1.1 ipmi_port=623 provision_mac=ec:f4:bb:da:0c:58 hardware_profile=default poweroff=true
master-1 name=master-1 role=master ipmi_user=admin ipmi_password=password ipmi_address=192.168.1.2 ipmi_port=623 provision_mac=ec:f4:bb:da:32:88 hardware_profile=default poweroff=true
master-2 name=master-2 role=master ipmi_user=admin ipmi_password=password ipmi_address=192.168.1.3 ipmi_port=623 provision_mac=ec:f4:bb:da:0d:98 hardware_profile=default poweroff=true
# Worker nodes
[workers]
worker-0 name=worker-0 role=worker ipmi_user=admin ipmi_password=password ipmi_address=192.168.1.4 ipmi_port=623 provision_mac=ec:f4:bb:da:0c:18 hardware_profile=unknown poweroff=true
worker-1 name=worker-1 role=worker ipmi_user=admin ipmi_password=password ipmi_address=192.168.1.5 ipmi_port=623 provision_mac=ec:f4:bb:da:32:28 hardware_profile=unknown poweroff=true
# Provision Host
[provisioner]
provisioner.example.com
# Registry Host
# Define a host here to create or use a local copy of the installation registry
# Used for disconnected installation
# [registry_host]
# registry.example.com
# [registry_host:vars]
# The following cert_* variables are needed to create the certificates
# when creating a disconnected registry. They are not needed to use
# an existing disconnected registry.
# cert_country=US #it must be two letters country
# cert_state=MyState
# cert_locality=MyCity
# cert_organization=MyCompany
# cert_organizational_unit=MyDepartment
# The port exposed on the disconnected registry host can be changed from
# the default 5000 to something else by changing the following variable.
# registry_port=5000
# The directory the mirrored registry files are written to can be modified from teh default /opt/registry by changing the following variable.
# registry_dir="/opt/registry"
# The following two variables must be set to use an existing disconnected registry.
#
# Specify a file that contains extra auth tokens to include in the
# pull-secret if they are not already there.
# disconnected_registry_auths_file=/path/to/registry-auths.json
# Specify a file that contains the addition trust bundle and image
# content sources for the local registry. The contents of this file
# will be appended to the install-config.yml file.
# disconnected_registry_mirrors_file=/path/to/install-config-appends.json
The The A detailed description of the |
4.6.1. Enabling dual-stack based deployments (optional).
Users now can deploy dual-stack
based deployments using ansible-playbook
by including below variables under inventory/hosts.sample
file.
ipv6_enabled=True
dualstack_baremetal=True
extcidrnet="<ipv4-subnet-for-your-cluster>" #Ex: 10.0.0.1/24
extcidrnet6="<ipv6-subnet-for-your-cluster>" #Ex: fe80:12:0:4567::/64
Only applicable for OCP versions greater than 4.6 .
|
4.7. The Ansible playbook.yml
The Ansible playbook connects to your provision host and
runs through the redhatci.ocp.node_prep
role and the
redhatci.ocp.installer
role.
No modification is necessary. All modifications of variables
may be done within the inventory/hosts
file. A sample file
is located in this repository under inventory/hosts.sample
.
From the system that is to run the playbook,
playbook.yml
---
- name: IPI on Baremetal Installation Playbook
hosts: provisioner
collections:
- redhatci.ocp
roles:
- node_prep
- installer
4.8. Customizing the Node Filesystems
If you need to modify files on the node filesystems, you can augment
the "fake" roots for the masters and workers under the
roles/installer/files/customize_filesystem/{master,worker}
directories. Any files added here will be included in the ignition
config files for each of the machine types, leading to permanent
changes to the node filesystem.
Do not place any files directly in the "fake" root — only in
subdirectories. Files in the root will cause the ignition process to
fail. (There is a task in the playbook to cleanup the |
This will utilize the Ignition
filetranspiler
tool,
which you can read about for more information on how to use the "fake"
root directories.
An example of using this customization is to disable a network
interface that you need to not receive a DHCP assignment that is
outside of the cluster configuration. To do this for the eno1
interface on the master nodes, create the appropriate
etc/sysconfig/network-scripts/ifcfg-eno1
file in the "fake" root:
IFCFG_DIR="roles/installer/files/customize_filesystem/master/etc/sysconfig/network-scripts"
IFNAME="eno1"
mkdir -p $IFCFG_DIR
cat << EOF > $IFCFG_DIR/ifcfg-${IFNAME}
DEVICE=${IFNAME}
BOOTPROTO=none
ONBOOT=no
EOF
By default these directories are empty, and the |
4.9. Adding Extra Configurations to the OpenShift Installer
Prior to the installation of Red Hat OpenShift, you may want to include
additional configuration files to be included during the installation.
The installer
role handles this.
In order to include the extraconfigs
, ensure to place your yaml
files within the roles/installer/files/manifests
directory. All the
files provided here will be included when the OpenShift manifests are
created.
By default this directory is empty. |
4.10. Pre-caching RHCOS Images
If you wish to set up a local cache of RHCOS images on your
provisioning host, set the cache_enabled
variable to True
in your
hosts file. When requested, the playbook will pre-download RHCOS
images prior to actual cluster deployment.
It places these images in an Apache web server container on the
provisioning host and modifies install-config.yaml
to
instruct the bootstrap VM to download the images from that web server
during deployment.
If you set the |
4.11. Disconnected Registry
A disconnected registry can be used to deploy the cluster. This registry can exist or can be created.
To use a disconnected registry, set the registries host name in the
[registry_host]
group in the inventory file.
4.11.1. Creating a new disconnected registry
To create a new disconnected registry, the
disconnected_registry_auths_file
and
disconnected_registry_mirrors_file
variables must not be set.
The certificate information used to generate the host certificate must
be defined. These variables must be defined as variables to the
registry_host
group in the inventory file.
[registry_host:vars]
cert_country=US
cert_state=MyState
cert_locality=MyCity
cert_organization=MyCompany
cert_organizational_unit=MyDepartment
cert_country must be only two letters, i.e. US
|
4.11.2. Using an Existing Registry
If no existing registry is already existing for your fully disconnected environment, visit Creating a New Disconnected Registry section. |
When using an existing registry, two variables labeled
disconnected_registry_auths_file
and the disconnected_registry_mirrors_file
must be set. These variables are located within your inventory/hosts file and
the inventory/hosts.sample file can be used as reference.
The disconnected_registry_auths_file
variable should point to a file
containing json data regarding your registry information. This will be appended
to the auths
section of the pull secret by the Ansible playbook itself.
An example of the contents of the disconnected_registry_auths_file
is shown
below.
cat /path/to/registry-auths.json
{"registry.example.com:5000": {"auth": "ZHVtbXk6ZHsFVtbXk=", "email": "user@example.com" } }
The auth password given base64 encoding of the http credentials used to create the htpasswd file. Example: [user@registry ~]$ b64auth=$( echo -n '<username>:<passwd>' | openssl base64 )
|
The disconnected_registry_mirrors_file
variable should point to a file
containing the additionalTrustBundle
and imageContentSources
(OpenShift
4.13 and below) or imageDigestSources
(OpenShift 4.14 and above) for
the disconnected registry. The certificate that goes within the additional
trust bundle is the disconnected registry node’s certificate. The
imageContentSources
adds the mirrored information of the registry. The below
content from the install-config-appends.yml
file gets automatically appended
by the Ansible playbook.
cat /path/to/install-config-appends.yml
additionalTrustBundle: |
-----BEGIN CERTIFICATE-----
MIIGPDCCBCSgAwIBAgIUWr1DxDq53hrsk6XVLRXUjfF9m+swDQYJKoZIhvcNAQEL
BQAwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdNeVN0YXRlMQ8wDQYDVQQHDAZN
eUNpdHkxEjAQBgNVBAoMCU15Q29tcGFueTEVMBMGA1UECwwMTXlEZXBhcnRtZW50
.
. [ABBREVIATED CERTIFICATE FOR BREVITY]
.
MTMwMQYDVQQDDCpyZWdpc3RyeS5rbmk3LmNsb3VkLmxhYi5lbmcuYm9zLnJlZGhh
dC5jb20wHhcNMjAwNDA3MjM1MzI2WhcNMzAwNDA1MjM1MzI2WjCBkDELMAkGA1UE
-----END CERTIFICATE-----
<image-config>: (1)
- mirrors:
- registry.example.com:5000/ocp4/openshift4
source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
- mirrors:
- registry.example.com:5000/ocp4/openshift4
source: registry.svc.ci.openshift.org/ocp/release
- mirrors:
- registry.example.com:5000/ocp4/openshift4
source: quay.io/openshift-release-dev/ocp-release
Where:
+
<1> <image-config>
is either imageContentSources
for OpenShift 4.13 and below, or imageDigestSources
for Openshift 4.14 and above.
Indentation is important in the yml file. Ensure your copy of the install-config-appends.yml is properly indented as in the example above.
|
4.12. Running the playbook.yml
With the playbook.yml
set and in-place, run the playbook.yml
$ export ANSIBLE_CONFIG=./ansible.cfg
$ ansible-playbook -i inventory/hosts playbook.yml
5. Verifying Installation
Once the playbook has successfully completed, verify that your environment is up and running.
-
Log into the provision node
ssh kni@provisioner.example.com
kni
user is my privileged user. -
Export the
kubeconfig
file located in the~/clusterconfigs/auth
directoryexport KUBECONFIG=~/clusterconfigs/auth/kubeconfig
-
Verify the nodes in the OpenShift cluster
[kni@worker-0 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION master-0.openshift.example.com Ready master 19h v1.16.2 master-1.openshift.example.com Ready master 19h v1.16.2 master-2.openshift.example.com Ready master 19h v1.16.2 worker-0.openshift.example.com Ready worker 19h v1.16.2 worker-1.openshift.example.com Ready worker 19h v1.16.2
6. Troubleshooting
The following section troubleshoots common errors that may arise when running the Ansible playbook.
6.1. Unreachable Host
One of the most common errors is not being able to reach the
provisioner
host and seeing an error similar to
$ ansible-playbook -i inventory/hosts playbook.yml
PLAY [IPI on Baremetal Installation Playbook] **********************************
TASK [Gathering Facts] *********************************************************
fatal: [provisioner.example.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname provisioner.example.com: Name or service not known", "unreachable": true}
PLAY RECAP *********************************************************************
provisioner.example.com : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
In order to solve this issue, ensure your provisioner
hostname is
pingable.
-
The system you are currently on can
ping
theprovisioner.example.com
ping provisioner.example.com
-
Once pingable, ensure that you have copied your public SSH key from your local system to the privileged user via the
ssh-copy-id
command.ssh-copy-id kni@provisioner.example.com
When prompted, enter the password of your privileged user (i.e. kni
). -
Verify connectivity using the
ping
module in Ansibleansible -i inventory/hosts provisioner -m ping provisioner.example.com | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/libexec/platform-python" }, "changed": false, "ping": "pong" }
-
Re-run the Ansible playbook
$ ansible-playbook -i inventory/hosts playbook.yml
6.2. Permission Denied Trying To Connect To Host
Another very common error is getting a permission denied error similar to:
$ ansible-playbook -i inventory/hosts playbook.yml
PLAY [IPI on Baremetal Installation Playbook] *****************************************************************************************************
TASK [Gathering Facts] ****************************************************************************************************************************
fatal: [provisioner.example.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: rlopez@provisioner.example.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).", "unreachable": true}
PLAY RECAP ****************************************************************************************************************************************
provisioner.example.com : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
The above issue is typically related to a problem with your
ansible.cfg
file. Either it does not exist, has errors inside it, or
you have not copied your SSH public key onto the
provisioner.example.com system. If you notice closely, the Ansible
playbook attempted to use my rlopez
user instead of my kni
user
since my local ansible.cfg
did not exist AND I had not yet set
the remote_user
parameter to kni
(my privileged user).
-
When working with the Ansible playbook ensure you have an
ansible.cfg
located in the same directory as yourplaybook.yml
file. The contents of theansible.cfg
should look similar to the below with the exception of changing your inventory path (location ofinventory
directory) and potentially your privileged user if not usingkni
.$ cat ansible.cfg [defaults] inventory=/path/to/baremetal-deploy/ansible-ipi-install/inventory remote_user=kni [privilege_escalation] become=true become_method=sudo
-
Next, ensure that you have copied your public SSH key from your local system to the privileged user via the
ssh-copy-id
command.ssh-copy-id kni@provisioner.example.com
When prompted, enter the password of your privileged user (i.e. kni
). -
Verify connectivity using the
ping
module in Ansibleansible -i inventory/hosts provisioner -m ping provisioner.example.com | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/libexec/platform-python" }, "changed": false, "ping": "pong" }
-
Re-run the Ansible playbook
$ ansible-playbook -i inventory/hosts playbook.yml
6.3. Dig lookup requires the python ‘dnspython’ library and it is not installed
One of the tasks in the node_prep
role captures your API VIP and the
Ingress VIP of your environment using a lookup
via dig
. It does
this DNS query using the dnspython
library.
This error is a little deceiving because the dnspython
package
does not need to be installed on the remote server
(i.e. provisioner.example.com
) but the package must be installed on
your local host that is running the Ansible playbook.
TASK [node_prep : fail] ************************************************************************************************************
skipping: [provisioner.example.com]
TASK [node_prep : Verify DNS records for API VIP, Wildcard (Ingress) VIP] **********************************************************
fatal: [provisioner.example.com]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'dig'. Error was a <class 'ansible.errors.AnsibleError'>, original message: The dig lookup requires the python 'dnspython' library and it is not installed"}
PLAY RECAP *************************************************************************************************************************
provisioner.example.com : ok=2 changed=0 unreachable=0 failed=1 skipped=3 rescued=0 ignored=0
The above issue can be fixed by simply installing python3-dns
on
your local system (assuming your using an OS such as Fedora, Red Hat)
On a local host running Red Hat 8.x, run:
# sudo dnf install python3-dns
On a local host running Red Hat 7.x, run:
# sudo yum install python2-dns
On a local host running Fedora, run:
# sudo dnf install python3-dns
Re-run the Ansible playbook
$ ansible-playbook -i inventory/hosts playbook.yml
6.4. Missing python netaddr
library
The Ansible playbook takes advantage of certain filters such as the
ipaddr
filter. In order to use this filter, your localhost running the
Ansible playbook requires the python netaddr
library.
The error when running the playbook looks like the following:
TASK [node_prep : Fail if Python modules are missing] ******************************************************************************
Tuesday 05 May 2020 19:30:19 +0000 (0:00:00.512) 0:00:13.829 ***********
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Missing python module(s) ['netaddr'] on localhost\n"}
The above issue can be fixed by simply installing python3-netaddr
on
your local system (assuming your using an OS such as Fedora, Red Hat)
On a local host running Red Hat 8.x, run:
# sudo dnf install python3-netaddr
On a local host running Red Hat 7.x, run:
# sudo yum install python2-netaddr
On a local host running Fedora, run:
# sudo dnf install python3-netaddr
Re-run the Ansible playbook
$ ansible-playbook -i inventory/hosts playbook.yml
6.5. Shared connection closed on provision host when installing packages
When deploying in an environment where subscription manager is not being used and a local repository is being setup on the provision host due to the nature that the provision host is offline, you may see the following error.
TASK [node_prep : Install required packages] ************************************************************************************************
Thursday 07 May 2020 17:04:21 +0000 (0:00:00.152) 0:00:11.854 **********
fatal: [provisioner.example.com]: FAILED! => {"changed": false, "module_stderr": "Shared connection to provisioner.example.com closed.\r\n", "module_stdout": "[Errno 101] Network is unreachable\r\n\r\n{\"msg\": \"Nothing to do\", \"changed\": false, \"results\": [], \"rc\": 0, \"invocation\": {\"module_args\": {\"name\": [\"firewalld\", \"tar\", \"libvirt\", \"qemu-kvm\", \"python3-devel\", \"jq\", \"ipmitool\", \"python3-libvirt\", \"python3-lxml\", \"python3-yaml\", \"NetworkManager-libnm\", \"nm-connection-editor\", \"libsemanage-python3\", \"policycoreutils-python3\", \"podman\"], \"state\": \"present\", \"update_cache\": true, \"allow_downgrade\": false, \"autoremove\": false, \"bugfix\": false, \"disable_gpg_check\": false, \"disable_plugin\": [], \"disablerepo\": [], \"download_only\": false, \"enable_plugin\": [], \"enablerepo\": [], \"exclude\": [], \"installroot\": \"/\", \"install_repoquery\": true, \"install_weak_deps\": true, \"security\": false, \"skip_broken\": false, \"update_only\": false, \"validate_certs\": true, \"lock_timeout\": 30, \"conf_file\": null, \"disable_excludes\": null, \"download_dir\": null, \"list\": null, \"releasever\": null}}}\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 0}
The error basically means that dnf
was not able to load particular
plugins, specifically the product-id
and the subscription-manager
plugins. However,since this is a local repository with offline access,
we will want to disable these plugins when this error occurs.
On the provision host, if you run the following command:
[kni@provisioner ~]$ sudo dnf info dnf
Updating Subscription Management repositories.
Unable to read consumer identity
[Errno 101] Network is unreachable
Last metadata expiration check: 0:08:49 ago on Thu 07 May 2020 08:11:19 PM UTC.
Installed Packages
Name : dnf
Version : 4.2.7
Release : 7.el8_1
Architecture : noarch
Size : 1.7 M
Source : dnf-4.2.7-7.el8_1.src.rpm
Repository : @System
From repo : rhel-8-for-x86_64-baseos-rpms
Summary : Package manager
URL : https://github.com/rpm-software-management/dnf
License : GPLv2+ and GPLv2 and GPL
Description : Utility that allows users to manage packages on their systems.
: It supports RPMs, modules and comps groups & environments.
To ensure the issue is plugin related, we can attempt to run the same command with plugins disabled as such:
[kni@provisioner ~]$ sudo dnf info dnf --disableplugin=product-id,subscription-manager
Last metadata expiration check: 0:11:17 ago on Thu 07 May 2020 08:11:19 PM UTC.
Installed Packages
Name : dnf
Version : 4.2.7
Release : 7.el8_1
Architecture : noarch
Size : 1.7 M
Source : dnf-4.2.7-7.el8_1.src.rpm
Repository : @System
From repo : rhel-8-for-x86_64-baseos-rpms
Summary : Package manager
URL : https://github.com/rpm-software-management/dnf
License : GPLv2+ and GPLv2 and GPL
Description : Utility that allows users to manage packages on their systems.
: It supports RPMs, modules and comps groups & environments.
If you notice, the portion that says
Unable to read consumer identity
[Errno 101] Network is unreachable
is no longer stated.
For this fix to be permanent, modify the /etc/yum.conf
file and include
the plugins=0
into the [main]
section of the configuration file.
[kni@provisioner ~]$ cat /etc/yum.conf
[main]
gpgcheck=1
installonly_limit=3
clean_requirements_on_remove=True
best=True
plugins=0
7. Gotchas
7.1. Using become: yes
within ansible.cfg
or inside playbook.yml
This Ansible playbook takes advantage of the ansible_user_dir
variable. As such, it is important to note that if within your
ansible.cfg
or within the playbook.yml
file the privilege
escalation of become: yes
is used, this will modify the home
directory to that of the root user (i.e. /root
) instead of using the
home directory of your privileged user, kni
with a home directory of
/home/kni
7.2. Failed to connect to bus: No such file or directory
The Ansible playbook creates two containers (when enabled) to
store a mirored registry and a caching webserver.
When these containers are created, the playbook also creates a
systemd
unit file to ensure these containers are restarted upon
the reboot of the host serving them.
Since these are systemd
user services, when logging into a
system to attempt a command such as
systemctl --user status container-cache.service
for the
webserver or systemctl --user status container-registry.service
for the mirrored registry, you may get an error such as:
[kni@provisioner ~]$ systemctl --user status container-cache Failed to connect to bus: No such file or directory
What the following error is trying to address is that the
parameter, DBUS_SESSIONBUS_ADDRESS
, is not set.
In order to set this variable, we can export
as follows:
export DBUS_SESSIONBUS_ADDRESS="unix:path/run/user/$id/bus"
Once that has been set, if you re-attempt the systemctl
command, you should
see output as follows:
[kni@provisioner ~]$ systemctl --user status container-cache.service ● container-cache.service - Podman container-cache.service Loaded: loaded (/home/kni/.config/systemd/user/container-cache.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2020-06-01 19:52:04 UTC; 49min ago Process: 36380 ExecStart=/usr/bin/podman start rhcos_image_cache (code=exited, status=0/SUCCESS) Main PID: 36410 (conmon)
Appendix A: Using Ansible Tags with the playbook.yml
As this playbook continues to grow, there may be times when it is useful to run specific portions of the playbook rather than running everything the Ansible playbook offers.
For example, a user may only want to run the networking piece of the playbook or create just the pull-secret.txt file, or just clean up the environment — just to name a few.
As such the existing playbook has many tags that can be used for such purposes. By running the following command you can see what options are available.
$ ansible-playbook -i inventory/hosts playbook.yml --list-tasks --list-tags
playbook: playbook.yml
play #1 (provisioner): IPI on Baremetal Installation Playbook TAGS: []
tasks:
include_tasks TAGS: [validation]
include_tasks TAGS: [subscription]
include_tasks TAGS: [packages]
include_tasks TAGS: [network]
include_tasks TAGS: [network_facts]
include_tasks TAGS: [user]
include_tasks TAGS: [services]
include_tasks TAGS: [firewall]
include_tasks TAGS: [storagepool]
include_tasks TAGS: [clusterconfigs]
include_tasks TAGS: [powerservers]
include_tasks TAGS: [cleanup, getoc]
include_tasks TAGS: [extract, pullsecret]
include_tasks TAGS: [rhcospath]
include_tasks TAGS: [cache]
include_tasks TAGS: [installconfig]
include_tasks TAGS: [metal3config]
include_tasks TAGS: [customfs]
include_tasks TAGS: [manifests]
include_tasks TAGS: [extramanifests]
include_tasks TAGS: [cleanup]
include_tasks TAGS: [install]
TASK TAGS: [cache, cleanup, clusterconfigs, customfs, extract, extramanifests, firewall, getoc, install, installconfig, manifests, metal3config, network, network_facts, packages, powerservers, pullsecret, rhcospath, services, storagepool, subscription, user, validation]
To break this down further, the following is a description of each tag.
tag | description |
---|---|
|
It is always required. It verifies that everything in your environment is set and ready for OpenShift deployment and sets some required internal variables |
|
subscribe via Red Hat subscription manager |
|
install required package for OpenShift |
|
setup the provisioning and baremetal network bridges and bridge slaves |
|
regather networking facts of environment |
|
add remote user to |
|
enable appropriate services for OpenShift |
|
set firewall rules for OpenShift |
|
define, create, auto start the default storage pool |
|
directory that stores all configuration files for OpenShift |
|
power off all servers that will be part of the OpenShift cluster |
|
get the appropriate |
|
extract the OpenShift installer |
|
copy the |
|
set the RHCOS path |
|
tasks related to enabling RHCOS image caching |
|
generates the install-config.YAML |
|
generates the metal3-config.YAML |
|
deals with customizing the filesystem via ignition files |
|
create the manifests directory |
|
include any extra manifests files |
|
Deploy OpenShift |
|
clean up the environment within the provisioning node. Does not remove networking |
A.1. How to use the Ansible tags
The following is an example on how to use the --tags
option. In this example, we will just install the packages to the provision node.
ansible-playbook -i inventory/hosts playbook.yml --tags "validation,packages"
The example above calls for the setup of the networking and installation of the packages from the Ansible playbook. Only the tasks with these specific tags will run.
Due to the dependencies in the playbook, the validation tag is always required.
|
A.2. Skipping particular tasks using Ansible tags
In the event that you want to always skip certain tasks of the
playbook this can be done via the --skip-tag
option.
We will use similar example as above where we want to skip the network setup and the package installation.
ansible-playbook -i inventory/hosts playbook.yml --skip-tags "network,packages"
Appendix B: Using a proxy with your Ansible playbook
When running behind a proxy, it is important to properly set the environment
to handle such scenario such that you can run the Ansible playbook. In order
to use a proxy for the ansible playbook set the appropriate variables within
your inventory/hosts
file. These values will also be included within your
generated install-config.yaml
file.
# (Optional) Provide HTTP proxy settings
#http_proxy=http://USERNAME:PASSWORD@proxy.example.com:8080
# (Optional) Provide HTTPS proxy settings
#https_proxy=https://USERNAME:PASSWORD@proxy.example.com:8080
# (Optional) comma-separated list of hosts, IP Addresses, or IP ranges in CIDR format
# excluded from proxying
# NOTE: OpenShift does not accept '*' as a wildcard attached to a domain suffix
# i.e. *.example.com
# Use '.' as the wildcard for a domain suffix as shown in the example below.
# i.e. .example.com
#no_proxy_list="172.22.0.0/24,.example.com"