Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHEL 9.1 + OCP 4.13.10 installation failed #277

Open
3 tasks done
shashank-6777 opened this issue Oct 11, 2023 · 23 comments
Open
3 tasks done

RHEL 9.1 + OCP 4.13.10 installation failed #277

shashank-6777 opened this issue Oct 11, 2023 · 23 comments
Labels
bug Something isn't working

Comments

@shashank-6777
Copy link

shashank-6777 commented Oct 11, 2023

Bug description

Hi Team,

I am trying to install OCP 4.13.10 with 9.1 by using latest crucible framwork but my installation failed everytime, I found below two error on OCP failed nodes.

Can you please let me know how can i solve these issues. ?

OpenShift version

other (provide in the description)

Assisted Installer version

v2.12.1

Relevant log output

**Error 1:-**

Oct 11 11:01:28 master-super1.nec.nfvi.localdomain next_step_runne[2488]: time="11-10-2023 11:01:28" level=info msg="Executing timeout [30 chronyc -n sources]" file="execute.go:39"
Oct 11 11:01:28 master-super1.nec.nfvi.localdomain next_step_runne[2488]: time="11-10-2023 11:01:28" level=error msg="Failed to get NTP sources" file="ntp_synchronizer.go:170" error="chronyc exited with non-zero exit code 1: \nchronyc: /lib64/libm.so.6: version `GLIBC_2.29' not found (required by chronyc)\nchronyc: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by chronyc)\n"
Oct 11 11:01:28 master-super1.nec.nfvi.localdomain next_step_runne[2488]: time="11-10-2023 11:01:28" level=error msg="Step execution failed (exit code -1): <ntp-synchronizer-475163b7>, command: <ntp_synchronizer>, args: <[{\"ntp_source\":\"172.90.12.210\"}]>. Output:\nstdout:\n\n\nstderr:\nchronyc exited with non-zero exit code 1: \nchronyc: /lib64/libm.so.6: version `GLIBC_2.29' not found (required by chronyc)\nchronyc: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by chronyc)\n\n" file="step_processor.go:110" request_id=1bbccc67-dce2-45e7-98ca-6199828dcb8f
**Error 2:-**

Oct 11 11:02:37 master-super1.nec.nfvi.localdomain next_step_runne[2488]: time="11-10-2023 11:02:37" level=error msg="Step execution failed (exit code 255): <installation-disk-speed-check-9ee10436>, command: <sh>, args: <[-c id=`podman ps --quiet --filter \"name=disk_performance\"` ; test ! -z \"$id\" || timeout 480.00 podman run --privileged --rm --quiet -v /dev:/dev:rw -v /var/log:/var/log -v /run/systemd/journal/socket:/run/systemd/journal/socket --name disk_performance registry.nec.nfvi.localdomain:5000/ocpmetal/assisted-installer-agent@sha256:6d782042debb951b202c80869f2dd30fffb9272010a46d77ea39ce8f6a862162 disk_speed_check '{\"path\":\"/dev/disk/by-path/pci-0000:04:00.0\"}']>. Output:\nstdout:\n{\"path\":\"/dev/disk/by-path/pci-0000:04:00.0\"}\n\nstderr:\nCould not get I/O performance for path /dev/disk/by-path/pci-0000:04:00.0 (fio exit code: 1, stderr: fio: io_u error on file /dev/disk/by-path/pci-0000: No space left on device: write offset=4194304, buflen=896\n)\n" file="step_processor.go:110" request_id=ce192de1-5656-4c65-84dc-2f5656822981

Inventory file

all:
  vars:
    ##################################
    # Assisted Install Configuration #
    ##################################
    # These options configure Assisted Installer and the resulting cluster
    # https://generator.swagger.io/?url=https://raw.githubusercontent.com/openshift/assisted-service/58a6abd5c99d4e41d939be89cd0962433849a861/swagger.yaml
    # See section: cluster-create-params

    # Cluster name and dns domain combine to give the cluster namespace that will contain OpenShift endpoints
    # e.g. api.clustername.example.lab, worker1.clustername.example.lab
    cluster_name: nec
    base_dns_domain: nfvi.localdomain

    # OpenShift version (4.6.16, 4.7.52, 4.8.43, 4.9.45, 4.10.37, 4.11.24, 4.12.31, or 4.13.10)
    openshift_full_version: 4.13.10

    # Virtual IP addresses used to access the resulting OpenShift cluster
    api_vip: 172.90.12.242 # the IP address to be used for api.clustername.example.lab and api-int.clustername.example.lab
    ingress_vip: 172.90.12.243 # the IP address to be used for *.apps.clustername.example.lab

    ## Allocate virtual IPs via DHCP server. Equivalent to the vip_dhcp_allocation configuration option of Assisted Installer
    vip_dhcp_allocation: false

    # The subnet on which all nodes are (or will be) accessible.
    machine_network_cidr: 172.90.12.0/24

    # The IP address pool to use for service IP addresses
    service_network_cidr: 172.30.0.0/16

    # Cluster network settings. You are unlikely to need to change these
    cluster_network_cidr: 10.128.0.0/14 # The subnet, internal to the cluster, on which pods will be assigned IPs
    cluster_network_host_prefix: 23 # The subnet prefix length to assign to each individual node.

    # # Cluster network provider. Cannot be changed after cluster is created.
    # # The default is OpenShift SDN unless otherwise specified.
    # network_type: OVNKubernetes
    # network_type: OpenShiftSDN
    network_type: OVNKubernetes
    populate_operator_catalog: true
    mirror_packages:
      - advanced-cluster-management
      - local-storage-operator
      - odf-operator
      - openshift-gitops-operator
      - loki-operator
      - kiali-ossm
      - jaeger-product
      - servicemeshoperator
      - netobserv-operator
      - ptp-operator
      - sriov-network-operator

    os_images:
      - openshift_version: '4.13'
        cpu_architecture: x86_64
        url: http://{{ hostvars['http_store']['ansible_host'] }}/discovery/rhcos-4.13.10-x86_64-live.x86_64.iso
        rootfs_url: http://{{ hostvars['http_store']['ansible_host'] }}/discovery/rhcos-4.13.10-x86_64-live-rootfs.x86_64.img
        version: 413.92.202308210212-0

    release_images:
      - openshift_version: '4.13'
        cpu_architecture: x86_64
        url: registry.nec.nfvi.localdomain:5000/ocp4/openshift4:4.13.10-x86_64
        version: 4.13.10

    assisted_service_image_repo_url: registry.nec.nfvi.localdomain:5000/ocpmetal
    ignore_cached_image_hash_file: true

    listen_address: 172.90.12.241


    ######################################
    # Prerequisite Service Configuration #
    ######################################

    # Proxy settings. These settings apply to: Assisted Installer, Day1 clusters and Day2 clusters.
    # This assumes the host where the AI runs and the OpenShift cluster share the same proxy settings.
    # http_proxy: ""
    # https_proxy: ""
    # no_proxy: ""

    # Flags to enable/disable prerequisite service setup
    # You will need to ensure alternatives are available for anything that will not be automatically set up
    setup_ntp_service: false
    setup_dns_service: true
    setup_pxe_service: false
    setup_registry_service: false # Only required for a Restricted Network installation
    setup_http_store_service: true
    setup_assisted_installer: true # default is true you may wish to turn it off if multiple users are using the same instance.


    # NTP Service
    # ntp_server is the address at which the NTP service is (or will be) available
    ntp_server: 172.90.12.210
    # ntp_server_allow is the range of IPs the NTP service will respond to
    ntp_server_allow: 172.90.12.0/24 # not required if setup_ntp_service is false


    # Mirror Registry Service parameters for a Restricted Network installation

    # use_local_mirror_registry controls if the install process uses a local container registry (mirror_registry) or not.
    # Set this to true to use the mirror registry service set up when `setup_registry_service` is true.
    use_local_mirror_registry: true

    # HTTP Store Configuration
    # ISO name must include the `discovery` directory if you have a SuperMicro machine
    discovery_iso_name: "discovery/{{ cluster_name }}/discovery-image.iso"

    # discovery_iso_server must be discoverable from all BMCs in order for them to mount the ISO hosted there.
    # It is usually necessary to specify different values for KVM nodes and/or physical BMCs if they are on different subnets.
    discovery_iso_server: "http://{{ hostvars['http_store']['ansible_host'] }}"

    ############################
    # Local File Configuration #
    ############################

    path_base_dir: /playbook/

    repo_root_path: "{{ path_base_dir }}/crucible" # path to repository root

    # Directory in which created/updated artifacts are placed
    fetched_dest: "{{ repo_root_path }}/fetched"

    # Configure possible paths for the pull secret
    # first one found will be used
    # note: paths should be absolute
    pull_secret_lookup_paths:
      - "{{ fetched_dest }}/pull-secret.txt"
      - "{{ repo_root_path }}/pull-secret.txt"

    # Configure possible paths for the ssh public key used for debugging
    # first one found will be used
    # note: paths should be absolute
    ssh_public_key_lookup_paths:
      - "{{ fetched_dest }}/ssh_keys/{{ cluster_name }}.pub"
      - "{{ repo_root_path }}/ssh_public_key.pub"
      - ~/.ssh/id_rsa.pub

    # Set the base directory to store ssh keys
    ssh_key_dest_base_dir: "{{ path_base_dir }}"
    # The retrieved cluster kubeconfig will be placed on the bastion host at the following location
    kubeconfig_dest_dir: "{{ path_base_dir }}"
    kubeconfig_dest_filename: "{{ cluster_name }}-kubeconfig"
    kubeadmin_dest_filename: "{{ cluster_name }}-kubeadmin.vault.yml"
    # You can comment out the line below if you want the kubeadmin credentials to be stored in plain text
    # kubeadmin_vault_password_file_path: "{{ repo_root_path }}/kubeadmin_vault_password_file"

    ############################
    #    LOGIC: DO NOT TOUCH   #
    # vvvvvvvvvvvvvvvvvvvvvvvv #
    ############################

    # pull secret logic, no need to change. Configure above
    local_pull_secret_path: "{{ lookup('first_found', pull_secret_lookup_paths) }}"
    pull_secret: "{{ lookup('file', local_pull_secret_path) }}"

    # ssh key logic, no need to change. Configure above
    local_ssh_public_key_path: "{{ lookup('first_found', ssh_public_key_lookup_paths) }}"
    ssh_public_key: "{{ lookup('file', local_ssh_public_key_path) }}"

    # provided mirror certificate logic, no need to change.
    local_mirror_certificate_path: "{{ (setup_registry_service == true) | ternary(
        fetched_dest + '/' + (hostvars['registry_host']['cert_file_prefix'] | default('registry')) + '.crt',
        repo_root_path + '/mirror_certificate.txt')
      }}"
    mirror_certificate: "{{ lookup('file', local_mirror_certificate_path) }}"

    openshift_version: "{{ openshift_full_version.split('.')[:2] | join('.') }}"

    is_valid_single_node_openshift_config: "{{ (groups['nodes'] | length == 1) and (groups['masters'] | length == 1) }}"

    ############################
    # ^^^^^^^^^^^^^^^^^^^^^^^^ #
    #    LOGIC: DO NOT TOUCH   #
    ############################


  children:
    bastions: # n.b. Currently only a single bastion is supported
      hosts:
        bastion:
          ansible_host:  172.90.12.241 # Must be reachable from the Ansible control node

    # Configuration and access information for the pre-requisite services
    # TODO: document differences needed for already-deployed and auto-deployed
    services:
      hosts:
        assisted_installer:
          ansible_host: 172.90.12.241
          host: 172.90.12.241
          port: 8090 # Do not change
          dns_servers:
            - 172.90.12.241

        registry_host:
          ansible_host: 172.90.12.241
          registry_port: 5000
          registry_fqdn: registry.nec.nfvi.localdomain # use in case of different FQDN for the cert
          registry_namespace: ocp4 # This is the default, use only in case the registry namespace name is different
          registry_image: openshift4 # This is the default, use only in case the registry image name is different
          cert_common_name: "{{ registry_fqdn }}"
          cert_country: US
          cert_locality: Raleigh
          cert_organization: Red Hat, Inc.
          cert_organizational_unit: Lab
          cert_state: NC

          # Configure the following secret values in the inventory.vault.yml file
          REGISTRY_HTTP_SECRET: "{{ VAULT_REGISTRY_HOST_REGISTRY_HTTP_SECRET | mandatory }}"
          disconnected_registry_user: "{{ VAULT_REGISTRY_HOST_DISCONNECTED_REGISTRY_USER | mandatory }}"
          disconnected_registry_password: "{{ VAULT_REGISTRY_HOST_DISCONNECTED_REGISTRY_PASSWORD | mandatory }}"

        dns_host:
          ansible_host: 172.90.12.241
          # upstream_dns: 8.8.8.8 # an optional upstream dns server
          # The following are required for DHCP setup
          # use_dhcp: true
          # use_pxe: false
          # dhcp_range_first: 10.60.0.101
          # dhcp_range_last:  10.60.0.105
          # prefix: 24
          # gateway: 10.60.0.1

        http_store:
          ansible_host: 172.90.12.241

        tftp_host:
          ansible_host: 172.90.12.241
          tftp_directory: /var/lib/tftpboot/

        ntp_host:
          ansible_host: 172.90.12.241

    vm_hosts:
      hosts:
        vm_host1: # Required for using "KVM" nodes, ignored if not.
          ansible_user: root
          ansible_host: 172.90.12.241
          host_ip_keyword: ansible_host # the varname in the KVM node hostvars which contains the *IP* of the VM
          images_dir: "/var/lib/libvirt/images/" # directory where qcow images will be placed.
          vm_bridge_ip: 172.90.12.241 # IP for the bridge between VMs and machine network
          vm_bridge_name: caas-br
          SETUP_VM_BRIDGE: false
          vm_bridge_interface: bond0.902 # Interface to be connected to the bridge. DO NOT use your primary interface.
          dns: 172.90.12.241 # DNS used by the bridge
          # ssl cert configuration
          # sushy_fqdn: ... # use in case of different FQDN for the cert
          cert_vars_host_var_key: registry_host # Look up cert values from another host by name (excluding cert_common_name)
          # or
          # cert_country: US
          # cert_locality: Raleigh
          # cert_organization: Red Hat, Inc.
          # cert_organizational_unit: Lab
          # cert_state: NC

        vm_host2: # Required for using "KVM" nodes, ignored if not.
          ansible_user: root
          ansible_host: 172.90.12.141
          host_ip_keyword: ansible_host # the varname in the KVM node hostvars which contains the *IP* of the VM
          images_dir: "/var/lib/libvirt/images/" # directory where qcow images will be placed.
          vm_bridge_ip: 172.90.12.141 # IP for the bridge between VMs and machine network
          vm_bridge_name: caas-br
          SETUP_VM_BRIDGE: false
          vm_bridge_interface: bond0.902 # Interface to be connected to the bridge. DO NOT use your primary interface.
          cert_vars_host_var_key: registry_host # Look up cert values from another host by name (excluding cert_common_name)
          dns: 172.90.12.241


    # Describe the desired cluster members
    nodes:
      # A minimum of three master nodes are required. More are supported.
      # Worker nodes are not required, but if present there must be two or more.
      #
      # Node Required Vars:
      # - role
      #     - Must be either "master" or "worker", and must match the group
      #
      # - mac
      #     - The MAC address of the node, used as a hardware identifier by Assisted Installer.
      #     - The value set here will be used when creating VMs and must be unique within the network.
      #
      # - vendor
      #     - One of "Dell", "HPE", "Lenovo", "SuperMicro", "KVM", "PXE" as the supported BMC APIs.
      #     - "KVM" identifies a node as a VM to be created. If a "KVM" node is present,
      #       then a "vm_host" must be defined in the node and a host with that name must exist
      #       inside the "vm_hosts" group.
      #     - "PXE" identifies a node as a baremetal that needs to boot from PXE.
      #
      # - bmc_address
      # - bmc_user
      # - bmc_password
      #     - details for the BMC that controls the node.
      #     - Must be set to the vm_host for "KVM" nodes.
      #
      # Static IP Vars:
      #   See docs/inventory.md: Network configuration section
      #
      # Optional Vars:
      # - vm_spec
      #     - Specifications for the node:
      #          - cpu_cores
      #          - ram_mib
      #          - disk_size_gb
      #
      # - installation_disk_path
      #     - The value set here will be used by Assisted Installer as the installation disk device
      #       for a given host.
      #     - The value must be a path to the disk device, e.g. /dev/sda
      #     - If not specified, Assisted Installer will pick the first enumerated disk device for a
      #       given host.
      vars:
        # Set the login information for any BMCs. Note that these will be SET on the vm_host virtual BMC.
        bmc_user: "{{ VAULT_NODES_BMC_USER | mandatory }}"
        bmc_password: "{{ VAULT_NODES_BMC_PASSWORD | mandatory }}"
        dns1: "172.90.12.241"
        gateway: "172.90.12.254"
        mask: 24
        network_config:
          dns_server_ips:
            - "{{ dns1 }}"
          interfaces:
            -
              addresses:
                ipv4:
                  -
                    ip: "{{ ansible_host }}"
                    prefix: "{{ mask }}"
              dhcp: false
              mac: "{{ mac }}"
              name: enp1s0
              state: up
              type: ethernet
          routes:
            -
              address: "{{ gateway }}"
              destination: 0.0.0.0/0
              interface: enp1s0
      children:
        masters:
          vars:
            role: master
            vendor: KVM # this example is a virtual control plane
            bmc_address: "172.90.12.241:8082" # port can be changed using sushy_tools_port on the vm_host
            vm_host: vm_host1
            vm_spec:
              cpu_cores: 18
              ram_mib: 40960
              disk_size_gb: 120
          hosts:
            master-super1:
              ansible_host: 172.90.12.244
              mac: "DE:AD:BE:EF:C0:2C"
              installation_disk_path: /dev/vda

              # # Uncomment to set custom BMC credentials for the node
              # # These variables must be set in the inventory.vault.yml file
              # bmc_user: "{{ VAULT_NODES_SUPER1_BMC_USER | mandatory }}"
              # bmc_password: "{{ VAULT_NODES_SUPER1_BMC_PASSWORD | mandatory }}"

            master-super2:
              ansible_host: 172.90.12.245
              mac: "DE:AD:BE:EF:C0:2D"
              installation_disk_path: /dev/vda

            master-super3:
              ansible_host: 172.90.12.246
              mac: "DE:AD:BE:EF:C0:2E"
              installation_disk_path: /dev/vda

        workers:
          vars:
            role: worker
            vendor: KVM # this example is a virtual control plane
            bmc_address: "172.90.12.141:8082" # port can be changed using sushy_tools_port on the vm_host
            vm_host: vm_host2
            vm_spec:
              cpu_cores: 16
              ram_mib: 40960
              disk_size_gb: 120
          hosts:
            ocs-worker1:
              ansible_host: 172.90.12.247
              mac: "DE:AD:BE:EF:C0:1C"

              # # Uncomment to set custom BMC credentials for the node
              # # These variables must be set in the inventory.vault.yml file
              # bmc_user: "{{ VAULT_NODES_SUPER1_BMC_USER | mandatory }}"
              # bmc_password: "{{ VAULT_NODES_SUPER1_BMC_PASSWORD | mandatory }}"
              installation_disk_path: /dev/vda

            ocs-worker2:
              ansible_host: 172.90.12.248
              mac: "DE:AD:BE:EF:C0:1D"
              installation_disk_path: /dev/vda


            ocs-worker3:
              ansible_host: 172.90.12.249
              mac: "DE:AD:BE:EF:C0:1E"
              installation_disk_path: /dev/vda

Required statements

  • I have removed all sensitive details from the attached logs and inventory files.
  • I acknowledge that Red Hat does not provide commercial support for the content of this repository.
  • I acknowledge that any assistance is offered purely on a best-effort basis, as resource permits.
@shashank-6777 shashank-6777 added the bug Something isn't working label Oct 11, 2023
@arjuhe
Copy link
Collaborator

arjuhe commented Oct 12, 2023

At the moment crucible isn't compatible with RHEL9 yet. We are currently in the planning stages of this work.

@shashank-6777
Copy link
Author

At the moment crucible isn't compatible with RHEL9 yet. We are currently in the planning stages of this work.

@arjuhe thanks for the information, In the README file it says RHEL9.1 based bastion node i thought crucible supports RHEL9.
Anyway thanks for your support

@nocturnalastro
Copy link
Collaborator

nocturnalastro commented Oct 16, 2023

As this is happening on the nodes I don't think this is a issue with the bastion. Have you tried this against inventory with a RHEL 8 bastion?

Perhaps its an issue with the rootfs version. What are the versions you have used for your 4.13 os_image and release_image.

@shashank-6777
Copy link
Author

Hi Thanks for your reponse,

No @nocturnalastro i did'nt try this with RHEL 8 based bastion. Actually i wanted to tested this framwork with RHEL 9.x based bastion.
I have tried with different combination i mean with 4.13.0 & 4.13.10 both

@shashank-6777
Copy link
Author

shashank-6777 commented Oct 19, 2023

HI @nocturnalastro ,

I have tried with multiple combination but this framwork have some issue with 4.13.10. Because we are using this framwork from last more than a year with 4.8.x to 4.11.x but we never face such issue.

I have performed below combination.
combination 1:
VMhost :- 9.1
bastion :- 9.1
OCP : 4.13.10
Arch:- x86_64
rhcos images:-
live :- rhcos-4.13.10-x86_64-live.x86_64.iso, rhcos-4.13.5-x86_64-live.x86_64.iso, rhcos-4.13.0-x86_64-live.x86_64.iso
rootfs:- rhcos-4.13.10-x86_64-live-rootfs.x86_64.img, rhcos-4.13.5-x86_64-live-rootfs.x86_64.img, rhcos-4.13.0-x86_64-live-rootfs.x86_64.img

combination 2:
VMhost :- 8.6
bastion :- 9.1
OCP : 4.13.10
Arch:- x86_64
rhcos images:-
live :- rhcos-4.13.10-x86_64-live.x86_64.iso, rhcos-4.13.5-x86_64-live.x86_64.iso, rhcos-4.13.0-x86_64-live.x86_64.iso
rootfs:- rhcos-4.13.10-x86_64-live-rootfs.x86_64.img, rhcos-4.13.5-x86_64-live-rootfs.x86_64.img, rhcos-4.13.0-x86_64-live-rootfs.x86_64.img

combination 3:
VMhost :- 8.6
bastion :- 8.6
OCP : 4.13.10
Arch:- x86_64
rhcos images:-
live :- rhcos-4.13.10-x86_64-live.x86_64.iso, rhcos-4.13.5-x86_64-live.x86_64.iso, rhcos-4.13.0-x86_64-live.x86_64.iso
rootfs:- rhcos-4.13.10-x86_64-live-rootfs.x86_64.img, rhcos-4.13.5-x86_64-live-rootfs.x86_64.img, rhcos-4.13.0-x86_64-live-rootfs.x86_64.img

Every time i got the similar issue. Requesting you to please check once if you have time.
Attaching error snippit for your reference.
error1
error2

However my chrony server working fine also my vm_host have 3.2T disk space for vm's.

@shashank-6777
Copy link
Author

Hi @nocturnalastro

Good morning, Today when i tried to reinstall i checked my hosts details on assisted installer and i found that assisted installer displayed that my disk (ODD) is small however my installation disk is VDA(HDD)with 250G of size.image

@nocturnalastro
Copy link
Collaborator

Its fine as sr0 is not the installation disk. That is probably a virtual CD or something like that

@shashank-6777
Copy link
Author

@nocturnalastro thanks for your response, Do you have any idea that why it is failing then ??
Thanks in advance

@nocturnalastro
Copy link
Collaborator

nocturnalastro commented Oct 25, 2023

@shashank-6777 Try bumping the ai_version to a new version looks like the latest is v2.26.0

@shashank-6777
Copy link
Author

@nocturnalastro thanks for the suggestion.
I tried the same and now i can see installation triggered sucessfully.

So it means with 4.13.x this ai_version are supported ?
Also if i tried with Bastion node based on RHEL9 will it work also ?

Thanks in advance

image

@nocturnalastro
Copy link
Collaborator

nocturnalastro commented Oct 25, 2023

So it means with 4.13.x this ai_version are supported ?
Looks like there was a bug which has been fixed in the newer versions of assisted isntaller.

Also if i tried with Bastion node based on RHEL9 will it work also ?
If you got to the point of booting then it would probably.
We've had issues with missing packages in RHEL9 which meant we couldn't get to booting.

@shashank-6777
Copy link
Author

@nocturnalastro I got your point.
Thanks for your support.

@novacain1
Copy link
Member

@shashank-6777 were you using VirtIO based disks on your Bastion? I have seen terrible write induced latencies at times not using disks backed by VirtIO. This also includes older firmware on the SAS/SCSI based controllers that may be used.

In general, we recommend SAS or NVMe class drives on the VM host for this. Mechanical spinners (HDD) offer much lower IOPS both read and write, and are oftentimes insufficient for low latency requirements for etcd (control plane supervisor nodes) in OpenShift. I personally use SAS based SSD drives by default, otherwise NVMe class block devices.

@shashank-6777
Copy link
Author

@novacain1 thanks for your response.

were you using VirtIO based disks on your Bastion? I have seen terrible write induced latencies at times not using disks backed by VirtIO. This also includes older firmware on the SAS/SCSI based controllers that may be used.
Yes I am using VirtIO based disks Physical disks are SAS based HDD.

In general, we recommend SAS or NVMe class drives on the VM host for this. Mechanical spinners (HDD) offer much lower IOPS both read and write, and are oftentimes insufficient for low latency requirements for etcd (control plane supervisor nodes) in OpenShift. I personally use SAS based SSD drives by default, otherwise NVMe class block devices.
I understand and totally agree with you, but its just a internal test setup.

@shashank-6777
Copy link
Author

shashank-6777 commented Oct 30, 2023

@nocturnalastro, yes installation failed/stuck after "installation step successfull" with RHEL 9.1 based bastion.

Just wanted to check if you guys have any planned date to release this framwork with 9.x support.

@nocturnalastro
Copy link
Collaborator

@shashank-6777 We're currently working on it shouldn't be to much longer.

@shashank-6777
Copy link
Author

@nocturnalastro thanks for the update. Please let me know when can i test the same in my lab.

@nocturnalastro
Copy link
Collaborator

nocturnalastro commented Nov 30, 2023

@shashank-6777 PR #293 has the changes that got crucible working for us on RHEL 9, note there is some changes in the README as some packages have changed location in the RHEL9 repos.

@shashank-6777
Copy link
Author

@nocturnalastro i hope you are doing good.

is 9.1 officially supported by crucible now or still have to wait ?

@novacain1
Copy link
Member

@shashank-6777 I have gotten success with the PR mentioned and using a KVM host based on RHEL 9.3 recently. My use case was that server was also the place I launched the playbooks from.

@shashank-6777
Copy link
Author

@novacain1 thanks for the update and your support as always. Let me try it in my lab. I will try to host my Master's on different machine.

@shashank-6777
Copy link
Author

shashank-6777 commented Jan 24, 2024

@novacain1 i tried to install cluster with RHEL 9.2. but getting below error, however expected packages are already installed on the bastion node.

[root@sh-lab-sc5f-bastion-4 crucible]# rpm -qa |grep ansible
ansible-core-2.14.2-5.el9_2.x86_64
[root@sh-lab-sc5f-bastion-4 crucible]#
[root@sh-lab-sc5f-bastion-4 crucible]# cat /etc/redhat-release
Red Hat Enterprise Linux release 9.2 (Plow)
[root@sh-lab-sc5f-bastion-4 crucible]#
[root@sh-lab-sc5f-bastion-4 crucible]# uname -r
5.14.0-284.48.1.el9_2.x86_64
[root@sh-lab-sc5f-bastion-4 crucible]# **uname -**a
Linux sh-lab-sc5f-bastion-4.14-nec.nfvi.localdomain 5.14.0-284.48.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jan 4 03:49:47 EST 2024 x86_64 x86_64 x86_64 GNU/Linux
[root@sh-lab-sc5f-bastion-4 crucible]#
[root@sh-lab-sc5f-bastion-4 crucible]# ll /usr/bin/python
python python3 python3.11 python3.9

ERROR:-

TASK [validate_inventory : Assert required vars are correctly typed] *****************************************************************************************************************************************
fatal: [localhost -> {{ validation_host | default('bastion') }}]: FAILED! =>
  msg: 'The conditional check ''(hostvars[item][''mac''] | ansible.utils.hwaddr(''bool'')) == true'' failed. The error was: Failed to import the required Python library (netaddr) on sh-lab-sc5f-bastion-4.14-nec.nfvi.localdomain''s Python /usr/bin/python3.11. Please read the module documentation and install it in the appropriate location. If the required library is installed, but Ansible is using the wrong Python interpreter, please consult the documentation on ansible_python_interpreter'

PLAY RECAP ***************************************************************************************************************************************************************************************************
localhost                  : ok=8    changed=0    unreachable=0    failed=1    skipped=4    rescued=0    ignored=0

@nocturnalastro
Copy link
Collaborator

It can be confusing with multiple python versions around. It is likely using python3.9. Try python3.9 -c "import netaddr" if that gives you an import try using pip if it that doesn't clean it up try the same with python3.11. Its likely that the rpm package for netaddr and the version that ansible is using have a mismatch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants