Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-defined meta-data can create malformed YAML #13853

Open
3 tasks done
holmanb opened this issue Aug 1, 2024 · 12 comments
Open
3 tasks done

User-defined meta-data can create malformed YAML #13853

holmanb opened this issue Aug 1, 2024 · 12 comments
Labels
Bug Confirmed to be a bug
Milestone

Comments

@holmanb
Copy link
Member

holmanb commented Aug 1, 2024

Required information

  • Distribution: Ubuntu Noble
  • The output of "snap list --all lxd core20 core22 core24 snapd":
Name    Version      Rev    Tracking       Publisher   Notes
core20  20240227     2264   latest/stable  canonical✓  base,disabled
core20  20240416     2318   latest/stable  canonical✓  base
core22  20240111     1122   latest/stable  canonical✓  base,disabled
core22  20240408     1380   latest/stable  canonical✓  base
lxd     6.1-0d4d89b  29469  latest/stable  canonical✓  disabled
lxd     6.1-c14927a  29551  latest/stable  canonical✓  -
snapd   2.62         21465  latest/stable  canonical✓  snapd,disabled
snapd   2.63         21759  latest/stable  canonical✓  snapd
  • The output of "lxc info" or if that fails:
config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- storage_api_remote_volume_snapshot_copy
- zfs_delegate
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- init_preseed_storage_volumes
- metrics_instances_count
- server_instance_type_info
- resources_disk_mounted
- server_version_lts
- oidc_groups_claim
- loki_config_instance
- storage_volatile_uuid
- import_instance_devices
- instances_uefi_vars
- instances_migration_stateful
- container_syscall_filtering_allow_deny_syntax
- access_management
- vm_disk_io_limits
- storage_volumes_all
- instances_files_modify_permissions
- image_restriction_nesting
- container_syscall_intercept_finit_module
- device_usb_serial
- network_allocate_external_ips
- explicit_trust_token
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: holmanb
auth_user_method: unix
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIB+zCCAYCgAwIBAgIQdDx+LXwGuHE6lUh7Eidt5jAKBggqhkjOPQQDAzAxMRww
    GgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMREwDwYDVQQDDAhyb290QGFyYzAe
    Fw0yMTEwMjExNDE4MzlaFw0zMTEwMTkxNDE4MzlaMDExHDAaBgNVBAoTE2xpbnV4
    Y29udGFpbmVycy5vcmcxETAPBgNVBAMMCHJvb3RAYXJjMHYwEAYHKoZIzj0CAQYF
    K4EEACIDYgAEZVKG/5oSol3bL/KYIaIag7xM7QEAUe0KsNcW44JNMRWWjKEC1bYy
    RPf7dabQywL2pNeiWYUPpXtEzQEMthpCrFH1tYWwCxbab0I8xXP5nio+qyEoZ76B
    qIwept8PNb9xo10wWzAOBgNVHQ8BAf8EBAMCBaAwEwYDVR0lBAwwCgYIKwYBBQUH
    AwEwDAYDVR0TAQH/BAIwADAmBgNVHREEHzAdggNhcmOHBH8AAAGHEAAAAAAAAAAA
    AAAAAAAAAAEwCgYIKoZIzj0EAwMDaQAwZgIxAMLGwnrmbcb2QpQusAGqqYR7/tri
    dnZFXK0w7sbpndc+9XMuoKpEf9VOVCh90EQtdgIxAOJeO3egwenHJ9S4CVyrK0ON
    lKbu/QQBW0XJ77VVIKKP/OIOyAIJXncOkOxip5XMEQ==
    -----END CERTIFICATE-----
  certificate_fingerprint: 78d858acdbbb797d36863a910368bc41311b2c5eb1c3b11287c0966c7f58c962
  driver: qemu | lxc
  driver_version: 8.2.1 | 6.0.0
  instance_types:
  - virtual-machine
  - container
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 6.8.0-38-generic
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "24.04"
  project: default
  server: lxd
  server_clustered: false
  server_event_mode: full-mesh
  server_name: arc
  server_pid: 172560
  server_version: "6.1"
  server_lts: false
  storage: dir
  storage_version: "1"
  storage_supported_drivers:
  - name: cephfs
    version: 17.2.7
    remote: true
  - name: cephobject
    version: 17.2.7
    remote: true
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.48.0
    remote: false
  - name: powerflex
    version: 1.16 (nvme-cli)
    remote: true
  - name: zfs
    version: 2.2.2-0ubuntu9
    remote: false
  - name: btrfs
    version: 5.16.2
    remote: false
  - name: ceph
    version: 17.2.7
    remote: true

Issue description

Problem

The user-defined meta-data key gets appended as a string to the lxd-provided meta-data. This means that duplicate keys can be added, which creates a configuration that isn't well defined. Both 1.1 and 1.2 of the YAML spec state that keys are unique, which this violates.

The configuration received by cloud-init:

{'_metadata_api_version': '1.0',
 'config': {'user.meta-data': 'instance-id: test_2'},
 'devices': {'eth0': {'hwaddr': '00:16:3e:e3:ed:2c',
                      'name': 'eth0',
                      'network': 'lxdbr0',
                      'type': 'nic'},
             'root': {'path': '/', 'pool': 'default', 'type': 'disk'}},
 'meta-data': '#cloud-config\n'
              'instance-id: 0b6c31e2-403c-44eb-b610-ad7eafea777e\n'
              'local-hostname: oracular\n'
              'instance-id: test_2'}

Cloud-init's implementation uses PyYAML which happens to use the last defined key - which happens to produce the desired outcome (allow user to override the default meta-data), but it depends on undefined behavior of a specific library. If cloud-init were ever to move to a different YAML library this behavior could break or need to be manually worked around.

In order to preserve the current behavior while creating a path to using standard-compliant yaml while preserving backwards compatibility, we could do the following:

  1. cloud-init could be updated to make values in metadata['config']['user.meta-data'] override values in metadata['meta-data']. This wouldn't change cloud-init's current behavior, which ignores the values in metadata['config']. We could optionally check for a bump to the value in _metadata_api_version before doing this, but this wouldn't be strictly required since this is functionally identical currently.

  2. Once stable distributions have this update, we could update the api to no longer append user meta-data to the default metadata (and bump the meta-data api, if desired). While we're making this change, we might want to drop the #cloud-config comment too. This isn't necessary because meta-data isn't part of cloud-config.

canonical/cloud-init#5575

Information to attach

  • Container log (lxc info NAME --show-log)
Name: cloudinit-0801-1919380a56vdl6
Status: RUNNING
Type: container
Architecture: x86_64
PID: 1040232
Created: 2024/08/01 13:19 MDT
Last Used: 2024/08/01 13:42 MDT

Resources:
  Processes: 69
  CPU usage:
    CPU usage (in seconds): 6
  Memory usage:
    Memory (current): 83.53MiB
    Swap (current): 28.00KiB
  Network usage:
    eth0:
      Type: broadcast
      State: UP
      Host interface: vethd9b8b75f
      MAC address: 00:16:3e:9a:8b:f6
      MTU: 1500
      Bytes received: 115.82kB
      Bytes sent: 5.29kB
      Packets received: 454
      Packets sent: 52
      IP addresses:
        inet:  10.161.80.194/24 (global)
        inet6: fd42:80e2:4695:1e96:216:3eff:fe9a:8bf6/64 (global)
        inet6: fe80::216:3eff:fe9a:8bf6/64 (link)
    lo:
      Type: loopback
      State: UP
      MTU: 65536
      Bytes received: 404B
      Bytes sent: 404B
      Packets received: 4
      Packets sent: 4
      IP addresses:
        inet:  127.0.0.1/8 (local)
        inet6: ::1/128 (local)

Log:

lxc cloudinit-0801-1919380a56vdl6 20240801194228.855 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194228.855 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194228.857 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194228.857 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194243.782 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194243.782 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194243.795 ERROR    attach - ../src/src/lxc/attach.c:lxc_attach_run_command:1841 - No such file or directory - Failed to exec "user.meta-data"
lxc cloudinit-0801-1919380a56vdl6 20240801194325.518 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194325.518 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194417.803 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194417.803 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801195046.604 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801195046.604 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801201625.883 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801201625.883 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
  • Container configuration (lxc config show NAME --expanded)
architecture: x86_64
config:
  image.architecture: x86_64
  image.description: Ubuntu 20.04 LTS server (20240730)
  image.os: ubuntu
  image.release: focal
  limits.cpu.allowance: 50%
  user.meta-data: 'instance-id: test_2'
  volatile.base_image: c19cc6a8469b596aae092a3953e326ed01e1183a25bff1d26145a85a2272767e
  volatile.cloud-init.instance-id: 7d26c435-da56-405c-9b04-9ad98f550736
  volatile.eth0.host_name: vethd9b8b75f
  volatile.eth0.hwaddr: 00:16:3e:9a:8b:f6
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.last_state.ready: "false"
  volatile.uuid: a097111b-15e4-45e4-aa31-a6da707012a8
  volatile.uuid.generation: a097111b-15e4-45e4-aa31-a6da707012a8
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
  • Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
time="2024-07-22T10:13:23-06:00" level=warning msg=" - Couldn't find the CGroup network priority controller, per-instance network priority will be ignored. Please use per-device limits.priority instead"
time="2024-07-31T07:48:45-06:00" level=warning msg="Skipping AppArmor for dnsmasq due to raw.dnsmasq being set" driver=bridge name=lxdbr0 network=lxdbr0 project=default
time="2024-07-31T07:49:29-06:00" level=warning msg="Skipping AppArmor for dnsmasq due to raw.dnsmasq being set" driver=bridge name=lxdbr0 network=lxdbr0 project=default
time="2024-07-31T07:50:07-06:00" level=warning msg="Skipping AppArmor for dnsmasq due to raw.dnsmasq being set" driver=bridge name=lxdbr0 network=lxdbr0 project=default
time="2024-07-31T07:50:07-06:00" level=warning msg="Skipping AppArmor for dnsmasq due to raw.dnsmasq being set" driver=bridge name=lxdbr0 network=lxdbr0 project=default
time="2024-07-31T07:50:26-06:00" level=warning msg="Skipping AppArmor for dnsmasq due to raw.dnsmasq being set" driver=bridge name=lxdbr0 network=lxdbr0 project=default
time="2024-07-31T07:50:33-06:00" level=warning msg="Skipping AppArmor for dnsmasq due to raw.dnsmasq being set" driver=bridge name=lxdbr0 network=lxdbr0 project=default
@tomponline
Copy link
Member

Hi @holmanb

I'm afraid im not really following what it is that LXD needs to change here?

Also, not sure if relevant, but using the user.* prefix is deprecated for cloud-init config and the current support keys start with cloud-init., see https://documentation.ubuntu.com/lxd/en/latest/reference/instance_options/#instance-options-cloud-init

@tomponline tomponline added the Incomplete Waiting on more information from reporter label Aug 2, 2024
@holmanb
Copy link
Member Author

holmanb commented Aug 2, 2024

Thanks for the response @tomponline!

I'm afraid im not really following what it is that LXD needs to change here?

This is the offending line. See the commit on this branch for the change that I am proposing.

I'm happy to submit a PR for this, but we need to release a change in cloud-init first to accommodate this expectation. This is why I filed a bug report rather than just a PR - I want to make sure that the proposed solution is acceptable before moving forward it.

Also, not sure if relevant, but using the user.* prefix is deprecated for cloud-init config and the current support keys start with cloud-init., see https://documentation.ubuntu.com/lxd/en/latest/reference/instance_options/#instance-options-cloud-init

The relevant key is user.meta-data, which wasn't actually deprecated (despite being used for exposing information to cloud-init just like the others):

$ lxc launch ubuntu:noble me -c cloud-init.meta-data=instance-'id: test_1'
Creating me
Error: Failed instance creation: Failed creating instance record: Unknown configuration key: cloud-init.meta-data
$ lxc launch ubuntu:noble me -c user.meta-data=instance-'id: test_1'
Creating me
Starting me                               

similarly:

$ lxc config set me user.meta-data=instance-'id: test_1'    
$ lxc config set me cloud-init.meta-data=instance-'id: test_1'
Error: Invalid config: Unknown configuration key: cloud-init.meta-data

If you want to deprecate the user.meta-data key as well for uniformity I could potentially make cloud-init support a new cloud-init.meta-data key while making this change. Let me know.

@tomponline
Copy link
Member

I'm happy to submit a PR for this, but we need to release a change in cloud-init first to accommodate this expectation. This is why I filed a bug report rather than just a PR - I want to make sure that the proposed solution is acceptable before moving forward it.

Thanks!

Will this break users of LXD guests with older versions of cloud-init?

@tomponline
Copy link
Member

The relevant key is user.meta-data, which wasn't actually deprecated (despite being used for exposing information to cloud-init just like the others):

Hrm, that is curious, I wasn't expecting that, but I'd need to dig into the commit history and original pull requests to try and understand why this wasn't originally changed to have a cloud-init. prefix like the other keys, as it seems to be like it should.

@tomponline
Copy link
Member

tomponline commented Aug 5, 2024

This isn't necessary because meta-data isn't part of cloud-config.

Please could you explain this statement. I'm confused why a key being used by cloud-init isn't part of cloud-config?

@tomponline
Copy link
Member

In order to preserve the current behavior while creating a path to using standard-compliant yaml while preserving backwards compatibility, we could do the following:

1. cloud-init could be updated to make values in `metadata['config']['user.meta-data']` override values in `metadata['meta-data']`. This wouldn't change cloud-init's current behavior, which ignores the values in `metadata['config']`. We could optionally check for a bump to the value in `_metadata_api_version` before doing this, but this wouldn't be strictly required since this is functionally identical currently.

2. Once stable distributions have this update, we could update the api to no longer append user meta-data to the default metadata (and bump the meta-data api, if desired). While we're making this change, we might want to drop the `#cloud-config` comment too. This isn't necessary because meta-data isn't part of cloud-config.

I suspect we'll need option 1. at least, and then potentially land the proposed changed in 2. for only the 6.x series of LXD.

@holmanb
Copy link
Member Author

holmanb commented Aug 5, 2024

Will this break users of LXD guests with older versions of cloud-init?

This would break any user that provides a custom instance-id (duplicate key) on an older version of cloud-init, since this would cause cloud-init to see the old key where it didn't before.

From a cloud-init perspective, fixes for bugs come in new releases so the typical stability / support recommendation is "upgrade to the latest version". If we want to avoid breaking old instances, I could probably update the proposal I made above to increment the api rev number.

The relevant key is user.meta-data, which wasn't actually deprecated (despite being used for exposing information to cloud-init just like the others):

Hrm, that is curious, I wasn't expecting that, but I'd need to dig into the commit history and original pull requests to try and understand why this wasn't originally changed to have a cloud-init. prefix like the other keys, as it seems to be like it should.

Agreed. Let me know if you'd like to go that route.

This isn't necessary because meta-data isn't part of cloud-config.

Please could you explain this statement. I'm confused why a key being used by cloud-init isn't part of cloud-config?

Cloud-config isn't required for any of the keys: vendor-data, user-data, or meta-data.

Cloud-config is just one of cloud-init's configuration formats. There are several configuration format options available for user-data and vendor-data, including cloud-config, and even just running a shell script:

config:
...
  user.user-data: |
    #!/usr/bin/bash
    echo hello | tee -a /tmp/example.txt

With the above example a user would see:

$ lxc exec me -- cat /tmp/example.txt
hello

User-data is provided by the user for the purpose of configuring an instance. Vendor-data is likewise intended to by provided by the cloud/vendor for the purpose of configuring an instance with cloud-specific information. Both vendor-data and user-data can be any of the multiple configuration formats mentioned above.

Meta-data doesn't follow any of the above formats, and is not intended to be a configuration format for the instance. Instead, it supposed to tell cloud-init just a few pieces of information about the instance: its instance_id, region, etc. The lines are blurred a bit because a couple of the keys that it supports overlap with cloud-config. One of the overlapping keys is local-hostname, which is used by lxd and probably adds to the confusion here. Neither key is defined in cloud-init's cloud-config schema.

I suspect we'll need option 1. at least, and then potentially land the proposed changed in 2. for only the 6.x series of LXD.

That sounds fine by me. Let me know if my responses here or further digging revealed anything new that suggest that we shouldn't go forward with this proposal. This PR is my proposal to option 1, if you'd like to take a look.

@tomponline
Copy link
Member

@holmanb Hi, would you mind booking a meeting to discuss this issue? Thanks

@holmanb
Copy link
Member Author

holmanb commented Sep 17, 2024

@holmanb Hi, would you mind booking a meeting to discuss this issue? Thanks

I just saw this when checking back on the status of this. I'd be happy to.

@tomponline
Copy link
Member

tomponline commented Sep 25, 2024

Thanks for the call @holmanb

As discussed, you can change the instance-id exposed to cloud-init via LXD's devlxd metadata API (https://documentation.ubuntu.com/lxd/en/latest/dev-lxd/#meta-data) by changing volatile.cloud-init.instance-id see:

https://documentation.ubuntu.com/lxd/en/latest/reference/instance_options/#instance-volatile:volatile.cloud_init.instance-id

To change local-hostname rename the instance.

I also think we should entirely remove the user.meta-data key from LXD's code base, as it is currently undocumented and appears to have been due to be removed in LXD 4.21 but was not, apparently due to an oversight:

I believe there is also a user.meta-data config key which is tied to cloud-init. Did we just forget to mention it here and in the issue, or must this remain as user.meta-data?

We will not keep that configuration key moving forward. It’s always been a very odd one with no real use cases, so it will just go away completely.

https://discuss.linuxcontainers.org/t/lxd-first-class-cloud-init-support/12559/18

See also
https://discuss.linuxcontainers.org/t/lxd-4-21-has-been-released/12860#reworked-cloud-init-support-4

Removed from docs here:

As far as I understood, it's because there's no reason for using it - the user.meta-data was originally added to set the instance name, but that isn't necessary anymore (and also doesn't work).

#11433 (comment)

There is also an issue confirming its removal here (although there's some confusion between user.user-data and user.meta-data in that thread):

#10417

@tomponline tomponline added Bug Confirmed to be a bug and removed Incomplete Waiting on more information from reporter labels Sep 25, 2024
@tomponline tomponline added this to the lxd-6.2 milestone Sep 25, 2024
@holmanb
Copy link
Member Author

holmanb commented Sep 25, 2024

Thanks @tomponline for discussing. The volatile key and instance rename should meet our needs.

Cloud-init has one test which I recently added which depends on setting the instance ID via the user.meta-data key. I will update that to use the volatile key later today; it is a trivial change.

I just submitted a PR against cloud-init to update cloud-init's lxd documentation per our conversation.

@blackboxsw
Copy link

@holmanb @tomponline we have a second use case for the user of user.meta-data in integration testing of lxd which allows cloud-init to inject default SSH public-keys configuration into all images launched in a profile without colliding or being overwritten with cloud-init.user-data provided to a system at launch. This now undocumented feature which LXD provides in user.meta-data is reminiscent of the behavior that clouds like Azure, ec2, openstack have which allows project owners or teams to set per-project ssh-public-keys that are authorized for SSH into those vms. If user.meta-data goes away, then minimally integration test runners for Ubuntu Pro and cloud-init will be force to use cloud-init.user-data configuration or cloud-init.vendor-data to setup such authorized keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug
Projects
None yet
Development

No branches or pull requests

3 participants