Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugins/ocp: Use structure for ocp smart log #2604

Merged
merged 3 commits into from
Jan 8, 2025
Merged

Conversation

sc108-lee
Copy link
Contributor

printed log tested (same), except below
Remove wrong duplicated prints from stdout
NVMe base errata version
NVMe command set errata version

@igaw
Copy link
Collaborator

igaw commented Dec 20, 2024

I've merged #2577, so this one needs a rebase. Thanks!

@hmi-jeon hmi-jeon force-pushed the smart branch 6 times, most recently from 1b03f68 to b7a4dd9 Compare December 23, 2024 05:18
@hmi-jeon
Copy link
Contributor

Hi @igaw

I rebased it. Thanks!

@igaw
Copy link
Collaborator

igaw commented Jan 3, 2025

I am not seeing any layout changes:

struct ocp_smart_extended_log {
        __u8                       physical_media_units_written[16]; /*     0    16 */
        __u8                       physical_media_units_read[16]; /*    16    16 */
        __u8                       bad_user_nand_blocks_raw[6]; /*    32     6 */
        __le16                     bad_user_nand_blocks_normalized; /*    38     2 */
        __u8                       bad_system_nand_blocks_raw[6]; /*    40     6 */
        __le16                     bad_system_nand_blocks_normalized; /*    46     2 */
        __le64                     xor_recovery_count;   /*    48     8 */
        __le64                     uncorrectable_read_err_count; /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        __le64                     soft_ecc_err_count;   /*    64     8 */
        __le32                     end_to_end_detected_err; /*    72     4 */
        __le32                     end_to_end_corrected_err; /*    76     4 */
        __u8                       system_data_used_percent; /*    80     1 */
        __u8                       refresh_counts[7];    /*    81     7 */
        __le32                     user_data_erase_count_max; /*    88     4 */
        __le32                     user_data_erase_count_min; /*    92     4 */
        __u8                       thermal_throttling_event_count; /*    96     1 */
        __u8                       thermal_throttling_current_status; /*    97     1 */
        __u8                       dssd_errata_version;  /*    98     1 */
        __le16                     dssd_point_version;   /*    99     2 */
        __le16                     dssd_minor_version;   /*   101     2 */
        __u8                       dssd_major_version;   /*   103     1 */
        __le64                     pcie_correctable_err_count; /*   104     8 */
        __le32                     incomplete_shoutdowns; /*   112     4 */
        __u8                       rsvd116[4];           /*   116     4 */
        __u8                       percent_free_blocks;  /*   120     1 */
        __u8                       rsvd121[7];           /*   121     7 */
        /* --- cacheline 2 boundary (128 bytes) --- */
        __le16                     capacitor_health;     /*   128     2 */
        __u8                       nvme_base_errata_version; /*   130     1 */
        __u8                       nvme_cmdset_errata_version; /*   131     1 */
        __u8                       rsvd132[4];           /*   132     4 */
        __le64                     unaligned_io;         /*   136     8 */
        __le64                     security_version;     /*   144     8 */
        __le64                     total_nuse;           /*   152     8 */
        __u8                       plp_start_count[16];  /*   160    16 */
        __u8                       endurance_estimate[16]; /*   176    16 */
        /* --- cacheline 3 boundary (192 bytes) --- */
        __le64                     pcie_link_retaining_count; /*   192     8 */
        __le64                     power_state_change_count; /*   200     8 */
        __le64                     lowest_permitted_fw_rev; /*   208     8 */
        __u8                       rsvd216[278];         /*   216   278 */
        /* --- cacheline 7 boundary (448 bytes) was 46 bytes ago --- */
        __le16                     log_page_version;     /*   494     2 */
        __u8                       log_page_guid[16];    /*   496    16 */

        /* size: 512, cachelines: 8, members: 41 */
} __attribute__((__packed__));
struct ocp_smart_extended_log {
        __u8                       physical_media_units_written[16]; /*     0    16 */
        __u8                       physical_media_units_read[16]; /*    16    16 */
        __u8                       bad_user_nand_blocks_raw[6]; /*    32     6 */
        __le16                     bad_user_nand_blocks_normalized; /*    38     2 */
        __u8                       bad_system_nand_blocks_raw[6]; /*    40     6 */
        __le16                     bad_system_nand_blocks_normalized; /*    46     2 */
        __le64                     xor_recovery_count;   /*    48     8 */
        __le64                     uncorrectable_read_err_count; /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        __le64                     soft_ecc_err_count;   /*    64     8 */
        __le32                     end_to_end_detected_err; /*    72     4 */
        __le32                     end_to_end_corrected_err; /*    76     4 */
        __u8                       system_data_used_percent; /*    80     1 */
        __u8                       refresh_counts[7];    /*    81     7 */
        __le32                     user_data_erase_count_max; /*    88     4 */
        __le32                     user_data_erase_count_min; /*    92     4 */
        __u8                       thermal_throttling_event_count; /*    96     1 */
        __u8                       thermal_throttling_current_status; /*    97     1 */
        __u8                       dssd_errata_version;  /*    98     1 */
        __u8                       dssd_point_version[2]; /*    99     2 */
        __u8                       dssd_minor_version[2]; /*   101     2 */
        __u8                       dssd_major_version;   /*   103     1 */
        __le64                     pcie_correctable_err_count; /*   104     8 */
        __le32                     incomplete_shoutdowns; /*   112     4 */
        __u8                       rsvd116[4];           /*   116     4 */
        __u8                       percent_free_blocks;  /*   120     1 */
        __u8                       rsvd121[7];           /*   121     7 */
        /* --- cacheline 2 boundary (128 bytes) --- */
        __le16                     capacitor_health;     /*   128     2 */
        __u8                       nvme_base_errata_version; /*   130     1 */
        __u8                       nvme_cmdset_errata_version; /*   131     1 */
        __u8                       rsvd132[4];           /*   132     4 */
        __le64                     unaligned_io;         /*   136     8 */
        __le64                     security_version;     /*   144     8 */
        __le64                     total_nuse;           /*   152     8 */
        __u8                       plp_start_count[16];  /*   160    16 */
        __u8                       endurance_estimate[16]; /*   176    16 */
        /* --- cacheline 3 boundary (192 bytes) --- */
        __le64                     pcie_link_retaining_count; /*   192     8 */
        __le64                     power_state_change_count; /*   200     8 */
        __le64                     lowest_permitted_fw_rev; /*   208     8 */
        __u8                       rsvd216[278];         /*   216   278 */
        /* --- cacheline 7 boundary (448 bytes) was 46 bytes ago --- */
        __le16                     log_page_version;     /*   494     2 */
        __u8                       log_page_guid[16];    /*   496    16 */

        /* size: 512, cachelines: 8, members: 41 */
};

This is with gcc on x86_64. How do you get a different layout?

@hmi-jeon
Copy link
Contributor

hmi-jeon commented Jan 6, 2025

struct is 4bytes align.
If packed is not used, align padding will occur in dssd_point_version.
So, I aligned to not use packed.
__u16 -> __u8[2]

  packed packed X packed X (__u16 -> __u8 [2])
size 512 520 512

image

@hmi-jeon hmi-jeon force-pushed the smart branch 4 times, most recently from 0e2127f to eb9a59b Compare January 7, 2025 08:45
@hmi-jeon
Copy link
Contributor

hmi-jeon commented Jan 7, 2025

Hi @igaw

I resolved the conflict.
Thanks.

printed log tested (same), except below
Remove wrong duplicated prints from stdout
  NVMe base errata version
  NVMe command set errata version

Signed-off-by: Steven Seungcheol Lee <sc108.lee@samsung.com>
@igaw
Copy link
Collaborator

igaw commented Jan 7, 2025

Urgh! This is a nasty thing in the spec.

Anyway, you can't just cast the bytes to a __le16 type, because some architectures will horrible fail if you try to access a non aligned data type. x86 allows though.

You need to copy the value into a proper aligned buffer before you can access it with le16_to_cpu.

And the compiler should actually warn but since you add a hard cast on it, you silence it up. That's why I really like to see __packed go away. This is one of the reason __packed is almost always wrong.

@hmi-jeon
Copy link
Contributor

hmi-jeon commented Jan 8, 2025

Hi @igaw

I thought it was right to move the data to uint16_t as you said, so I fixed it.
If you have a better solution, please let me know.

Additionally, I updated ocp v2.6 spec for smart information extended.
Thank you for your review.

hmi-jeon and others added 2 commits January 8, 2025 14:37
If the dssd point, minor version are declared as __le16, the alignment
will be broken. Remove __packed keyword.
Change parameter void to ocp_smart_extended_log.

Reported-by: Steven Seungcheol Lee <sc108.lee@samsung.com>
Signed-off-by: Minsik Jeon <hmi.jeon@samsung.com>
Log Identifier C0h.
Spec Documents:
https://www.opencompute.org/documents/datacenter-nvme-ssd-specification-v2-6-2-pdf

Signed-off-by: Minsik Jeon <hmi.jeon@samsung.com>
Co-authored-by: Steven Seungcheol Lee <sc108.lee@samsung.com>
@igaw igaw merged commit d4b5f77 into linux-nvme:master Jan 8, 2025
16 of 17 checks passed
@igaw
Copy link
Collaborator

igaw commented Jan 8, 2025

I thought it was right to move the data to uint16_t as you said, so I fixed it. If you have a better solution, please let me know.

Looks good. As far I am aware the memcpy approach is the canonical way to address the alignment issue.

Additionally, I updated ocp v2.6 spec for smart information extended. Thank you for your review.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants