Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check group id, metric name, and group name uniqueness #382

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ All notable changes to this project will be documented in this file.

What's changed

* Detect duplicate group ids and report them. ([#XXX](...) by lquerel).
* Add `escape_square_brackets` into `comment_formats` markdown configuration. ([#XXX](...) by lquerel).
* Add `enforce_trailing_dots` into the `comment_formats` configuration. ([#XXX](...) by lquerel).
* Add support for `indent_type` in both the comment filter and the `comment_formats` configuration. ([#XXX](...) by lquerel).
Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion crates/weaver_diff/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ rust-version.workspace = true

[dependencies]
walkdir.workspace = true
serde_json.workspace = true
similar = "2.5.0"


[lints]
workspace = true

Expand Down
4 changes: 3 additions & 1 deletion crates/weaver_diff/allowed-external-types.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,6 @@
# SPDX-License-Identifier: Apache-2.0
# This is used with cargo-check-external-types to reduce the surface area of downstream crates from
# the public API. Ideally this can have a few exceptions as possible.
allowed_external_types = []
allowed_external_types = [
"serde_json::*",
]
316 changes: 316 additions & 0 deletions crates/weaver_diff/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

//! This crate provides bare minimum support for colorized string differencing.

use serde_json::Value;
use similar::TextDiff;
use std::cmp::Ordering;
use std::collections::HashSet;
use std::fs;
use std::path::Path;
Expand Down Expand Up @@ -117,9 +119,172 @@ pub fn diff_dir<P: AsRef<Path>>(expected_dir: P, observed_dir: P) -> std::io::Re
Ok(are_identical)
}

/// Canonicalizes a JSON string by parsing it into a `serde_json::Value` and then canonicalizing it.
///
/// # Returns
/// - The canonicalized and prettyfied JSON string.
pub fn canonicalize_json_string(value: &str) -> Result<String, serde_json::error::Error> {
let json_value: Value = serde_json::from_str(value)?;
let canonicalized = canonicalize_json(json_value);
serde_json::to_string_pretty(&canonicalized)
}

/// Recursively canonicalizes a JSON value by sorting objects and arrays.
/// - Objects: Keys are sorted lexicographically (handled automatically by `BTreeMap`).
/// - Arrays:
/// - Arrays of primitive values are sorted based on their values.
/// - Arrays containing objects or arrays are sorted based on their canonical form.
/// - Primitive Values: Returned as is.
///
/// # Returns
/// - A new `serde_json::Value` that is the canonical form of the input value.
pub fn canonicalize_json(value: Value) -> Value {
match value {
// Primitive types are returned as is.
Value::Null | Value::Bool(_) | Value::Number(_) | Value::String(_) => value,
Value::Array(arr) => {
// Recursively canonicalize each item in the array.
let mut sorted_items: Vec<Value> = arr.into_iter().map(canonicalize_json).collect();
// Sort the array using `compare_values` to ensure consistent ordering.
sorted_items.sort_by(compare_values);
Value::Array(sorted_items)
}
Value::Object(map) => {
// Recursively canonicalize each value in the object.
let sorted_map = map
.into_iter()
.map(|(k, v)| (k, canonicalize_json(v)))
.collect::<serde_json::Map<String, Value>>();
Value::Object(sorted_map)
// No need to sort keys; BTreeMap keeps them sorted.
}
}
}

/// Compares two `serde_json::Value` instances for sorting purposes.
/// Defines a total ordering among JSON values to allow sorting within arrays.
/// The order is defined as:
/// `Null < Bool < Number < String < Array < Object`
///
/// # Returns
/// - An `Ordering` indicating the relative order of `a` and `b`.
fn compare_values(a: &Value, b: &Value) -> Ordering {
match (a, b) {
// Both values are `Null`.
(Value::Null, Value::Null) => Ordering::Equal,
// `Null` is less than any other type.
(Value::Null, _) => Ordering::Less,
(_, Value::Null) => Ordering::Greater,

// Both values are booleans.
(Value::Bool(a_bool), Value::Bool(b_bool)) => a_bool.cmp(b_bool),
// `Bool` is less than `Number`, `String`, `Array`, and `Object`.
(Value::Bool(_), _) => Ordering::Less,
(_, Value::Bool(_)) => Ordering::Greater,

// Both values are numbers.
// Compare numbers as floating-point values.
(Value::Number(a_num), Value::Number(b_num)) => a_num
.as_f64()
.partial_cmp(&b_num.as_f64())
.unwrap_or(Ordering::Equal), // Handle NaN cases gracefully.
// `Number` is less than `String`, `Array`, and `Object`.
(Value::Number(_), _) => Ordering::Less,
(_, Value::Number(_)) => Ordering::Greater,

// Both values are strings.
(Value::String(a_str), Value::String(b_str)) => a_str.cmp(b_str),
// `String` is less than `Array` and `Object`.
(Value::String(_), _) => Ordering::Less,
(_, Value::String(_)) => Ordering::Greater,

// Both values are arrays.
(Value::Array(a_arr), Value::Array(b_arr)) => compare_arrays(a_arr, b_arr),
// `Array` is less than `Object`.
(Value::Array(_), _) => Ordering::Less,
(_, Value::Array(_)) => Ordering::Greater,

// Both values are objects.
(Value::Object(a_obj), Value::Object(b_obj)) => compare_objects(a_obj, b_obj),
}
}

/// Compares two JSON arrays element-wise for sorting purposes.
/// Arrays are compared based on their elements:
/// - First, by length.
/// - Then, by comparing each pair of elements in order.
///
/// # Returns
/// - An `Ordering` indicating the relative order of `a` and `b`.
fn compare_arrays(a: &[Value], b: &[Value]) -> Ordering {
// Compare lengths first.
let len_cmp = a.len().cmp(&b.len());
if len_cmp != Ordering::Equal {
return len_cmp;
}

// Compare each pair of elements.
for (a_item, b_item) in a.iter().zip(b.iter()) {
let ord = compare_values(a_item, b_item);
if ord != Ordering::Equal {
return ord;
}
}

// All elements are equal.
Ordering::Equal
}

/// Compares two JSON objects by their sorted key-value pairs for sorting purposes.
/// Objects are compared based on:
/// - Their number of key-value pairs.
/// - The keys and associated values in lexicographical order.
///
/// # Returns
/// - An `Ordering` indicating the relative order of `a` and `b`.
fn compare_objects(
a: &serde_json::Map<String, Value>,
b: &serde_json::Map<String, Value>,
) -> Ordering {
// Compare number of key-value pairs.
let len_cmp = a.len().cmp(&b.len());
if len_cmp != Ordering::Equal {
return len_cmp;
}

// Iterate over key-value pairs (keys are sorted in BTreeMap).
let mut a_iter = a.iter();
let mut b_iter = b.iter();

loop {
match (a_iter.next(), b_iter.next()) {
(Some((a_key, a_val)), Some((b_key, b_val))) => {
// Compare keys.
let key_cmp = a_key.cmp(b_key);
if key_cmp != Ordering::Equal {
return key_cmp;
}
// Compare associated values.
let val_cmp = compare_values(a_val, b_val);
if val_cmp != Ordering::Equal {
return val_cmp;
}
}
// Both iterators have reached the end.
(None, None) => break,
// This case should not occur since lengths are equal.
_ => unreachable!("Maps have the same length; this should not happen."),
}
}

// All key-value pairs are equal.
Ordering::Equal
}

#[cfg(test)]
mod tests {
use super::*;
use serde_json::json;

#[test]
fn test_diff_output() {
Expand Down Expand Up @@ -148,4 +313,155 @@ mod tests {
diff_dir(&expected_dir, &observed_dir).expect("Failed to diff directories");
assert!(!are_identical);
}

#[test]
fn test_canonicalize_primitives() {
let null = Value::Null;
assert_eq!(canonicalize_json(null.clone()), null);

let boolean = Value::Bool(true);
assert_eq!(canonicalize_json(boolean.clone()), boolean);

let number = Value::Number(serde_json::Number::from(42));
assert_eq!(canonicalize_json(number.clone()), number);

let string = Value::String(String::from("hello"));
assert_eq!(canonicalize_json(string.clone()), string);
}

#[test]
fn test_canonicalize_empty_array_and_object() {
let empty_array = json!([]);
assert_eq!(canonicalize_json(empty_array.clone()), empty_array);

let empty_object = json!({});
assert_eq!(canonicalize_json(empty_object.clone()), empty_object);
}

#[test]
fn test_canonicalize_array_of_primitives() {
let array = json!([3, 1, 2]);
let expected = json!([1, 2, 3]);
assert_eq!(canonicalize_json(array), expected);
}

#[test]
fn test_canonicalize_array_of_arrays() {
let array = json!([[3, 1, 2], [6, 5, 4]]);
let expected = json!([[1, 2, 3], [4, 5, 6]]);
assert_eq!(canonicalize_json(array), expected);
}

#[test]
fn test_canonicalize_array_of_objects() {
let array = json!([
{"b": 2, "a": 1},
{"d": 4, "c": 3}
]);
let expected = json!([
{"a": 1, "b": 2},
{"c": 3, "d": 4}
]);
assert_eq!(canonicalize_json(array), expected);
}

#[test]
fn test_canonicalize_nested_structures() {
let json_value = json!({
"b": [3, 1, 2],
"a": {"y": 2, "x": 1},
"c": [{"b": 2}, {"a": 1}]
});
let expected = json!({
"a": {
"x": 1,
"y": 2
},
"b": [
1,
2,
3
],
"c": [
{"a": 1},
{"b": 2}
]
});
assert_eq!(canonicalize_json(json_value), expected);
}

#[test]
fn test_canonicalize_numbers() {
let array = json!([3.1, 2.2, 1.3]);
let expected = json!([1.3, 2.2, 3.1]);
assert_eq!(canonicalize_json(array), expected);
}

#[test]
fn test_canonicalize_mixed_types_in_array() {
let array = json!([null, "string", 2, true, 3]);
let expected = json!([null, true, 2, 3, "string"]);
assert_eq!(canonicalize_json(array.clone()), expected);
}

#[test]
fn test_canonicalize_mixed_array() {
let array = json!([{"a": 2}, [3, 1], "b", 4]);
let expected = json!([4, "b", [1, 3], {"a": 2}]);
assert_eq!(canonicalize_json(array), expected);
}

#[test]
fn test_canonicalize_complex() {
let json_value = json!({
"z": [{"b": [3, 2, 1]}, {"a": [6, 5, 4]}],
"y": {"b": {"d": 4, "c": 3}, "a": {"b": 2, "a": 1}},
"x": [9, 7, 8]
});
let expected = json!({
"x": [7, 8, 9],
"y": {
"a": {"a": 1, "b": 2},
"b": {"c": 3, "d": 4}
},
"z": [
{"a": [4, 5, 6]},
{"b": [1, 2, 3]}
]
});
assert_eq!(canonicalize_json(json_value), expected);
}

#[test]
fn test_compare_values() {
let a = json!(1);
let b = json!(2);
assert_eq!(compare_values(&a, &b), Ordering::Less);

let a = json!("apple");
let b = json!("banana");
assert_eq!(compare_values(&a, &b), Ordering::Less);

let a = json!([1, 2, 3]);
let b = json!([1, 2, 4]);
assert_eq!(compare_values(&a, &b), Ordering::Less);

let a = json!({"a": 1});
let b = json!({"a": 2});
assert_eq!(compare_values(&a, &b), Ordering::Less);
}

#[test]
fn test_handling_of_special_numbers() {
let array = json!([0.0, -0.0, 1.0, -1.0]);
let expected = json!([-1.0, -0.0, 0.0, 1.0]);
assert_eq!(canonicalize_json(array), expected);
}

#[test]
fn test_unicode_strings() {
let array = json!(["éclair", "apple", "Æther", "zebra"]);
let expected = json!(["apple", "zebra", "Æther", "éclair"]);
assert_eq!(canonicalize_json(array), expected);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"Err":{"DuplicateAttributeId":{"group_ids":["registry.messaging_bis","registry.messaging"],"attribute_id":"messaging.batch.message_count"}}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Test that duplicate group ids are detected and reported.
This test must fail.
Loading
Loading