fabtests: New fabtest fi_flood to test over subscription of resources #10427

nikhilnanal · 2024-09-30T21:19:31Z

  1.  MR cache based registrations tests regsiter and send  in batch and sequential modes while flooding the cache beyond the maximum size.
  2. Test receipt of unexpected messages by overwhelming the receiver

aingerson · 2024-09-30T21:30:46Z

fabtests/Makefile.am

@@ -41,6 +41,7 @@ bin_PROGRAMS = \
 	functional/fi_rdm_stress \
 	functional/fi_multi_recv \
 	functional/fi_bw \
+	functional/fi_flood \


This test is just adding a new mode to the bw test - I would just replace/rename the bw test and add the new testing mode inside. No need to create a whole new test

FWIW, AWS CI has such flood_peer test that reuse fi_bw: https://github.com/ofiwg/libfabric/blob/main/fabtests/pytest/efa/test_flood_peer.py#L6

aingerson · 2024-09-30T21:31:06Z

fabtests/common/shared.c

@@ -3270,6 +3270,7 @@ void show_perf(char *name, size_t tsize, int iters, struct timespec *start,
 	printf("%8.2fs%10.2f%11.2f%11.2f\n",
 		elapsed / 1000000.0, bytes / (1.0 * elapsed),
 		usec_per_xfer, 1.0/usec_per_xfer);
+	printf("-----------------------------------------\n");


Remove random prints through this PR (there are a handful)

aingerson · 2024-09-30T21:31:15Z

fabtests/functional/flood.c

@@ -0,0 +1,319 @@
+/*
+ * Copyright (c) 2019 Intel Corporation.  All rights reserved.


Remove year

aingerson · 2024-09-30T21:32:05Z

fabtests/functional/flood.c

+		return ret;
+
+	if (opts.machr)
+		show_perf_mr(opts.transfer_size, opts.window_size, &start, &end, 1,


Remove the performance reporting since this is a functional test and has a hardcoded sleep to force unexpected messages. Replace with a PASS/FAIL print

aingerson · 2024-09-30T21:33:30Z

fabtests/functional/flood.c

+	if (ret)
+		return ret;
+
+	ret = ft_tx(ep, remote_fi_addr, 1, &tx_ctx);


See the new option recently added that does this FT_OPT_NO_PRE_POSTED_RX

aingerson · 2024-09-30T22:10:58Z

fabtests/functional/flood.c

+static void mr_close(struct ft_context *ctx_arr, int window_size)
+{
+	for (int i = 0; i < window_size; i++)
+	{


drop brackets

aingerson · 2024-09-30T22:12:27Z

fabtests/functional/flood.c

+
+	return ret;
+}
+static void mr_close(struct ft_context *ctx_arr, int window_size)


Rename to something that describes what's happening a bit more - this makes it sound like it's closing a single MR

aingerson · 2024-09-30T22:12:41Z

fabtests/functional/flood.c

+}
+static void mr_close(struct ft_context *ctx_arr, int window_size)
+{
+	for (int i = 0; i < window_size; i++)


Declare variables outside of for loop

aingerson · 2024-09-30T22:14:08Z

fabtests/functional/flood.c

+	mr_close(tx_ctx_arr, opts.window_size);
+	mr_close(rx_ctx_arr, opts.window_size);
+
+	printf("sequential memory registration:\n");


Make your test prints consistent - capitalize first word, remove new line, and then print pass or fail in your out

printf("%s\n", ret ? "FAIL" : "PASS");

aingerson · 2024-09-30T22:14:26Z

fabtests/functional/flood.c

+	{
+		FT_CLOSE_FID(ctx_arr[i].mr);
+	}
+}


add line between functions

aingerson · 2024-09-30T22:15:07Z

fabtests/functional/flood.c

+	printf("sequential memory registration:\n");
+	ft_start();
+	if (opts.dst_addr) {
+		for (int i = 0; i < opts.window_size; i++) {


Declare outside

aingerson · 2024-09-30T22:15:28Z

fabtests/functional/flood.c

+			if (ret)
+				return ret;
+
+			ft_post_tx_buf(ep, remote_fi_addr,  opts.transfer_size,


Does this return something?

always returns 0

aingerson · 2024-09-30T22:16:21Z

fabtests/functional/flood.c

+			ret = wait_check_rx_bufs();
+			if (ret)
+				return ret;
+		}


Add rx buf close here

aingerson · 2024-09-30T22:16:53Z

fabtests/functional/flood.c

+		show_perf(NULL, opts.transfer_size, opts.window_size, &start, &end, 1);
+
+	return ret;
+}


aingerson · 2024-09-30T22:17:20Z

fabtests/functional/flood.c

+	if (!opts.dst_addr)
+		sleep(sleep_time);
+
+	ft_start();


Drop performance print and also timers

zachdworkin · 2024-10-01T00:02:19Z

Timeout failure
server: fi_flood -e rdm -v -T 1 -p "tcp" -s n1
client: fi_flood -e rdm -v -T 1 -p "tcp" -s n1 n2

fi_eq_sread(): common/shared.c:1169, ret=-4 (Interrupted system call)
server: fi_flood -e msg -v -T 1 -p "tcp" -s n1
client: fi_flood -e msg -v -T 1 -p "tcp" -s n1 n2

1. MR cache based registrations tests regsiter and send in batch and sequential modes while flooding the cache beyond the maximum size. 2. Test receipt of unexpected messages by overwhelming the receiver Signed-off-by: nikhil nanal <nikhil.nanal@intel.com>

shijin-aws · 2024-10-03T22:38:22Z

bot:aws:retest

nikhilnanal force-pushed the mr_cache branch 3 times, most recently from 8e087e8 to e21b1f5 Compare September 30, 2024 21:23

nikhilnanal requested a review from aingerson September 30, 2024 21:23

aingerson reviewed Sep 30, 2024

View reviewed changes

nikhilnanal force-pushed the mr_cache branch from e21b1f5 to 471aba1 Compare September 30, 2024 22:10

aingerson reviewed Sep 30, 2024

View reviewed changes

nikhilnanal force-pushed the mr_cache branch from 471aba1 to 46f7478 Compare October 1, 2024 00:02

nikhilnanal force-pushed the mr_cache branch from 46f7478 to cb5cb65 Compare October 1, 2024 07:54

nikhilnanal force-pushed the mr_cache branch from cb5cb65 to 23587c8 Compare October 2, 2024 16:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fabtests: New fabtest fi_flood to test over subscription of resources #10427

fabtests: New fabtest fi_flood to test over subscription of resources #10427

nikhilnanal commented Sep 30, 2024

aingerson Sep 30, 2024

shijin-aws Oct 1, 2024

aingerson Sep 30, 2024

aingerson Sep 30, 2024

aingerson Sep 30, 2024

aingerson Sep 30, 2024

aingerson Sep 30, 2024

aingerson Sep 30, 2024

aingerson Sep 30, 2024

aingerson Sep 30, 2024

aingerson Sep 30, 2024

aingerson Sep 30, 2024

aingerson Sep 30, 2024

nikhilnanal Sep 30, 2024

aingerson Sep 30, 2024

aingerson Sep 30, 2024

aingerson Sep 30, 2024

zachdworkin commented Oct 1, 2024

shijin-aws commented Oct 3, 2024

		@@ -0,0 +1,319 @@
		/*
		* Copyright (c) 2019 Intel Corporation. All rights reserved.

fabtests: New fabtest fi_flood to test over subscription of resources #10427

Are you sure you want to change the base?

fabtests: New fabtest fi_flood to test over subscription of resources #10427

Conversation

nikhilnanal commented Sep 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zachdworkin commented Oct 1, 2024

shijin-aws commented Oct 3, 2024