Exponential processing time increase with case statement nested parentheses #1075

Durbatuluk1701 · 2024-10-22T20:11:32Z

Throughout I am utilizing CakeML v2274.

Given two programs, one with and one without nested parentheses in case statements:

fun fib n = 
case (n = 1) of
	True => 1
| False =>
case (n = 2) of
	True => 2
| False =>
case (n = 3) of
	True => 3
| False =>
case (n = 4) of
	True => 5
| False =>
case (n = 5) of
	True => 8
| False =>
case (n = 6) of
	True => 13
| False =>
case (n = 7) of
	True => 21
| False =>
case (n = 8) of
	True => 34
| False =>
case (n = 9) of
	True => 55
| False =>fib (n - 1) + fib (n - 2)

vs.

fun fib n = 
case (n = 1) of
	True => 1
| False =>(
case (n = 2) of
	True => 2
| False =>(
case (n = 3) of
	True => 3
| False =>(
case (n = 4) of
	True => 5
| False =>(
case (n = 5) of
	True => 8
| False =>(
case (n = 6) of
	True => 13
| False =>(
case (n = 7) of
	True => 21
| False =>(
case (n = 8) of
	True => 34
| False =>(
case (n = 9) of
	True => 55
| False =>fib (n - 1) + fib (n - 2)
))))))))

The time it takes to run cake < fib.cml > /dev/null for the one WITH parentheses will take much longer than the one without.

In particular, I generated the following benchmark showing the exponential increase given the nesting depth:

Code used to generate benchmark

import math
import time
import subprocess
import matplotlib.pyplot as plt


def fib_spec(i):
    if i <= 1:
        return 1
    else:
        return fib_spec(i - 1) + fib_spec(i - 2)

def file_name(n, parens):
    return f"./test_dir/fib_{str('no_') if not parens else '_'}parens_{n}.cml"

def create_file_with_n(n, parens):
    with open(file_name(n, parens), "w+") as fd:
      fd.write(f"fun fib n = \n")
      for i in range(1, n):
          fd.write(f"case (n = {i}) of\n\tTrue => {fib_spec(i)}\n| False =>")
          if i == n - 1:
              fd.write("fib (n - 1) + fib (n - 2)\n")
              close_parens = ")" * (n - 2)
              if parens:
                  fd.write(close_parens)
          else:
              if parens:
                  fd.write("(\n")
              else:
                  fd.write("\n")


def time_file_exec(n, parens):
    create_file_with_n(n, parens)
    print(f"Processing fib({n}) with parens={parens}")
    fd = open(file_name(n, parens), "r")
    start = time.time()
    subprocess.run(["cake"], stdin=fd, stdout=subprocess.DEVNULL)
    end = time.time()
    return math.floor((end - start) * 1000)

min_iters = 5
max_iters = 20

x = [i for i in range(min_iters,max_iters)]
parens = [time_file_exec(i, True) for i in range(min_iters,max_iters)]
print(parens)
no_parens = [time_file_exec(i, False) for i in range(min_iters,max_iters)]

# Make an array for x vs. parens and x vs. no_parens
plt.plot(x, parens, label="With Parens")
plt.plot(x, no_parens, label="Without Parens")
plt.xlabel('depth')
plt.ylabel('Time (ms)')

# make the scale logarithmic base 2
plt.yscale('log', base=2)
# make y scale tick marks appear
plt.yticks([math.pow(2, i) for i in range(8, 15)])

# save the plot to a file
plt.savefig("fib_plot.png")

Another thing I will mention, I tried testing this with if statements and did not see the same exponential increase

The text was updated successfully, but these errors were encountered:

Durbatuluk1701 · 2024-10-24T16:08:34Z

FYI: I have tested this with CakeML v2648 and it still seems to be an issue.

tanyongkiam · 2024-10-24T16:50:12Z

Thanks for the report and the code.

I looked into it and this looks like exponential behavior in the parser, so perhaps something @mn200 could comment on.

For example:

finished: start up            
finished: lexing and parsing   --- 24654 milliseconds
...

finished: start up            
finished: lexing and parsing   --- 47164 milliseconds
finished: type inference       --- 15 milliseconds
...

(the rest of the compiler seems to run normally after parsing)

Durbatuluk1701 · 2024-10-24T17:06:45Z

For example:

finished: start up            
finished: lexing and parsing   --- 24654 milliseconds
...

Would you mind mentioning how you got that debug output? Would likely prove helpful for me in other work using CakeML.

tanyongkiam · 2024-10-24T17:12:40Z

Sure, you will need to bootstrap the CakeML compiler with its debugging output turned on.

If you have the latest release files and cake built, you can run this:

./cake --sexp=true --emit_empty_ffi=true --skip_type_inference=true < cake-sexpr-64 > cake_debug.S

This will produce cake_debug.S which you can then turn into an executable with:

gcc -O2 cake_debug.S basis_ffi.c  -DEVAL -DDEBUG_FFI -o cake_debug -lm

This version of the compiler will print the debugging output when compiling files but note that --repl and related flags will NOT work.

mn200 · 2024-10-25T00:22:46Z

I will try to look into this!

mn200 · 2024-10-29T03:59:08Z

So, the problem is not solved yet, but I think I understand the issue now. It is to do with the way the case-syntax is disambiguated into our context-free grammar.

The root of it is a classic SML problem with how the naïve grammar would see

case e1 of pat1 => case e2 of pat2 => body2 | pat3 => body3

as ambiguous (which case does pat3 belong to?). This is possible to resolve, but the PEGification of the unambiguous CFG appears to be forcing backtracking in an awkward way. Further investigation still required.

Durbatuluk1701 · 2024-10-29T16:12:08Z

Thank you for looking into it, glad the issue seems to have been located and hopefully a fix should not be too difficult

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exponential processing time increase with case statement nested parentheses #1075

Exponential processing time increase with case statement nested parentheses #1075

Durbatuluk1701 commented Oct 22, 2024

Durbatuluk1701 commented Oct 24, 2024

tanyongkiam commented Oct 24, 2024

Durbatuluk1701 commented Oct 24, 2024

tanyongkiam commented Oct 24, 2024

mn200 commented Oct 25, 2024

mn200 commented Oct 29, 2024

Durbatuluk1701 commented Oct 29, 2024

Exponential processing time increase with case statement nested parentheses #1075

Exponential processing time increase with case statement nested parentheses #1075

Comments

Durbatuluk1701 commented Oct 22, 2024

Durbatuluk1701 commented Oct 24, 2024

tanyongkiam commented Oct 24, 2024

Durbatuluk1701 commented Oct 24, 2024

tanyongkiam commented Oct 24, 2024

mn200 commented Oct 25, 2024

mn200 commented Oct 29, 2024

Durbatuluk1701 commented Oct 29, 2024