Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential processing time increase with case statement nested parentheses #1075

Open
Durbatuluk1701 opened this issue Oct 22, 2024 · 7 comments

Comments

@Durbatuluk1701
Copy link

Throughout I am utilizing CakeML v2274.

Given two programs, one with and one without nested parentheses in case statements:

fun fib n = 
case (n = 1) of
	True => 1
| False =>
case (n = 2) of
	True => 2
| False =>
case (n = 3) of
	True => 3
| False =>
case (n = 4) of
	True => 5
| False =>
case (n = 5) of
	True => 8
| False =>
case (n = 6) of
	True => 13
| False =>
case (n = 7) of
	True => 21
| False =>
case (n = 8) of
	True => 34
| False =>
case (n = 9) of
	True => 55
| False =>fib (n - 1) + fib (n - 2)

vs.

fun fib n = 
case (n = 1) of
	True => 1
| False =>(
case (n = 2) of
	True => 2
| False =>(
case (n = 3) of
	True => 3
| False =>(
case (n = 4) of
	True => 5
| False =>(
case (n = 5) of
	True => 8
| False =>(
case (n = 6) of
	True => 13
| False =>(
case (n = 7) of
	True => 21
| False =>(
case (n = 8) of
	True => 34
| False =>(
case (n = 9) of
	True => 55
| False =>fib (n - 1) + fib (n - 2)
))))))))

The time it takes to run cake < fib.cml > /dev/null for the one WITH parentheses will take much longer than the one without.

In particular, I generated the following benchmark showing the exponential increase given the nesting depth:

Code used to generate benchmark
import math
import time
import subprocess
import matplotlib.pyplot as plt


def fib_spec(i):
    if i <= 1:
        return 1
    else:
        return fib_spec(i - 1) + fib_spec(i - 2)

def file_name(n, parens):
    return f"./test_dir/fib_{str('no_') if not parens else '_'}parens_{n}.cml"

def create_file_with_n(n, parens):
    with open(file_name(n, parens), "w+") as fd:
      fd.write(f"fun fib n = \n")
      for i in range(1, n):
          fd.write(f"case (n = {i}) of\n\tTrue => {fib_spec(i)}\n| False =>")
          if i == n - 1:
              fd.write("fib (n - 1) + fib (n - 2)\n")
              close_parens = ")" * (n - 2)
              if parens:
                  fd.write(close_parens)
          else:
              if parens:
                  fd.write("(\n")
              else:
                  fd.write("\n")


def time_file_exec(n, parens):
    create_file_with_n(n, parens)
    print(f"Processing fib({n}) with parens={parens}")
    fd = open(file_name(n, parens), "r")
    start = time.time()
    subprocess.run(["cake"], stdin=fd, stdout=subprocess.DEVNULL)
    end = time.time()
    return math.floor((end - start) * 1000)

min_iters = 5
max_iters = 20

x = [i for i in range(min_iters,max_iters)]
parens = [time_file_exec(i, True) for i in range(min_iters,max_iters)]
print(parens)
no_parens = [time_file_exec(i, False) for i in range(min_iters,max_iters)]

# Make an array for x vs. parens and x vs. no_parens
plt.plot(x, parens, label="With Parens")
plt.plot(x, no_parens, label="Without Parens")
plt.xlabel('depth')
plt.ylabel('Time (ms)')

# make the scale logarithmic base 2
plt.yscale('log', base=2)
# make y scale tick marks appear
plt.yticks([math.pow(2, i) for i in range(8, 15)])

# save the plot to a file
plt.savefig("fib_plot.png")

fib_plot

Another thing I will mention, I tried testing this with if statements and did not see the same exponential increase

@Durbatuluk1701
Copy link
Author

FYI: I have tested this with CakeML v2648 and it still seems to be an issue.

@tanyongkiam
Copy link
Contributor

Thanks for the report and the code.

I looked into it and this looks like exponential behavior in the parser, so perhaps something @mn200 could comment on.

For example:

finished: start up            
finished: lexing and parsing   --- 24654 milliseconds
...
finished: start up            
finished: lexing and parsing   --- 47164 milliseconds
finished: type inference       --- 15 milliseconds
...

(the rest of the compiler seems to run normally after parsing)

@Durbatuluk1701
Copy link
Author

For example:

finished: start up            
finished: lexing and parsing   --- 24654 milliseconds
...

Would you mind mentioning how you got that debug output? Would likely prove helpful for me in other work using CakeML.

@tanyongkiam
Copy link
Contributor

Sure, you will need to bootstrap the CakeML compiler with its debugging output turned on.

If you have the latest release files and cake built, you can run this:

./cake --sexp=true --emit_empty_ffi=true --skip_type_inference=true < cake-sexpr-64 > cake_debug.S

This will produce cake_debug.S which you can then turn into an executable with:

gcc -O2 cake_debug.S basis_ffi.c  -DEVAL -DDEBUG_FFI -o cake_debug -lm

This version of the compiler will print the debugging output when compiling files but note that --repl and related flags will NOT work.

@mn200
Copy link
Contributor

mn200 commented Oct 25, 2024

I will try to look into this!

@mn200
Copy link
Contributor

mn200 commented Oct 29, 2024

So, the problem is not solved yet, but I think I understand the issue now. It is to do with the way the case-syntax is disambiguated into our context-free grammar.

The root of it is a classic SML problem with how the naïve grammar would see

case e1 of pat1 => case e2 of pat2 => body2 | pat3 => body3

as ambiguous (which case does pat3 belong to?). This is possible to resolve, but the PEGification of the unambiguous CFG appears to be forcing backtracking in an awkward way. Further investigation still required.

@Durbatuluk1701
Copy link
Author

Thank you for looking into it, glad the issue seems to have been located and hopefully a fix should not be too difficult

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants