Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add F# support via fslex and fsyacc #404

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

gdziadkiewicz
Copy link

@gdziadkiewicz gdziadkiewicz commented Nov 29, 2021

Five years ago I used bnfc (it's amazing, big thanks to all ppl involved in creating it) for the first time to generate frontend for my first compiler. Time issues didn't allow me to add F# support first and in the end, I manually adjusted the OCaml output to work with F#. For the five recent years, I made numerous attempts to finalize the F# support PR, but there was always something that made me fail and start from the beginning. Well, not anymore. Here I am sharing something that should have been shared a lot quicker.

As fslex is a port of ocamllex and fsyacc of ocamlyacc the code is based on OCaml code.

TODO list:

  • get to know if @andreasabel is willing to take in new language backend
  • add tests (similar to OCaml?) WIP
  • confirm if it is ok to add new dependency interpolate, if not remove it
  • if interpolate gets a green light then use it to make boilerplate code more readable using it
  • if interpolate gets a green light then consider removing string-qq as it can be replaced by interpolate
  • go through the whole added code and clean up todos
  • make CI builds green
  • use this version of the code to build test base for the F# in preparation for the rewrite/refactor according to the chosen approach WIP
  • rewrite/refactor
  • get # fixed in fslex and module name in fsyacc WIP

EDIT:
Due to problems with building on GHC 9.2.1 I decided to remove interpolate. It (or string-interpolate) can always be added later, in separate PR, and used to refactor.

During the work, I plan to make notes of the differences between fslex/fsyacc and ocamllex/ocamlyacc for future reference.

@gdziadkiewicz gdziadkiewicz changed the title Add F# Add F# support via fslex and fsyacc Nov 29, 2021
@gdziadkiewicz
Copy link
Author

@andreasabel what is your opinion on adding F# support via fslex and fsyacc?

From my point of view, it would bring back dotnet to the table for bnfc after removing C# and compared to gplex and gppg that were used in C# support, fslex and fsyacc are used to build F# compiler and as results are maintained by people maintaining the compiler, so the risk of them getting outdated, unsupported and hard to use with new dotnet versions is much smaller.

@andreasabel andreasabel added the F# label Dec 5, 2021
@andreasabel
Copy link
Member

andreasabel commented Dec 6, 2021

Hello @gdziadkiewicz, thank you for your initiative!

I was planning to evaluate your PR yesterday, but brew install nuget got me into a rabbit hole with brew occupying my CPU for 6 hours...

Let's start discussion anyway, here are some points:

  1. It would be nice to have a F# backend, together with a maintainer. For accepting a new backend I would require a maintenance pledge for at least 3 years. Some backends got removed short of a maintainer: C#, CNF, ... (In case of C# it was also the bad availability of the C# lex/yacc.)
  2. There used to be a F# backend, before my times, not sure how serious it was, but it got removed in 2.7: Remove the F# backend #16.
  3. Backends should share code to reduce the maintenance burden. BNFC has accumulated a lot of technical dept, due to careless forking of backend by cut-and-paste. Each change was very tedious to push through all the backends. I worked hard to undo these sins of the past, in particular by unifying C, C++/STL and C++/NoSTL as much as possible. This is still WIP, but for the parser, [ fix #349 ] Make parser reentrant by using reentrant lexer #351 accomplished it. (See also my requests on PR [WIP] Issues/293 use new c++ feature and some fix #389.)
    Cutting the story short, the direction I am heading is towards parametrized backends to share code where possible. In the case of F#, I think a lot of code could be shared with the OCaml backends. This could take different paths: Either to get F# by parametrizing the existing OCaml backend, or to get a new OCaml backend by parametrizing your F# backend, or to synthesize F# and OCaml to something new, taking the best of both worlds.
  4. With a student, I am working on a reimplementation, BNFC3. This project is still private. Backends will have to be ported to the new BNFC frontend (which is arguably better structured than the old one).
  5. There is a common testsuite for the backends in testing/ (start reading e.g. here). The F# backend would have to be added here.
  6. I might need some help with testing the PR, e.g. installing the prerequisites. What would be a good tutorial on FsLex/Yacc?

P.S.: Good you got rid of interpolate, its dependencies are too heavy, see sol/interpolate#17.

@gdziadkiewicz
Copy link
Author

@andreasabel

  1. I'm willing to join as a maintainer. How does a maintainer pledge look like?
  2. I kind of expect that it did make use of the fact that the early F# versions were able to compile a subset of OCaml. I will take a look and see if there is something worth salvaging there.
  3. That makes sense. I will check out the PRs you linked and investigate the possible approaches:
    a) parametrizing new F# backend
    b) parametrizing existing OCaml backed
    c) fusion
    Expect me to return here with input on them.
  4. Ok, if this (in one form or another) gets in you can expect me to work on/help with the porting when BNFC3 gets public.
  5. I will focus on that now (and on 6.). Before starting with the rewrite/refactor from 3. I would like to have "good" tests for the OCaml (or feel of them if they are already there) and the new F# thing.
  6. Sorry to hear that the experience of getting dotnet up and running on your machine turned out to be so painful. Expect me (tomorrow or a day after tomorrow) to provide you here with battle-tested instructions on how to get the necessary tool, compile the code and run it. The good part about it is that compared to the mono times it is a lot easier.

p.s. Having a reliable, maintained interpolation-providing package that's not deps heavy would be really nice. Did you like that neat-interpolation mentioned in the linked issue?

@gdziadkiewicz
Copy link
Author

gdziadkiewicz commented Dec 7, 2021

Differences between fslex/fsyacc and ocamllex/ocamlyacc:

  1. F# and OCaml have different std libs which means that boilerplate and usage of boilerplate (e.g. Hashmaps and using them)
    must be different (or unified by an adapter in one of them?)
  2. fslex does not support # (or did not in the past, confirmation with current version needed). Docs state that it does but real life and code strongly suggested that it does not. I will recheck that and open PR with an update to fslex docs if this is still the case.
  3. Current version of fsyacc requires some special dance (specifying module name as a parameter in fsproj file) around module declaration. This should be fixed in fsyacc and workaround in bnfc while it is still ongoing.
  4. Casing conventions for F# and OCaml are different, using OCaml conventions causes warnings and/or errors for F# (TODO elaborate)

@andreasabel
Copy link
Member

@gdziadkiewicz worte:

1. I'm willing to join as a maintainer. How does a maintainer pledge look like?

Great! This pledge is nothing formal, it is the serious promise that you will maintain the F# backend for at least 3 years.

2. I kind of expect that it did make use of the fact that the early F# versions were able to compile a subset of OCaml. I will take a look and see if there is something worth salvaging there.

I don't really know if something can be salvaged, but I thought I point this out to you in case you want to do the archeology to see how things were done then.

3. That makes sense. I will check out the PRs you linked and investigate the possible approaches:
   a) parametrizing new F# backend
   b) parametrizing existing OCaml backed
   c) fusion
   Expect me to return here with input on them.

Great!

4. Ok, if this (in one form or another) gets in you can expect me to work on/help with the porting when BNFC3 gets public.

Excellent. I am hoping to make some progress on this early next year.

5. I will focus on that now (and on 6.). Before starting with the rewrite/refactor from 3. I would like to have "good" tests for the OCaml (or feel of them if they are already there) and the new F# thing.

Yes, the testsuite is the most important thing. This might also solve point 6, if you provide a script or instructions that install dotnet and the required tools.

To integrate the testsuite into CI is also a project I have been postponing for too long already, but maybe it can wait until BNFC3. I have not made any concrete plans how to migrate the testsuite, but I am resolved to migrate it one way or the other.

6. Sorry to hear that the experience of getting dotnet up and running on your machine turned out to be so painful. Expect me (tomorrow or a day after tomorrow) to provide you here with battle-tested instructions on how to get the necessary tool, compile the code and run it. The good part about it is that compared to the mono times it is a lot easier.

It is not the fault of dotnet, but the fault of my homebrew setup, so no need to apologize. But a pointer to get started would still be nice.

p.s. Having a reliable, maintained interpolation-providing package that's not deps heavy would be really nice. Did you like that neat-interpolation mentioned in the linked issue?

No, I passed by there, but did not find a "shopwindow" that would get me interested (see nikita-volkov/neat-interpolation#32).

@gdziadkiewicz
Copy link
Author

  1. I started working on adding f# backend to the existing tests and it looks like they already point out some problems with my implementation. WIP
  2. Requirements:

    dotnet cli tool and dotnet SDK for version defined in https://github.com/BNFC/bnfc/pull/404/files#diff-07ee170ba626d1934086f864c3ec7da48aa99c2de545fc2299e76298a50a13ffR122 which is net5.0

    How to get dotnet

    I recommend using https://docs.microsoft.com/en-us/dotnet/core/tools/dotnet-install-script for installing dotnet and
    the required SDK. It is portable, easy to use, does not require admin rights, works well with CIs. The only downside that I
    encountered is the lack of update tracking (which is something that for example installation via apt-get on Ubuntu provides).
    curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --channel 5.0
    export PATH=~/.dotnet:$PATH
    Tested on my Ubuntu and Windows (in the dotnet-install.ps1 flavor). Hope to hear from you if it works on your OS.

    Usage:

    In the directory containing generated files:
    1. Run dotnet build (this compiles the code and from our perspective asserts that the generated code at least compiles)
    2. Run dotnet run -- <path-to-the-input-file> (this runs the compiled program (also recompiles if code changed), running it allows us the check if it terminates in reasonable time and if the output is as expected)

@gdziadkiewicz
Copy link
Author

Work on getting all the existing tests passing still in progress. 23/30 of the parametrized tests are passing. During the process, I'm making notes on the fragile, easy-to-break parts of the code that should be replaced during the rewrite.

@gdziadkiewicz
Copy link
Author

Differences between fslex/fsyacc and ocamllex/ocamlyacc:

  1. F# and OCaml have different std libs which means that boilerplate and usage of boilerplate (e.g. Hashmaps and using them)
    must be different (or unified by an adapter in one of them?)
  2. fslex does not support # (or did not in the past, confirmation with current version needed). Docs state that it does but real life and code strongly suggested that it does not. I will recheck that and open PR with an update to fslex docs if this is still the case.
  3. Current version of fsyacc requires some special dance (specifying module name as a parameter in fsproj file) around module declaration. This should be fixed in fsyacc and workaround in bnfc while it is still ongoing.
  4. Casing conventions for F# and OCaml are different, using OCaml conventions causes warnings and/or errors for F# (TODO elaborate)
  1. I investigated the adapter option and decided not to do it (it would bloat the generated F# code). I will go with parametrization of the boilerplate on BNFC side.
  2. I confirmed that this is still the case and decided to add the support instead of removing it from the docs.
  3. I will fix that in fsyacc.

@gdziadkiewicz
Copy link
Author

I'm done with the tests and started working on the new version which parametrizes the OCaml backend.

andreasabel added a commit to gdziadkiewicz/bnfc that referenced this pull request Mar 3, 2022
andreasabel added a commit to gdziadkiewicz/bnfc that referenced this pull request Mar 3, 2022
@andreasabel
Copy link
Member

Thanks for the PR! I built and tested it.
Atm, two tests are not yet working.

The first is 266_define:

[TEST] Parameterized tests:F#:F#:266_define
§ Generate
bnfc -mMyMakefile --fsharp /Users/abel/project/open-source/bnfc/testing/regression-tests/266_define/test.cf
§ Build
dotnet build
error running: dotnet build
exit status: 1
stderr: 
@@@ Error! (3812ms)

This is probably due to missing sanitization, if the user uses let as a definition. I have fixed similar problems in the Ocaml backend.

testing/regression-tests/266_define/fsharp/AbsTest.fs(27,9): error FS0010: Unexpected keyword 'let' or 'use' in pattern [testing/regression-tests/266_define/fsharp/Test.fsproj]

The other test is 278_Keywords:

/regression-tests/278_Keywords/fsharp$ dotnet run -- ../good01.in
Unhandled exception. System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at FSharp.Text.Parsing.Implementation.interpret[tok,a](Tables`1 tables, FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 initialState)
   at FSharp.Text.Parsing.Tables`1.Interpret[char](FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 startState)
   at ParTest.engine[a](FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 startState) in /Users/abel/project/open-source/bnfc/testing/regression-tests/278_Keywords/fsharp/ParTest.fs:line 8749
   at ParTest.pProgram[cty](FSharpFunc`2 lexer, LexBuffer`1 lexbuf) in /Users/abel/project/open-source/bnfc/testing/regression-tests/278_Keywords/fsharp/ParTest.fs:line 8751
   at TestTest.parse(TextReader c) in /Users/abel/project/open-source/bnfc/testing/regression-tests/278_Keywords/fsharp/TestTest.fs:line 15
   at TestTest.main(String[] args) in /Users/abel/project/open-source/bnfc/testing/regression-tests/278_Keywords/fsharp/TestTest.fs:line 30

@andreasabel
Copy link
Member

@gdziadkiewicz Currently the generated Makefile only has the clean goal. Could it be extended by a

  • goal that calls dotnet build whenever on of the dependencies is updated
  • reruns BNFC if the grammar changed (similar to what the Haskell backend does)?

@gdziadkiewicz
Copy link
Author

gdziadkiewicz commented Mar 4, 2022 via email

@gdziadkiewicz
Copy link
Author

I'm sorry for the long silence. I added the missing goals, but have some questions about them as I rarely use make (I will add attach them to lines).

testParserRule = Makefile.mkRule tgt deps [ "dotnet build" ]
where
tgt :: String
tgt = buildTarget opts
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a name that is not a file name. Is this a problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if the target is a file, then make can use its time-stamp to decide on rebuilding.
But I guess you have your reasons to not put a file there (maybe it is too complicated or fragile).
I am OK with that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a moment I believed that there won't be a common file name for Linux and Windows but this turned out to be false. I will fix it.

phonyRule = vcat
[ "# List of goals not corresponding to file names."
, ""
, Makefile.mkRule ".PHONY" [ "all", "clean", "distclean" ] []
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My testParserRule also does not correspond to a file name. Should I add it here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please.

mkFile withLang "Test" "" opts,
mkFile withLang "" "fsproj" opts,
utilFile opts,
makeFile ]]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the code from Haskell as a reference for this part, but I don't understand why it removes the makefile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed debatable. I think the rationale was that distclean should remove all generated files, and the Makefile was one of them. However, this could be dangerous if the user edited the Makefile later. Atm, the user also has to edit the distclean goal then.
Personally, I do not use the distclean, but there is a test for it in bnfc-system-tests. Thus, I would just keep it like this.
A smarter cleaning logic would be another project: store hashes along generated files and only automatically delete stuff that hasn't changed since generated last (checkable using the hash)...

Copy link
Member

@andreasabel andreasabel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

testParserRule = Makefile.mkRule tgt deps [ "dotnet build" ]
where
tgt :: String
tgt = buildTarget opts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if the target is a file, then make can use its time-stamp to decide on rebuilding.
But I guess you have your reasons to not put a file there (maybe it is too complicated or fragile).
I am OK with that.

phonyRule = vcat
[ "# List of goals not corresponding to file names."
, ""
, Makefile.mkRule ".PHONY" [ "all", "clean", "distclean" ] []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please.

mkFile withLang "Test" "" opts,
mkFile withLang "" "fsproj" opts,
utilFile opts,
makeFile ]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed debatable. I think the rationale was that distclean should remove all generated files, and the Makefile was one of them. However, this could be dangerous if the user edited the Makefile later. Atm, the user also has to edit the distclean goal then.
Personally, I do not use the distclean, but there is a test for it in bnfc-system-tests. Thus, I would just keep it like this.
A smarter cleaning logic would be another project: store hashes along generated files and only automatically delete stuff that hasn't changed since generated last (checkable using the hash)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants