-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
updating broken links + semantic validation + chisel method + small c…
…hanges
- Loading branch information
Showing
11 changed files
with
211 additions
and
19 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,24 @@ | ||
# The Chisel Method | ||
|
||
_To be written_ | ||
The Chisel method for writing parsers allows to build parsers consistently using the same approach which allows to achieve a consistency on the architectural decisions and enables developers to get familiar with a project easily. | ||
|
||
This method is called Chisel because building parsers is about getting the information out of the code, as when you use a chisel you "take" the statue out of the marble. Also, since Strumenta means tools in Latin, Chisel takes the place as one of the most important tool in the company's toolset. | ||
|
||
The Chisel method is based on the following principles: | ||
1. We define a clear goal, which is shared by the Client and the Language Engineering Team. This goal is objective, and it is not subject to interpretation. | ||
2. At each step, the Language Engineering Team should clearly understand where we are and what should be done next - parsing and then refining the produced AST. Tool support should facilitate each single step, removing friction due to repetitive tasks. For this the StarLasu Libraries can be used since it provides user-friendly APIs for seamless integration as well as cross-language integration. | ||
3. Ensure frictionless adoption by providing all support needed to facilitate the adoption of the parser inside a Language Engineering Pipeline. By automatically generating documentation and having a method that simplifies training processes. | ||
![image.png](chiselMethod.png) | ||
|
||
The Chisel method is based on the following steps: | ||
|
||
1. Create a parser using the StarLasu Libraries, for that you need to: | ||
- Define a grammar; | ||
- Use the starlasu libraries to generate the AST; | ||
- Refine the AST; | ||
- Have a testing strategy; | ||
- Have a set of examples that can be used to measure the parser's progress; | ||
2. Write a semantics module for more advanced codebase analysis and symbol resolution (optional). | ||
3. Write a code generation module this if the final goal is to generate the code for a target language (optional). | ||
|
||
Chisel is a method and a transformative approach to parsers development, ensuring efficiency, adaptability, and sustainability. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,86 @@ | ||
# Code Generation | ||
|
||
_To be written_ | ||
Code Generation modules are usefull to generate new files programmatically. This is the case when building a transpiler, where there is the need to generate text (code) from an AST representation. | ||
|
||
Currently Code Generation modules can be written only with Kolasu. | ||
|
||
## Setup | ||
|
||
The Code Generation implementation should be separated from any other modules and be called code-generation. This module will typically have as dependency an ast module. Next is presented an `build.gradle.kts` file example for a code generation module. | ||
Note that currently the gradle starlasu plugin is private. | ||
``` kotlin | ||
import com.strumenta.starlasugradleplugin.addGitHubPackagesRepo | ||
|
||
plugins { | ||
id("org.jetbrains.kotlin.jvm") | ||
id("java-library") | ||
id("org.jlleitschuh.gradle.ktlint") | ||
id("maven-publish") | ||
id("antlr") | ||
id("com.github.johnrengelman.shadow") | ||
id("com.google.devtools.ksp") | ||
id("com.strumenta.starlasu") | ||
} | ||
|
||
dependencies { | ||
api(project(":ast")) | ||
implementation("org.apache.logging.log4j:log4j-api-kotlin:1.2.0") | ||
implementation("org.apache.logging.log4j:log4j-api:2.20.0") | ||
implementation("org.apache.logging.log4j:log4j-core:2.20.0") | ||
|
||
implementation("commons-io:commons-io:2.13.0") | ||
implementation("com.google.code.gson:gson:2.10.1") | ||
|
||
implementation("com.github.ajalt.clikt:clikt:3.5.0") | ||
} | ||
|
||
``` | ||
|
||
## Generation rules | ||
|
||
The Code Generator class should be a subclass of `ASTCodeGenerator`, this class needs to override the `registerRecordPrinters` function. This function will contain a `recordPrinter` for each AST node, the implementation of the recordPrinter will determine the output generated for the node type. | ||
|
||
``` kotlin | ||
class MyCodeGenerator : ASTCodeGenerator() { | ||
override fun registerRecordPrinters() { | ||
recordPrinter<MyNode> { | ||
print(it.field) | ||
indent() | ||
println(it.field2) | ||
dedent() | ||
printList(prefix = "",postfix = "",elements = it.children,separator = "\n") | ||
printFlag(it.flag,"Flag is true") | ||
} | ||
} | ||
} | ||
``` | ||
|
||
The methods to generate code are quite simple and intuitive to use: | ||
- `print` is used to print a simple string; | ||
- `println` is used to print a string followed by a new line; | ||
- `printList` is used to print a list of nodes; You can specify a prefix, suffix and separator; | ||
- `printFlag` is used to print a string if a condition (flag which is a boolean value) is true; | ||
- `indent` is used to increase the indentation level which will be checked by the `print` and `println` methods; | ||
- `dedent` is used to decrease the indentation level which will be checked by the `print` and `println` methods. | ||
|
||
## Testing | ||
|
||
There is the need to know how the generated code should look like. | ||
|
||
Unit testing can be done by writing the expected output in a file/string and comparing the generated output with the expected output using as input a manually created AST. | ||
|
||
It is also a good practise to have end-to-end tests, and one can follow 2 methods: | ||
|
||
1. AST to AST testing: | ||
- Parse an input file that contains the original code to the target AST representation; | ||
- Parse an input file that contains the expected code to the expected AST representation; | ||
- Compare the expected AST with the generated AST. | ||
|
||
This testing method is useful to test large examples, where it is hard to write the expected output manually. It allows to test the code generation in a more abstract way, where we check that the produced AST matches the one we expect. Of course it lacks coverage for code styling and formatting. | ||
|
||
2. AST to code testing: | ||
- Parse a string that contains the original code to the target AST representation; | ||
- Generate the code from the target AST; | ||
- Compare the expected output with the generated output. | ||
|
||
This testing method is useful to test smaller examples and cover styling and formatting of the code. It allows to test the code generation in a more concrete way, where we check that the produced code matches the one we expect. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,15 @@ | ||
# The Dual Code Model APIs | ||
|
||
_To be written_ | ||
This segment is more theoretical than the others. It is meant to provide a high-level overview of the dual code model APIs and the approach followed in the development of the starlasu tools. | ||
It will approach the concepts of homogenous and heterogeneous APIs, and how to use and leveraged them in the starlasu tools. | ||
|
||
## Homogenous APIs | ||
|
||
In kolasu, [Nodes](https://github.com/Strumenta/kolasu/blob/main/core/src/main/kotlin/com/strumenta/kolasu/model/Node.kt) are the basic building blocks of an AST. They are used to represent the different elements of the language being parsed. | ||
All the instances of nodes are the same and have a defined set of properties and methods, such as the origin, parent, etc... This is what we call a homogenous API. | ||
|
||
This homogenous APIs are important to develop generic tools, such as the current efforts to have an algorithm to find idioms within a language. Using the homogenous API, we can develop a tool that can be used for any language that has an AST representation. | ||
## Heterogeneous APIs | ||
|
||
On the other hand, each Node can also have its own specific set of properties and methods. For instance a Node that represents an if statement can have the condition and the body properties. | ||
This is what we call a heterogeneous API. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,21 @@ | ||
# Semantic Enrichment | ||
|
||
_To be written_ | ||
People who are new to parser development may attempt to enforce grammar-level type coherence. For example, defining that a variable declared with type int can only have a subset of the expression types as its initial value, that doesn't work and can increase the complexity of writing a grammar and still not obtaining the desired constraints. | ||
|
||
Symbol Resolution is the process of giving a meaning to the each name in the source | ||
code. For example: | ||
In general, it is preferable to be less strict in the grammar and Parse Tree to AST Mapping, identifying discrepancies as the final AST processing step. | ||
|
||
* to connect a given use of a variable to its declaration; | ||
* * to reconstruct the definition of a SQL column from an alias in a subquery; | ||
* * to identify a user-defined therapy plan in a medical support DSL. | ||
The benefits of doing so are: | ||
|
||
Symbol resolution relies on the naming facilities in StarLasu. | ||
- Simpler grammars; | ||
- Better error messages; | ||
|
||
Semantic checks are the mechanisms that are meant to check/identify possible discrepancies in a syntactically correct input. It is an advanced feature and it is not implemented in most parsers, only when specifically requested/needed. | ||
|
||
## Semantic Checks | ||
|
||
- [](SymbolResolution.md); | ||
- Type System checks; | ||
- Other checks such as: | ||
- Two symbols with the same name declared in the same scope; | ||
- Variables used before being declared; | ||
- etc... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,8 @@ | ||
# Serialization | ||
|
||
In addition to what is described here, there is also EMF serialization, which is discussed separately. See [EMF](https://github.com/Strumenta/StarLasu/blob/main/documentation/emf.md). | ||
In addition to what is described here, there is also EMF serialization, which is discussed separately. See [EMF](EMFInteroperability.md). | ||
|
||
StarLasu supports exporting ASTs to JSON and XML. | ||
StarLasu supports exporting ASTs to JSON and XML. | ||
Additionally, kolasu supports import/export from the [Lionweb](https://github.com/Strumenta/kolasu/blob/main/lionweb/src/main/kotlin/com/strumenta/kolasu/lionweb/LionWebModelConverter.kt) format. | ||
|
||
_See in [Kolasu](https://github.com/Strumenta/kolasu/tree/master/core/src/main/kotlin/com/strumenta/kolasu/serialization)_. | ||
_See in [Kolasu](https://github.com/Strumenta/kolasu/tree/main/serialization/src/main/kotlin)_. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Symbol Resolution | ||
|
||
The objective of symbol resolution consists in linking name-based textual references to the corresponding node entity in the Abstract Syntax Tree (AST). StarLasu provides support for implementing such process with the following building blocks: | ||
|
||
* `PossiblyNamed` and `Named` interfaces can be implemented for nodes that can be referenced - see [Naming](Naming.md); | ||
* `ReferenceByName` properties can be defined in nodes to represent links to other nodes; | ||
* `SymbolResolver` instances can be configured to specify symbol resolution logic for each property or node type; | ||
* `Scope` instances are used to resolve each reference in the AST; | ||
|
||
## Representing references among nodes | ||
|
||
References between nodes are implemented using `ReferenceByName` instances in StarLasu. These keep track of the relationship between a `name` and the `referred` node, which might be absent until the symbol resolution phase and must be a sub-type of `PossiblyNamed`. | ||
|
||
In [Kolasu](https://github.com/Strumenta/kolasu), for example, we can define a node `Person` containing a reference `friend` towards another `Person` as follows: | ||
```kotlin | ||
data class Person( | ||
override val name: String, | ||
val friend: ReferenceByName<Person> // <-- reference to another `Person` | ||
) : PossiblyNamed | ||
``` | ||
Instances can then be created providing the `name` of the referred `Person` instance. As regards the actual referenced object, it might be provided as `initialReferred` value if known or left unresolved until symbol resolution. | ||
```kotlin | ||
// reference by name using `name` only | ||
val first: Person = Person(friend = ReferenceByName("second")) | ||
// reference by name using `initialReferred` value and `name` | ||
val second: Person = Person(friend = ReferenceByName("first", first)) | ||
``` | ||
In general, references can be resolved using one or more candidates, as follows | ||
```kotlin | ||
second.tryToResolve(first) // <-- `first` is the only candidate | ||
second.tryToResolve(listOf(first, second, others)) // <-- list of candidates | ||
``` | ||
While it is possible to manually implement symbol resolution by traversing the AST and updating the `referred` value for each `ReferenceByName` property, StarLasu provides support for the declarative specification of symbol resolution rules, as shown in the next section. | ||
|
||
## Declarative symbol resolution | ||
|
||
As mentioned in the previous section, it is surely possible to manually implement symbol resolution as some kind of tree-traversal algorithm. However, StarLasu provides support to ease such task and allows the developer to focus on language-specific concerns by providing rules for each reference in a given AST. | ||
|
||
Symbol resolution rule specifications consist of three parts: | ||
|
||
* __guard__ - the reference property for which we want to provide a scope; | ||
* __context__ - the node from which we want to compute the scope; | ||
* __body__ - the actual scope definition, i.e. `Scope` instance; | ||
|
||
Each rule produces a `Scope` instance that is used to resolve a given property. Given a property, StarLasu adopts a precise rule resolution schema. Considering `Person::friend`, for example, the following steps will be performed: | ||
|
||
1) lookup for a property-based rule having `Person::friend` as guard and `Person` as context; | ||
2) lookup for a property-based rule having `Person::friend` as guard and any ancestor of the `Person` node as context; | ||
|
||
As soon as one rule is found, the symbol resolver will use it to resolve the reference. | ||
|
||
In our example, we could define that `friend` reference candidates should correspond to aggregating all `Person` instances contained in the `CompilationUnit` of the AST as follows: | ||
```kotlin | ||
val symbolResolver = symbolResolver { | ||
// property-based rule for Person::friend property | ||
scopeFor(Person::friend) { | ||
scope { | ||
it.findAncestorOfType(CompilationUnit::class.java) | ||
?.walk() | ||
?.filterIsInstance<Person>() | ||
?.forEach(this::define) | ||
} | ||
} | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters