Single-File-Compiler-From-Scratch-Recursive-Descent-LR-and-NASM-asm-gen

Note: This is a very rough implementation of a RD parser, I've just kept it because it was added to the artic code vault 2020, for a better code base see my repository "Calc-".

C-like to NASM-like (kinda works? haven't tested generated code) assembly compiler in one file with combined Lexer-> Recursive Descent Parser->Code Gen. written in C++ with only standard libraries, parsers and a nice command line coloring system! All in only 1328 lines of code!

Grammar:


 NUM-LIT: [0-9]+ ['b'|'q']? 'u'?
CHAR-LIT: ''' [\0-\255] '''
typespec: 'unsigned'? ('byte' | 'word' | 'dword' | 'qword' | 'void' ) 'ptr'?
typespec-list: typespec (',' typespec)*
expr-list: expr (',' expr)*
param-list : IDENT ':' typespec (',' IDENT ':' typespec)*
atom-expr: NUM-LIT
	  : CHAR-LIT
	  
	  : IDENT ('++'| '--' | '(' expr-list? ')' | '<' typespec-list '>' )?
	  
	  : ('++' | '--') IDENT
	  
	  : ('-' atom | '~' atom | '!' atom | '&' atom | '*' atom)
	  
	  : '(' expr ')'
	  


mul-div-mod-expr:   atom-expr (('*' | '/' | '%') atom-expr)*

add-sub-expr:       mul-div-mod-expr (('+' | '-') mul-div-mod-expr)*

shift-expr:         add-sub-expr (('<<'|'>>') add-sub-expr)*

lt-gt-lte-gte-expr: shift-expr(('<?' | '>?' | '<=' | '>=') shift-expr)*

ee-ne-expr:         lt-gt-lte-gte-expr(('=' | '!=')  lt-gt-lte-gte-expr)*

bit-and-expr:       ee-ne-expr('&' ee-ne-expr)*

bit-xor-expr:       bit-and-expr('^' bit-and-expr)*

bit-or-expr:        bit-xor-expr('|' bit-xor-expr)*

log-and-expr:       bit-or-expr('&&' bit-or-expr)*

log-or-expr:        log-expr('||' log-and-expr)*

stmt_expr		   : 'ref' expr '=' expr

				   : 'let' IDENT ('='|'=>') expr
				   
				   : 'def' IDENT '(' param-list? ')' ':' typespec '{' expr_block '}'
				   
				   : 'call' expr '(' param-list? ')'
				   
				   : 'ret' expr
				   
         : if '(' expr ')' '{' expr_block '}' ('else' '{' expr_block '}' )?

Summary:

Types are:

byte, word, dword, qword, void, signed/unsigned versions of these and pointer versions of these.

A value that has 'u' at the end of it e.g. 18u is unsigned, if it then has ('b', 'w' or 'q') then it is a byte, word, qword value respectively if it doesn't specify, it's a dword. Lastly if it has a 'p' then it's a constant pointer memory address.

If a value is entered as 'x' with x being an ascii character between 0-255 then the corresponding ascii value will be read into a byte value.

For e.g.:


15u : unsigned dword,
15ub : unsigned byte,
15ubp : unsigned byte pointer,

15b : signed byte, 15bp : signed byte ptr, 'a' : signed byte, '@' : signed byte, ....

Statements:


'if' and 'else' do what you would expect,
'let' allows you to set/define variables, '=' would set and '=>' would define a variable.
'ref' references a variable by pointer basically this in C++: *(ptr) = x;
'&' gets the address (returns ptr values).
'def' defines function.

Syntax:


if ( expr ) { body },
if (expr) { body } else { body },
let identifier = expr,
let identifier => expr,
ref expr = expr,
def identifier ( (param-name : param-type)* ) : type { body },
& expr,

Function overloads

When using 'def' to define a function, this function will not be indexed within the symbol table only using its identifier or name but using its complete signature, that is, a name composed of its identifier as well as a list of the types of its parameters in order. (for e.g. a function called 'test' that takes two bytes and a char would be 'test(byte,byte,char)' in the symbol table. Therefore you can have as many functions with the same name as you'd like as long as they have different parameter types or combinations thereof.

Then, in order to call a specific overload you use '<' and '>' as follows:


test(x) : calls test's byte overload with x,
test(x) : calls test's char overload with x,
test(x) : calls test's (char, byte, dword) overload with x,

Syntax:

function-name < param-type-list >(args);

Finally

Always end a statement with a ';', it doesn't matter if two or more statements are on the same line as long as they each end in ';', (kinda free-form)....

Examples:

This works:


let x => 5;
let y => x * 2;
let z => x + y * (x * 4 + (2 + x));
let x = 67;
def test(x : dword) : dword { let a => x * x; ret a + x; }
test(x + z);
if (z >= 2) { let z = 4; };
else { let z = 6; };
def test(x : dword) : dword
{
if (x >= 2) {
ret 1;
};
else {
ret -1;
};
}

test(55);

Kinda works but not entirely (function overloading):


def test(x : byte) : byte { let a => x * x; ret a + x; }
call test(x + z);
call test(x + z);

Doesn't compile properly (type-checking doesn't work but I'm too lazy to fix it):


def pt => &x;
ref pt => 6;

Basically most features work except for pointers (these don't compile/transpile properly to NASM assembly) and function overloading. This is a little personal project I don't intend to finish it but it's interesting nonetheless so I'm posting the full source code.

Note

I now that the source code is organized terribly that is due to this being a one-day project I worked on. It's anything but a serious project. It's meant to be organized kinda like a c-program but using C++ for strings and other features found in there. There's tons stuff you would never do in production code that would lead to very hard to debug code (like this!)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TBC+.cpp		TBC+.cpp
TBC+.sln		TBC+.sln
TBC+.vcxproj		TBC+.vcxproj
TBC+.vcxproj.filters		TBC+.vcxproj.filters
TBC+.vcxproj.user		TBC+.vcxproj.user
view.png		view.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Single-File-Compiler-From-Scratch-Recursive-Descent-LR-and-NASM-asm-gen

Grammar:

Summary:

Types are:

For e.g.:

Statements:

Syntax:

Function overloads

Then, in order to call a specific overload you use '<' and '>' as follows:

Syntax:

Finally

Examples:

This works:

Kinda works but not entirely (function overloading):

Doesn't compile properly (type-checking doesn't work but I'm too lazy to fix it):

Note

About

Releases

Packages

Languages

License

ncortiz/Compiler-Test

Folders and files

Latest commit

History

Repository files navigation

Single-File-Compiler-From-Scratch-Recursive-Descent-LR-and-NASM-asm-gen

Grammar:

Summary:

Types are:

For e.g.:

Statements:

Syntax:

Function overloads

Then, in order to call a specific overload you use '<' and '>' as follows:

Syntax:

Finally

Examples:

This works:

Kinda works but not entirely (function overloading):

Doesn't compile properly (type-checking doesn't work but I'm too lazy to fix it):

Note

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages