Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global variable code generation #170

Merged
merged 15 commits into from
Jul 7, 2024
Merged

Conversation

leewei05
Copy link
Contributor

@leewei05 leewei05 commented Jun 24, 2024

  • Introduce is_global member for DeclNode and ExprNode.
  • Update is_global in type checking stage.
  • Code generation support for global variables, global arrays. I'll update global struct and union after finishing their code generation part in block scope.
  • Since declaring and initializing global variables are outside of main scope, we need a solution for storing initialized values and types. Therefore, I create a new struct GlobalVarInitVal with members of value and type and a function GenerateQBEInit, so that we can generate corresponding QBE IR.
export data $a = align 4 { w 1, w 2, w 3, }
export data $c = align 4 { z 4 }
export
function w $main() {
@start.1
	%.5 =l alloc4 8
@body.2
	%.1 =w loadw $c
	%.2 =l extsw 0
	%.3 =l mul %.2, 4
	%.4 =l add $a, %.3
	storew 2, %.4
	storew 1, %.5
	%.6 =l add %.5, 4
	storew 1, %.6
	ret 0
}

P.S: I also found a bug for struct and union declaration. I'll fix that first before jumping into struct, union code generation.

@leewei05 leewei05 self-assigned this Jun 24, 2024
@leewei05 leewei05 mentioned this pull request Jun 24, 2024
3 tasks
@leewei05 leewei05 requested a review from Lai-YT June 24, 2024 08:23
Copy link
Collaborator

@Lai-YT Lai-YT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, in the type checker, declarations and their corresponding InitExpr are set as global if they are in file scope. Additionally, IdExpr and ArrSubExpr are set as global if they refer to global identifiers. However, since global variables can only be initialized with constant expressions (such as constant integers and enums), is it possible that we don't need to mark them specially? Constant expression nodes hold their value, so we can directly access those values when generating code, which means GlobalVarInitVal can be removed as well. For ArrSubExpr, we can check whether they refer to a global array during code generation.

With these changes, only DeclNode and IdExprNode need to hold the is_global data member. Additionall, we can merge the id_is_global map with the symbol table, making it one of its fields.

Please consider these possible modifications, and let me know if there's anything I might have misunderstood!

@leewei05
Copy link
Contributor Author

leewei05 commented Jul 6, 2024

If I understand correctly, in the type checker, declarations and their corresponding InitExpr are set as global if they are in file scope. Additionally, IdExpr and ArrSubExpr are set as global if they refer to global identifiers. However, since global variables can only be initialized with constant expressions (such as constant integers and enums), is it possible that we don't need to mark them specially?

Not possible with the current infrastructure. We still need to mark DeclNode and ExprNode with is_global data member, or else the compiler doesn't know which type of output to generate since global variable declaration and variable declaration in a block scope is completely different.

Constant expression nodes hold their value, so we can directly access those values when generating code, which means GlobalVarInitVal can be removed as well. For ArrSubExpr, we can check whether they refer to a global array during code generation.

It is doable for VarDeclNode but not possible for ArrDeclNode. As you can see in our AST dump output, VarDeclNode's initialization node is IntConstExprNode, which we can use cast decl.init node as IntConstExprNode and use its constant value. However, ArrDeclNode's initialization is a series of InitExprNode, we would need some way to get the integer value from the bottom level, unless you want every AST node to have a constant value member, which I don't prefer. And I haven't consider the case for SimpleAssignmentExprNode yet. Unless we can store some kind of data structure like llvm::Value*, I think we would still need additional data structure, such as GlobalVarInitVal, to implement global variable code generation.

       DeclStmtNode <14:3>
          VarDeclNode <14:7> b: int
            IntConstExprNode <14:11> 2: int
          VarDeclNode <14:14> c: int
            IntConstExprNode <14:18> 0: int
          ArrDeclNode <14:21> d: int[4]
            InitExprNode <14:29> int
              IntConstExprNode <14:29> 1: int
            InitExprNode <14:29> int
              IdExprNode <14:32> b: int
            InitExprNode <14:29> int
              SimpleAssignmentExprNode <14:37> int
                IdExprNode <14:35> c: int
                IntConstExprNode <14:39> 3: int
            InitExprNode <14:29> int
              IntConstExprNode <14:42> 4: int

Additionally, we can merge the id_is_global map with the symbol table, making it one of its fields.

  1. This would require some changes in scope.cpp and we would need to keep the global symbol table in type checking stage til the code generation stage.
  2. This approach also assumes that we can get the id of our expression. For example, a global pointer variable.
int b = 10;
int *ptr = &b;

int main() {
    *ptr;
    return 0;
}

There's a UnaryExprNode in the way of getting to IdExprNode, so dynamic_cast couldn't work and we cannot get the id to search for symbol in the symbol table.

TransUnitNode <1:1>
  ExternDeclNode <1:1>
    DeclStmtNode <1:1>
      VarDeclNode <1:5> b: int
        IntConstExprNode <1:9> 10: int
  ExternDeclNode <2:1>
    DeclStmtNode <2:1>
      VarDeclNode <2:6> ptr: int*
        UnaryExprNode <2:12> int* &
          IdExprNode <2:13> b: int
  ExternDeclNode <4:1>
    FuncDefNode <4:5> main: int ()
      CompoundStmtNode <4:12>
        ExprStmtNode <5:5>
          FuncCallExprNode <5:5> int
            IdExprNode <5:5> __builtin_print: int (int)
            ArgExprNode <5:21> int
              UnaryExprNode <5:21> int *
                IdExprNode <5:22> ptr: int*
        ReturnStmtNode <6:5>
          IntConstExprNode <6:12> 0: int

In conclusion, I agree that not every AST node needs is_global data member, so I can change to only DeclNode and ExprNode that have is_global data member. For your other suggestions, I prefer not to change it. Since LLVM is already ported as one of our backend, I prefer to focus on that. QBE will still be supported alongside LLVM in our future feature, but I don't want to do too much hacking on it.

@leewei05
Copy link
Contributor Author

leewei05 commented Jul 6, 2024

Btw, I haven't rebase and implemented LLVM's part yet! 🫡

Copy link
Collaborator

@Lai-YT Lai-YT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. 🙆‍♂️ Let's only restrict the scope of is_global and keep the others as-is.

src/qbe_ir_generator.cpp Show resolved Hide resolved
src/type_checker.cpp Outdated Show resolved Hide resolved
@leewei05 leewei05 requested a review from Lai-YT July 6, 2024 10:36
Copy link
Collaborator

@Lai-YT Lai-YT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for implementing these; this is yet another complicated PR. 😭

src/llvm_ir_generator.cpp Outdated Show resolved Hide resolved
src/llvm_ir_generator.cpp Outdated Show resolved Hide resolved
src/llvm_ir_generator.cpp Show resolved Hide resolved
src/qbe_ir_generator.cpp Outdated Show resolved Hide resolved
src/qbe_ir_generator.cpp Outdated Show resolved Hide resolved
src/llvm_ir_generator.cpp Show resolved Hide resolved
test/codegen/global_decl.c Outdated Show resolved Hide resolved
@leewei05 leewei05 requested a review from Lai-YT July 7, 2024 13:52
src/llvm_ir_generator.cpp Outdated Show resolved Hide resolved
src/llvm_ir_generator.cpp Outdated Show resolved Hide resolved
src/llvm_ir_generator.cpp Show resolved Hide resolved
src/qbe_ir_generator.cpp Outdated Show resolved Hide resolved
src/qbe_ir_generator.cpp Outdated Show resolved Hide resolved
test/codegen/array.c Show resolved Hide resolved
src/qbe_ir_generator.cpp Outdated Show resolved Hide resolved
@leewei05 leewei05 requested a review from Lai-YT July 7, 2024 15:08
Copy link
Collaborator

@Lai-YT Lai-YT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@Lai-YT Lai-YT merged commit 9c7ee3a into fruits-lab:main Jul 7, 2024
8 checks passed
@leewei05 leewei05 deleted the global-code-gen branch July 7, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants