A VHDL implementation of the Minimal Machine processor taught at the Karlsruhe Institute of Technology.
- Introduction
- Features
- VHDL Component description
- How to get this running on your FPGA?
- MIMA Specification
- Advanced description of the MIMA architecture
- Mentions
- More stuff to work on
The Minimal Machine is not just a hypothetical processor anymore. With this project the MIMA becomes a real processor, capable of running even complex programs up to a virtual machine or even an operating system. I've provided not just the processor itself, but embedded it in an environment with memory, a 512 by 256 pixel memory mapped screen, a ps/2 interface to connect a keyboard, a register monitoring suite and clock speed control. The memory mapped screen and the monitoring suite are accessible via a VGA interface and can be switched between. To fully enjoy this design, your FPGA board should be equipped with four buttons, a VGA connector and if possible a ps/2 connector.
- register monitoring suite: The monitoring suite lets you display all the necessary information on your VGA monitor. It shows all the register states, the current micro instruction and the address of the next micro instruction to be executed. It also displays the MIMAs clock frequency. To fully enjoy this, clock the processor down to a slow speed, load a program and enjoy!
- memory mapped screen: The screen is 512x256 pixels in size. To display something you can have the MIMA write to the addresses 0x04000 to 0x5FFF. Every bit in that area represents a pixel on screen, the organization is as follows: Every word in this segment is 16 bit wide, one word represents a 16 pixel array on screen. 32 such arrays make up one row of the screen.
- memory: The MIMA can address 1MiB of Memory containing 24 bit data words. This is far too much for a FPGA that a student could afford. I decided to scale this down and split the memory in distinct segments to serve different purposes:
- heap spans from 0x00001 to 0x03FFF, 16K locations, 16 bit words, not sign extended (yet)
- screen memory map spans from 0x04000 to 0x05FFF, 8K locations, 16 bit words, write only
- keyborad mapping is a single memory location at 0x06000, read only
- program rom spans from 0x08000 to 0x0FFFF, 32K locations, 24 bit words, read only
- address 0x00000 is reserved for a JMP 0x08000 instruction, which is executed first at startup and will let the MIMA perform a jump to the program ROM
- clock control: Assign the speed_up and speed_down inputs of the design to buttons and change the clock speed in a range from 1 Hz to 20 MHz.
- keyboard support: The keyboard can be connected via a ps/2 interface. The ASCII value of the last pressed key is written to location 0x06000 in memory and can be accessed there.
The Project is separated into two folders. The mima_processor folder only contains the VHDL files for the MIMA processor itself, without any additional functionality, implemented exactly how it's specified. This design does not use any cores that have to be regenerated, so it should work by selecting the mimaProcessor.vhd as top module.
The mima_environment folder contains vhdl files that give the above described additional functionality. This one is a bit trickier to get running, as it requires the regeneration of some cores. First you will need to include the files from both folders into your project and select the "MIMAEnvironment.vhd" as top module.
Next you will have to regenerate the clock frequency core located in the top Module "MIMAEnvironment.vhd". The output clock must be running at 40 MHz, this is required for the VGA syncer component to generate the proper VGA timing. Next up are three cores located in the "MemoryController.vhd". These cores provide the MIMA with memory. As my developement FPGA was not equipped with enough RAM blocks, I had to shrink the MIMAs accessible memory space. There are three memory regions the MIMA can access: The program ROM, a heap area and the screen memory. The program memory in my design is read only and was generated using distributed ROM. Program ROM data words are 24 bits wide. The heap area is a simple single port ram, containing 16 bit words. For the screen memory use a simple dual port ram, the MIMA can write to it while the syncer component has to be able to acess that ram asynchronously to retrieve the correct pixel data. The screen memory words are 16 bit in width as well.
Finally choose a program, load it into the program ROM and enjoy! I have provided a few example programs in the example_programs folder, these are stored in xilinx .coe files. If you are working with a different fpga brand you should be fine by opening it with a standard editor and copying out the program code. Apart from that feel free to write own programs and run them, just remember that the program memory starts at address 0x8000!
Make sure all memory cores have exactly one cycle delay for read access (the screen memory too)! Doing otherwise might result in reading the wrong data from memory!
- Acc: Accumulator
- X: first ALU operand register
- Y: second ALU operand register
- Z: ALU result register
- One: constant 1
- IAR: instruction address register
- IR: instruction register
- SAR: storage address register
- SDR: storage data register
ALU control | Operation |
---|---|
000 | do nothing (0 --> Z) |
001 | X + Y --> Z |
010 | rotate X to the right --> Z |
011 | X and Y --> Z |
100 | X or Y --> Z |
101 | X xor Y --> Z |
110 | not X --> Z |
111 | Z <-- (X == Y) ? -1 : 0 |
20 bit operand format:
OpCode | Operand
oooo xxxx xxxx xxxx xxxx xxxx
No operand format:
OpCode |
1111 oooo xxxx xxxx xxxx xxxx
OpCode | Mnemonik | Description |
---|---|---|
0 | LDC c | c --> Acc |
1 | LDV a | [a] --> Acc |
2 | STV a | Acc --> [a] |
3 | ADD a | Acc + [a] --> Acc |
4 | AND a | Acc and [a] --> Acc |
5 | OR a | Acc or [a] --> Acc |
6 | XOR a | Acc xor [a] --> Acc |
7 | EQL a | if (Acc == [a]) {-1 --> Acc} else {0 --> Acc} |
8 | JMP a | jump to address a |
9 | JMN a | jump to address a if Acc < 0 |
A | LDIV a | [[a]] --> Acc |
B | STIV a | Acc --> [[a]] |
C | JMS a | jump subroutine (see below) |
D | JIND a | jump indirect (see below) |
E | free | |
F0 | HALT | stops the minimal machine |
F1 | NOT | not Acc --> Acc |
F2 | RAR | rotates Acc on the the right --> Acc |
F3 - FF | free |
- Bits shifted out right within a RAR command will be pushed in at the left.
- The instruction JMS target saves the address of the succeeding instruction (return address) to the address given by target and initiates a jump to target + 1.
- JIND target initiates a jump to the address which is stored at the target address. (JMP [target])
- [a]: value at the address a, [[a]]: value at the adress pointed at by [a]
This implementation strictly follows the specifications that were provided in the lectures and tutorials taught at the KIT. Even the micro instructions are the original ones and not optimized, though this might be addressed in the future. This way it is ensured that the implemented processor closely represents the capabilities of the one that is taught in the lectures. In addition it is easier using the monitoring suite to follow what is happening.
The first micro instructions executed are located starting from address 0 in the micro instruction ROM, which is the FETCH function. It will load the next machine instruction from memory into the IR. After an instruction fetch, the next micro instruction to be executed resides at address 255. This instruction is special, as the control unit will now look up the function entry point of the micro function, which is located somewhere in the micro instruction ROM and execute the first instruction of that function next. The function entry points are stored in the function entry ROM and looked up based off the OpCode of the machine instruction that was just loaded in the IR. From there on the function will be executed and finally jump back to the FETCH function.
The micro instructions are the instructions controlling which registers can write to the bus, read from the bus, control the ALU operations and send read or write requests to the memory. Each micro instruction also has a NEXT_ADDR field specifying the address of the next micro instruction to be executed. The micro instruction is a 28 bit wide vector where every bit has a specific function, as described below.
MICRO INSTRUCTION FORMAT:
Ar Aw X Y Z E Pr Pw Ir Iw Dr Dw S CCC R W RR NEXT_ADDR
x x x x x x x x x x x x x xxx x x 00 xxxxxxxx
- You might have noticed that the micro instruction contains two reserved bits, which are not used and thus not implemented or further explained.
Data bus access; for convenience read means reading from the bus, then setting the register to that value and write means writing to the bus:
Ar: Accu reads
Aw: Accu writes
X: X register reads
Y : Y register reads
Z: Z register writes
E: One register writes
Pr: IAR reads
Pw: IAR writes
Ir: IR reads
Iw: IR writes
Dr: SDR reads
Dw: SDR writes
S: SAR reads
For the ALU control:
CCC: ALU control signals
For memory access:
R: read request
W: write request
The actual used addresses in the micro instrucrion ROM:
ADDR: location of the instraction in the micro instruction ROM
NEXT_ADDR: next address to execute
All the currently supported micro instructions implemented in hardware:
ADDR |OPC |FUNC |Ar Aw X Y Z E Pr Pw Ir Iw Dr Dw S CCC R W |NEXT_ADDR
0 |FETCH |IAR -> SAR, X; R = '1' |0 0 1 0 0 0 0 1 0 0 0 0 1 000 1 0 |1
1 |E -> Y; R = '1' |0 0 0 1 0 1 0 0 0 0 0 0 0 000 1 0 |2
2 |ADD; R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 001 1 0 |3
3 |Z -> IAR |0 0 0 0 1 0 1 0 0 0 0 0 0 000 0 0 |4
4 |SDR -> IR |0 0 0 0 0 0 0 0 1 0 0 1 0 000 0 0 |255
8 |LDC c |IR -> Acc |1 0 0 0 0 0 0 0 0 1 0 0 0 000 0 0 |0
12 |LDV a |IR -> SAR; R = '1' |0 0 0 0 0 0 0 0 0 1 0 0 1 000 1 0 |13
13 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |14
14 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |15
15 |SDR -> Acc |1 0 0 0 0 0 0 0 0 0 0 1 0 000 0 0 |0
20 |STV a |Acc -> SDR |0 1 0 0 0 0 0 0 0 0 1 0 0 000 0 0 |21
21 |IR -> SAR; W = '1' |0 0 0 0 0 0 0 0 0 1 0 0 1 000 0 1 |22
22 |W = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 0 1 |23
23 |W = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 0 1 |0
28 |ADD a |IR -> SAR; R = '1' |0 0 0 0 0 0 0 0 0 1 0 0 1 000 1 0 |29
28 |Acc -> X; R = '1' |0 1 1 0 0 0 0 0 0 0 0 0 0 000 1 0 |30
30 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |31
31 |SDR -> Y |0 0 0 1 0 0 0 0 0 0 0 1 0 000 0 0 |32
32 |ADD |0 0 0 0 0 0 0 0 0 0 0 0 0 001 0 0 |33
33 |Z -> Acc |1 0 0 0 1 0 0 0 0 0 0 0 0 000 0 0 |0
36 |AND a |IR -> SAR; D = '1' |0 0 0 0 0 0 0 0 0 1 0 0 1 000 1 0 |37
37 |Acc -> X; R = '1' |0 1 1 0 0 0 0 0 0 0 0 0 0 000 1 0 |38
38 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |39
39 |SDR -> Y |0 0 0 1 0 0 0 0 0 0 0 1 0 000 0 0 |40
40 |AND |0 0 0 0 0 0 0 0 0 0 0 0 0 011 0 0 |41
41 |Z -> Acc |1 0 0 0 1 0 0 0 0 0 0 0 0 000 0 0 |0
44 |OR a |IR -> SAR; D = '1' |0 0 0 0 0 0 0 0 0 1 0 0 1 000 1 0 |45
45 |Acc -> X; R = '1' |0 1 1 0 0 0 0 0 0 0 0 0 0 000 1 0 |46
46 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |47
47 |SDR -> Y |0 0 0 1 0 0 0 0 0 0 0 1 0 000 0 0 |48
48 |OR |0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 |49
49 |Z -> Acc |1 0 0 0 1 0 0 0 0 0 0 0 0 000 0 0 |0
52 |XOR a |IR -> SAR; D = '1' |0 0 0 0 0 0 0 0 0 1 0 0 1 000 1 0 |53
53 |Acc -> X; R = '1' |0 1 1 0 0 0 0 0 0 0 0 0 0 000 1 0 |54
54 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |55
55 |SDR -> Y |0 0 0 1 0 0 0 0 0 0 0 1 0 000 0 0 |56
56 |XOR |0 0 0 0 0 0 0 0 0 0 0 0 0 101 0 0 |57
57 |Z -> Acc |1 0 0 0 1 0 0 0 0 0 0 0 0 000 0 0 |0
60 |EQL a |IR -> SAR; R = '1' |0 0 0 0 0 0 0 0 0 1 0 0 1 000 1 0 |61
61 |Acc -> X; R = '1' |0 1 1 0 0 0 0 0 0 0 0 0 0 000 1 0 |62
62 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |63
63 |SDR -> Y |0 0 0 1 0 0 0 0 0 0 0 1 0 000 0 0 |64
64 |EQL |0 0 0 0 0 0 0 0 0 0 0 0 0 111 0 0 |65
65 |Z -> Acc |1 0 0 0 1 0 0 0 0 0 0 0 0 000 0 0 |0
68 |JMP a |IR -> IAR |0 0 0 0 0 0 1 0 0 1 0 0 0 000 0 0 |0
72 |JMN a |E -> X |0 0 1 0 0 1 0 0 0 0 0 0 0 000 0 0 |73
73 |ROR; Acc -> Y |0 1 0 1 0 0 0 0 0 0 0 0 0 010 0 0 |74
74 |Z -> X |0 0 1 0 1 0 0 0 0 0 0 0 0 000 0 0 |75
75 |AND |0 0 0 0 0 0 0 0 0 0 0 0 0 011 0 0 |76
76 |Z -> Y |0 0 0 1 1 0 0 0 0 0 0 0 0 000 0 0 |77
77 |EQL; R -> Y |0 0 0 1 0 0 0 0 0 1 0 0 0 111 0 0 |78
78 |Z -> X |0 0 1 0 1 0 0 0 0 0 0 0 0 000 0 0 |79
79 |AND; IAR -> Y |0 0 0 1 0 0 0 1 0 0 0 0 0 011 0 0 |80
80 |NOT; Z -> SDR |0 0 0 0 1 0 0 0 0 0 1 0 0 110 0 0 |81
81 |Z -> X |0 0 1 0 1 0 0 0 0 0 0 0 0 000 0 0 |82
82 |AND; SDR -> X |0 0 1 0 0 0 0 0 0 0 0 1 0 011 0 0 |83
83 |Z -> Y |0 0 0 1 1 0 0 0 0 0 0 0 0 000 0 0 |84
84 |OR |0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 |85
85 |Z -> IAR |0 0 0 0 1 0 1 0 0 0 0 0 0 000 0 0 |0
88 |NOT |Acc -> X |0 1 1 0 0 0 0 0 0 0 0 0 0 000 0 0 |89
89 |NOT |0 0 0 0 0 0 0 0 0 0 0 0 0 110 0 0 |90
90 |Z -> Acc |1 0 0 0 1 0 0 0 0 0 0 0 0 000 0 0 |0
96 |RAR |Acc -> X |0 1 1 0 0 0 0 0 0 0 0 0 0 000 0 0 |97
97 |ROR |0 0 0 0 0 0 0 0 0 0 0 0 0 010 0 0 |98
98 |Z -> Acc |1 0 0 0 1 0 0 0 0 0 0 0 0 000 0 0 |0
100 |LDIV a |IR -> SAR; R = '1' |0 0 0 0 0 0 0 0 0 1 0 0 1 000 1 0 |101
101 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |102
102 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |103
103 |SDR -> SAR; R = '1' |0 0 0 0 0 0 0 0 0 0 0 1 1 000 1 0 |104
104 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |105
105 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |106
106 |SDR -> Acc |1 0 0 0 0 0 0 0 0 0 0 1 0 000 0 0 |0
112 |STIV a |IR -> SAR; R = '1' |0 0 0 0 0 0 0 0 0 1 0 0 1 000 1 0 |113
113 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |114
114 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |115
115 |SDR -> SAR |0 0 0 0 0 0 0 0 0 0 0 1 1 000 0 0 |116
116 |Acc -> SDR; W = '1' |0 1 0 0 0 0 0 0 0 0 1 0 0 000 0 1 |117
117 |W = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 0 1 |118
118 |W = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 0 1 |0
124 |JMS a |IAR -> SDR |0 0 0 0 0 0 0 1 0 0 1 0 0 000 0 0 |125
125 |IR -> SAR, X; W = '1' |0 0 0 0 0 0 0 0 0 1 0 0 1 000 0 1 |126
126 |E -> Y; W = '1' |0 0 0 1 0 1 0 0 0 0 0 0 0 000 0 1 |127
127 |ADD; W = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 001 0 1 |128
128 |Z -> SAR; R = '1' |0 0 0 0 1 0 0 0 0 0 0 0 1 000 1 0 |129
129 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |130
130 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |131
131 |SDR -> IAR |0 0 0 0 0 0 1 0 0 0 0 1 0 000 0 0 |0
136 |JIND a |IR -> SAR; R = '1' |0 0 0 0 0 0 0 0 0 1 0 0 1 000 1 0 |137
137 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |138
138 |R = '1' |0 0 0 0 0 0 0 0 0 0 0 0 0 000 1 0 |139
139 |SDR -> IAR |0 0 0 0 0 0 1 0 0 0 0 1 0 000 0 0 |0
255 |reserved for the cu to jump |0 0 0 0 0 0 0 0 0 0 0 0 0 000 0 0 |0
|to the decoded functions
- The MIMA architecture diagram displays a negative flag from the accumulator register, but it is nowhere specified how it is addressed. Therefore, it is not implemented, leading to the JMN instructions implementation being a bit more complex. To function correctly the JMN instruction uses the SDR as temporary register, therefore the memory controller must only write to the SDR after a read request!
- The above described instructions are not optimized to closely mimic the instructions taught in the lectures. But due to the implementation of the bus using multiplexers, for instance the OR instruction could be shortened by writing the Accumulator and the SDR to the bus simultaniously, not using the ALU at all!
- Memory reads or writes take 3 cycles.
- After pulling the Read signal high for 3 cycles, the read data has to be present in the SDR beginning of the 4th cycle.
- The read and write signals can already be pulled high in the same cycle that is still written to the SAR or SDR register.
The last point might lead to confusion and misbehaviour if not considered, as it is perfectly valid to write to the SAR or SDR registers already at the same cycle the write or read signals for the memory are high. The problem is that registers are only updated at the rising edge of the next cycle, but the write and read signals are routed directly from the control unit out of the processor, meaning that the memory controller has to take care of delaying these two signals by one cycle, otherwise faulty reads or worse, writes to wrong memory locations could occur. For example take a look at the implementation of the STV instruction:
STV a:
1: Acc -> SDR
2: IR -> SAR; W = '1'
3: W = '1'
4: W = '1'
Consider the memory controller wouldn't add the necessary delay stage for the write signal, the controller would get the signal to write already in cycle 2, whereas the SAR is updated at the beginning of cycle 3. Thus the memory would write to the address that was stored in the SAR before cycle 3.
- Three files from the ps/2 interface are from eewiki
- To compile your MIMA programs online check out this awesome project
- design a compiler to run higher level programs
- optimize hardware and micro instructions to create a faster design
- develop an extended instruction set
- single step mode
- make heap memory sign extend on reads