11
Experimental / Suggested calling convention
« Last post by Jubatian on August 03, 2014, 09:43:35 PM »After quite a while of experimenting with the RRPGE CPU, maybe it is time to determine some useful calling conventions, that is, how function calls should be formatted, and how higher level languages may work with them.
Note that the specification lists instruction timings for the CPU, which will be relied upon here. Although I did not release how I sketched up the internal operation of the processor yet, it exists (defining a weakly pipelined construction), so these timings have some ground.
How functions work on this CPU
The RRPGE CPU defines specific opcodes for function call (JFA, JFR and JSV) which all work by the same base idea. They establish a new stack frame, and may also contain up to 16 function parameters as extension to the opcode. The function entry normally takes 9 cycles, then each parameter fetch another 4 cycles. Since when establishing the new stack frame, also involving pushing the return address, the number of parameters are not known, the return address will point at the first function parameter. No problem since the parameters can be formatted as NOPs, so the CPU simply skims over them again at an 1 cycle / instruction rate. So in total normally 5 cycles are taken for each parameter, which parameters will be accessible on the stack (simply by BP relative addressing) within the function body.
There is no way to manually build up a new stack frame, no push and pop, so this is the only way to do a function call. However it leaves many things not set in stone, which should be carved out to get a functional calling convention.
Components of a calling convention to consider
The way how the function must be called is defined above, however the following further details are left to the user program to work out:
Some more on the stack
After the JFL / JFA opcode, at the start of the function, the stack pointer (SP) is set up to point above the last parameter. This ensures that should anything happen (for example an interrupt using the user mode stack), the function's stack is not clobbered. When working with the stack, the stack pointer, if necessary, needs to be altered to keep this property (that it always points after the last used value).
Note that since the SP points after the last parameter on entry, it's value is the number of parameters passed to the function (SP is always to be interpreted BP relative). To acquire this value, an area is necessary to hold it, however unless clobbering something (such as one of the registers), such an area is not available. This needs to be considered when designing the calling convention. Note however that normally for fixed parameter count functions, reading the SP may not be necessary at all.
What to clobber if anything: some notes on the CPU registers
The RRPGE CPU has 8 general purpose registers, which by use may be divided in 3 groups:
The role differences end here: apart from these, all 8 registers operate the same way, available in any operation the same way.
Allowing a register to be clobbered takes it away from the caller, while making the life of the function code easier (no need to save / restore it). The return value convention may also be taken in account when deciding what to permit for clobbering: obviously if the return is generated into a register, that register need not be preserved. The supervisor call specification may also be followed here which uses R3 and sometimes C for return.
Allowing clobbering C is a quite natural decision. Due to it's special purpose which only this register can fulfil, it is not very likely that it is viable to use it to hold values in long term, so the caller won't quite miss it if functions would trash it.
The pointer registers (X0 - X3) on the other hand are quite eligible for long-term uses, however their special role may also make it quite likely that they will be necessary in function bodies as well.
Should all parameters be passed in the call parameter list
As described above, a function parameter normally takes 5 cycles (4 cycles when passing, 1 cycle when skimming over after return). If the same parameter was passed in a register, it would take 3 cycles only, assuming the necessity of a simple MOV. Truly this is not a huge gain as an average instruction takes 4 cycles.
An another problem with passing parameters in registers that it potentially takes away registers from the caller, while unless a freely usable (clobber-able) register remains for the function, it would have problems with extracting the value of SP (for dealing with variable number of arguments).
Where to return
Since the management of stack frames is quite rigid by the RRPGE CPU, producing a return value on the stack is not possible. The only reasonable way to return is by using registers.
The supervisor calls are defined to use R3 and sometimes C to return. This might be followed, but may be deviated from as well. Using C for return is a quite natural choice since this register is also the easiest to miss by the caller (see above, at clobbering). However if it would be nice to give a pointer register for the function to work with (which is even necessary for supporting a high-level object-oriented language), using that for return is also something quite natural.
For returning 32 bit values, using C as the high part of the return also would work out well in function bodies if they can generate this part simply by an instruction producing a carry.
The calling convention
On the above bases I would suggest the following calling convention for general use:
Variable number of arguments is supported by this convention, since the register C is free in the function body to store the initial value of SP (the parameter count) in it.
The object-oriented convention making the function accept the "this" pointer in X3 is optimal for most uses in that term it is not very likely that the caller would want to repeatedly call methods on the same object (so that X3 may be clobbered, or be part of the return value usually should not be an obstacle for the caller).
The 16 bit post-incrementing mode defined to be excepted by functions for X3 is usually the most useful pointer mode. It allows to walk the data sequentially. Defining it to this fixed value both allows the function to rely on it, and makes it unnecessary to save it even if the function wants to use an other pointer mode (since it can simply restore it to be 16 bit post-incrementing). In 16 bit pointer modes the appropriate XH part is not used, so by giving X3 to the functions, it is not very likely that the caller would need it's high part in XH3. So it is allowed to be used (clobbered) by the functions freely, making it somewhat easier to work with sub-word addressing.
Note: Post edited to reflect current specifications (at the time of edit, 00.015.001), where this calling convention is present
Note that the specification lists instruction timings for the CPU, which will be relied upon here. Although I did not release how I sketched up the internal operation of the processor yet, it exists (defining a weakly pipelined construction), so these timings have some ground.
How functions work on this CPU
The RRPGE CPU defines specific opcodes for function call (JFA, JFR and JSV) which all work by the same base idea. They establish a new stack frame, and may also contain up to 16 function parameters as extension to the opcode. The function entry normally takes 9 cycles, then each parameter fetch another 4 cycles. Since when establishing the new stack frame, also involving pushing the return address, the number of parameters are not known, the return address will point at the first function parameter. No problem since the parameters can be formatted as NOPs, so the CPU simply skims over them again at an 1 cycle / instruction rate. So in total normally 5 cycles are taken for each parameter, which parameters will be accessible on the stack (simply by BP relative addressing) within the function body.
There is no way to manually build up a new stack frame, no push and pop, so this is the only way to do a function call. However it leaves many things not set in stone, which should be carved out to get a functional calling convention.
Components of a calling convention to consider
The way how the function must be called is defined above, however the following further details are left to the user program to work out:
- How to produce a return value. The RFN (return from function) opcode does not mandate anything for this.
- Which registers should be preserved, which should be allowed to be clobbered during the execution of the function.
- The function call opcode gives a possibility for variable number of arguments. How to exploit this in a defined manner.
- For object-oriented high level languages, a mean for passing a "this" pointer should be defined.
Some more on the stack
After the JFL / JFA opcode, at the start of the function, the stack pointer (SP) is set up to point above the last parameter. This ensures that should anything happen (for example an interrupt using the user mode stack), the function's stack is not clobbered. When working with the stack, the stack pointer, if necessary, needs to be altered to keep this property (that it always points after the last used value).
Note that since the SP points after the last parameter on entry, it's value is the number of parameters passed to the function (SP is always to be interpreted BP relative). To acquire this value, an area is necessary to hold it, however unless clobbering something (such as one of the registers), such an area is not available. This needs to be considered when designing the calling convention. Note however that normally for fixed parameter count functions, reading the SP may not be necessary at all.
What to clobber if anything: some notes on the CPU registers
The RRPGE CPU has 8 general purpose registers, which by use may be divided in 3 groups:
- Registers A, B and D: These registers can not be used as pointers.
- Register C: Some operations may use it as carry, or produce a carry in it. Can not be used as pointer.
- Registers X0, X1, X2 and X3: Only these registers may be pointers.
The role differences end here: apart from these, all 8 registers operate the same way, available in any operation the same way.
Allowing a register to be clobbered takes it away from the caller, while making the life of the function code easier (no need to save / restore it). The return value convention may also be taken in account when deciding what to permit for clobbering: obviously if the return is generated into a register, that register need not be preserved. The supervisor call specification may also be followed here which uses R3 and sometimes C for return.
Allowing clobbering C is a quite natural decision. Due to it's special purpose which only this register can fulfil, it is not very likely that it is viable to use it to hold values in long term, so the caller won't quite miss it if functions would trash it.
The pointer registers (X0 - X3) on the other hand are quite eligible for long-term uses, however their special role may also make it quite likely that they will be necessary in function bodies as well.
Should all parameters be passed in the call parameter list
As described above, a function parameter normally takes 5 cycles (4 cycles when passing, 1 cycle when skimming over after return). If the same parameter was passed in a register, it would take 3 cycles only, assuming the necessity of a simple MOV. Truly this is not a huge gain as an average instruction takes 4 cycles.
An another problem with passing parameters in registers that it potentially takes away registers from the caller, while unless a freely usable (clobber-able) register remains for the function, it would have problems with extracting the value of SP (for dealing with variable number of arguments).
Where to return
Since the management of stack frames is quite rigid by the RRPGE CPU, producing a return value on the stack is not possible. The only reasonable way to return is by using registers.
The supervisor calls are defined to use R3 and sometimes C to return. This might be followed, but may be deviated from as well. Using C for return is a quite natural choice since this register is also the easiest to miss by the caller (see above, at clobbering). However if it would be nice to give a pointer register for the function to work with (which is even necessary for supporting a high-level object-oriented language), using that for return is also something quite natural.
For returning 32 bit values, using C as the high part of the return also would work out well in function bodies if they can generate this part simply by an instruction producing a carry.
The calling convention
On the above bases I would suggest the following calling convention for general use:
- All parameters are to be passed as part of the function call opcode, in the case of 32 bit parameters, high word first.
- Registers C, X3 and XH3 may be clobbered by the function.
- Register XM3 (pointer mode for X3) can be assumed to be set PTR16I (16 bit, post-incrementing) by the function, and it must preserve this (XM must not be clobbered).
- Return value (if there is any) may be generated in X3 (16 bits), or C:X3 (32 bits, high word in C).
- For object-oriented use, the "this" pointer is passed in X3 (and may be clobbered).
Variable number of arguments is supported by this convention, since the register C is free in the function body to store the initial value of SP (the parameter count) in it.
The object-oriented convention making the function accept the "this" pointer in X3 is optimal for most uses in that term it is not very likely that the caller would want to repeatedly call methods on the same object (so that X3 may be clobbered, or be part of the return value usually should not be an obstacle for the caller).
The 16 bit post-incrementing mode defined to be excepted by functions for X3 is usually the most useful pointer mode. It allows to walk the data sequentially. Defining it to this fixed value both allows the function to rely on it, and makes it unnecessary to save it even if the function wants to use an other pointer mode (since it can simply restore it to be 16 bit post-incrementing). In 16 bit pointer modes the appropriate XH part is not used, so by giving X3 to the functions, it is not very likely that the caller would need it's high part in XH3. So it is allowed to be used (clobbered) by the functions freely, making it somewhat easier to work with sub-word addressing.
Note: Post edited to reflect current specifications (at the time of edit, 00.015.001), where this calling convention is present