Threaded code: Variations

Address lists: cfa’s or pfa’s?

In my introduction to threaded code I mentioned that colon word bodies consist of a list of cfa’s (code field addresses), but that sometimes it makes sense to use pfa’s instead. We are going to explore that idea here.

First, let’s recap the definitions:

The address of a code field is called, unsurprisingly, a code field address – usually abbreviated cfa.

The address of a parameter field is called, also unsurprisingly, a parameter field address – usually abbreviated pfa.

In a DTC system, a cfa points to a code field, which contains machine code.

In an ITC system, a cfa points to a code field, which points to machine code.

The main reason for using cfa’s in address lists is that it’s the obvious thing to do. The cfa points to the code field, which we either jump to (DTC) or jump through (ITC). Why would we want to use anything else?

The answer is suggested not by the code of NEXT, but by NEST, VARIABLE, and CONSTANT – the “runtime” routines for colon words, variables, and constants, respectively. Let’s look at them again:

  DTC_NEST: Push IP onto call stack
            Set IP to W + sizeof(Jump)
            Execute NEXT

  ITC_NEST: Push IP onto call stack
            Set IP to W + address size
            Execute NEXT

  VARIABLE: Add to W the size of the code field (ie, calculate pfa)
            Push W onto the data stack (push pfa)
            Execute NEXT

  CONSTANT: Add to W the size of the code field (calculate pfa)
            Fetch value pointed to by W into W (fetch contents of parameter field)
            Push W onto the data stack
            Execute NEXT

Whether calculating the new IP in NEST, or calculating the address of the value in VARIABLE and CONSTANT, we are always adding an offset to W. Because of how NEXT works, W contains the cfa. We want the pfa.

Whenever a programmer creates a new data type that has an assembler runtime (using ;code) they have to do this same arithmetic on W. Why not do it one place?

The natural “one place” is in NEXT. So what would we change? Our aim is to exit NEXT with W containing a pfa instead of a cfa.

Let’s look at DTC first. Here is DTC NEXT:

  NEXT:  Fetch into W the address pointed to by IP
         Increment IP by address size
         Jump to W

We have conflicting requirements here. For DTC we have to jump to the code field. We have no choice there. But somehow we want W to point to the parameter field – one memory cell higher. How to do both?

If the address list contains cfa’s, then the first step of NEXT will put a cfa into W. Can we jump to W and increment it in one instruction? Not likely.

What about using pfa’s in the address list. Does this help? W will have the right value when we exit from NEXT, but can we jump to W minus an offset? It depends on the machine. And it might make NEXT longer – possibly significantly longer.

On some machines the best way to do DTC NEXT doesn’t involve W at all. For example, on the MSP430 the shortest DTC NEXT is one instruction:

  mov pc, (ip)+

This does the auto-increment of IP and the jump to the fetched cfa, but leaves to the runtimes the computation of the pfa. This approach requires that code fields contain calls rather than jumps in order to “capture” the pfa somewhere (it gets pushed onto the machine’s call stack).

DTC doesn’t offer a lot of wiggle room. Maybe ITC is better. Let’s take a look.

Here is ITC NEXT:

  NEXT:  Fetch into W the address pointed to by IP
         Increment IP by address size
         Fetch into X the address pointed to by W
         Jump to X

As we did with DTC, let’s start with the case where address lists contain cfa’s. Just as with DTC NEXT, the first step of ITC NEXT loads a cfa into W. On a machine with auto-increment addressing we can have the “Fetch into X” step increment W so that when we get to NEST (or one of the other “runtimes”) W will contain the pfa.

It might take an extra cycle (it does on the MSP430) but because in every case except executing a code word we will have to take at least a cycle to do arithmetic on W, it might make sense to auto-increment instead. It’s also less error-prone to do it, correctly, in one place. (This is what I have done in the MSP430 ITC kernel.)

If we don’t have auto-increment (eg, most true RISC machines – ARM (excepting v6-M) doesn’t count ;-) but can use a negative offset in our memory access instructions, then it makes sense to use pfa’s in our address lists instead of cfa’s. If we do this, the first step of NEXT loads W with a pfa, which is what we want. But the “Fetch into X” step needs to load from the cfa instead, so we offset backwards by the size of an address. This sounds convoluted, but it’s actually the most efficient approach.

Here is that version of ITC NEXT:

  NEXT:  Fetch into W the address pointed to by IP (this is a pfa, not a cfa!)
         Increment IP by address size
         Fetch into X the address pointed to by (W - address size)
         Jump to X

This is what I have done for the RISC-V ITC NEXT.

You have to work with the strengths and against the weaknesses of your architecture – not the other way around! There is no fixed answer. Using cfa’s is not always the “perfect and beautiful” approach. Nor is using pfa’s. It pays to tinker and draw pictures and pay attention to instruction cycle-times and sizes.