Macro Machines

1.     What is a "Macro Machine"?
2.     What is a Macro Machine good for?
3.     How can one TM simulate another TM?
4.     How are Macro Machines constructed?
4.1     Multi-Symbol Macro Machines
4.2     Backward-Symbol Macro Machines
4.3     Pushing Multiple Macro Machines
5.     But, how are Transitions of Macro Machines defined?
5.1     Transitions for Multi-Symbol Macro Machines
5.2     Failure to define a Transition
5.3     How to detect endless Loops with limited Tapes?
5.4     Transitions for Backward-Symbol Macro Machines
6.     Macro Machines and Tape Cell Repetition Counts
7.     Lazy Transition Computation
8.     Further Reading

You may also want to consult my glossary.

1. What is a "Macro Machine"?

A "macro machine" (MM) is a Turing Machine (TM), which is constructed in a certain way. The construction of a MM is always based on some other TM, which the MM shall "simulate".

The MM techniques described here are used in the awk program to generate simulations of many record TMs. That awk program is accessible from the simulations page.

2. What is a Macro Machine good for?

In short: it shall be faster.

If you want to run a TM like this monster till the end, there is not much hope to do it with special hardware, but also not with software which does not somehow accelerate the simulation. MMs are an important first step to accelerate simulation.

MMs also can help to prove non-halting, see e.g. this TM with 80-bit wide symbols.

3. How can one TM simulate another TM?

Let us remember, what constitutes a TM (including a MM):

States that have no transitions defined, are halt states: the TM just does not know how to proceed.

The TM that simulates another one is said to be "on top of" the other TM, or to be "above" the other TM.

The states of the higher TM need not be identical with the lower TM's state, but they map to the lower TM's states: for every higher state there is a lower state.

The symbols and tape contents of the higher TM need not be identical with the lower TM's symbols and tape contents, but they map to the lower TM: any tape of the higher TM can be converted back into a tape for the lower TM.

Whenever the "higher" TM does a single step, the resulting configuration (state, tape and head position) maps to a configuration of the lower TM, which the lower TM would have reached after 1 or more of its steps. I.e. the higher TM groups a bunch of the transitions (steps) of the lower TM into a single transition (step).

Due to this grouping effect the higher machine obviously can be faster than the lower machine. While we have some overhead to determine the higher machine's transitions, we can remember them, and that pays off when we use them multiply. And there is no reason why we should not go on and put yet another simulation TM on top of the just constructed one. The new TM may be significantly faster than the old one.

The first TM, that is not built on top of another TM, is called the "basic TM." Selecting a MM for simulation on top of another TM is called "pushing" the (higher) MM "onto" the (lower) TM.

That is still quite abstract. Read on...

4. How are Macro Machines constructed?

While the abstract definition above leaves a lot of room for creative constructions, there are two main construction methods (as used by the author), decribed in the next two sections.

4.1 Multi-Symbol Macro Machines

A multi-symbol Macro Machine (MS-MM) combines a fixed number of tape symbols (say K of them, K>=1) into its new symbols. How large K is, is up to you. Such a MM is called a K-MM (for some number K).

This way we build up a new, larger set of symbols. When we e.g. "push" a MS-MM with K=3 "onto" a TM with 2 symbols we now have a 3-MM with 2K=8 new symbols. Accordingly the tape cells of our higher TM now each contain a new symbol, each worth (standing for) a series of K lower symbols.

Now we have a problem with our new K-MS-MM: the read/write head of the higher TM identifies the current tape cell, which the higher TM is going to read for the next transition (step). That is fine. But, we must be able to map this tape with the higher symbols back to some tape with the lower symbols... including the position of the head! Now, which one of the K lower symbols inside the current higher symbol is it? Where is the head of the lower TM just now, exactly?

As we will see below, it is sufficient to deal with just two cases: either it is the leftmost, or it is the rightmost of the K lower symbols inside the current higher symbol.

We say: the higher TM looks at its current symbol (the one under its head) either from left (to right), or from right (to left), and we often write it this way:

           symbol <state      looking at right end: orientation=left  = <
    state> symbol             looking at left  end: orientation=right = >
We say "the state is oriented", i.e. it has a direction attached.

If the lower TM does not yet have oriented states, we now double the set of states by pairing all lower TM states with either of { > , < }. From state B we derive the two new states B> and <B this way.

In case the lower TM already has oriented states we need not double the state set by attaching orientation: we trivially map the higher TM's states to the lower ones, 1 to 1, and derive the orientation of the higher state from the lower states. How this is sufficient will become clearer, when we show how to define the higher TM's transitions.

4.2 Backward-Symbol Macro Machines

A backward-symbol Macro Machine (BS-MM) is always based on a TM with oriented states and pulls the symbol immediately "behind" the head into its states. These states now form a (much) larger set. Such a MM is called a bck-MM. If the lower machine not yet has oriented states, we just use a MS-MM with K=1, first.

The symbol "behind" a state with orientation left (<) is right of the current symbol:

    X <state Y Z
Here, X is the current symbol (under the head), and Y is the symbol "behind" the head. The higher BS-MM transforms this into:
    X <state(Y) Z
where "<state(Y)" now is just one state of the higher TM!

Note that while we build a different set of states, we do not change the set of symbols, here.

At first, all this looks like a very strange idea, and... is it allowed at all? Yes, it is. Remember, we are free in our definition of higher TMs, as long as we can provide the necessary mappings of symbols, states and head positions. That we can do that is shown below.

4.3 Pushing Multiple Macro Machines

When we e.g. first push a MS-MM with K=3, then a BS-MM onto that one (with a 3-bit symbol pulled into the state), and then again a MS-MM now with K=2, we obtain what we call a "3-bck-2-MM". It has 6-bit symbols in its tape, and a 3 bit backward context in its states.

If we would like to have 3-bit symbols, but 6 bit backward context, we could achieve this with a "3-bck-bck-MM".

5. But, how are Transitions of Macro Machines defined?

Well, it is not that hard. Let's see: what is a transition for? It shall define what happens, when a TM is in a certain state and faces a certain symbol under its head. Ok, let our higher TM (a macro machine) face this situation. What can we do?

We translate (map) this situation back to the lower TM's terms, and use the lower TM's transitions to find out, what happens. After we have done "enough" steps with the lower TM, we then map back the result into the terms of the higher TM. How many are "enough" is explained below.

5.1 Transitions for Multi-Symbol Macro Machines

Say, we have a MS-MM with K=3 on top of a basic TM with 2 symbols and no orientation attached to its states. Assume this TM is in state, say S>, and faces symbol, say 110,

    S> 110 
corresponds to lower machine terms:
    S: [1] 1 0 
where the brackets indicate the position of the head. We now have a "limited tape", i.e. unlike normally, where a tape is unbounded at both ends, here we are bounded at both ends. We do not know what is left or right of the shown tape fragment. So what? As long, as we know, what to do... let's do it. Let us assume that the lower TM does some 4 steps to transform the two 1-symbols at the left and mid position into a 1 and a 0, arriving at the right 0:
    X: 1 0 [0]
The lower TM's transition now demands to write a 1, move right, and go into state T. We still can do that, but we also now fall off the limited tape we have:
    T: 1 0 1 [ ]
That is the most we can stuff into the higher TM's transition. Obviously, the old current symbol is left, and the next symbol to the right is now going to determine what happens next.

We now map this back into terms of the higher TM. Since we just fell off at the right side, the higher TM's symbol to the right, which will be looked at next, has to be looked at with orientation "right", i.e. starting with its leftmost subsymbol. From this we know the orientation of the new higher state:

       101 T>
Ok, we are ready to define our new higher TM's transition:
    input state:    S>
    input symbol:   110
    output symbol:  101
    head move:      right
    output state:   T>

Now we can see the reason, why it is sufficient for the higher TM to map to lower TM head positions at the leftmost or the rightmost sub-symbol. Those are reached, when we fall off a limited tape, so we need them. But while we do not fall off the limited tape, we just refuse to stop with the definition of the higher TM's transition. So, now we also know how many steps are "enough steps": we do the maximum possible.

Cool, eh?

Wait a moment... does this work always? No, sorry, it does not. Why this isn't that bad, is explained next.

5.2 Failure to define a Transition

Our attempt to define a transition for the higher TM by running the lower TM on a limited tape can fail for two different reasons:

We discuss both cases and show, why and how this does not stop us to like MMs.

When the lower TM happens to halt inside the process of defining a new higher transition, we just say: "so what?" The sole purpose of the higher TM was to quickly find out what the lower TM "does". And we did find out, didn't we? So, we just map the higher TM's configuration back to the lower TM terms, and let it complete its job without any further help from the higher TM. Bingo.

When the lower TM fails to leave the limited tape we set up for it... we conclude that the lower TM does not halt at all, but rather repeats some complete configuration over and over. It is a non-halter!

This can be proven by showing that there are only finitely many different configurations (state, tape content and head position) that can happen while we run the lower TM on the limited tape. After that many steps at least one of them must have been repeated. Hence we arrived at a proof for the lower TM to not halt, and we are done, again, aren't we? There is no point in going on with the higher TM, and so it does not hurt that we failed to define yet another transition for it.

5.3 How to detect endless Loops with limited Tapes?

Again, we have a problem, this time a practical one. How do we detect such an endless loop while running a lower TM on a limited tape? The above sketched proof directly indicates a method: we just compute the number of possible configurations of the limited tape, say N, and stop when we have done N steps without the lower TM halting or falling off the limited tape.

That works, but unfortunately N can become very large. Much too large to do so many steps. And the loop (if it exists) often shows up much faster.

We can try to remember all intermediate configurations, and after each step look up the new one whether we have seen it, already. But when the loop is long, we need a lot of memory, and also start to slow down considerably.

A much better solution is to run two instances of the lower TM, a "fast" one, and an auxiliary "slow" one. After two steps of the fast instance we let the slow instance do one step, and now compare their configurations. If they are identical, we obviously have detected a loop. The fascinating fact is, that we will detect all loops by this procedure. The proof is left as an exercise.

With this fast/slow approach we need a very limited amount of additional memory, slow down the simulation not even by a factor of 2, and detect the loop before the slow machine works on the second round of the cycle. Yes, there are other methods to do it, but I happen to very much like this technique.

5.4 Transitions for Backward-Symbol Macro Machines

The method to determine a new transition for a BS-MM is a lot like that for a MS-MM, which is shown above. We map the current state and symbol into a limited tape in terms of the lower TM, run that until the head falls off the tape, and map the result back into terms of the higher TM.

Handling of failures is the same as for MS-MMs.

The different part are the details of mapping down and up. Remember that the lower states are oriented, and the symbols are the same for both TMs. So we just split/join the special symbol pulled into the higher state:

    (b)S> x 
is mapped into:
    b S> x 
which is a limited tape with two cells.

Remarkable is, that we start approximately in the middle of the limited tape, not at either end of it. As a consequence, the minimum amount of steps we must do to complete the higher transition is the size of the symbol (steps and size in terms of the basic TM). A MS-MM may fall off the limited tape immediately, after a single step of the basic TM.

Especially counter-like TMs most of the time do many steps within a relatively small fragment of the tape, switching forth and back between the left and the right part of this tape fragment quite often. Such behaviour can be accelerated much better by BS-MMs than by pure MS-MMs. But the benefits are not limited to counter-like TMs, other TMs can benefit as well.

Let us go on with the lower TM on the limited tape. Assume we fall off the limited tape like this:

    <T c y 
By pulling the backward symbol (c) into the state we map that back into:
    <T(c) y 
and we are ready to define the new transition:
    input state:    (b)S>
    input symbol:   x
    output symbol:  y
    head move:      left
    output state:   <T(c)

6. Macro Machines and Tape Cell Repetition Counts

Despite all the acceleration we already gain from MMs as described above, the most important technique is still missing: explicit repetition counts in the cells of the tape. This chapter tells about them.

When we look at record TMs with 5 or 6 states (and 2 symbols) we observe that all of them (so far as we know them) show a very obvious repetative behaviour regarding both, tape contents and also transitions executed.

As an example look at the current 5-state champion, where we see repeated occurrences of "100" at the left end of the tape (step 130), repeated occurrences of "1" at the right end of the tape (step 107), and increasingly long runs of states B and D sweeping forth and back through the right block of "1"s. Looking more closely (e.g. after 108 steps) we also find repeated state series of (A,C,E). Obviously it is a good idea to use a 3-MM for simulation.

After 47,164,584 steps (shortly before halting) this 5-state champion faces the following configuration:

    E: 1 1 0 [1] 1 1 ...... 1 1 1
                     <----> 12279 more "1"s
Our 3-MM may represent this as:
    110 E> 111 111 ...... 111 111
                   <----> 4091 more "111"s
From the definition of the TM it is easy to find out what happens next. Since we have the transitions (E,1 -> 0,R,A), (A,1 -> 1,R,C) and (C,1 -> 0,R,E) each block of 3 "1"s is transformed in 3 steps into a "010", running through it from left to right, and we are back in state E. Hence the according 3-MM transition is (E>,111 -> 010,R,E>).

Obviously, this 3-MM will perform this same transition 4095 times: the state does not change, and the next 4095 macro tape symbols are all the same "111". The result of these 4095 macro steps is:

    110 010 010 ...... 010 010 E>
                <----> 4091 more "010"s
To do all these 4095 steps one by one still is quite tedious and boring, although it is already better with the 3-MM as with the original TM, which needed 12285 steps for the same.

But, look what we have done above: instead of really writing down 4095 times the "111", we "cheated" and used "......" together with a repetition count, indicating how many of those symbols the dots stand for. Then we went on to show how all of these 4095 symbols get processed the same way, moving the head from the left of this heap to the right of this heap. Wouldn't it be nice, if our MM would be as smart as we are?

That would be nice indeed, since we could convert 4095 symbols "111" all at once into 4095 symbols "010", moving the head accordingly, and leaving the state alone.

Well, in order to catch the above kind of repetitive bahaviour, the idea is to change the way we represent (implement) the tape. Instead of just plainly placing symbol after symbol, one by one, we now count blocks of repeated (equal) symbols, and into each of the new tape cells we put two quantities: the symbol, and its repetition count (at least 1).

That may sound like a lot of overhead, and also not easy to maintain. But it can be done. How to efficiently maintain such a tape, shall be described in another page not yet written. Just now lets assume we can efficiently maintain such a tape.

Ok, how do we write such a tape? A common notation humans use for a long time, already, is to place the repetition counts as superscripts to the symbols:
    1101 E> 1114095
And since the superscript "1" does not change anything, it is omitted by convention. Since this is the same notation as for exponentiation, tape cell repetition counts also are called "exponent"s. Equipped with explicit exponents in our tape, now every MM-transition, that does not change the state, can work on the cell's symbol independant of the exponent! In the above configuration the next macro step yields:
    110 0104095 E>
And this would not have changed for another exponent: we leave the exponent alone, and just change the symbol. You may look at this record TM simulated as a 3-MM with exponents in the tape, and see how well this works.

On the other hand, in case the MM-transition does change the state, we do not gain speed from the exponent: we split off a single symbol (decrementing the exponent) and process normally. Still, the shorter tape representation may well be worth it.

The acceleration and tape compression achieved by exponents is so important, that the author decided to run MMs with this feature always enabled. The stipulation to run record machines until they halt does not leave much of a choice, until we invent something else to help us performing the huge amount of steps and representing the really long tapes needed e.g. by this 6-state TM #r.

7. Lazy Transition Computation

A normal TM has all its transitions predefined. For a MM we have to compute them, as described above. The transitions of a MM can be somewhat different from normal transitions: they may (1) halt somewhere inside the macro symbol, or even (2) not terminate at all, i.e. loop inside the macro symbol.

Aside from those peculiarities, we may ask at what time the MM transitions should be computed? Would it make a difference to compute them all immediately (as early as possible), or to compute them "lazily", when they are needed the first time (i.e. as late as possible)?

The answer is: no, it does not make a difference. Regardless of the point in time we compute a MM transition, we always yield the exact same result. And when we run the MM, we do not have a choice: we need exactly that next transition. We cannot work around it, and use some other transition.

Now, since we can be lazy, we also should do so. The complete number of transitions for a MM is always finite, but can easily be a huge number. A K-bit N-state MM has 2K*N*2 transitions. For a 3-bit 5-state MM that are just 40 transitions, but with each additional bit in the macro symbols this number doubles, and for 80-bit symbols there are far too many to compute them all. And most probably we will only need a very small fraction of those to actually run the MM.

Another aspect is, that the deterministic nature of our MM transitions allows us to consider them being a part of the MM itself. We may think of them as readily computed, immutable, and integral part of the MM definition. That we lazily compute the transitions not before needed, is just an implementation detail (albeit an important one), it does not change our concept.

We might even decide to use a cache of limited size to remember the already computed transitions, and from time to time just forget about some already computed transition. We may recompute it, nothing is lost. So, we can trade space for time, here.

8. Further Reading

Shortly after writing (most of) the above text in December 2004 a relevant web search came up with a (short) paper by Alex Holkner: "Acceleration Techniques for Busy Beaver Candidates" (taken from the Proceedings of the Second Australian Undergraduate Students' Computing Conference, 2004). Holkner describes the MM technique, and also the next logical step, which Holkner calls "Proof-Machines". That technique is already incorporated in my awk programs for TM simulation, but not (yet) described here.

This completes our survey through the world of "macro machines."

To the busy beaver page of Heiner Marxen.
To the home page of Heiner Marxen.
Valid HTML 3.2!

$Date: 2010/06/11 18:36:24 $