Search This Blog

Sunday, February 28, 2016

Debugging... (Part 1)

For a long time I have wondered why no one gives the process of debugging any serious academic credence.  Sure, there are lots of "proof" types around trying to prove that programs might be correct (if in fact the criteria for designing the program has no bugs), but really nothing I have seen in the last decade comes close to addressing "debugging."

By "debugging" I mean the process of correcting a program which performs in some way incorrectly.

So let's break this down:

For something to debug let's consider a basic Turing machine (see this for the Wikipedia description which works well for what I am about to describe).

The details, I think, are really not very important so long as the machine is logically consistent.

The key for this is the representation of a program (in this case a Turing machine program) as a set of tuples.  We can use the Wikipedia state table as an example (from Wikipedia):
Example: state table for the 3-state 2-symbol busy beaver reduced to 5-tuples
Current stateScanned symbolPrint symbolMove tapeFinal (i.e. next) state5-tuples
A01RB(A, 0, 1, R, B)
A11LC(A, 1, 1, L, C)
B01LA(B, 0, 1, L, A)
B11RB(B, 1, 1, R, B)
C01LB(C, 0, 1, L, B)
C11NH(C, 1, 1, N, H)
So what does this give us?

Really a simple set of data representing the operation of the Busy Beaver program (I don't think, at least for the moment, we really even care here what the program actually does):

{ S1, A, 0, 1, R, B }
{ S2, A, 1, 1, L, C}
{ S3, B, 0, 1, L , A }
{ S4, B, 1, 1, R, B }
{ S5, C, 0, 1, L, B }
{ S6, C, 1, 1, N, H }

S here uniquely identifies the row as a state.  Of course, we still need more:

Some kind of tape and head, for example.  Here we can consider a simple, infinitely extensible (in both directions) array of tuples:

{ H, Header n }
{ T1, Vn-1 }
{ T2, Vn }
{ T3, Vn + 1 }
...

In this case the location of the Turing machine head (tagged with H) is represented by tuple n (T tags tuple rows, V is the specific tape value).  We can insert new tuples as needed or move the "Header" by changing its value.  I think we really only care that the implementation is logically consistent with what Turing machines are supposed to do.

We also need to know the current state:

{ S, s }

Where s is the current state.

So given a description as above we can construct a representation of the operation of the Turing machine:

The Program part does not change:

Program = {
  { S1, A, 0, 1, R, B },
  { S2, A, 1, 1, L, C},
  { S3, B, 0, 1, L , A },
  { S4, B, 1, 1, R, B },
  { S5, C, 0, 1, L, B },
  { S6, C, 1, 1, N, H },
}

Operationally we only care about what things look like at each given step, or cycle.  So we can imagine a representation that looks something like this:

Operation = {
  { Cycle, 1,
    { S, s },
    { H, n },
    { T1, Vn-1 },
    { T2, Vn },
    { T3, Vn + 1 } },
  { Cycle, 2,
    { S, s },
    ... }
  }
}

Each cycle of the machine appends new information to Operation.

Again, exactly how we represent this is unimportant.

So finally, when I say we are going to talk about debugging I am going to represent the problem of "debugging" as follows:

First, I have the expectation that there is some agreed and correct representation of Operation.  I may only be interested in some specific Cycle or Cycles that represent an "answer", I may be interested in an entire set of Operation Cycles, or I may be interested in whatever the pattern is so long as the machine does not Halt.

I require that I can represent the Cycle or Cycles with specific patterns, e.g., something like the following if I have a specific structure in mind.

 { Cycle, c,
    { S, s },
    { H, n },
    { T1, Vn-1 },
    { T2, Vn },
    { T3, Vn + 1 } },
   ... }

In this case I have an expected answer, say adding 2 + 2 so I would expect to see a specific pattern representing 4.  What that pattern is is not important.  What is important is that I can describe the resultant pattern unambiguously.

The pattern might represent all the Cycles, or only a subset (for example a program that repeats an answer over and over could potentially also be correct if I am only looking for the specific value represented by a pattern of Cycles.  The pattern might also be Operation = .  Here we use ∅ to represent unknown content, in this case Cycles, i.e., we don't know what the output of the program will be.

If I don't have a representation for Operation then I will say that instead of debugging a program I am writing (or creating a program).

We can think of writing a program as "debugging"  an empty Program = (equal to) ∅ if Operation =/= (not equal to)  ∅.  For Program we use ∅ to represent an unknown set states.

So to summarize there are four cases:

Program = ∅, Operation =  ∅ - We don't know what we are doing.

Program = ∅, Operation =/=  ∅ - We know what we want to see as output (at least at some point) but don't have a means to get there, i.e., we are writing a program from scratch.

Program =/= ∅, Operation =  ∅ - We don't know what the program we have will do.  We can likely enter the next case (below) by operating the program to create the situation where Operation =/=  ∅.

Program =/= ∅, Operation =/=  ∅ - We have a program and some data but we don't know if the two are coherent.  We use coherent to describe a state where we must, if we can, perform or check the operations to determine if what we see in Operation is actually the result of the Program.

We define "debugging" as aligning Program with Operation such that there is coherence between the two.

Note that if Operation is invalid or inconsistent we are no longer debugging the program but instead constructing a description of a programming result, i.e., effectively creating or repairing a specification for a Program.