“We always knew that we were spied on.” Yawn. If someone wants to achieve a goal there are ways. (Bad) dreams became reality.

In the Star Trek movies we get to see what people imagine. In order to achieve it you have to try. Still a dream.

So how do we get to tell our computers what to do just like in Star Trek and what the hell does that have to do with TDD?

The basics

In our daily uxebu calls in the last weeks various aspects around testing come up and instead of finishing our “dailys” in 15 minutes they take much longer.

Some history. After our internal workshop with J.B. Rainsberger a bit more than three years ago the TDD mindset has settled in our day to day workflows. We ran into the usual walls when starting TDD and we improved a lot over time. I always like to say that learning TDD was like learning to program anew. Even my kids were surprised to hear that after about 20 years of programming I was learning it kinda from scratch again.

TDD/BDD and all it’s brothers and sisters are meant to drive your code’s design by the requirements. That means it is important to get a good feeling of how to build those requirements, how to group them and how to write them out in order to let the production code evolve from them. Esoteric? Well, not really.

Try it out in a coding dojo, or even better a code retreat, where you pair with different people throughout a day and you will realize there is no right an wrong. There is no one way of approaching things. You work on the same (seemingly simple) task with five different people and you approach the problem in five different ways. The communication with all the stakeholders is key. And one of the stakeholders is the future maintainer of the code you are writing. But how do you communicate with this person (which might be you yourself)? You do that through good code.


Good code is only a restriction that we humans need to be able to continuously maintain an efficient communication with machines, that unfortunately don’t understand us too well yet. The Transformation Priority Premise (TPP) is a way that can help us in keeping the complexity at a minimum. Out of this APP derived by Micah Martin tries to absolutely quantify the human readability of source code. After having this spin lots of rounds in my head and inspiring George to analyze this deeper in a bachelor thesis the automatibility is what struck me much deeper.

In his article “The Transformation Priority Premise” Uncle Bob for the first time made me feel that there is a way that creating code can be structured by applying small well described steps, what he calls a transformation. TPP builds up on the assumption that code evolves from simple to complex. Which is tightly coupled of how you define the requirements, your tests in TDD. He also states that the way you build and sort your requirements makes a difference for the resulting production code. And by applying the transformations described in his article it feels like there is a strategy for finding (discrete) path from requirement to production code.

By assuming this can be automated, more about the how later, only the requirements building would be a manual task. The tool that knows how to do that can even support the requirements building process in such a way that it tells you to try to define a simpler failing requirement next. This might be a help for the requirements structuring (but I didn’t think about that yet too much).

Automatic code creation – The How

Let’s take a simple task to analyze the “how” of automatically creating the source. Let’s take the simple roman numerals kata, a simple problem. But everyone in programming knows there is an endless number of ways to do it. So we have to limit that in order to get to a discrete set of transformations. Break!!!
Let’s get practical here. The plan. Do the roman numeral kata, let’s say, ten times. Best would be with ten different pairs/people. Extract all transformations applied into a catalog. Assuming every solution takes about twenty steps to solve the problem. Store all those transformations, that makes 200 steps in our catalog. Now, write an engine that can apply all transformations and find the one that makes all existing tests pass. Now let the human implement a test, define the initial production code, which is mostly just a constant. And now let the engine run against the production code after a human has written a new failing test. The engine picks hopefully the one transformation that makes the test pass, reports back the priority (complexity that was applied in order to pass the test) and you write the next test. At this iteration the engine would actually be able to apply a completely new series of transformation starting from the first one in order to get to even better code compared to just applying a new transformation to the first run. In order to make the engine fast the test runs have to be amazingly fast, more about that later.
If that theory can be proven it would be very interesting, to see

  • if the engine does always get to a result at all
  • the resulting code
  • if the engine can be optimized to find the best code.

If the engine really has a couple of successful runs just on roman numerals it would be hot to give it another problem. Maybe the set of transformations has to be extended and the catalog might needs an update first, which might require another set of transformations from human solutions of another kata. I stop speculating here, but feel free to spend some brain cycles on the future, I would be very keen to hear opinions.


One problem that I think is very obvious is that the transformations we build the catalog with have to be more specific than what is described in TPP. Converting an unconditional into a conditional involves creating the conditional expression, which is not part of a transformation definition and can be as simple as “if (a)” or even a bit more complex such as “if (a > 1)” and so on. Maybe the conditional building needs to be defined in kind of sub-transformations too.

Another problem is that the production code might require more input data, such as data structures or knowledge how to use external dependencies which can be as simple as function such as “Math.sqrt”. There seems to be lot of room here to make this entire dream fail :).

Very challenging seems to be to find out if the resulting code is side-effect free. Can this be determined by defining the right transformations? Can side-effects be found by some kind of analyzes at all? If so could they be prevented from the beginning?

As long as such a system can not create “all the code” we need there might be a high need for the engine to create human readable code, in order to allow it to be maintainable. Since that is still our daily business right now. Though if the engine had no restrictions in getting tasks done good code would be no necessity anymore, but the best code and that a machine can learn if we teach it right.

Fast tests

Problems and possible complete failures of this approach aside, one requirement that not only this approach would need is fast tests. The engine would have to run the tests very often, but without test execution engines that can parallelize tests this will be mostly a really painful process. That actually also leads to the necessity of the engine to just consume tests and even allow itself to reorder the tests in order to apply transformations on the requirements in a different order. Which bears in the requirement that the tests must be even much faster to get to a result in human life times at all.

But besides all the dreaming fast tests are a pain in many places nowadays. The reduction of test execution time is not only a thing of improving the test runners, it’s first and foremost a problem of writing the right tests, applying things such as dependency injection and proper abstraction but also the proper splitting up of responsibilities which requires knowledge in the domain.


But back to the automation and automatic code creation. My friend Alex showed me Tierra (video) which was created by a biologist and in simple words flips some bits to mutate existing code in order to see what happens. This is literally done on the Assembler level, which means that flipping bits is possible, but in JavaScript land (where I would do it) we can’t do that. Therefore I tried to define the bits that we flip by using the transformation catalog. A couple of questions I would have about that are open. For example where does mutation come into play? But I just don’t know enough about that yet, in order to even try to answer that question.


Of course all this didn’t stop at the doable parts I described up there. Alex logically said that once this works we just need to apply some AI in order to have computers create the tests from what we tell the computer. And the computer then creates the production source, so we get to full round trip code generation and what we see in Star Trek is no longer just a dream :).

I would like to quote an internal of ours which got kicked off by my tweet about my discussion with Alex yesterday about the topic described in this blog article.

The chat quotes that matter are those

  • Nikolai: I am sure people once said, flying would never work :))))
  • David: There’s (by now) well-known physics behind flying. It’s not magic (any more).
  • Nikolai: the “by now” is key :)

Keep dreaming, but let’s try some things.