Montag, 22. August 2016

A tester’s thoughts on characterization testing

Michael Feathers just recently posted something about characterization testing on his blog. The term is not new, in fact it is used at least since 2007, still I stumbled over something in this particular blog post. Since I also read Katrina Clokies post about human centered automation at the same time the two topics kind of merged a bit together in my head and got me thinking.
So what is this blog post going to be? Basically it is my stream of thought about characterization testing to see if I can make sense of my thoughts. Hopefully someone else benefits from this, too.

characterization testing is an exploratory testing technique

Let’s start with what characterization testing actually is, at least to my understanding. Characterization testing is a technique, which facilitates writing unit tests to check and document what an existing source code actually does and is therefore especially useful when dealing with legacy code. Note that it is not important if the checked behaviour is also the wanted behaviour.
The created checks are used to find out what a system does and then automatically check that it still works as before while you refactor the code base. If you want to dig deeper on characterization tests and how to create them I suggest you read Michael’s inital blog post or go to Alberto Saviola’s four piece article series, which starts here and ends with a tool, that can create characterization tests automatically.  

Michael starts his blog with the following statement before he moves on to characterization testing itself: "We use [the word testing] for many things, from exploratory testing and manual testing to unit testing and other forms of automation. The core issue is that we need to know that our code works, and we’ve lumped together a variety of practices and put them under this banner.” 

I have a problem with that statement and this was the kicker, which started my thoughts. I namely disagree with stating exploratory testing is there to make sure “that our code works”, because this is not how I see exploratory testing. I use exploratory testing to find out how a system works. If my findings represent desired behaviour or if they result in a series of bugs is often up to debate with my teammates. 

The ultimate reference for exploratory testing to me is Elisabeth Hendrickson’s book Explore It!. I own a german translation therefore I cannot quote here and will summarise instead. Right in the beginning of the book she writes a test strategy should answer two questions:
  1. Does the software behave as planed under the specified conditions?
  2. Are there further risks?
The first one deals a lot with knowing “our code works” as Michael puts it. The second one goes further and also explores (sic!) the system in more detail than just checking it against a specification. Risks are found by learning what the system actually does and using this as an input for even further exploration. 
I think you already know where I am going with this: If exploratory testing is there to learn about risks by learning how the system at hand behaves doesn’t this mean that characterization testing is an exploratory testing technique? Elisabeth’s book even has a whole chapter dedicated to exploring existing (aka legacy) systems, which is precisely what Michael uses characterization testing for.

In this case I think the terms black box testing and white box testing are helpful: While Elisabeth describes mainly black box testing techniques in her book I see characterization testing as a white box testing technique for exploration on unit level. Combine Elisabeth’s techniques with Michael’s characterization testing and you have a very powerful framework to start working on a legacy system, still I see characterization testing more as a part of and not an addition to exploratory testing.

You can read Meike Mertsch’s blog post Exploratory testing while writing code to see how a tester with an exploratory mind works with code while testing, although it might not be characterization testing in the most strictest sense. Meike was also the translator of Explore It! to german.

If you look at characterization testing as a white box exploratory testing technique they have a very unique property when being compared to all the black box techniques in Elisabeth’s book: they create automated checks, which can be seen as a form of documentation of the current system behaviour.

characterization tests are fascinating for testers

This is the point where I have to say that I am a big fan of characterization testing when dealing with legacy systems. Developers, who have to refactor the system, benefit from them directly, because they give them confidence that they did not change the system behaviour in unexpected ways. Testers can use existing characterization tests as a starting point for finding out more about the system.

I don’t know you, but to me finding or writing characterization checks begs the question why the system behaves that way. What is is this behaviour good for and what does it lead to if you put it in the bigger picture of the overall system? Characterization checks can be an input for exploratory testing sessions or fuel discussions with developers, product managers or users. They are an invitation to explore even when they don’t fail and therefore are a good example of checks that help you learn about the system even if the build is green. 

As a tester there are two fallacies regarding characterization tests, I have encountered in the past. The first one is not fixing bugs, because the bugfix breaks a characterization test. Remember that you cannot know if the checked behaviour is correct or wrong. I saw it happen that someone wanted to commit code, but reverted it because it broke some checks. Only later did we find out that the checked behaviour was actually faulty.
The second one is the exact opposite: You know that they are just checking the current state and you are very confident your new code works better than the old one, when the checks break you adjust them to your code and commit everything together. Guess what: The old behaviour was correct and you just introduced a bug.
Since characterization testing comes with all the pros and cons of unit testing (fast & cheap vs. checking only a small part of the system) the situation can even change over time: the checked behaviour is correct until the implementation of a new feature, now the checked behaviour is wrong. The build however stays green. 

ageing characterization and regular checks 

Characterization checks do not just come into existence, in fact Michael and Alberto both wrote down some rules when and how to create them. Now while developers work on a legacy system characterization checks are not the only unit checks they create. There are also regular checks for new code, which are created using TDD and check for a desired behaviour. Both kind of checks end up in the code base and in the continuous integration. In time you may not know anymore if a check stems from characterization testing or TDD. In this sense characterization checks itself can become legacy code, which is hard to deal with.

Imagine entering a project finding 1000 automated checks, 250 of which are characterization checks and the rest are regular checks. If one of the characterization checks fails it is not necessarily a bug, if one of the others fails it most certainly is. Only you cannot see which is which. if the person, who wrote the check, is not on the project anymore you have to treat every failing check as a characterization check and always have to investigate if you found a bug or not. A way to mitigate this is following up on Richard Bradshaw’s  advice to state the intent of a specific check. If you do this you know if a check is a characterization check or not.

Furthermore I have the feeling that a lot of checks become characterization checks over time. When they were written in the first place there was a reason for creating them exactly like they are, checking for a specific behaviour. Now, one or two project member generations later, they are there and document a specific system behaviour. The people, who know why they were created and why the system behaves like this, are gone. The checks have become characterization checks.

This is maybe what Katrina is facing in her project. She writes about a test suite, which is longer with the project than all of the testers, hence they don’t know why there is some certain logic coded into it. Katrina uses this as an example why they do not automate after mastery. I tend to disagree a little bit: The initial team members might very well have automated after mastery, I cannot know for sure, yet knowledge of why has been lost over time. Moving away from Katrina’s example this happens quite often: testers inherit checks from previous testers.

I like to think of a project as body of knowledge, not just the people, but the project itself. There is a lot of knowledge about the system, the users, the workflows in the project’s confluence, in the specific build setup and in the automated checks. From the project’s perspective I see the automated checks as a form of codified prior knowledge.
The current team is left with this form of prior knowledge and now has the problem of finding out why the system behaves like that. Otherwise they risk running into one of the two problems I mentioned earlier: being reluctant to change behaviour that needs changing or introducing bugs by ignoring the checks. This is actually a tough exercise, because finding out why a systems does what it does is usually very challenging. 

Conclusion

Characterization testing is a white box exploratory testing technique and a very powerful tool when dealing with legacy systems. As a tester you should make sure characterization checks are marked as such and try to find out why a system behaves as a characterization check says it does.

Sources:




Freitag, 13. Mai 2016

Die TestBash - Ein Sommerlager für Tester

Ich möchte auf meinem Blog eigentlich auf englisch schreiben, ganz einfach damit mehr Leute meine Gedanken lesen und mir darauf Feedback geben können. Jetzt habe ich bisher nur einen Eintrag geschrieben, da gibt es wohl noch keine festen Regeln.
Es ist mir ein großes Anliegen meinen TestBash-Erfahrungsbericht auf deutsch zu schreiben, denn es gibt schon sehr, sehr viele englische Berichte und in der englischsprachigen Testgemeinde ist die TestBash schon sehr bekannt. 

Ich möchte deutschsprachigen Testern, die bisher noch nicht nach Konferenzen jenseits der Landesgrenzen suchen, zeigen was sie sich entgehen lassen. Vielleicht spielt dabei die Sprache eine Rolle, deshalb möchte ich die Einstiegshürde so niedrig wie möglich halten und schreibe diesen Eintrag hier auf deutsch.


Was ist die TestBash? 


Die TestBash ist eine seit mehreren Jahren in Brighton ausgerichtete Konferenz von und für Software-Tester, dieses Jahr war sie am 10. und 11. März. Veranstaltet wird sie vom “Ministry of Testing”, das sich zur Aufgabe gemacht hat Testern die Möglichkeit zu geben sich weiterzuentwickeln und mit einander zu vernetzen. Das Ministry sammelt und aggregiert Blogs zum Thema Testing, bietet mit dem DoJo eine Plattform für Webinare, hostet Diskussionsforen und richtet Veranstaltungen aus, die prominenteste davon ist eben die TestBash.
Die TestBash ist mittlerweile so erfolgreich, dass sie in andere Städte exportiert wird: Es gab eine in New York, dieses Jahre werden noch TestBashs in Philadelphia und Manchester stattfinden und während ich das hier schreibe wird gerade auf Twitter nachgefragt ob eine TestBash in Schottland nicht eine gute Idee wäre.
Eine TestBash kann man sich ein bißchen vorstellen wie ein Sommerlager für Tester. Es gibt nicht nur den Konferenz-Tag, sondern vorher auch einen Tag an dem man Workshops buchen kann. Außerdem versucht das Ministry weitere Schulungen und Workshops für die Tage vor der TestBash zu organisieren oft zum Beispiel den Rapid Software Testing - Kurs von James Bach und / oder Michael Bolton. Damit nicht genug gibt es jeden Abend in einem der Pubs in Brighton ein getTogether oder einen Spieleabend, um Tester zusammenzubringen. 


Was mach die TestBash besonders? 


Eine Besonderheit im Programm sticht sofort ins Auge: Die TestBash ist eine Single-Track Konferenz, also alle Teilnehmer sehen dieselben Vorträge. So nimmt man den Teilnehmern zum einen die unbewusste Sorge immer gerade den interessantesten Vortrag zu verpassen und zum anderen sorgt man so auch dafür, dass sich jeder Teilnehmer mit jedem Teilnehmer was zu erzählen hat, um ein Gespräch beginnen zu können.
Doch was die TestBash am meisten ausmacht ist ihr Herz. Die Organisatoren kümmern sich in den Wochen vor der Konferenzen geradezu rührend um die Teilnehmer und versorgen einen mit wirklich allen Informationen, die man sich vorstellen kann.
Dazu gehört auch die wichtigste von allen TestBash-Regeln: Auf der TestBash ist niemand allein. Neulinge werden dazu ermuntert andere anzusprechen. Erfahrene TestBash-Besucher sollen einsame Tester aktiv in die Gruppe integrieren. Die zahlreichen Pub-Besuche helfen natürlich hier Hemmschwellen abzubauen.
Damit nicht genug ist auch die Konferenz darauf ausgelegten jungen oder schüchternen Test-Talenten im wahrsten Sinne des Wortes eine Bühne zu geben. Die letzte Stunde einer TestBash ist reserviert für die 99 Second Talks. Dabei darf jeder auf die Bühne gehen und 99 Sekunden über ein Thema reden, dass er gerne teilen möchte. Der Grundgedanke dabei ist es den Testern die Angst vorm Vortrag zu nehmen: 99 Sekunden sind nicht lang und zeigen, dass es gar nicht so schlimm ist. Viele heute renommierte Vortragende (Richard Bradshaw, Dan Billing, ...) haben mit einem 99 Seconds Talk angefangen.
Bei den großen Vorträgen waren insgesamt 4 Vorträge von denen ich sicher weiß, dass es der erste Vortrag vor Publikum für den Vortragenden war. Das ist eine außergewöhnlich hohe Quote und auch das ist Absicht. Den so genannten “First-Time-Speakern” soll die Möglichkeit gegeben werden vor einem freundlichen und wohl gesonnenen Publikum ihre ersten Sporen zu verdienen statt sie wegen mangelnder Erfahrung abzulehnen.


Was wurde auf der TestBash erzählt? 


Nachdem ich nun so viele Worte über die TestBash als Veranstaltung verloren schulde ich natürlich noch eine Antwort: Wie ist die Konferenz denn nun inhaltlich? 
Die Vortragsqualität war meiner Meinung nach im Schnitt sehr hoch, besonders Gefallen hat mir, dass hier Tester ihre Erfahrungen geteilt oder über Themen gesprochen haben die sie bewegt haben. Oft  erlebt man es auf Konferenzen, dass die Vorträge einen hohen Werbecharakter haben, um Produkte oder Leistungen zu verkaufen. Das ist auf der TestBash erfreulicherweise nicht der Fall.

Auf ein paar der Vorträge möchte ich noch etwas detaillierter eingehen. Wer die Vorträge sehen möchte kann sich auf der Website des Ministry of Testing für das DoJo registrieren. Dann sind alle Vorträge und noch viele, viele weitere tolle Sachen als Video erhältlich.

  • Testing or Hacking? Real Advice on Effective Security Testing Strategies – Dan Billing
    Security Testing war nie meine beliebteste Testart, doch zu sehen mit welcher Leidenschaft Dan darüber berichtet macht mir sofort Lust damit anzufangen. Besonders hängen geblieben sind mir die Dark User Stories, die jede Software hat und nach denen man Ausschau halten sollte : “As a Hacker I can …"
  • Test/QA a Gate Keeper’s Experience – Michael Wansley.
    Michael Wansley hat einen Grammy für das Lied Thrift Shop im Schrank stehen UND ist Software Tester bei Microsoft. Michaels Bühnenpräsenz ist unfassbar und vermutlich gestählt von den großen Musikbühnen dieser Welt. Michael war einer der Tester für das berühmt-berüchtigte Windows Vista und hat davon erzählt wie er seine Aufgabe damals wahrgenommen hat, nämlich als Türsteher, der zu den schlimmsten Versionen der Software “Du kommst hier nicht rein” oder mehr “du gehst hier nicht raus” gesagt hat.
    Sein Verständnis von Testing hat hinterher zur Kontroversen Diskussionen geführt und das nicht nur mit Teilnehmern vor Ort: Auf Twitter geriet er in einen interessant zu lesenden Austausch mit Michal Bolton (dem Tester, nicht dem Sänger).
  • A Pairing Experiment – Katrina Clokie
    Katrinas Vortrag war ein Erfahrungsbericht darüber wie sie Pairing in ihrem Projekt eingesetzt hat, um den Austausch zwischen Testern zu fördern. Oft gibt es in agilen Teams nur einen Tester und er hat nicht die Gelegenheit sich mit anderen Testern auszutauschen. Katrina hat deshalb gemeinsame Testsitzungen von Testern aus verschiedenen Teams angesetzt mit sehr großem Erfolg. Ein paar Pairing-Experimente habe ich seitdem selber in meinem Projekt gemacht: Tester mit Tester, Tester mit Entwickler oder Tester mit PO. Es ist bisher immer gewinnbringend für beide gewesen die Arbeitsweise des anderen aus der Nähe zu erleben. Manche Bugs konnten plötzlich zwischen Entwickler und Tester reproduzierbar gemacht werden an denen sich beide einzeln die Zähne ausgebissen haben.
  • Accepting Ignorance – The Force of a Good Tester – Patrick Prill
    Manchmal muss man bis nach England fahren, um Menschen kennen zu lernen, die keine 2000 m von einem entfernt täglich zur Arbeit gehen. Auf der TestBash hat Patrick seinen ersten Vortrag überhaupt gehalten und gemerkt hat man es nicht. Er war witzig, er war souverän und er wusste wovon er spricht. Patrick hat davon erzählt wie er nach fast 10 Jahren schematischem abfahren von Testplänen entdeckt hat wie viele Möglichkeiten des Testens es gibt und so seine Leidenschaft neu entdeckt. Er hat erzählt wie er seine Ignoranz im Sinne von Kenntnislosigkeit erkannt hat und wie er sie zu einem Antreiber gemacht hat ein besserer Tester zu werden. Natürlich indem er sich bemüht seine Kenntnisse auszubauen, aber auch indem er sich seiner Ignoranz situativ bewusst ist.
  • Do Testers Need a Thick Skin? Or Should We Admit We’re Simply Human? – Nicola Sedgwick.
    Der Vortag von Nicola war der persönlichste und bewegendste Vortrag des Tages. Sie hat von den vielen Dämonen erzählt, die sie als Testerin plagen: Die Kämpfe, die sie mit sich selbst ausfechtet (“Ein Fehler in Produktion? Wie konnte das passieren? Bin ich als Tester nichts wert?”) oder auch mit äußeren, organisatorischen Rahmenbedingungen. Nicola hat sehr bewegend davon erzählt wie nahe ihr alles gegangen ist und wie viel Kraft es sie gekostet hat. Schließlich hat sie für sich akzeptiert, dass sie nicht alle Kämpfe austragen muss. und, dass sie sich selbst als Mensch schützen darf. Sie hat mit uns nicht nur geteilt welche neuen Blickwinkel ihr das eröffnet hat (sie belegt Workshops zum Thema “Zusammenarbeiten mit Entwicklern”, belegen ihre Entwickler auch Workshops zum Thema “Zusammenarbeiten mit Testern”?) und welche Maßnahmen sie für sich erkannt hat, die ihr helfen neue Kraft zu schöpfen.
  • 99 second Talk about his Card Game - Beren Van Daele
    99 Sekunden können manchmal auch reichen für eine gute Geschichte. Beren hat von einem Erzählspiel in Form eines Kartenspiels berichtet, dass er gerade entwickelt und das Testern helfen soll über ihre tägliche Arbeit zu reflektieren und zu erzählen. Auf der TestBash hat er nach Testspielern gesucht. An dem Abend habe ich ihn leider nicht mehr erwischt, doch seitdem sind wir im regen Austausch: Er hat mir seine Karten zugeschickt, ich habe es mit Arbeitskollegen ausprobiert und wir haben seitenweise emails mit möglichen Spielregeln oder Einsatzzwecken hin- und her geschickt. Obwohl wir uns gar nicht kannten haben wir sofort gut mit einander arbeiten können und ohne die TestBash hätten wir uns vermutlich nie kennen gelernt.
Soweit meine Erfahrungen von meiner ersten TestBash. Ich kann nur jedem Tester empfehlen dorthin zu fahren. Selbst habe ich schon mein Ticket für Manchester im Herbst gebucht. Diesmal auch mit Open Space Diskussion.

Dienstag, 29. März 2016

Why I still like pyramids

Here is the Testing Pyramid ...


The testing pyramid is widely used approach towards testing in agile projects, yet it starts to get a bad reputation with testers  like Richard Bradshaw, who according to John Stevenson proclaimed the pyramid is dead on MEWT [1], or James Bach, who states “it has little to do with testing” [2]. So why is that?

To elaborate on this we have to go a little bit into the pyramid’s history.

It started out as the test automation pyramid [3] by Mike Cohn, with just 3 layers: unit, service and UI (see Fig. 1). Mike used it to express you should have a lot of automated unit test scripts, a lesser amount of service test scripts and only a few UI test scripts, because as you go up the pyramid the respective automation scripts become less cost effective. This is in fact a valid point.
Over time the three layers have been basically upped to 5 by expanding the service layer, a cloud for all manual test sessions has been added on top and the automation in the title has been abandoned, a prominent example [4] can be found on Alister Scott’s blog. I will not go into further detail regarding the pyramid’s history since other people have already done this. If you want more information on this just watch this 10 minute video by Richard and John [5], which also turns the infamous ice-cream cone from an anti-pattern into a pattern.

Fig. 1: From Test Automation Pyramid to Testing Pyramid


… and here is what’s wrong with it.


The first clue, that something might be wrong, is right there in the name and visual, hidden in plain sight:  the automation in the name is lost over time, yet the pyramid is still predominately about automation, by stating that all layers are automated and all manual testing is shoved into this big, cloudy thing on top. This is a tremendous overstating of the value of automation and underrates any manual testing effort by basically saying: “Do very, very much automation … oh and some other stuff.” This automation heavy approach to testing might be the target of James Bach’s criticism when you look at his definition of testing and checking [6]. The pyramid is all about (machine) checking.
The strong focus on automation - ironically - might also be one reason why the pyramid is so popular among a lot of agile practitioners since this bodes very well with people, who have their roots deep into software development and I was no exception to that. Another reason for the pyramid's popularity is most likely the strong visualisation and the direct actionable advice you can take from it: A pyramid is a perfect form to say “do lots of the stuff at the bottom and less of the the rest while you reach to the top".

A second argument made against the pyramid is, that it marks the top layers as less important than the bottom layers. This is why John Stevenson proposes an alternate model: The test execution model, which treads all layers equally and is described here [7] (Blog) or for the more audio-visual among you here [8] (youtube). Although I think this is a very good model for learning and adjusting your testing during execution I also see a difference to the pyramid. The pyramid is more targeted at a strategic level and wants to express how you can distribute your testing while the test execution model to me focuses more on ... well ... test execution. Hence I do not really see them as alternatives to each other, nor as mutual exclusive. The testing pyramid makes no statement of each layer’s importance, the sole reasoning behind it is cost effectiveness.

This brings us to the third main critique, which is for example addressed by Todd Gartner in his talk about Case Studies in Terrible Testing [9]. Note that Todd seems to use Mike Cohn’s pyramid, but renamed the UI layer with system test, so it’s possibly not entirely the same. Furthermore Todd's slides and his subsequent interview with Joe Colantonio [10] at least indicate a mild misconception about testing on his side, too. In his third case study he states, they had the perfect pyramid, meaning the right amount of and best technologies for automated checks on each layer, yet the project was a failure because nobody addressed the market risk, a layer for user testing was missing. One might argue that Todd fell prey to the automation heavy approach the pyramid indicates and that a good tester, focused on manual testing, might have told them that. Which of course can also not be accounted to the pyramid’s pro-side as I explained above.
Nevertheless the point he makes is very valid: The pyramid does not take risk into account at all and takes its distribution advice solely from a cost analysis of check creation. His best example for this are websites he creates with little to no algorithmic or functional risks, but mainly user related ones.

In conclusion the test pyramid heavily narrows testing down to automation, has a distribution of layers, which is basically only defined by money and does not take risks or anything else into account. So when you look at the evidence, is Richard right and it is it time for the testing pyramid to die? 


Another way to look at it


In his talk at Testbash Brighton 2016 John Stevenson encouraged us to take existing models and make them our own, change and adapt them to better suit our project needs. I ask myself: why not give the pyramid this very courtesy but instead start digging it’s grave instead? Especially since I think the pyramid’s success did not come out of nowhere: The pyramidical form is a powerful visualisation of your testing approach, easily explained to others and a very good framing device for yourself. Just print it out, hang it on the wall and you have a very good reminder of how you want to organise your testing efforts right above your desk in one picture. Definitely beats 20 pages long test strategy documents. So there are merits.
A lot of the criticism revolves around the concrete way the pyramid looks right now: the specific layers it consists of, the cause for this exact build up. Here is the point where we can adapt that model. I look at the pyramid mainly as an instance for a possible testing approach, when I step back and abstract it then I see the following: 
    •  do lots of this, less of that for a specific reason
    • don’t skip anything on here entirely
    • everything not on here is not in your focus (at least for now)
If you look at it that way the inital pyramid from Mike Cohn becomes an instance of this approach:
    • do lots of unit tests, less Service tests, because unit tests are the most cost effective
    • do few UI tests, but don’t skip them entirely, they add value to your project
    • this pyramid makes not statement about manual testing efforts
This way of looking at pyramids makes them way more flexible: I do not need to slavishly stick to the layers the inital testing pyramids consist of and I don’t need to focus entirely on automation anymore, yet the pyramid as a strong visualisation and framing device stays intact. I can even change the reasoning behind the layer distribution: “I do lots of Beta Testing, less Unit Testing because customer acceptance and usage are really what makes or brakes my app.” This way you can build up your pyramid of layers addressing certain projects risks if you want.

Take Todd’s fifth Case Study as an example: he claims the testing pyramid does not help him here, since he faces mostly market and orchestration risks, but nearly no interface risks. I agree with Todd, that the original distribution does not help him much, but what did he end up doing? He invested heavily in user and integration tests and has some mild unit testing going on while skipping the system test completely. I might argue, that he still has a pyramid in place, but it is assembled differently: 


Fig. 2: A pyramid for Todd's 5th project


Immediately you have a strong visual representation of Todd’s testing efforts and a guideline to this project’s testing approach as shown in Figure 2. You can verbally express it like this:
    • do lot’s of user tests, less Integration tests since acceptance from user poses more risks than integrating services
    • do few unit tests, but don’t skip them entirely, sincere there are some algorithmically challenging parts
    • don’t invest in system tests, since there is no risk here
Note that this pyramid says nothing about the automation anymore, how exactly the user or unit tests are composed and which role automation plays there is entirely up to the testers in this project. You might even try to use John’s test execution model [7] while executing your tests. Furthermore the cloud on top is lost, because it always just seemed like an add-on for testing efforts that have been simply forgotten in the pyramid itself. 
And you can tweak your pyramid even further without losing its benefits with another simple trick: How about color coding the layers to emphasise certain aspects you want to focus on? However be careful not to overdo it: Having a five-layered pyramid with each layer in a different color might end up confusing you. I tend to use color to indicate if I much automated checking I want to do on a specific layer.
A wonderful and popular real life example of an adjusted pyramid is the mobile testing pyramid [11] by Daniel Knott: it flips several stages around and uses color coding to emphasise on which stages automation might be put to good use and which are dominated by manual testing - albeit still tool enhanced, of course. 

So instead of clinging to the very specific current instances of the testing pyramid another way to look at it might be to lift this model up to a more generic and abstract level as depicted in the picture below:

Fig. 3: Template for a project specific testing pyramid


The degree of automation in the Figure 3 is just one example of information you can convey via color coding, you may choose something different. If you approach the testing pyramid like this you loose one of the things, which made it successful, you cannot just take a pyramid as a manual anymore, that tells to build lots of unit tests and only few UI tests in your project.
Instead you have to come up with a specific pyramid of your own. There a lot of different ways to do this for this, e.g. the simple risk analysis Todd did or you can use James Bach’s Heuristic Test Strategy Model [12]. Once you have done this you benefit from the pyramids strengths. It is easy to explain to and discuss with others and the simple, yet strong visual helps you keep your testing efforts in line with your chosen approach. If, for example, you spend way more effort on UI testing than your pyramid indicates you should, you can easily see this and reassess your project by asking yourself some questions: Are you still testing the right thing? Shouldn’t you spend this effort on another layer, where it adds more value to your project? Or is there a flaw in your pyramid and you really should spend all this effort on UI testing? If so: What other implications does this realisation have to your approach?  

What does it mean: lots of this and less of this?


When you frame the pyramid like I did in the last section you stumble over a quite interesting follow up question. You visualised for example to do lots of user testing and less unit testing. However what does it mean when you do lots of something or less of something else?
In the traditional test automation pyramide it basically boiled down to number of test scripts: If you count the number of test scripts on unit layer, it should be significantly higher than the number of test scripts on service layer. Furthermore you should have only few test scripts on  UI layer. So figuring out if you are applying the pyramid as intended is a task as simple as counting. Although, to be fair, doing this is criticised itself since number of test scripts does not necessarily reflect how much work went into them. Yet it remains a good indicator.

However counting test scripts does not work anymore when every layer can consist of both automated checks and manual testing tasks. You can still count the number of automated test scripts, but you cannot count manual testing tasks as easy. In a first approach you might want to count the number of manually performed test cases, but this is flawed even if you overlook the “apples to oranges” comparison between automated and manual test scripts. In today’s projects designed test cases for testers to manually execute hardly capture everything a tester does, some even argue test cases do not reflect testing very well [13]. A tester does much more than creating and then ticking of check lists, for example performing test sessions, product explorations or interacting with other team members. As a result there is no easy measurment and comparision anymore, which expresses the difference between the layers in easy to digest numbers.

One might think that effort spent on a layer is a good metric here, but this is a false friend, too. 250 Beta Testers will definitely outmatch every other layer within hours regarding effort, even if the Beta Test is not the bottom layer. What might come closest is effort spent in the core team for that layer, for example not measuring the testing effort of all 250 Beta Testers (which is admittedly a little unjust), but rather taking into account the setup of the beta tests, deciding which purpose it should have in the project, deciding which groups get which versions, analysing the feedback, scripting testers with different chartas, ...
The truth is I have no easy answer to this. As a rule of thumb I say I am willing to spent the most manpower, brainpower or money on the bottom layer since this is the one, which is closest to my heart in regards to my reasoning behind my pyramid in my project. And I am aware that this is the exact opposite of why Mike Cohn put unit tests at the bottom.


Conclusion 

 The test automation pyramid is criticised for very valid reasons and is not a very good model to use anymore in it’s exact appearance. However I think if you take a step back and look at the pyramidical form in general, you can still use it’s benefits to add value to your project - and that is why I still like pyramids.


Sources: