Lessons Learned from Medical Testing

lessons-learned-from-med-testing-01

[Wikimedia, 2015]

Application Testing is about determining whether or not a given software “behaves” as expected by its sponsors and end users. These stakeholders have a legitimate right to their expectations, because the system should have been engineered with the objective to satisfy their requirements. The diagram below represents the testing process as a black-box: lessons-learned-from-med-testing-02.jpg In consideration of the above, it is easy to see that Application Testing is a particular instance of a more general mathematical problem: Hypothesis Testing. As such, it can be tuned to maximise its reliability either in terms of false positives or false negatives, but not both.

  • False Negative: this is the error in which one incurs when a buggy application wrongly passes a test.
  • False Positive: this is the error in which one incurs when a correct application wrongly fails a test.

This is a very well-known problem, and it finds application in other domains, including statistics and medicine.

lessons-learned-from-med-testing-03

Adapted from [Wikimedia, 2015]

Let us consider, for instance, the case of medical tests. How did the scientific community manage to strike the right balance between sensitivity and reliability? In order to answer this question, one has to answer a preliminary one: what is worse, wrongly detecting an illness or failing to detect it? Clearly, the latter case is the worst. Failing to detect contagion can result in rapid spread of infectious deseases, loss of lives, and high healthcare costs. For this reason, medical tests are designed to minimise false negatives. What is the downside? It is clearly the cost of managing false positives. Whenever a critical infection is detected, medical tests are repeated in order to minimise the chance of an error.

Now, let us come back to our domain, Application Testing. What is the worst scenario, false positives or false negatives? If an application fails a test, but it is actually ok, costs originate because the application needs be investigated by software development and, sometimes, business users. But that is the end of it. Conversely, if an application has a severe problem which remains undetected until deployment to production, this is an entirely different story, and can result in a full range of critical consequences, including compliance breach, financial loss, reputational damage, and so on and so forth. So, in the financial sector, just like in medicine, testing must be optimised in order to minimise the chance of false negatives.

A question naturally arises: how do we fare in this regard? Where are we? Luckily, I believe, in the financial sector we are doing alright. Application Testing in this sector is clearly aligned with this risk management strategy. However, there is still room for improvement. The fact is that, as we have seen above, false positives also have a cost. Do we have to be prepared to pay it in its entirety, or can we do something to reduce it? To go on with the parallel with medicine, I shall argue that false positives, like cholesterol, come in two flavours: the good ones and the bad ones. The good ones originate because of unavoidable causes, like technical glitches, human error in test case design or execution, or even up the software life cycle, like wrong requirements or wrong understanding of user requirements. These errors are part of the game, and can only be mitigated up to an extent.

But there is another category of false positives which I think is bad, and can be reduced. Those are the ones generated because of lack of domain expertise. Techies will base their judgement on the evidence collected during test execution. But this evidence is oftentimes not self-explanatory: interpretation is required. And this interpretation can only go as far as the business domain experience of the tester. Enterprise-wide application landscapes implement very sophisticated workflows and support complex business scenarios. The behaviour of these applications changes depending on user rights, user role, client type, and many more criteria. What is actually the intended behaviour of an application, can easily be mistaken for an error.

So, what is my pragmatic recipe to cut down bad cholesterol in application testing (reduce the cost of false positives)? I believe the answer to this question is closely related to the evolution of the testing profession. Nowadays, a good test engineer is someone with technical skills, and business domain knowledge. This unique blend of skills is certainly precious and makes application testing a science and an art at the same time. But new trends are changing this. Automation is increasingly being pursued because of cost pressure, and of the need for increased agility and reliability. But automation comes with skills challenges of its own, to the effect that, as it is generally recognised, more sw engineering savvy personnel will be required in testing. And this is good. But, at the same time, once the amount of manual activity will be reduced, a new opportunity will exist to inject more business-savvy personnel in testing. The transition, the way I see it, can be represented like this:

lessons-learned-from-med-testing-04.jpg

To sum up, more automation will pursue optimisation in terms of minimisation of false negatives, whereas business domain expertise will reduce the costs of false positives. This evolution of the testing profession in specialist roles is what is required to apply the lessons learned from medical testing to application testing in the financial sector. I will be happy to receive your feedback on this admittedly unconventional view on the future of the testing profession.

References Wikimedia, 2015, https://commons.wikimedia.org/wiki/File:Infant_Body_Composition_Assessment.jpg, accessed on 24.07.2015 Morelli M, 2013, https://alaraph.com/2013/09/26/the-not-so-simple-world-of-user-requirements/, accessed on 07.08.2015

Test Automation and the Data Bottleneck

img-01[Wikimedia]

Introduction

The topic of automation has been revamped in the financial industry following the recent hype on industrialization of IT and lean banking. The rationale of the idea is that there are tasks which are better taken care of by automatons than by humans. This kind of tasks are composed by actions which are executed iteratively, and the quality of their outcomes can be negatively affected by even the smallest operational mistake. An interesting proportion of tests belong to this category. Not all of them, of course, because human intellect can still excel in more creative testing endeavours, like exploratory testing, to give just an example. But other kinds of tests, like regression-tests, would make a very good candidate for automation. Practitioners and managers know all too well that a well-rounded battery of regression tests can indeed prevent defects from being introduced in a new release with demonstrable positive effects on quality. But they also know that manual regression testing is inefficient, error prone, and expensive. Therefore, awareness is raising on the necessity to pursue an increase in the degree of test automation. In this essay I will argue that, before thinking about tooling, solutions, and blueprints, there is a key success factor that must be addressed: avoidance of the data bottleneck. I will first explain what it is, and why it can jeopardize even the most promising automation exercise. After that, I will introduce an architecture which can tackle this issue, and I will show that this approach will bring additional advantages along the way. I will now start the exposition introducing the topic of the data bottleneck.

The Data Bottleneck
If we make an abstraction exercise, we can see testing as a finite state automaton. We start from a state {s1} and after executing a test case TC1 we leave the system in state {s2}. A test case is a transition in our conceptual finite state diagram.

img-02a

In order to be able to execute a test case, the initial state {s1} must satisfy some pre-conditions. When the test case is executed, ending state {s2} may or may not satisfy the post conditions. In the former case we say that the test case has succeeded; in the latter case we say that it has failed. Now, what does this have to do with automation? An example will clarify it. Let us consider the case of a credit request submitted by a client of a certain kind (e.g. private client, female). The pre-conditions of the test case require that no open credit request can exist for a given client when a new request is submitted. From the diagram above we see that there is no transition between states {s2} and {s1}. What does it mean? It means that business workflows are not engineered to be reversible. If the test case creates a credit request and it fails, there is no way to execute it again, because no new business case can be created in the application for this client, until the open request is cancelled. Now, there are cases in which the application can actually execute actions which recover a previous state. But in the majority of cases, this is not possible. In banking, logical data deletion is used instead of physical deletion. Actions are saved in history tables recording a timestamp and the identity of the user, for future reference by auditors. In cases like this, the initial state of a test case cannot be automatically recovered. Sometimes, not even manually. What one would need, is a full database recovery to an initial state where all test cases can be re-executed. This is the only way; other approaches to data recovery are not viable because of the way applications are designed and of applicable legislation.
Above we have seen that data recovery is a key pre-condition for automation. Now we will see why legacy environments are oftentimes an impediment to tackling this issue efficiently. Oftentimes, business data of a financial institution is stored in a mainframe database. And when it is not in a mainframe, the odds are, it is in an enterprise-class database, such as Oracle. What do an Oracle database on Unix/Linux, and a DB2 on a mainframe have in common? Technology-wise very little. Cost-wise, a lot: neither one comes for cheap. The practical implication is that database environments made available for testing are only a few, and they must be shared among testing teams. This makes it impracticable to make available automatic database recovery procedures, because of the synchronization and coordination required. What happens in reality is that test engineers have to carefully prepare their test data, hoping that no interference from their colleagues will affect their test plan. And what is worst, after they are finished with their tests, the data is no longer in a condition suitable for re-execution of the same battery of test cases. Another round of manual data preparation is required.
One may wonder if it is indeed impossible to reduce the degree of manual activity involved. The point is that so long as access to databases is mediated by applications, and applications obey by the business workflow rules (and applicable legislation), recoverability of data is not an option. Are we indeed stuck? Isn’t it possible to achieve automatic data recovery without breaking the secure data access architecture? My contention is that there is indeed a viable solution to this problem. The solution is outlined in the following section.

Proposed Solution: On-Demand Synthetic Test Environments
Automated tests take place in synthetic environments, that is, environments where no client data is available in clear. Therefore, the focus of this solution will be on these environments, which are the relevant ones when it comes to issues of efficiency, cost-optimisation, regressions and, ultimately, quality.
The safest way to recover a database to a desired consistent state is using snapshots. A full snapshot of a database in a consistent state is taken and this “golden image” is kept as the desired initial state of a battery of automated tests. Using the finite state representation, we can describe this concept in the following way:
img-03a

The diagram shows that any time a battery of automated test cases terminates, it can be executed again and again, just by recovering the initial desired state. To be more precise, the recovery procedure can take place not only at the end of the test battery, but at any desired intermediate state. This is particularly useful when a test case fails, and the battery must be re-executed after a fix is released. The diagram can be amended like this:

img-04b
So, we have solved the problem of recovering the database to a desired consistent state which enables automatic (and manual) re-execution of test-cases. Is this all? What if other test engineers are also working on the same database environment? What would be the effect on their test case executions if someone else, inadvertently, swept their data away through the execution of an automatic database recovery procedure? It would simply be catastrophic, to say the least. This would bring about major disruption. How to fix this problem? What is needed is a kind of “sandboxing”: environments should be allocated so that only authorised personnel can run their test cases against the database, and no one else. Only the owner of one such environment should be in a position to order the execution of an automatic database recovery procedure. How can this be achieved? An effective way to do it is by offering on-demand test environments which can be allocated temporarily to a requestor. This sounds very much like private cloud. The below are the key attributes of an ideal solution to the on-demand test environment problem:

  • Test environments shall be self-contained.
    Applications, data and interfaces shall be deployed as a seamlessly working unit
  • Allocation of test environments shall be done using a standard IT request workflow.
    For example, opening a ticket in ServiceNow or what have you
  • Test environments are allocated for a limited period of time.
    After the allocation expires, servers are de-allocated. After a configurable interval, data is destroyed.
  • During the whole duration of the environment reservation, an automatic database recovery procedure shall be offered.
    This procedure may be executed by IT support whenever a request is submitted using a standard ticket. An internal Operational Level Agreement shall be defined. For example, full-database recovery is executed within a business day of the request.
  • The TCO of the solution shall be predictable and flat with respect to the number of environments allocated.
    Traditional virtual environments are allocated indefinitely and can easily become expensive IT zombies. Zombie environments still consume licenses, storage and other computing resources. Conversely, the solution proposed prevents these zombie environments from originating in the first place.

The logical representation of the proposed solution infrastructure is the following.

img-05b

To sum up, these are the advantages of the proposed approach:

  • It enables full test automation, making it truly possible to re-execute batteries of test cases using an automatic database recovery procedure in sand-boxed database instances.
  • It gives 100% control of TCO and allows to keep testing spend on IT within defined limits.
  • It allows to attribute testing spend to projects with high precision.
  • It increases overall quality of testing results
  • It eliminates cases of interference among independent test runs.
  • It allows to anticipate involvement of test teams in the software life cycle.
  • It saves infrastructure costs because computing clouds allow for transparent workload distribution, with the effect of running more (virtual) servers on the same physical infrastructure.

Conclusions

In this essay I have articulated the data bottleneck problem relating to test automation. First, I have given a general introduction to the topic. Second, I have explained why this problem may put in jeopardy test automation initiatives. Last, I have proposed a solution based on the concept of on-demand test automation environments and I have shown why I believe this is the way forward. The interested reader can contact me for sharing feedback or delving deeper in the discussion.

References

Wikimedia, 2015, https://commons.wikimedia.org/wiki/File:Buckeye_automatic_governor_%28New_Catechism_of_the_Steam_Engine,_1904%29.jpg, accessed 1 June 2015