On the Use of Job Interview Puzzles

Monty Open Door

[Wikimedia; 2013]

Hiring the right person for the job requires experience, craft, time and money. Even in the most favorable circumstances, the determination of the best candidate for an opening remains a very difficult one. Consequences of a wrong hire can be dire, and may range from failing to achieve project objectives, to ending up with a long-term incompatibility with the team or the company culture. Because of this, companies invest substantial resources on devising the best possible hiring process. The challenge is to define a repeatable process which, when executed by expert recruiters, leads systematically to good results. This is the theory. Now, how do hiring teams actually proceed with the selection of IT personnel? Methods vary, but a technique has become particularly popular among hyped technology companies: it is the use if job interview puzzles. This essay focuses on this very technique, with the intent to demystify it  and, possibly, determine its value and applicability. Before we delve deep into the topic, I must define what I mean by puzzle problems. There are a lot of different such problems, but what they have in common is that they are at the same time easy to understand, and apparently difficult to solve. Secondly, they cannot be answered using the skills required for the position at hand, except, possibly, creative problem solving. Examples abound on the internet. The interested reader can use search keywords “sample job interview puzzles” on any search engine, like www.duckduckgo.com.  For the sake of argument, I will consider job interview puzzles as a peculiar kind of test. As such, they can be tuned to maximise their reliability in either one of the following cases:

  1. False Positive: this is the error in which one incurs when a bad candidate is selected.
  2. False Negative: this is the error in which one incurs when a good candidate is rejected.

Even though people from HR (or Human Operations as someone with a lot of fantasy is calling them) like to tell the story that they “care for false positives and negatives”, classical logic and common sense advise quite to the contrary. One can only maximise either for false positives or false negatives, but not for both. This is a very well known problem, and it has application in several sciences, including statistics and medicine, to name a few. Medical tests, for instance, are carefully tuned in order to strike the optimal balance between sensitivity and reliability. Typically, such tests are optimised in order to minimise the risk that a disease is not detected.  Hence, they can give false positives. It is for this reason that positive tests are repeated twice.  Now, let us come back to our problem space: interview puzzle tests. Here, it’s easy to see that puzzles are optimised in order to minimise false positives: if one solves the given problem (without cheating), it cannot be the case that her cognitive intelligence comes from nowhere. She must have the required degree of cognitive intelligence. Conversely, if someone is unable to produce evidence of the required cognitive intelligence during the interview, she will not make it. But this can have two reasons: either the candidate actually fails to meet the set requirements, or she simply had a bad performance for some reason, despite meeting the requirements. Why can a good candidate fail a puzzle test she would  have solved correctly in dozens of other circumstances? Well, it depends. She may have not slept the night before because she was excited for the interview of her life. Or she did not sleep enough because she had to catch that early morning flight. Or for any other reason, because she is human and humans fail sometimes. But the hiring team does not care for determining which case it is, perhaps giving her a second chance. Why should they, so long as they have enough good people in the pipeline? Among them, there will certainly be someone in a better shape, able to produce the required evidence. The problem is that it may very well be the case that, among the rejected good candidates, there are some who are a better match of Mr Rambo who on that particular day was in a better shape to solve a puzzle, but happens to be an annoying presence and a nuisance for any team. “Hey, wait a minute”, the recruiter will want to object by now. She will probably advance that: (1) the candidate is purposefully put under stress, to see how she can put up with it, like in a real work situation; (2) one must use a criterion after all, to tell the good from the bad or, in politically correct, to tell the best match from the rest. However, I reject both objections and I advance that they are untenable. Let us start with (1): if someone fails to produce evidence of her qualities under exceptional circumstances, this is no evidence whatsoever that after a life of successes but one unlucky interview, this candidate is going to start, for this very reason , a series of failures. This is simply a ridicule argument, and betrays an oversized ego: the recruiter is implicitly advancing that dozens of university professors, former satisfied employers and clients were all wrong about the qualities of this candidate, and the only absolute truth worth of consideration is the result of the holy puzzle test. So holy and so true that she feels like rejecting on these very grounds a candidate with a history of documented successes (and if she did not have it, why have they invited her for an interview in the first place?). I will now proceed with point (2). The argument that a criterion is a criterion neglects such a thing as ethics. Discarding a good candidate is bad for the candidate, even though it may be good for the hiring company, which just considers this as ineliminable part of a process optimised to avoid false positives and, with them, reduce risk. Ethics mandates that the hiring process shall not pose the hiring company in a position of unilateral advantage on the candidate. But optimising the test against false positives does just that. Is there a way out of this clash of viewpoints? Perhaps. There can be one only if the likeliness of rejecting a good candidate is reasonably small, statistically. So, before the last word can be said on the use of puzzles, let us determine the likeliness of the morally uncomfortable mistake. How can we possibly do this? A famous formula will help. I will use Bayes’ Theorem, proved by English mathematician and Presbyterian minister Thomas Bayes (1701-1761) [Wikipedia; 2013]. This is the formula, in its simplest form [Wikipedia; 2013b]:

Bayes' Formula

The meaning is this [Wikipedia; 2013b; Bayesian interpretation]:

In the Bayesian (or epistemological) interpretation, probability measures a degree of belief. Bayes’ theorem then links the degree of belief in a proposition before and after accounting for evidence. (…)

For proposition A and evidence B,

  • P(A), the prior, is the initial degree of belief in A.
  • P(A|B), the posterior, is the degree of belief having accounted for B.
  • the quotient P(B|A)/P(B) represents the support B provides for A.

In our case, I will set:

  • A = ”candidate X is a bad match”
  • B = “candidate X has failed the job interview puzzle”

I will now apply Bayes’ Theorem in order to determine the probability that candidate X is a bad match, given that she has failed the puzzle problem. In order to apply the formula, I must first determine the value of P(B|A), P(A) and P(B). Let’s start with P(A), which is the prior probability that candidate X is a bad match, Candidates are only invited to a job inteview after a careful screening. Their resume is analysed, titles are (assuming the hiring company is a serious one) verified. Further, oftentimes a phone interview precedes the on-site puzzle surprise. If the candidate has a documented brilliant academic record, an history of successes, and validated written references, she is very likely to deserve, if not the job, at least to be shortlisted. P(A) is the probability that: (1) the academic record is fake or, dozens of university professors have been damn wrong over a prolonged period of time, spanning years; (2) references are fake, and former managers and clients are not respectable people and wrote false reference letters. Uhm, this is crazy, isn’t it? What can be the prior probability of all this? I would say zero, but I cannot, because that would be tantamount to begging the question. To give an advantage to the puzzle interview true believers, I will generously concede that there may be one case in 1’000. So, we have:

P(A)=0,1%                [1]

Let us now determine P(B|A), that is, the probability that candidate x fails the test, if she is a bad match. The probability is fairly high, say 99.5%: it is the probability that someone without the necessary problem solving skills fails a problem specifically designed to stress them to the max.

P(B|A)=99,5%        [2]

Now it’s the turn of P(B): the probability that a candidate, whose resume has been screened, whose academic record and references checked, who has successfully passed the preliminary job interview fails the puzzle test. The fairness of the selection process hinges on the apparently reasonable assumption that P(B) is very very low. But this assumption is false, because people with the requested endowment of intellectual resources may not be ok on that particular day, they are likely to have caught an early flight, they are techies, and oftentimes they need time to familiarise with a new environment where there are several new people. I will again give a generous concession, I will recognise that these exceptionally unlucky circumstances will not materialise in more than 2% of cases:

P(B)=2%                  [3]

Now we have got the figures. Let us compute P(A|B)=(0,995*0,001)/0,02=0,04975%, which is approximately 5%. Wait a moment, we have determined that P(A|B)=5%. What??? 5%??? This astonishing result is worth a pause: The probability that candidate X, whose resume has been checked successfully, whose academic titles are genuine, whose references are validated, who has successfully passed a preliminary phone interview, is a bad match, given that she has failed a given job interview puzzle, is about 5%. The puzzle test priests will now complain that they did not actually invest the time to check all these things beforehand, because they can only afford this effort for the shortlisted candidates. After all, they have an infallible collection of puzzle tests, which will filter out bad candidates before HR even thinks of validating their dossier (resume, references, academic record). Sure, but along the way, I contend, they may have filtered out very good candidates as well. True believers will now want to remark, with a pitiful expression, that they are very sorry for that, but their job is to select the best person for the job, not to save unlucky candidates, however good they may be. They are not a charity after all. The point is that, in so doing, they are running an unethical process which is biased, by design, in favour of the hiring company, and against the single applicant. Of course, this selection will find a very good match, but at the cost of potentially discarding more good candidates than ethics would allow.

Conclusions

One may be tempted to think that the implications of this essay are leading nowhere. One needs criteria to select candidates, however imprecise these may be, the true believer will say. And she will go on with “imprecision is a fact of life: nobody ìs perfect. But we are doing our best. If we have made it, you can make it too: your success is only a few puzzle tests away”. But, to be constructive, how can one define an effective hiring procedure which is, at the same time also ethical, rather than grossly biased in favour of the hiring company? My answer is that puzzle problems can actually be utilised, on condition that a candidate with the right credentials can only be discarded after failing a series of such tests, not just one. A popular interview scheme is this:

A popular, but unethical, interview process

Using the results from this essay, one may think of alternative, more balanced, implementations of the selection process. For example the following:

A revised interview process

The revised process above is only an example, but it contains important amendments with respect to the previous one. For a start, the preliminary resume screening is now in depth for every candidate with the required qualifications and experience. Second, there are two preliminary job interviews rather than one, but there is no single instance of an interview puzzle with the potential to interrupt the ongoing selection of a candidate. Last, if the candidate meets the criteria so far, she is invited for a run of on site interviews. If one is really in love with puzzle tests, here they can be used: one cannot be always unlucky. But the candidate has the right to spend the night before close to the hiring location, and the hiring company must pay the expenses within reasonable limits. Under this proviso, can puzzle tests be used, which do not constitute an unfair bias of the selection process in favour of the hiring company, and against candidates. To sum up, this essay has disproved the following misconceptions:

  • Misconception 1: Job Interview Puzzles can replace serious resume screening We have seen that technically speaking, they can, but at the cost of running an unethical process which is disreputable for a mainstream company and, actually, for any company.
  • Misconception 2: Tests can be optimised at the same time for false positives and false negatives Here there is little to say. Someone with this crazy belief is simply lacking the cultural background to understand what is a test. One can “really care for both false positives and false negatives”, but the act of caring can neither reverse, nor cancel the laws of mathematics and common sense.
  • Misconception 3: A candidate failing a puzzle test is a bad match As we have seen above, Bayes’ Theorem demystified this  contention without mercy. This is a fairly wrong belief, and a real shame. Good people deserve to work for better companies than these.

Bibliography

The False Myth of Scientific Management

[Wikimedia; 2013]

Management theories and practices are increasingly focussing on implementing a systematic measurement of “objective” indicators of performance, inspired by the axiomatic tenet “you cannot manage what you cannot measure”. Enterprises are measuring the performance of processes, they are measuring service availability, they are even measuring performance indicators of employees. Measures are everywhere. Such measures are typically referred to in the literature as Key Performance Indicators (KPIs). There are handbooks of KPIs, collections of KPIs, and all this, according to their holy fathers, will eventually free the discipline of management from the “plague” of subjectivity and its alleged worrisome weaknesses. Whenever practical results fail to meet expectations, despite religious adherence to this theory, true believers amend the implementation adding more and more KPIs, requiring and enforcing even stricter allegiance to their dogma.

But however much individuals and organisations are bent to the rule of the wildy KPI priests, the likeliness of achieving the promised results keeps looking more like utopia than scientific determinism. In some cases, there is a correlation between adoption of KPI-based management practices and results, in the sense that such results materialise in an overlapping time interval with the measurement of KPIs. But this is no scientifically acceptable evidence for the existence of a cause-effect relation between the two.

So, the question arises, why should we stick to a theory of management which is so often disproved and which is continuously confronted with exceptions requiring constant defensive explaining from its own priests? First, the theory is not actually disproved, because it is not a scientific theory. Only scientific theories can be disproved[1]. And scientific management is not a scientific theory, as we will see later in more detail. But the point remains, why are we so incredibly fascinated by measures and objectivity? The excerpt below will give us the insight we need:

 “We in the western world in the 21st century are children of science: our lives are dominated by the products of science and by the powers that they place in our hands. We are still biological creatures, but unlike any other species we have stepped out of the environment of nature, and into a new environment of our own making – one created by science and technology. All the primitive physical threats to life – hunger, cold, disease, darkness, distance – have been beaten back, leaving us free to redesign our social lives in ways that would have seemed inconceivable two centuries ago. The visible signs of this mastery are the machines with which we have harnessed the forces of nature, and which have now become for us indispensable tools for living. But these machines are only symbols of something much deeper – of our understanding of the laws of nature, an understanding which has been gained slowly and painfully over thousands of years. Science has been one of the great intellectual quests of human history, but unlike the philosophical quest or the religious quest, it has had consequences that are intensely practical, indeed it has reshaped our lives.”

[Whitfield, Peter; 2012; Kindle Locations 54-61]

Got it? We are fascinated by science, which is the dominating western religion of the 21st century. The reasons for this fascination abound. As the excerpt above has clearly explained, science has had practical consequences on our lives, mostly for the better. And we certainly cannot blame physicians and mathematicians for madmen’s (or politicians’) use of their discoveries.

I need not go through each and every endeavour of science to substantiate my contention that in the West we are fascinated by it. I will just mention a few examples, among thousands. The discovery of antibiotics has saved, and is still saving, millions of lives. But the advances of medical science have also dramatically reduced the consequences of aggressions and wounds:

  • “advances in medical technology since 1970 have prevented approximately three out of four murders”.

[Christensen et al.; 2012; Kindle Locations 2851-2852]

  • “a wound that would have killed a soldier nine out of ten times in World War II, would have been survived nine out of ten times by United States soldiers in Vietnam. This is due to the great leaps in battlefield evacuation and medical care technology between 1940 and 1970.”

[Christensen et al.; 2012; Kindle Locations 2854-2860]

  • “-c. 1840: Introduction in Hungary of washing hands and instruments in chlorinated lime solution reduces mortality due to “childbed fever” from 9.9 percent to .85 percent -c. 1860: Introduction by Lister of carbolic acid as germicide reduced mortality rate after major operations from 45 percent to 15 percent”

[Christensen et al.; 2012; Kindle Locations 2879-2884]

And it was not by chance Americans chose to exhibit their superiority by landing on the moon in 1969. It was a safe bet for them to assume that a public exhibition of the magical powers of american science would impress and subdue the rest of the World for years to come. So, we have plenty of evidence of our cultural bias. Now, how can this bias influence our management practices up to the point of transforming otherwise brilliant people into dull measurement agents of improbable pseudo-scientific observables? We can find a convincing answer in the use of mathematics, which is the universally recognised language of science. But beware, not the one with capital ‘M’; only its bare approximation: the arithmetic of the bean counter. The excerpt below, referring to Newton’s Principia Mathematica, gives again invaluable insight:

“Newton’s title is important in another sense too, in that it recognises the role of mathematics in natural philosophy, the sense that nature must be an ordered system, that number, regularity and proportion are built into the fabric of the universe, and that mathematics can provide the key to our understanding of it.”

[Whitfield, Peter; 2012; Kindle Locations 134-136]

So, everything is clear by now: universal use of KPIs, true believers think, will transform enterprises into “an ordered system” benefiting from the same very “regularity” which is “built into the fabric of the universe”. This is a truly impressive utopia, isn’t it? It is only a pity that all the theory is based on false metaphysical assumptions. I will now go through the misconceptions which underlie this false myth of scientific management.

Misconception 1: Using figures and measuring KPIs is the same as using the Scientific Method

The Scientific Method can be summarised with reasonable approximation using this sequence of steps:

  1. Define a question
  2. Gather information and resources (observe)
  3. Form an explanatory hypothesis
  4. Test the hypothesis by performing an experiment and collecting data in a reproducible manner
  5. Analyze the data
  6. Interpret the data and draw conclusions that serve as a starting point for new hypothesis
  7. Publish results
  8. Retest (frequently done by other scientists)

[Wikipedia; 2013]

With the scientific method, observation of phenomena leads scientist to formulate hypotheses. These hypotheses include posits about the existence of entities (ontological hypotheses), and the laws governing them. Why did Newton care for the observables called acceleration, mass and force? Because he posited, and then proved, the existence of a mathematical law relating them with one another. His very well known second law of motion actually states that F=m*a. The formulation of Newton’s theory is a typical example of scientific theory, in that observation of the reproducible phenomenon described can confirm (or reject) its validity. If the law were false, a number of documented experimental exceptions would disprove it, determining its withdrawal from the body of recognised scientific knowledge.

The first difference between KPIs and scientific observables comes immediately to my mind. KPIs are typically not related to one another by laws like the laws of physics. They are measured with the implicit assumption that they be immediately meaningful on their own. For example, let us assume a manager would like to measure the efficiency of a resolver group. She might want to measure such KPIs as “number of incidents resolved in a day”, “number of incidents whose estimated closing date has been exceeded by more than 20%”. The implicit assumption here is that there exists such an entity as “the efficiency of my resolver group team”, and measuring the KPIs above alone is the same as measuring “the efficiency of her resolver group”. But, wait a moment, we do not have any scientific law here. We do have at most a very basic intuition. Without a law relating force, mass and acceleration, Newton could as well have been measuring the colour of the sky or how many ants were killed by the falling apple to determine its mass. When a manager measures KPIs without an experimental law relating them to the entity she seeks to measure, she is doing the same as measuring the killed ants by a falling apple to determine its mass.

Let us go farther. Scientists measure observables knowing in advance the error function. In other words, the acceleration of a falling body on planet Earth is 9,81 m/s +/- a given epsilon, that is, the error function. A measure is only relevant when the error range is known in advance. If I know that I am driving at 85 mph +/- 2%, I can safely assume that I am within the 90 mph limit. But if I did not know what is the error function of the speedometer in my car, how could I possibly determine if I am within the speed limit or not? It is thanks to the awareness of error ranges that we can trust the cockpit of our cars. and it is thanks to it, that car makers generally tune their speedometers so that they mark speed a bit in excess, to keep drivers on the safe side. Conversely, KPIs are measured without knowledge of their associated error function. Consequently, even if KPIs were actually useful on their own (and as we have seen above, they need not be), measuring them without knowing their associated error function would still make them useless. Saying that one’s team is solving 99% of the incidents within the estimated date without knowing the error function is like saying I am driving at 85b mph +/- an unknown amount. For example, there may be a percentage of incidents which are not tracked following strictly the process (not so unlikely). If one doesn’t know which is this percentage, the error function is unknown, and the KPI above has the same precision as my children’s toy speedometer.

So long as the practice remains for managers to make decision based on indicators measured in this way, it is no surprise that our companies, our economy and, ultimately, our lives, are unnecessarily endangered.

Misconception 2: The Performance is the KPI

Let us assume that the entity to be measured is “the quality of a service desk”. Since the definition is admittedly vague, the poor manager of this service desk will have a hard time when she will be assessed (and possibly rewarded) based on her objectively measured improvements. In order to achieve this, her boss will likely define a bunch of KPIs, which will be regarded as objective evidence. Why objective? Because they are expressed with figures, and this, the big boss will maintain, give them the special sacrality of absolute mathematical truths. For the simple fact that KPIs are expressed in figures, we are asked to devote them the same respect as if they were, say, Maxwell’s equations, or Peano’s axioms. Now, defenders of this use of KPIs would probably admit, the KPI in itself is obviously not the same as “the quality of the service desk”, just as my shoes number and my health insurance ID are not the same identical entity as the person I look in the mirror every morning when I shave. However, the argument goes, if the number of open incidents decreases, we know that quality increases; if the time spent on average to fix an incident decreases, quality increases. So, even if the measured KPIs are not exactly the same as “the quality of service desk”, defenders will say, they provide clear and actionable information to help the manager make the right decisions to act as to improve “the quality of her service desk”. Fair enough. However, I still counter this argument, because it implicitly assumes that there be a linear law relating such KPIs with the quality that need be optimised. But we cannot say, because such a law, need not be linear at all. It might very well be polynomial, logarithmic, or what have you. What does it mean? It means that a manager trying to optimise quality Q by optimising single KPIs related to Q, may be optimising such a quality Q of a very small percentage, and this depends on the mathematical law regulating the interdependence of Q and the KPI at hand. To conclude, saying that improving a given KPI k1 we know that we are improving quality Q1, is a very imprecise way to measure the effect of k1 on Q1, because the achieved improvement is regulated by a mathematical law which is unknown both to the manager whose performance is assessed, and to her assessor. The defender of KPIs could object that measuring something is better than nothing. I reject this stance, because every rational person will agree that, rather than collect arbitrary numbers, and base decisions on them, it is a lot better not to collect them at all.

Misconception 3: What you can Measure you can Manage

This is a typical logical fallacy. From the principle “you cannot manage what you cannot measure” it does not follow “what you can measure you can manage”. Let us see why.

Let’s define the following propositions:

Measure(x)=”I can measure x”

Manage(x)=”I can manage x”

Using definitions above, “you cannot manage what you cannot measure” becomes:

not Measure(x) -> not Manage(x)

By the same token, “what you can measure you can manage”. becomes:

Measure(x) -> Manage(x)

But from not A(x) -> not B(x) does not follow A(x) -> B(x). Rather than use truth tables, I will give a simple evidence of this fallacy.

Let us define A(x)=”person x has water to drink” and B(x)=”person x can stay healthy”

The plain truth “a person without water cannot stay healthy” would be expressed as:

not A(x) -> not B(x)

If expression A(x)->B(x) would hold, we would have “a person with water to drink will stay healthy”. But this is false (one may very well have a lot of water and no food whatsoever). Therefore the implication is mistaken. The correct implication would have been:

(not A(x) -> not B(x) ) -> (B(x)->A(x))

Coming back to our case, we would obtain: “what you can manage you can measure”, which looks like a more reasonable statement.

Conclusions

This essay is an attack on the use of the adjective “scientific” as a qualifier for the discipline of management. Admittedly, my almost exclusive focus on KPIs may have led the reader into believing that it is an attack on KPIs, which is not. I would really like to clarify that this essay is not against KPIs; it is against the pretence that their use can gain the adjective scientific to a non-scientific discipline like management. Again, not being scientific, need not be a disadvantage, or a defect. It’s only that the Western children of science, cannot admit of being subjective sometimes, to act based on gut feelings and other intangible elements. Let alone horoscopes and the like. But they do. They are humans like you and I. To be precise, the expression Scientific Management, or Scientific Business Management, is only an oxymoron and should rather be re-defined as Technical Business Management. KPIs are (sometimes useful) techniques upon which one can indeed build a framework for managing a business. And, possibly, a successful one. So, why all this fuss about an apparently negligible imprecision like using adjective  “scientific” rather than “technical”? Because by using “scientific”, managers demand a degree of respect and allegiance, which is due to Science with the capital “S”. But they do not deserve it. They don’t, because:

  • They are not scientists, and their acts are not supervised and controlled by their peers in the same way as scientists do in structures like universities, which have hundreds of years of experience in verification of research work.
  • They base decisions on KPIs, which can impact jobs, revenue and other critical aspects, and they cannot hide behind the alleged scientific nature of their data. A decision is an act of free will and, as such, implies full accountability, for the good or the bad. No assumed determinism can diminish personal accountability and moral responsibility.
  • Scientific truths are neither good or bad. Their uses are. KPIs too, are neither good or bad, but their use is. So, using KPIs does not convey the moral agnosticism of a mathematical truth to a manager’s decision. No way.

To conclude, there are very useful techniques which may help achieve better results in business management. These techniques may be good or may be bad, depending on how they are used. But please, do not call this Science. It isn’t, I’m sorry.

Bibliography

Christensen et al; 2012 Christensen, Loren, Grossman, Lt. Col. Dave (2012-08-21). “Evolution of Weaponry, A brief look at man’s ingenious methods of overcoming his physical limitations to kill”, Kindle Edition, 2012-08-21

Thomas S. Kuhn; 1996 The Structure of Scientific Revolutions, Third Edition, The University of Chicago Press, ISBN 0-226-45808-3

Whitfield, Peter; 2012. “The History of Science”, Naxos Audiobooks. Kindle Edition, 2012-07-26.

Wikimedia; 2013, https://commons.wikimedia.org/wiki/File:Mad_scientist_caricature.png, accessed 5 October 2013; this file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported

Wikipedia; 2013, http://en.wikipedia.org/wiki/Scientific_method, accessed 2 October 2013


[1] Actually, the fact that scientific theories can be disproved, does not imply that a single exception may cause their dismission. Rather, it is  a slow process, especially for “mainstream” scientific theories:

“once it has achieved the status of paradigm, a scientific theory is declared invalid only if an alternate candidate is available to take its place. […] That remark does not mean that scientists do not scientific theories, or that experience and experiment are not essential to the process in which they do so. But it does mean-what will ultimately be a central point-that the act of judgement that leads scientists to reject a previously accepted theory is always based upon more than a comparison of that theory with the world.”

[Thomas S. Kuhn; 1996; pagg 77-91]

Clearly, this does not diminish the import of my contention. It does only say that in order to succeed in dismissing the religious use of KPIs there need be sufficient documented counterexamples , and an alternative theory. This may take time, but it  is not an impossible challenge.

The not so Simple World of User Requirements

User requirements are oftentimes expressed in natural language, especially in the early phases of a project, and this fact has led many practitioners into believing that requirements management is a simple activity. It is so simple, they think, that all one needs to do is to write them down once and for all at the beginning, and go straight to the realization phase.

We are so used to natural language that when we write requirements, we sometimes forget that good requirements must have certain qualities like consistency, completeness, and computability, to name a few. But these concerns are usually ascribed to the realm of more formal specifications and are otherwise often neglected.

Overlooking the complexity of requirements management could be seen by some as a “philosophical topic”, with the implied negative meaning that it is purely theoretical and it does not have practical consequences. But this opinion, as it happens, is fairly wrong. It is wrong because, as everyone with experience knows, it is particularly difficult to fix requirements mistakes in later phases of a project, and the implied costs increase very fast.

This essay is aimed at giving a high-level overview of  requirements management, with a focus on those aspects which can help clarify some of the most commonly held and, sadly, dangerous misconceptions on this delicate topic. First, a definition of user requirements is given. Next, the concept of requirements elicitation is introduced. After that, I will  introduce Requirements Analysis and I will briefly touch on Requirements Engineering. These elements will allow me to progress with the explanation of the paradigm shift from sequences of project phases to iterations. In the conclusions I will summarise some consequences of this essay, in terms of misconceptions disproved.

Before I proceed, I would like to give a formal definition of what I mean exactly when I use the term user requirements.

What are User Requirements

User requirements are statements expressing desired qualities of a system to be realised. They are the primary means of describing what the system shall do, and how it shall behave. Requirements can be expressed by people representing different viewpoints, and this reflects in their granularity and content. There can be general requirements, described in terms of business objectives to be achieved, services to be activated or revamped, and so on. This class of requirements is typically articulated by sponsors. A sponsor is someone who has a problem to solve, and is ultimately accountable for it. But there can also be other kinds of requirements, expressed by different stakeholders[1], from their specific viewpoint. The granularity of such requirements varies, and ranges from technical requirements to security requirements, legal requirements, availability requirements (SLA[2]s) and so on and so forth. All viewpoints are relevant for the successful realisation of the requested system.

Requirements Elicitation

“Over the years we have fooled ourselves into believing that users know what the requirements are and that all we have to do is interview them. It is true that the systems we build should support users and that that we can learn about user interaction from the users themselves. However, it is even more important that systems support the mission for which they are built”[3].

The fact that someone has a problem to solve does not imply in any way that they be able to articulate it, writing, for example,  a requirements document. The reason is simple: writing requirements is already a first step in the solution space, and someone with a problem, oftentimes does not have a clue how to solve it. There can of course be exceptions but, as a general rule, one must be prepared to be confronted with initial requests which are rather unqualified and imprecise. This implies that requirements engineers should not be passive receivers of someone else’s requests, but be actively involved in their definition. This is why in the literature the expression “requirements elicitation” recurs. What does that mean? It means that gathering requirements is more like an interactive interaction than a static exchange of information.  The difficulty in clarifying and interpreting written requests is familiar to everybody who has ever been part of one of those never ending e-mail exchanges. Dozens of replies are quickly generated until eventually there is so much confusion, that the only viable option remains a face-to-face meeting with the relevant counterparts. When it comes to discussing user requirements, a similar communication pattern applies. Hence, requirements cannot be simply gathered, or written. They need be elicited. To sum up, requirements elicitation is the process whereby an initial high-level request is iteratively improved adding details, completing missing parts, resolving ambiguities, and renegotiating unsatisfiable wishes. Such process requires two parties: a stakeholder and a requirements engineer. The ability of the requirements engineer is to be able to put the stakeholder in a position to express with clarity their viewpoint. In this respect, it is important to notice that requirements elicitation cannot be done focussing exclusively on the logical level. This can of course be useful for the identification of general gaps. But if one is to pursue truly successful requirements elicitation, a deep understanding of, and experience with the class of systems to be realised is necessary. Contrary to what some may think, there is little room here for generalists. Which is the reason for this? The reason is such a thing as implicit requirements.

Implicit Requirements

When one has to specify the attributes of a system, the description must end at some point. One must be given the possibility to give some obvious truths for granted. Naturally, the fact that some things may safely remain unsaid, or unwritten, is dependent on several strict conditions. For example, the fact that the requested solution belongs to a category of systems, having a well known set of general qualities. So common and very well known within a cultural group, that they can be safely left implicit. If one buys a car, the fact that there be a steering wheel, or seats, or what have you, is a safe assumption. It would not be as safe an assumption if one were to specify the requirements of a car like the one you and I drive every day, to a visiting Martian. But Martians are not the only group one can imagine having a different cultural bias than one’s own. There is clearly no scientific way to pre-determine what is the degree of cultural affinity between the requestor and the requirements engineer. Hence, there are no golden rules concerning the determination of what can be safely left unsaid. Except one. Ask, ask, ask. If the requirements engineer is knowledgeable about the solution domain, she can elicit the specification of the things unsaid which can make a difference. Elicitation is all about posing the right questions. The requestor may not even be aware of the existence of certain qualities, or may simply give it for granted that the engineer was sharing her judgement in specifying them. Let us consider the case of someone wanting to buy a car, say John. John goes to a nearby car seller and asks for a comfortable car with an automatic gear. He adds, his favourite maker is “ABC”. The salesman, say Derek, asks: “Which kind of gear would you like”. John: “An automatic one”. Derek: “I see. If you prefer a sporty gear, I advise you to consider a double clutch. If you like a more relaxed driving experience, you could go for a traditional gear with torque converter. ABC allows you to choose either one”.  John: “In this case I prefer a traditional automatic gear.”

In this example, had not Derek asked John the question about the kind of automatic gear, John would not have had all the elements to make an informed decision. It was clearly not his fault neglecting to specify this detail. He was simply unaware of the existence of these two distinct kinds of automatic gear, even though he was absolutely in clear about his wish to drive a car with automatic transmission. Derek’s question has allowed John to ponder how to best qualify his request, and to make it more aligned to his actual wish. Could a good requirements engineer without car knowledge have done as good a requirements elicitation as Derek’s? You judge. Admittedly, the example above might seem far fetched. But if we consider a question like “Would you like the new system to connect to the existing user registry, or to use its internal one?”. This may sound more familiar. And in the same way as John need not be a car expert to buy a car, a project sponsor need not be an expert in LDAPs to require a new CRM.

Requirements Analysis

In the initial phase of requirements elicitation, the best way to formalise user requirements is to use natural language. Such a choice has several advantages. First, it allows  for ease of understanding by a broader range of stakeholders, which may not have an education in formal systems or languages. Second, it shortens revision cycles with diverse stakeholders, making it easier to reach consensus. The downside, on the other hand, is ambiguity. Requirements elicitation can actually detect and fix only part of the inconsistencies or ambiguities, because its focus is more on completeness. Consequently, after the requirements have been elicited, it is necessary to carefully  analyse them, and clarify all the sources of equivocation. Requirements analysis is aimed at translating the initial specification of qualities into a clear, consistent, and realistic description of the desired system.

Requirements analysis is a delicate step and encompasses several aspects, including:

  1. delimiting the perimeter of the system to be realised
  2. detecting and resolving ambiguous statements
  3. assessing the overall consistency of the desired qualities
  4. determining the existence of a computable solution to the request

Understanding the boundaries of the solution is a fundamental way to reduce risk and manage expectations. Systems do not exist in the vacuum, they are embedded in an environment composed of other technology artifacts and human beings. The solution is likely to require interaction with some of them, and delimitation of its perimeter helps identify with precision the interfaces.

Natural language, different than computer languages, is ambiguous, and that is both its beauty and the main constituent of its expressiveness. It is indeed unfortunate that, when it comes to budgets and schedules, the results of a misunderstanding or semantic ambiguity is a lot less fascinating or, sometimes, even funny, than in other contexts like human relations or poetry. An unclear requirement is very likely to cause delays, budget overruns, unrealistic expectations and, in some cases, loss of jobs. For these reasons, clarification of user requirements is probably the wisest thing a requirements engineer can do. Since misunderstanding of requirements tends to generate a snowball effect, timeliness is key to resolving a dangerous situation before it is too late to mitigate its consequences.

Requirements express  wishes and, as such, may suffer from the inconsistencies of all things human. Inconsistencies may be apparent or intrinsic. In the former case requirements express qualities which are sometimes irreconcilable, but not in the particular case at hand, where they can actually be pursued. An example follows. Let us assume a manager requested the following requirement to the hiring agency: “I would like to hire software engineers with outstanding productivity, able to write software with very few defects”. High productivity is oftentimes accompanied by poor quality, and this happens in so many cases that one may be tempted to assume it is also true for software engineering. However, studies reveal that it is not quite so. Actually, the most productive engineers write software with less defects[4].  In the latter case, that is, intrinsic inconsistency, there is little else to do than renegotiating the requirements with their author. With luck, the negotiation succeeds, and a fixed version of the requirement document is released. In the worst case, the negotiation fails, and it is necessary to escalate the problem. Ultimately, if a request persists which is unsatisfiable, the best thing to do is to decline the assignment. If the requirements engineer is not in a position to do so, this can easily translate in a very dangerous situation, in which a project team is held hostage in pursue of an impossible dream, that is bound to be hugely expensive.

When ambiguities are resolved, the perimeter of the solution is defined, and the consistency of requirements is assessed, one may be tempted to think everything crucial has been taken care of. However, that is not always the case.  The reason is that solutions based on software artifacts require the existence of an algorithm capable of solving the problem described by the requirements.  If one thinks about such beautiful pieces of software engineering as Adobe Photoshop, Google Maps, search engines, avionic applications, or artificial intelligence, it could seem that everything is theoretically possible in software. However, this popular belief is deeply mistaken. What a computer, or automaton of any present or future kind, can do, is execute computable functions. At this point, some readers will ask themselves, are not all functions computable?. Well, the answer is a definite no. The explanation of this important result may result difficult for those readers new to mathematical logic, but the important thing to grasp is only its essence, leaving details to professional mathematicians or philosophers. The fact remains, not all problems admit of an algorithmic solution, and with this important truth, the requirement engineer needs to familiarise herself.

For those willing to know more on this, I will briefly sketch the argument. The details are very well presented in mathematical logic books[5]. The following argument can be skipped without compromising the understanding of the remainder of this essay.

The argument starts with the definition of what is an effectively computable function:

A function is effectively computable if there are definite, explicit rules by following which one could in principle compute its value for any given arguments[6].

Mathematician and father of computer science, Alan Mathison Turing, introduced an idealised computing machine, named after him: the Turing Machine. The formal definition of such an automaton (which I omit here for the sake of simplicity) easily shows that all Turing-computable functions are effectively computable; Turing’s thesis is that all effectively computable functions are Turing-computable [Boolos et al; 2003; page 33]. Consequently, if Turing’s thesis holds true, a function is effectively computable if-and-only-if it is Turing computable. An essential property of the set of all Turing machines is that it is enumerable. This intuitively means that there are as many Turing machines as there are natural numbers. Which means infinite, but of a kind of infinity which is of a lesser degree than the infinity of all definable functions, computable or otherwise[7].

This is a remarkable result, because it brings to the philosophical consequence that human beings are able to define infinitely more problems than the ones which admit of an algorithmic solution. One could object that even though this may be an amusing, if weird, topic of mathematical logic, the practical consequences on requirements elicitation are close to null. However, there is no evidence whatsoever that all non-computable functions need be devoid of practical interest[8]. For all we know, we may be subject to an amazingly high number of requests, for which no computer program can implement a solution. The bottom line is that, when confronted with user requirements, one must be careful before saying the fatidical “yes, we can do it”.

Requirements Engineering

Once requirements have been clearly articulated and their elicitation has reached an advanced state, requirements engineering begins. This process is about translating descriptions of qualities expressed in natural language in a more structured way, with the intent to come closer to the precision and clarity of a mathematical description. In other words, requirements engineering is aimed at filling the gap between the world of humans and the world of automatons. The link between these two distant worlds, the means of bridging these apparently irreconcilable ontologies, is Use Case analysis. The term Use Cases is now very popular, and it is used in many ways. However, contrary to what too many practitioners nowadays believe, it is not a general term anyone can use light-heartedly. It is a technical term first defined with precision by Dr. Ivar Jacobson in his seminal book “Object Oriented Software Engineering”[9]. Other uses of the term only contribute to creating unnecessary confusion with which recent literature is all too often plagued. This essay does not aim at giving yet another explanation to what is Use Case analysis. Excellent sources abound[10]. Practical approaches have also emerged which can bring a lot of added value to the requirements engineer practitioner. Most notably, web site http://www.volere.org provides a comprehensive collection of resources, ranging from a thorough requirements specification template, to business analysis resources.

In the remainder of this essay we will see how, in reality, the  steps described above do not occur in an ordered sequence of steps. Actually, they are part of an iterative cycle. This fact has been recognized relatively recently, and its past neglect has been source of much distress and failure in project management practices.

From Sequences to Iterations: the Paradigm Shift

In the early days of methodologies, a commonly-held belief was that requirements can and should be fixed at the beginning of a project, and rarely, if at all, amended in subsequent phases. The rationale for this was that, on the one hand, there was a superficial and simplified understanding of how humans interact in real life, and, on the other hand, changes in requirements were ascribed as one primary cause for project failure in terms of budget overruns or missed deadlines. Several influential authors in the area of methodologies had an engineering background and applied concepts derived from formal methods to modelling of communication between persons. With this unambiguous but simplistic semantics of human-to-human communication, gathering requirements was conceived of as little more than passing a message from stakeholder A (the requestor) to stakeholder B (the requirements engineer[11]). This message-exchange metaphor was reflected in the most influential method of the time: the Waterfall method introduced by Winston Royce in the seventies. A high-level view of this model is provided below.

The picture above clearly shows that the definition of requirements was intended as the first phase of a project, on which all that was to come next had to rely, with religious faith on their absolute correctness. The assumption was that, if requirements are correct, the design will also be correct and, therefore, also the implementation. Certainly, the economic context of the seventies was less volatile than it has become in the subsequent age of the information revolution and the economic crisis. Even so, the assumption that the external context could be considered frozen once the requirements had been formally accepted, was doomed. The fact is that requirements are not mathematical axioms, existing in the pure and incorruptible world of ideas. Requirements are things human, and like all things of this kind, they are informed by the context in which they have been defined. The consistency of a set of requirements is of course a desired attribute, because without it, one could never do such a thing as project management. However, this desired stability can only be achieved in practice, over relatively short periods. The reasons vary, and may, for example, include:

  • changes in constraints
  • changes in macroeconomic context
  • new technology options become available, which impact the solution
  • technology or approach chosen prove not up to expectations
  • competitors make an unexpected move
  • increased awareness of the business objectives
  • change of priorities

In recognition of the need for a cyclical renegotiation and re verification of initial requests and assumptions, more recent methods have introduced the concept of iterations. The Unified Method is an example of a thorough process which embodies the concept of iterative-incremental development. Such a method was first described in [Jacobson et all; 1998]. The picture below shows that requirements are dealt with not only during project inception, but also in all subsequent phases.

There is of course more requirements engineering in the first project phases, but adjustments take place all the way through the transition phase. This model is a lot more realistic and is based on a deeper understanding of the dynamics of communications between stakeholders during the project life-cycle. A further evolution in the understanding of the requirements life cycle is embodied in TOGAF ADM (The Open Group Architecture Framework Architecture Development Method).

The TOGAF ADM is centered around requirements management, and emphasizes the separation of concerns which is typical of the distinct phases of the architecture life cycle. The strength of this method is that, contrary to its predecessors, it makes the cyclical nature of requirements explicit, and accounts for the cultural aspects of an enterprise, which before the emergence of TOGAF have traditionally been neglected. The import of this innovation is evident if we consider that oftentimes, what determines the viability of an option, and what are the constraints to the solution space, are to be found more in the history and culture of an enterprise, than in the requirements document.

In recognition of the criticality of these aspects, TOGAF gives them first class attention, to the effect that an architecture principles document is in the list of defined architecture artifact to be produced prior to proceeding with the solution design. The existence of artifacts like the architecture principles document, contributes to the formal specification of otherwise implicit requirements, making them relevant for the development of enterprise architecture. This explicitation of cultural aspects has the beneficial effect of reducing risk, because the overall consistency of the requirements can now be checked against an augmented requirements specification, dramatically reducing the need for making assumptions. As everyone knows, excessive use of assumptions is a symptom of scarce information, and is a primary source of risk. Less assumptions, less risk.

Conclusions

Requirements Management is oftentimes overlooked as a simple activity, which can easily be settled once and for all at the beginning of a project. In actual fact, it is not like this at all. As we have seen in this essay, commonly held beliefs are plagued with dire misconceptions, which are a primary cause of much failure and distress among project management practitioners and architects alike. Among all, I will recap the two main misconceptions disproved above:

  • Misconception nr 1: Everything can be done in software
  • Misconception nr 2: Since changes in requirements are a reason for project failure, the only way to avoid the problem is to define them once and for all, and never change them to the end of the project.

Bibliography

Boolos et al; 2003        Boolos G. S., Burgess J. P., Jeffrey R. C. “Computability and Logic”, 4th edition, Cambridge University Press, 2003, ISBN 0-521-00758-5.

DeMarco and Lister; 1999        DeMarco T. and T. Lister. “Peopleware. Productive Projects and Teams”. Dorset House Publishing. 1999. ISBN 0-932633-43-9

Jacobson et al; 1993        Jacobson I, Christerson M., Jonsson P. and Övergaard G. “Object-Oriented Software Engineering: A Use-Case Driven Approach”, Reading. MA: Addison Wesley Longman, 1992, (Revised 4th printing, 1993).

Jacobson et al; 1998        Jacobson I, Booch G., Rumbaugh J. “The Unified Software Development Process”, Addison Wesley Longman, 1998, ISBN 0-201-57169-2


[1] “Stakeholders are anyone who has an interest in the project. Project stakeholders are individuals and organizations that are actively involved in the project, or whose interests may be affected as a result of project execution or project completion. They may also exert influence over the project’s objectives and outcomes.” [https://en.wikipedia.org/wiki/Project_stakeholder; accessed 24 May 2013]

[2] Service Level Agreements

[3] [Jacobson et al; 1998; p. 112]

[4] [DeMarco and Lister; 1999; p. 47]

[5] “Computability and Logic”, 4th edition, G. S. Boolos, J. P. Burgess, R. C. Jeffrey, Cambridge University Press, 2003, ISBN 0-521-00758-5.

[6] [Boolos et al; 2003; p. 23]

[7] It can be proved that the set of functions from positive integers to positive integers is nonenumerable. [Boolos et al; 2003; page 35]

[8] For the interested reader, I recommend reading the explanation of the “Halting Problem” . [Boolos et al; 2003; pagg. 35-40].

[9] [Jacobson et al; 1993]

[10] For instance [Jacobson et al: 1998]

[11] In those years, the term used was “Business Analyst”.