On the Use of Job Interview Puzzles

Monty Open Door

[Wikimedia; 2013]

Hiring the right person for the job requires experience, craft, time and money. Even in the most favorable circumstances, the determination of the best candidate for an opening remains a very difficult one. Consequences of a wrong hire can be dire, and may range from failing to achieve project objectives, to ending up with a long-term incompatibility with the team or the company culture. Because of this, companies invest substantial resources on devising the best possible hiring process. The challenge is to define a repeatable process which, when executed by expert recruiters, leads systematically to good results. This is the theory. Now, how do hiring teams actually proceed with the selection of IT personnel? Methods vary, but a technique has become particularly popular among hyped technology companies: it is the use if job interview puzzles. This essay focuses on this very technique, with the intent to demystify it  and, possibly, determine its value and applicability. Before we delve deep into the topic, I must define what I mean by puzzle problems. There are a lot of different such problems, but what they have in common is that they are at the same time easy to understand, and apparently difficult to solve. Secondly, they cannot be answered using the skills required for the position at hand, except, possibly, creative problem solving. Examples abound on the internet. The interested reader can use search keywords “sample job interview puzzles” on any search engine, like www.duckduckgo.com.  For the sake of argument, I will consider job interview puzzles as a peculiar kind of test. As such, they can be tuned to maximise their reliability in either one of the following cases:

  1. False Positive: this is the error in which one incurs when a bad candidate is selected.
  2. False Negative: this is the error in which one incurs when a good candidate is rejected.

Even though people from HR (or Human Operations as someone with a lot of fantasy is calling them) like to tell the story that they “care for false positives and negatives”, classical logic and common sense advise quite to the contrary. One can only maximise either for false positives or false negatives, but not for both. This is a very well known problem, and it has application in several sciences, including statistics and medicine, to name a few. Medical tests, for instance, are carefully tuned in order to strike the optimal balance between sensitivity and reliability. Typically, such tests are optimised in order to minimise the risk that a disease is not detected.  Hence, they can give false positives. It is for this reason that positive tests are repeated twice.  Now, let us come back to our problem space: interview puzzle tests. Here, it’s easy to see that puzzles are optimised in order to minimise false positives: if one solves the given problem (without cheating), it cannot be the case that her cognitive intelligence comes from nowhere. She must have the required degree of cognitive intelligence. Conversely, if someone is unable to produce evidence of the required cognitive intelligence during the interview, she will not make it. But this can have two reasons: either the candidate actually fails to meet the set requirements, or she simply had a bad performance for some reason, despite meeting the requirements. Why can a good candidate fail a puzzle test she would  have solved correctly in dozens of other circumstances? Well, it depends. She may have not slept the night before because she was excited for the interview of her life. Or she did not sleep enough because she had to catch that early morning flight. Or for any other reason, because she is human and humans fail sometimes. But the hiring team does not care for determining which case it is, perhaps giving her a second chance. Why should they, so long as they have enough good people in the pipeline? Among them, there will certainly be someone in a better shape, able to produce the required evidence. The problem is that it may very well be the case that, among the rejected good candidates, there are some who are a better match of Mr Rambo who on that particular day was in a better shape to solve a puzzle, but happens to be an annoying presence and a nuisance for any team. “Hey, wait a minute”, the recruiter will want to object by now. She will probably advance that: (1) the candidate is purposefully put under stress, to see how she can put up with it, like in a real work situation; (2) one must use a criterion after all, to tell the good from the bad or, in politically correct, to tell the best match from the rest. However, I reject both objections and I advance that they are untenable. Let us start with (1): if someone fails to produce evidence of her qualities under exceptional circumstances, this is no evidence whatsoever that after a life of successes but one unlucky interview, this candidate is going to start, for this very reason , a series of failures. This is simply a ridicule argument, and betrays an oversized ego: the recruiter is implicitly advancing that dozens of university professors, former satisfied employers and clients were all wrong about the qualities of this candidate, and the only absolute truth worth of consideration is the result of the holy puzzle test. So holy and so true that she feels like rejecting on these very grounds a candidate with a history of documented successes (and if she did not have it, why have they invited her for an interview in the first place?). I will now proceed with point (2). The argument that a criterion is a criterion neglects such a thing as ethics. Discarding a good candidate is bad for the candidate, even though it may be good for the hiring company, which just considers this as ineliminable part of a process optimised to avoid false positives and, with them, reduce risk. Ethics mandates that the hiring process shall not pose the hiring company in a position of unilateral advantage on the candidate. But optimising the test against false positives does just that. Is there a way out of this clash of viewpoints? Perhaps. There can be one only if the likeliness of rejecting a good candidate is reasonably small, statistically. So, before the last word can be said on the use of puzzles, let us determine the likeliness of the morally uncomfortable mistake. How can we possibly do this? A famous formula will help. I will use Bayes’ Theorem, proved by English mathematician and Presbyterian minister Thomas Bayes (1701-1761) [Wikipedia; 2013]. This is the formula, in its simplest form [Wikipedia; 2013b]:

Bayes' Formula

The meaning is this [Wikipedia; 2013b; Bayesian interpretation]:

In the Bayesian (or epistemological) interpretation, probability measures a degree of belief. Bayes’ theorem then links the degree of belief in a proposition before and after accounting for evidence. (…)

For proposition A and evidence B,

  • P(A), the prior, is the initial degree of belief in A.
  • P(A|B), the posterior, is the degree of belief having accounted for B.
  • the quotient P(B|A)/P(B) represents the support B provides for A.

In our case, I will set:

  • A = ”candidate X is a bad match”
  • B = “candidate X has failed the job interview puzzle”

I will now apply Bayes’ Theorem in order to determine the probability that candidate X is a bad match, given that she has failed the puzzle problem. In order to apply the formula, I must first determine the value of P(B|A), P(A) and P(B). Let’s start with P(A), which is the prior probability that candidate X is a bad match, Candidates are only invited to a job inteview after a careful screening. Their resume is analysed, titles are (assuming the hiring company is a serious one) verified. Further, oftentimes a phone interview precedes the on-site puzzle surprise. If the candidate has a documented brilliant academic record, an history of successes, and validated written references, she is very likely to deserve, if not the job, at least to be shortlisted. P(A) is the probability that: (1) the academic record is fake or, dozens of university professors have been damn wrong over a prolonged period of time, spanning years; (2) references are fake, and former managers and clients are not respectable people and wrote false reference letters. Uhm, this is crazy, isn’t it? What can be the prior probability of all this? I would say zero, but I cannot, because that would be tantamount to begging the question. To give an advantage to the puzzle interview true believers, I will generously concede that there may be one case in 1’000. So, we have:

P(A)=0,1%                [1]

Let us now determine P(B|A), that is, the probability that candidate x fails the test, if she is a bad match. The probability is fairly high, say 99.5%: it is the probability that someone without the necessary problem solving skills fails a problem specifically designed to stress them to the max.

P(B|A)=99,5%        [2]

Now it’s the turn of P(B): the probability that a candidate, whose resume has been screened, whose academic record and references checked, who has successfully passed the preliminary job interview fails the puzzle test. The fairness of the selection process hinges on the apparently reasonable assumption that P(B) is very very low. But this assumption is false, because people with the requested endowment of intellectual resources may not be ok on that particular day, they are likely to have caught an early flight, they are techies, and oftentimes they need time to familiarise with a new environment where there are several new people. I will again give a generous concession, I will recognise that these exceptionally unlucky circumstances will not materialise in more than 2% of cases:

P(B)=2%                  [3]

Now we have got the figures. Let us compute P(A|B)=(0,995*0,001)/0,02=0,04975%, which is approximately 5%. Wait a moment, we have determined that P(A|B)=5%. What??? 5%??? This astonishing result is worth a pause: The probability that candidate X, whose resume has been checked successfully, whose academic titles are genuine, whose references are validated, who has successfully passed a preliminary phone interview, is a bad match, given that she has failed a given job interview puzzle, is about 5%. The puzzle test priests will now complain that they did not actually invest the time to check all these things beforehand, because they can only afford this effort for the shortlisted candidates. After all, they have an infallible collection of puzzle tests, which will filter out bad candidates before HR even thinks of validating their dossier (resume, references, academic record). Sure, but along the way, I contend, they may have filtered out very good candidates as well. True believers will now want to remark, with a pitiful expression, that they are very sorry for that, but their job is to select the best person for the job, not to save unlucky candidates, however good they may be. They are not a charity after all. The point is that, in so doing, they are running an unethical process which is biased, by design, in favour of the hiring company, and against the single applicant. Of course, this selection will find a very good match, but at the cost of potentially discarding more good candidates than ethics would allow.

Conclusions

One may be tempted to think that the implications of this essay are leading nowhere. One needs criteria to select candidates, however imprecise these may be, the true believer will say. And she will go on with “imprecision is a fact of life: nobody ìs perfect. But we are doing our best. If we have made it, you can make it too: your success is only a few puzzle tests away”. But, to be constructive, how can one define an effective hiring procedure which is, at the same time also ethical, rather than grossly biased in favour of the hiring company? My answer is that puzzle problems can actually be utilised, on condition that a candidate with the right credentials can only be discarded after failing a series of such tests, not just one. A popular interview scheme is this:

A popular, but unethical, interview process

Using the results from this essay, one may think of alternative, more balanced, implementations of the selection process. For example the following:

A revised interview process

The revised process above is only an example, but it contains important amendments with respect to the previous one. For a start, the preliminary resume screening is now in depth for every candidate with the required qualifications and experience. Second, there are two preliminary job interviews rather than one, but there is no single instance of an interview puzzle with the potential to interrupt the ongoing selection of a candidate. Last, if the candidate meets the criteria so far, she is invited for a run of on site interviews. If one is really in love with puzzle tests, here they can be used: one cannot be always unlucky. But the candidate has the right to spend the night before close to the hiring location, and the hiring company must pay the expenses within reasonable limits. Under this proviso, can puzzle tests be used, which do not constitute an unfair bias of the selection process in favour of the hiring company, and against candidates. To sum up, this essay has disproved the following misconceptions:

  • Misconception 1: Job Interview Puzzles can replace serious resume screening We have seen that technically speaking, they can, but at the cost of running an unethical process which is disreputable for a mainstream company and, actually, for any company.
  • Misconception 2: Tests can be optimised at the same time for false positives and false negatives Here there is little to say. Someone with this crazy belief is simply lacking the cultural background to understand what is a test. One can “really care for both false positives and false negatives”, but the act of caring can neither reverse, nor cancel the laws of mathematics and common sense.
  • Misconception 3: A candidate failing a puzzle test is a bad match As we have seen above, Bayes’ Theorem demystified this  contention without mercy. This is a fairly wrong belief, and a real shame. Good people deserve to work for better companies than these.

Bibliography