The notion of identity is implicit in everyday discourse on an incredible variety of subjects. When one says “my car”, what one means is not a car of the same make, model, colour, as the car one has left in a nearby parking lot. What one means, is exactly the same very car, the single one object which is spatially and temporally continuous with the car one has just driven. This relation is referred to in the literature as numerical identity. As a first approximation, “Numerical identity can be characterised, (…) as the relation everything has to itself and to nothing else. “, (Stanford; 2014a)
Everybody is familiar with this relation. We all can (barring abnormal circumstances) recognise our beloved relatives, friends and pets quite easily. But our recognition abilities are not limited to living beings. We are also usually able to tell inanimate objects from one another quite unproblematically. This ability is so common that sometimes one forgets on which principles it is grounded. So, the question arises, how do we do to determine whether two entities are identical or not? In order to answer this question, we need to start from the basics. The relation of identity is clearly an equivalence relation, because it satisfies the following properties:
- for every entity e, e is identical to itself (reflexivity)
- for any given two entities e and k, if e is identical to k, then k is identical to e (symmetry)
- for any given three entities e, k, and z, if e is identical to k and k is identical to z, then k is identical to z (transitivity)
Is this all? Clearly not. There are plenty of equivalence relations which are not identity relations. To give just a couple of examples:
- “Has the same birthday as” on the set of all people.
- “Has the same cosine” on the set of all angles.
So, what does the identity relation require in addition to being an equivalence relation? It requires Leibniz’s Law: the principle of the indiscernibility of identicals, which says that “if x is identical with y then everything true of x is true of y.” (Stanford; 2014a). To sum up, when in this essay I will refer to the relation of identity, I will intend the equivalence relation which satisfies Leibniz’s principle of the indiscernibility of identicals. Let us now try to apply the notion of identity to software. To be more specific, in this acceptation, software has to be intended as computer programmes1. And now the fun will begin! Let us consider a given personal productivity application P1 installed on one’s laptop at work. If the user of P1 has another application P2 installed at home, by the same vendor, with the same major version, are P1 and P2 one and the same software? If we stick to Leibniz’s Law, we must see if there is at least a property which is valid of P1 which is not valid of P2, or viceversa. Of course there is at least one. “Being installed on the same hardware” is clearly unsatisfied. And one can easily find many other properties which hold true of P1, but not for P2, and the other way round. For example, “having the same detailed configuration”. The installation at work will very likely be pre-configured in conformance with security policies and guidelines which at home may not apply in exactly the same way (thanks God). But this preliminary result is puzzling. Isn’t it? Most people would say that P1 and P2 are the same programme. And if we apply Leibniz’s Law in other special cases, the puzzlement can go even further. For example, let us consider a computer programme written in a given programming language, say C++. In order to be executed on a given hardware, this programme must be compiled. Compilation is a process whereby a computer programme P3 undergoes a series of correctness checks (syntactical and, to some extent, semantic), and is then “translated” into a sequence of machine instructions which can be executed on that particular hardware. Now, let us call P3-Win our computer programme compiled on a Windows machine, and P3-Ux the programme compiled on a Unix system. Are P3-Ux and P3-Win the same programme? If we consider that the source code is the same line-by-line, statement by statement, character by character, we may be tempted to say yes. But property “being compiled in the same set of machine instructions” is clearly not satisfied. So, according to Leibniz’s Law, P3-Win and P3-Ux are not numerically identical2. Is this really so puzzling, in the end? I don’t think so. Everybody with experience knows that programmes may behave differently on different platforms. So, even though P3 is clearly identical to itself, its instantiations aren’t identical with one another.
At this point, one may think, the problem is with the instantiations of software, not software itself. In our examples above, it was the installed application, or the compiled programme which led us into trouble. Are we safe if we just consider source code? At first sight, yes. But if we look more closely, not even source code can get us out of trouble when it comes to identity. There are still two problems. The first one is that source code does not exist in the vacuum: is it either saved on a hard disk, in a computing cloud, a flash memory, or is printed on paper. Again, two instances of a programme P, which are saved in two storage systems, or two folders in the same hard disk, or are printed on different sheets of paper, fail to satisfy Leibniz’s Law. The second problem is that even if we could find a way to make sense of the relation of identity for software, we would still need to answer the related question of what it means for a programme P to continue to be the same programme over time. Let us consider a programme consisting of, say, 3’000 statements. Let us now assume that this programme is modified and only the following statement is updated3, and nothing else:
i = i+1;
Every person with an understanding of programming would say that after this simple edit the programme is still the same. After all, the algorithm computed is the same; the operational semantics is the same. Everything except the syntax looks the same. But as we have seen above, Leibniz’s Law would not be satisfied, because property “having the same syntax” would not hold true4.
We have collected plenty of evidence that the relation of identity is not applicable to software. It is simply undefined. We cannot speak of numerical identity of software in any meaningful way. But how can we live with this finding? How can a software vendor justify the same license cost for two packages of a computer programme, if these cannot be proven numerically identical? How can one treat this strange substance, software, without such a basic relation as identity, which we use unproblematically with a lot of other entities we interact with on a daily basis?5
Luckily, there is, I think, a way out of this puzzle. This way out is the relation of equality. When people say that a programme P1 and a programme P2 are the same, what they really mean is not that they be numerically identical. What they mean is that they are equal. Equality is a weaker relation than identity. Here, it is not necessary that two entities satisfy exactly the same predicates: Leibniz’s Law need not apply. Equality is about satisfying a set of relevant properties which are the only ones which really matter to the observer for the determination whether or not such two entities can be exchanged. It is interesting to see that, although equality is a weaker relation than identity, it is still an equivalence relation6. The following definition will help us understand the difference between identity and equality more in depth:
The terms “equality” (Gr. isotes, Lat. aequitas, aequalitas, Fr. égalité, Ger. Gleichheit), “equal,” and “equally” signify a qualitative relationship. ‘Equality’ (or ‘equal’) signifies correspondence between a group of different objects, persons, processes or circumstances that have the same qualities in at least one respect, but not all respects, i.e., regarding one specific feature, with differences in other features. ‘Equality’ needs to thus be distinguished from ‘identity’ — this concept signifying that one and the same object corresponds to itself in all its features (…)
For physical objects, people usually use implicit but reliable criteria of equality. For example, a five dollars banknote in my wallet is considered to have the same value as a five dollar banknote in my wife’s wallet, despite the fact that they may differ in some respects like year printed, colour alterations due to ageing, or status of preservation overall. However, as I will show, if we try to apply equality to software, finding a commonly agreed set of properties is not so easy. As every head of software development knows very well, the notion of equality used by a manager and the notion of equality used by a software engineer are at odds. Let us consider the following fictitious excerpt of a discussion between a manager, and the head of software development:
Manager: How much will it cost to develop the new e-commerce platform internally?
Head of Development: It will cost X $
Manager: No way I’m doing it internally: I can have it done by software house ACME for 30% of that amount.
What equality criterion is the manager using above? Very likely, her criterion is “the software provider claims the software will satisfy the requirements document”. This is indeed a risky bet on her part, but it is the way outsourcing decisions are made every day. Why do I contend that this is a risky bet? Well, for a start, it is difficult to compare two things which do not even exist at the moment the decision is made. Second, satisfying a set of requirements is seldom black and white: compliance with requirements comes in degrees. For instance, test plans do not usually all fail or succeed. Some test cases fail, some others succeed, and when the ones which are perceived (or contractually defined) as the most important ones succeed, we say that requirements are satisfied. Moreover, requirements tend to be amended during a project, because they are seldom 100% correct at inception. Furthermore, there are such things as implicit requirements (Morelli; 2013), which are not formally defined, but are assumed because it is safe to give them for granted in a given cultural environment. But this assumption is only safe so long as the cultural affinity of the extended team is high. And a software house needs not be in the same cultural area as its customers. Therefore, even if all the requirements in the requirements document were satisfied by two applications, it may still be the case that implicit requirements are not equally satisfied by both of them. In which case, the two applications cannot be said equal, not even from the short-sighted viewpoint of our fictitious manager above. But this is not all. Below is an admittedly limited sample of other properties which a software engineer can legitimately consider in order to compare two computer programmes addressing the same specification:
- Having the same algorithmic complexity in time and space
- Having the same scalability with regard to number of concurrent users
- Having the same scalability with regard to number of concurrent transactions
- Having the same maintainability in terms of side effects per change
- Having comparable quality (e.g. cyclomatic complexity)
- Having comparable results when exposed to security penetration tests
- Having comparable levels of service availability when deployed to the same hardware
- Supporting the same deployment options
- Being compliant to the same applicable regulations
- Respecting the same standards
- Requiring the same skillset to be maintained
The list above is open-ended and could be expanded at will. The bottom line is that, even if we give up using the relation of identity and stick to the weaker notion of equality, it appears that when it comes to software, sameness is really in the eyes of the beholder. What are differentiating factors for the software engineer, are not differentiating factors for the manager. And, to be intellectually honest, the other way round. For example, a software engineer may not be so sensitive to issues of cost. Or, at least, not as much as a manager. What this story is teaching, in the end, is that criteria of equality are affected by such factors as education, role, cultural bias, an so on and so forth. Given that judgements on equality are done by human beings, we cannot escape this result: equality without formal definition of criteria is an undefined concept. So, is equality a relation which we can use in the comparison of software? My contention is that equality is indeed a useful and usable way to approach comparison of software. But it is only useful and relevant on condition that:
- Equality criteria are formally defined, shared and agreed with
- Equality criteria are measurable
- The error function of the measure is known in advance
In our fictitious case above, the manager is not making an informed decision unless she has formally defined, shared and has stakeholders agree on a set of equality criteria. The sentence “they will do the same programme for 50$” or what have you, is completely indefensible if it is not qualified in abidance with the provisions of the findings above.
This essay has explored the subtle notion of sameness applied to software. As a first attempt to define the semantic of software sameness I have analysed the relation of identity. First, I have defined what is a relation of identity; second, I have shown that this relation cannot be applied to software in any meaningful way. After that, I have introduced and explained the relation of equality. This relation has proved more useful and I have advanced that it is the best candidate semantics for the notion of sameness in software. However, my contention is that this is only true if the criteria for equality are formally defined, shared and agreed on. This proviso is a critical one and it must be satisfied in order to be able to profit from the use of the equality relation in matters of software sameness.
Morelli, Mauro, (2013), https://alaraph.com/2013/09/26/the-not-so-simple-world-of-user-requirements/, accessed 17.03.2014
Parfit, Derek, (1971), `Personal Identity´, Philosophical Review, pp. 8-27, vol. 80, no.1.
────────, (1987/1984), Reasons and Persons, Oxford University Press, 1987.
Stanford, (2014a), http://plato.stanford.edu/entries/identity/, accessed 13.03.2014
Stanford, (2014b), http://plato.stanford.edu/entries/equality/, accessed 17.03.2014
Wikipedia, (2014), https://en.wikipedia.org/wiki/Equivalence_relation, accessed 13.03.2014
(1) Some, like Enterprise Architects, distinguish between computer programmes, applications, and technical components. However, in this essay I will use these terms interchangeably, because for the metaphysics of software, an algorithm is an algorithm and the rest is irrelevant.
(2) One may object that if we consider interpreted languages or byte-code languages the problem does not arise. But this contention is wrong, because the interpreter or the virtual machine, are themselves compiled computer programmes. The problem has only been moved one level of indirection away.
(3) For the sake of simplicity, we must also assume that this statement is not nested in a control structure like, for instance, a “for” cycle.
(4) I’m not sure whether the machine code generated by the compiler would be the same either.
(5) There are other known problematic cases for the identity relation. One is personal identity. As Philosophy Professor Derek Parfit contends in (Parfit; 1971) and (Parfit; 1984), identity is not what really matters in issues of survival. In such cases, he says, we know everything we need to understand the situation, without having to appeal to the notion of identity.
(6) Because it is clearly reflexive, symmetrical, and transitive.