The degree of complexity of a system may be crudely defined as the amount of information needed for describing the structure and function of the system. But what do we understand by ‘information’? In this article I introduce elementary information theory, highlight the inverse relationship between information and entropy, and discuss the computational nature of all natural phenomena.
Leibniz (1675) was amongst the earliest known investigators of complexity. He argued that a worthwhile theory of anything has to be ‘simpler than’ the data it explains. Otherwise, either the theory is useless, or the data are ‘lawless’. The criterion ‘simpler than’ is best understood in terms of information theory, particularly its more recently developed offshoot, namely algorithmic information theory.
Weiner (1948) developed a statistical theory of ‘amount of information’, in which the unit amount of information was defined as that transmitted by a single decision between equally probable alternatives.
Claude Shannon (1948) is regarded as the father of modern information theory. He provided a quantitative or numerical measure of information, introducing the term ‘bit’ as the short form for ‘binary digit’. He took the bit as the unit of information, and asked the question: How many bits are needed to express any integer Ω? Let I denote the information function introduced by Shannon to define the quantity of information needed to specify an arbitrary integer Ω. It was defined by him as I = log2 Ω bits.
Shannon also addressed the question: What is the amount of missing information before we perform an experiment the result of which we cannot predict with certainty? He first calculated the missing information, IN, for N independent performances of the same experiment. Suppose Pi is the probability that, in a single performance, a particular result i will be obtained. For N performances of the experiment, and for large N, the fraction of times the result i is obtained will become close to the probability Pi. But some information is still missing, because we do not know the sequence in which the various results will appear as we perform the experiment N times. Shannon calculated this missing information, now called the Shannon information.
Shannon’s ideas are also applicable to thermodynamics. The equation Shannon derived for the missing information is practically the same as the equation for the Boltzmann-Gibbs entropy described in Part 3. Just as Gibbs enumerated the various microstates possible for a given macrostate (cf. Part 3), Shannon enumerated the various possible sequences in which the results of N independent performances of the same experiment can appear. The underlying combinatorial problem is the same in the two cases. Naturally, the end equations have the same structure. The units and dimensions for the Boltzmann-Gibbs equation and the Shannon equation depend on the interpretation we place on the events or states enumerated. The two equations really say the same thing, give or take a proportionality constant.
In the Shannon formulation, information, or rather the lack of it (i.e. ignorance), occupies centre stage. In information theory, one works with Shannon entropy, rather than the Boltzmann-Gibbs entropy of thermodynamics. The two terms describe the same quantity, except that Shannon entropy has a larger domain of applicability. Shannon entropy is the negative of Shannon information (I = -S). The negative sign reflects the fact that entropy is a measure of ignorance or lack of information (the ‘missing information’), and its negative, i.e. negative entropy or negentropy, is a measure of available information.
THE COMPUTATIONAL NATURE OF ALL NATURAL PHENOMENA
In science we use mathematical rules for describing natural phenomena. This paradigm has played a crucial role in the advancement of science for several centuries. However, the advent of computers has led to a paradigm shift. The rules and laws for any system can be viewed as corresponding to computer programs, and the behaviour of the system can be viewed as corresponding to a computation. I quote Seth Lloyd (2006):
‘The natural dynamics of a physical system can be thought of as a computation in which a bit not only registers a 0 or 1 but acts as an instruction: 0 means ‘do this’ and 1 means ‘do that’. The significance of a bit depends not just on its value but on how that value affects other bits over time, as part of the continued information processing that makes up the dynamical evolution of the universe.’
The computational approach to basic science and mathematics has also made us realize that there are limits to how far we can carry out our reasoning processes for understanding or describing natural phenomena. Beyond a limit, it is like in Alice in Wonderland:
‘Do cats eat bats? Do cats eat bats?’ Sometimes she asked, ‘Do bats eat cats?’ For you see, as she couldn’t answer either question, it didn’t much matter which way she put it.’
There are limits to how much we can compute for complex systems. Certain kinds of complexity are just irreducible. Let us see what that means.
In 1931, Kurt Gödel proved his incompleteness theorem, which demonstrates that mathematics contains true statements which cannot be proved. In other words, some mathematical statements are true for no reason. Gödel’s work went against the belief expressed in 1900 by Hilbert that there is a theory of everything for mathematics. Hilbert (as also Leibniz) had visualized a world in which all truths can be calculated and proved by a ‘universal coding’. This coding comprised of a few dozen axioms, and it was sought to systematize all mathematics by deriving all mathematical truths from these axioms in a ‘mechanical’ way. Gödel proved that this is not possible. His theorem says that no formal system encompassing elementary arithmetic can be at the same time both consistent and complete: Within any sufficiently powerful and noncontradictory system of language, logic, or arithmetic, it is possible to construct true statements that cannot be proved within the boundaries of the system itself.
According to Gregory Chaitin, Gödel’s proof ‘does not show that mathematics is incomplete. More precisely, it shows that individual formal axiomatic mathematical theories fail to prove the true statement “This statement is unprovable.” These theories therefore cannot be “theories of everything” for mathematics. The key question left unanswered by Gödel is: Is this an isolated phenomenon, or are there many important mathematical truths that are unprovable?’
According to Chaitin’s algorithmic information theory, which I shall outline shortly, some mathematical facts cannot be compressed into a theory because they are too complicated (in terms of the so-called ‘algorithmic complexity’).
TURING’S HALTING PROBLEM
Hilbert had identified the so-called decision problem for defining a closed mathematical universe: Is there a decision procedure that, given any statement expressed in the given language, will always produce either a finite proof of that statement, or else a definite finite construction that refutes it, but never both? In other words, can a precise ‘mechanical procedure’ distinguish between provable and disprovable statements within a given system?
The answer to this question required a way to define ‘mechanical procedure’. Alan Turing (1936) provided an answer. He proved, among other things, that the notions of ‘mechanical procedure’, ‘effectively calculable’, and ‘recursive functions’ were actually one and the same thing. The most famous notion Turing introduced was that of the universal computer. He demonstrated that all digital computers are fundamentally equivalent, regardless of what is there in their innards. It is all 1s and 0s underneath. He used this approach to prove that unsolvable mathematical problems do exist. He showed that there can be no general procedure to decide whether a self-contained computer program will eventually halt always (i.e. come to a conclusion about a posed problem, and then stop). In other words, he proved that the halting problem in unsolvable. Sometimes the only way to answer the question of whether a computer program will halt or not is by actually running it to see if it halts or not. There is no computer program that can take another computer program and determine with certainty whether the first program halts or not.
Turing invented an imaginary device now called the Turing machine. It consisted of a black box (it could be a typewriter or a human being) able to read and write a finite alphabet of symbols to and from an unbounded length of paper tape, and capable of changing its own ‘m-configuration’ or ‘state of mind’. In the context of computational science, Turing introduced the all-important notions of discrete time and discrete state of mind. This made logic a sequence of cause-and-effect steps, and mathematical proof a sequence of discrete logical steps. The Turing machine embodies a relationship between a sequence of symbols in space and a sequence of events in time. Each step in the relationship between the tape and the Turing machine is governed by what we call a program. The program or algorithm provides instructions to the machine for every conceivable situation. As has become increasingly clear in a very graphical manner by the work of Stephen Wolfram (2002), even a simple program can generate a complicated output. Complicated-looking behaviour need not require a complicated state of mind.
Turing argued that all symbols, all information, all meaning, and all intelligence that can be described in words or numbers can be encoded (and transmitted) as binary sequences of finite length. In contrast to this, Gödel’s theorem on undecidability in mathematics states that in all but the simplest mathematical systems there may be propositions that cannot be proved or disproved by any finite mathematical or logical process; the proof of a given proposition may possibly call for an infinitely large number of logical steps.
However, Turing did prove, like Gödel, that there were self-referential questions a universal Turing machine could not answer. This leads to an interesting possibility. Perhaps outsiders can answer such questions which are self-referential and therefore may be unanswerable by the Turing machine. That brings us eerily close to the problem of ‘free will’.
Gödel had shown that a system that has the capability for self-reference can end up with paradoxes in logic. Turing proved something similar, namely that self-reference results in uncomputability in computers. These statements may have implications for humans. We have a strong proclivity for self-reference. And we have ‘free will’, or the seeming ability to make decisions which are unpredictable even by ourselves. We cannot always tell in advance what decision we shall take in a given situation; we exercise unpredictable choice. I quote Seth Lloyd (2006): ‘The inscrutable nature of our choices when we exercise free will is a close analogue of the halting problem: once we set a train of thought in motion, we do not know whether it will lead anywhere at all. Even if it does lead somewhere, we don’t know where that somewhere is until we get there’.
ALGORITHMIC INFORMATION THEORY
Algorithmic information theory (AIT) was founded by Kolmogorov (1965) and Chaitin (1987). In the Shannon formulation outlined above, we considered the question: How many bits are needed to express an integer? Such considerations have been generalized to determine the number of bits needed to encode any given piece of information. For example, one bit is sufficient to encode a single yes/no answer. Now consider an infinite sequence of numbers 1, 2, 3, … The number of bits needed to encode them all is absurdly large, in spite of the fact that the information content of this sequence of numbers is really very small. There should be a better way of quantifying this information.
AIT remedies this situation by asking and answering the question: What is the size (information content in terms of number of bits) of the computer algorithm needed to generate a specified set of data (in the present case the infinite sequence 1, 2, 3, …)? The minimum number of bits (i.e. strings of zeroes and ones) needed to store the program to ‘explain’ or generate the data is called the algorithmic information content of the data.
Let us take up another example. Consider two numbers, both requiring, say, a million bits for specifying them to the desired accuracy. Let one of them be an arbitrary random number, which means that there is no defining pattern for specifying it. Let the other number be π (= 3.14159…..). The second number has very small algorithmic information content because a small computer program can be written for outputting it to a desired level of precision. By contrast, a random number (say 1.47373..59) has a much higher algorithmic information content: The shortest program for outputting it has information content not very different from that of the number itself:
No significantly smaller program can generate this sequence of digits. The digit stream in this case has no redundancy, and is said to be incompressible. Such digit streams are called irreducible or algorithmically random.
Thus there are limits to the powers of logic and reason. Chaitin has shown that certain facts are not just computationally irreducible; they are logically irreducible as well. The proof of their ‘truth’ must be in the form of additional axioms, without any reasoning.
THERMODYNAMIC ASPECTS OF INFORMATION
Processing of information (e.g. the flipping of a bit) requires energy. This simple-looking and ‘obvious’ statement has had quite a history. We saw in Part 3 that, if two parts of a closed system are at the same temperature, they cannot spontaneously develop a difference of temperature. In 1871, James Clerk Maxwell introduced his ‘demon’ which could apparently act in contradiction to this requirement of the second law of thermodynamics:
Suppose we introduce an imaginary partition between two parts of a well-insulated vessel filled with some gas, with a little window in the partition. The imaginary window has an imaginary shutter, which can be opened and shut by the Maxwell demon, as desired. The demon has the powers to see the molecules (it is a demon, after all!). It opens the shutter when it sees a fast molecule about to enter the window, say from right to left. Similarly, it opens the shutter when it sees a slow molecule about to enter the window from left to right. At all other times the imaginary window is kept closed. Over time, there will be a preponderance of faster molecules on the left side, implying a higher temperature compared to the right side which has a larger number of slower molecules. Thus a temperature difference would be created between the two parts of the gas, where none existed before, apparently without expenditure of energy, and in violation of the second law.
The fact is that there will be no violation of the second law, because there is a fallacy in the argument. What we are dealing with is really not an isolated system. It was pointed out by Leon Brillouin, among others, that, to ‘see’ the molecules and thus to make a distinction between fast and slow ones, the demon must use a torch, which means an impingement of photons from the outside (and therefore an expenditure of energy). In other words, we are really dealing with a system which accepts energy from the outside and creates the temperature difference.
THE SZILARD ENGINE
After the Maxwell demon was ‘exorcised’, a ‘pressure demon’ or ‘trapdoor demon’ was created to counter some of the arguments advanced for killing Maxwell’s ‘temperature demon’. Even this fellow could be annihilated. But then a question arose: Is it that although simple mechanical demons cannot violate the second law of thermodynamics, intelligent demons can? Leo Szilard (1929), for example, imagined an intelligent being operating an engine, which could act in apparent violation of the second law. His thought experiment is known as the Szilard engine, and has been discussed recently by Charles Bennett (1987). It comprises of a horizontal cylinder that is provided with friction-less pistons at both ends, and contains a single molecule. The cylinder is equipped with a partition that can be inserted or removed at will, as also devices for observing the contents of the two halves of the cylinder; there is also a memory for storing the information.
First the partition is inserted, trapping the molecule in any of the two halves of the cylinder. The measuring devices determine whether the molecule is in the left half or the right half, and this information is stored in the memory.
The piston from the empty half is then pushed till it touches the partition. The movement of the massless piston requires no work because it is pushing against vacuum. Next the partition is withdrawn.
The trapped molecule, during the course of its kinetic motion in the cylinder, will strike against the piston, thus pushing it back. In other words, the one-molecule gas ‘expands’ against the piston, thus doing work. The energy lost by the molecule as it works against the piston can be made up in the form of heat from the surroundings.
In due course, under sustained bombardment by the molecule, the piston returns to its original position near one end of the cylinder. The memory is then erased, and the cycle can start all over again. This device appears to be able to convert heat from the surroundings into work, in apparent violation of the second law.
The paradox of the Szilard engine was resolved as follows by invoking some results of quantum physics, and of the thermodynamics of information-processing. In essence, the answer came from the modern theory of computation of information:
- Light, heat, and other forms of radiation consist of photons, and it is impossible to detect less than one photon.
- For detecting the position of the molecule in the Szilard engine (i.e. whether it is on the left or the right half of the cylinder), it is not possible to use a probing photon of too low an energy. The cylinder and the molecule in it have a certain temperature, and are suffused with a photon bath characteristic of that temperature, courtesy Kirchhoff’s laws. The probing photon must be of a high enough energy to make the molecule ‘visible’. That is, the probing photon must come from a glowing filament having a temperature higher than that of the cylinder containing the molecule.
- Szilard (1929, 1964) had brought out the connection between entropy and information. He had shown that the entropy associated with a unit of information, namely with one bit, is k log 2. The observation of the molecule by the probing or illuminating photon involves expenditure of energy. This was in line with Szilard’s belief that acquiring a given amount of information amounts to producing a corresponding amount of entropy.
- It was established by the work of Rolf Landauer on the thermodynamics of data processing that data-processing operations like copying of data from one device to another are like measurements: One device acquires information about the state of the other.
- It was believed (till around 1960) that data-processing operations are thermodynamically irreversible operations, and that any kind of data operation requires the generation and removal of at least one bit’s worth of heat for every bit of data to be processed. Landauer showed that, whereas some data operations are indeed thermodynamically expensive, others need not have a lower thermodynamic limit.
- Landauer argued that different logical states of a computer must correspond to different physical states of the hardware of the computer. For example, clearing an n-bit memory amounts to compressing many logical states to one, rather like a piston compressing a gas.
- It follows that it is not possible to clear the memory register without generating heat and adding to the entropy of the surroundings in an irreversible manner.
- Landauer identified several thermodynamically irreversible operations. They all discard information about the past state of the computer; i.e. they are ‘logically irreversible’.
- The memory-erasure step in the operation of the Szilard engine is logically irreversible. It compresses (destroys) definite information about the molecule being on the left (or on the right) to the ‘do not know’ state. Therefore the engine cannot reset its memory to a blank state without adding at least one bit of entropy to the surroundings. And this converts the work done by the piston to heat. Nothing comes for free!
- Thus the explanation why the Szilard engine, or for that matter the Maxwell demon, does not violate the second law is that, in order to ‘see’ a molecule, the engine or the demon must first forget the results of previous measurements, and this act of discarding information is thermodynamically costly.
To conclude, we can now give a somewhat differently worded answer to the question: ‘Why is terrestrial complexity increasing all the time?’ The answer is that this is happening because the Sun is bombarding our ecosphere with information (in the form of negative-entropy radiation), some of which gets stored or trapped in increasingly complex ways. As more and more information builds up, the degree of complexity goes on increasing. It has been estimated by Tribus and McIrvine (1971) that ~1.6 x 1015 megawatt-hours of energy is radiated to outer space by the Sun. This carries with it the capacity to decrease entropy (at the average terrestrial temperature) by ~3.2 x 1022 joules per degree Kelvin per year; or 1038 bits per second. As entropy decreases, information builds up, and therefore the degree of complexity builds up.
Some additional technical details can be found in my article here.
Information is not independent of who is communicating it with whom. I shall discuss the anthropocentric nature of information in the next article.
Will Rogers had something interesting to say about information:
It ain’t what you don’t know, that counts;
it’s what you know that ain’t so.