by Eduard de Jong
In the wake of the Bitcoin phenomenon, the term “block chain,” which describes a critical, technical aspect of the Bitcoin payment system, is presented by Bitcoin adherents as a technical innovation on par with the invention of the transistor, accrediting it with a similar scope of fundamental change in society. This short write-up attempts to demystify some of the mythical thinking around block chains.
The technological advances that have led to the block chain stem from two different approaches in the 1980s for repeatedly applying a cryptographic hash function.[1] The first approach came from Ralph Merkle,[2] who used the hash function to construct a binary “tree” of hashes with each of the “leaves” of the hashes used as a one-time-only key to create a special kind of public key signature. The tree is build by hashing each pair of leaves together and by continually pairing the results of this until a single hash value is obtained. The final, single hash value, called the “root” of the Merkle tree, is—indirectly—a hash over all the data in the leaves.
The second approach came from Leslie Lamport.[3] He created a way to generate one-time passwords to secure computer logins over an insecure network. It all starts with computing a series of hash functions, starting from a random number and subsequently using the previous function’s output as a starting point. The original random number is then stored as the master password, and the list of hash values is given to the user. The result of the last hash computation serves as the first one-time password. The next time a password is needed the input value in the last computation is used. Subsequently, each of the inputs to a hash computation in the chain is used as a new one-time password, until the last computation, when the master password is reached. At this time a new master password is generated and a new hash chain calculated. Its values are sent to the user as a new sequence of one-time passwords. The sequence of one-time passwords is called a “Lamport-chain,” or simply a “hash chain.”
A Bitcoin block chain can be seen as a combination of a Merkle-tree with a Lamport-chain: each one-way computation uses the result of the previous computation as input in order to make a chain (Lamport), and uses an additional hash value as input (Merkle) to add new data.
Chains, or trees, of hash functions have been used in different proposals for the implementations of electronic money. For example, e-Cash, the electronic money implemented by DigiCash, uses a specially constructed tree of hash values to encode the value of a money transfer protected by a digital signature over the root of the tree. Ted Pederson,[4] working with DigiCash in the European Cafe project, describes the use of a single chain as being similar to the Lamport-chain passwords in the way it reveals a preimage with the payment of small amounts.
This paper provides the theoretical basis for analysing the security of payments protected by a one-way chain. Stanford,[5] together with the present author, suggests a practical implementation for electronic cash whereby each preimage in a hash chain is used as a payment of a unit value. In this electronic cash system unit values are specified by the merchant, while the security is provided by the hash chain. This also protects the moneys received, which are cryptographically bound to the merchant, as well as the process of redemption.[6] In 2000, a hash chain was proposed to securely record the progression of operations. In the same year, this author[7] filed a patent for the use of a chain of one-way functions to protect both communication with a smart card and the operations inside it. The hash chain was built by a sequence of hashes: the first hash sending input data to a smart card; the second hash, which was computed from the first, was changed in the card memory, and the third hash was created over the memory record and any output data. The chain could be extended to include any number of subsequent interactions with the smartcard.
Using the hash chain in this way, with each new hash value in the chain adding a hash over additional input data, is a very secure way to record events. It establishes a time line in which earlier events are being included in the chain of later event. An example of such an event is the secure version control of documents like business contracts or a set of program files. Git is such a software code version control system that uses a hash chain. By binding the present with the past, a hash chain can provide non-repudiation[8] for any transaction that has been recorded in the chain, as long as the most recently computed hash value is known, or available, to all parties involved in the transaction. Storing the hash value in a smartcard is one way of making it available for non-repudiation.
As we have seen, the Bitcoin block chain follows a well-established tradition in securing data. The use of a hash chain for securing financial data in Bitcoin, however, differs from the cases mentioned above. In the earlier systems the hash values are used to represent a unit value of electronic cash that is securely transferred from payer to payee. The hash value is the equivalent of a physical coin. Bitcoin, on the other hand, is a ledger system in which, as at all present-day financial institutions, a central database keeps records of the amount of money a user has to spend. In Bitcoin, the central database is implemented in a distributed fashion and each processor in the distributed systems maintains a copy of the full database. After performing an update of the database, consolidating one or more transactions, a processor computes the next hash in a hash chain with the data used in the update. The newly computed hash value is used to securely synchronise the database copies between all the distributed processors. The hash value acts as a synchronisation token sent to all distributed processors by the one consolidating processor that computed it. Each distributed processor already knows the previous hash value and has also received all new data incorporated in the new hash value. For Bitcoin, the new data is a set of all the hash values, each computed over the details of a transaction that a user has made. The synchronisation token is computed as a hash value over data that is known to every processor. Each of the processors can verify that the synchronisation token is valid by recomputing the hash function.
Synchronisation between processors is essential for a distributed (database) system to work correctly. The second aspect of the implementation of a distributed system is coordination between the processors, i.e. determining which processor does what. For a ledger system like that of Bitcoin, coordination determines which processor has the lead in updating the database. In the Bitcoin system, coordination of processors can be compared to the throwing of a dice: the processor with the highest number of eyes wins and is the only one to update the database. In fact, Bitcoin uses a very clever trick to simultaneously update the database and randomly select which processor is the one that will generate the synchronisation token.
In Bitcoin, this combined operation of throwing die, consolidating the database, and computing the synchronisation token is called “mining,” the distributed processor a “miner,” and the synchronisation token a “blockchain.” The normal computation of a hash function in a computer would roughly take the same amount of time in each processor, which is typically much shorter than the time needed to communicate a message to all distributed processors. However, this process is too fast to be used to compute a synchronisation token and so the computation of the hash chain in Bitcoin is done differently. The block chain is computed by repeatedly computing the hash function from transaction data and additional random input until the resulting hash value has a particular form. On average the computation of the block chain takes about 10 minutes. Block chain computations are effectively synchronised; each processor starts its computations after receiving a synchronising block chain and uses all transactions submitted by the users since the computation of the received blockchain started. The first processor to find a hash value with the correct form to be a block chain has obtained the next synchronisation token and broadcasts this value to all other processors. They then synchronise the database and restart their computing to get the next hash in the chain. The block chain beauty is that the computation used to randomly pick the lead processor amongst the set of distributed processors is the very same computation used to consolidate the database and in the creation of the synchronisation token.
The process of the computation of a block chain from a consolidating and synchronising function could be used for any distributed implementation of a shared database. However, it is one that uses a lot of energy as each of the processors is actively computing the next hash value in the chain for 10 minutes each time, while applying all its processing power. All energy spent by all these processors, bar that of the “lucky” one, is wasted. Even without considering the environmental consequences of wasting energy, it seems hard to scale such a distributed system to a very large size, especially if used commercially. However, there is a body theoretical and practical knowledge on the implementation of distributed systems and cheaper techniques do exist. As the Bitcoin system is supported by a community of miners that are seemingly primarily motivated by ideals and ideology, considerations of environmental and commercial effects may be irrelevant, in particular at the current size of the system.
However, the Bitcoin system is not without its commercial aspect. Independently from the prime consolidating and synchronising functions of the blockchain, computation is also used to add a small, fixed value to the pool of spendable money. The newly added money is allocated to the miner that computes the block chain. Being a miner in the Bitcoin system is like partaking in a lottery in which the price of each lottery ticket is the cost of computing the hashes. The purpose of the lottery is to provide a financial incentive to acting as a miner. An interesting consequence of this lottery aspect of mining comes from the creation of additional value out of thin air. This forces Bitcoin to operate as a closed financial system, using its own currency, where money creation has no macroeconomic consequences.
The block chain does a decent job of consolidating and synchronising the distributed database when implementing the Bitcoin ledger. However, consolidation and synchronisation are not sufficient for the correct operation of the ledger: it merely makes sure that all instances of the database contain the result of all transactions in the right order. It does not ensure that the transactions are in fact correct. A correct transaction is one in which, for instance, the account paying money has enough funds and the owner of the account is the person authorising the transfer. Before a transaction can be consolidated in the database, it has to be semantically evaluated in a process separated from updating the database. The transaction has to perform this evaluation correctly for the database to be correct. To summarise: there is no semantic relationship between the creation of a block chain and the additional data hashed into it. Hence, the blockchain does not guarantee the security of the Bitcoin system. Security in the Bitcoin system is based on making all transactions public while each miner semantically verifies every transaction before adding it to the data to be hashed. This system assumes the correct implementation of software at each miner, and that this software has not been modified unnoticed by the miner, e.g. by a worm. The Bitcoin system does not have a built-in mechanism to recover from such a breach of security.
To summarise this writeup, the block chain used by the Bitcoin system is a form of hash chain that is very suitable for distributed computation with continually added data to be hashed where all data is collected by each distributed hashing node. Despite a distributed implementation the Bitcoin system is a centralised ledger of financial transactions.[9]
[1] A cryptographic hash function is a computation using a message as input to compute another message in such a way that it is practically not feasible to compute the input when only the output is known. The result of computing is called a hash value or a hash, and in some cases the input is called a ‘preimage.’ For a good, cryptographically strong, hash function only brute force, that is trying all possible inputs, is the only possible way to determine the input that goes with a given output. For a hash function the number of bits in the output determines the level of security, as that determines the number of brute-force tries. The input for a hash function can have any length, for instance in a Merkle tree it is twice the output size.
[2] R. Merkle. Secrecy, authentication and public key systems/ A certified digital signature. Ph.D. dissertation, Dept. of Electrical Engineering, Stanford University, 1979.
[3] L. Lamport, “Password Authentication with Insecure Communication”, Communications of the ACM vol. 24, no.11, pp 770-772, 1981
[4] T. Pedersen, “Electronic payments of small amounts,” Proc. of Security Protocols Workshop, Lecture Notes in Computer Science, Vol.1189, Springer Verlag, pp.59– 68, 1997.
[5] C. Stanford, E. de Jong, “System and method of cryptographically protecting communications” patent EP0904581, filed may 24, 1996.
[6] In 2014 this so-called n-Count system for electronic money has been deployed by Transaxiom Ltd, as LET system in Canada.
[7] E. de Jong, “method and system of communicating devices, and devices therefor, with protected data transfer” patent US7828218, filed July 20, 2000
[8] Non-repudiation is the impossibility to deny that an event has happened or a more specific a contract has been signed. An electronic signature, created using a public key system, typically provide non-repudiation. As explained in the text a hash chain, which is much easier to compute than an electronic signature, can also provide non-repudiation.
[9] A central register is a precondition for a correct ledger system.