I often see people asking how Bitcoin/the blockchain works, and all the resources I’ve seen are either too technical or way too superficial. So I thought I’d finally try to fill the gap.
In this guide I’ll try to explain the fundamental principles of Bitcoin (and more generally, of any block chain), while leaving out the details that are not necessary for understanding. So you won’t find here all the details needed to implement a working implementation (that’s definitely not the idea) and I will take shortcuts, as long as they don’t negatively affect the logical coherence of the whole system, but hopefully you’ll find a sufficient amount of details so that you can say “OK, I get how the magical blockchain works – and it’s definitely not magical, nor even that complex”.
Concepts that you need to know
And by “know”, I mean that you don’t need to know the details (but you could find interesting to dig into that by yourself), but you need to have some basic knowledge about these concepts. Namely, asymmetric cryptography, digital signature and hashing.
Concept 1: asymmetric cryptography
By the way, “crypto” is short for “cryptography”, not for “cryptocurrency” (which is already short for “cryptography-based currency”).
Imagine Alice wants anyone in the world to be able to send her encrypted messages, without her having to provide each single person a unique key (as she would need if she used symmetric encryption). She can generate a key pair, with on one side a private key (secret, Alice keeps it) and on the other side a public key, which she can publish on her website – or her Tiktok profile or wherever.
Then if Bobs wants to write to Alice, he can encrypt his message with Alice’s public key, and send it, even publicly: only Alice, with her private key, will be able to decrypt it. That’s the point of asymmetric encryption: anyone can encrypt with the public key, but only the person who has the corresponding private key can decrypt.
Such characteristics are obtained via complex maths problems. For instance, for RSA algorithms, if I oversimplify a lot, the private key is a couple of very big prime numbers, P and Q, and the public key is their product N. The encryption’s asymmetry is based on the fact that it’s a very hard problem to find P and Q given only N, so in practice you can create a function that will encrypt based on N, and then another function to decrypt based on P and Q. Again this is very oversimplified, but that’s the concept.
If you want to study this further, see Wikipedia’s RSA page and the Wikipedia’s public-key cryptography page
Concept 2: digital signature
If you skipped asymmetric cryptography, I’m afraid you’ll need to go back to it first.
As we saw, Alice has her private key which she keeps secret. Imagine now that she wants to sign a message. First, she writes the message. Then, using her private key, she can apply a signing algorithm on the message, and obtain a signature. She then sends the message with the signature (note that she could encrypt it on top, or not).
Bob receives the message. Using the public key, he can apply the corresponding signature verifying algorithm on the message and signature to confirm that the signature is valid and comes for Alice’s private key. Even though he doesn’t know the private key, as this is all based on asymmetric cryptography.
And if Alice decides to publish her signed message publicly, anyone can use the public key to confirm that she signed it indeed. The “message” doesn’t have to be text. It can be any data. For instance, on this page, KeePass publishes signatures for their software releases (they are in the “[OpenPGP ASC]” files). They also publish hashes, which will be our next concept.
If you want to study this further, see Wikipedia’s digital signature page
Concept 3: hashing
If you’ve ever heard of MD5, that’s a hashing function, and if you know the concept, you probably know enough for this part. If not, read on.
A hash function can be applied to input data of any size (it can be an empty string but it can also be a several gigabyte file), and returns a fixed-size value (some hash functions return variable size hashes, but these are not common), named “hash digest”, or just “digest” or “hash”. For instance, an MD5 hash is 128 bits long (i.e. 16 bytes), and a SHA-256 hash is, as the name suggests, 256 bits long.
For cryptographic use, a simple hash function isn’t enough: we need a cryptographic hash function, which is basically any hash function that has these important properties:
– given random inputs, the output values must have a uniform probability distribution (i.e., it looks random)
– given a hash value, it is “impossible” to find the input value (except by brute force)
– given an input, it is “impossible” to find another input that has the same hash
– it is “impossible” to find two inputs that have the same hash
This means, notably, that 2 messages with only 1 letter changed will have a totally different hash. For instance, the MD5 of “hello” is 5d41402abc4b2a76b9719d911017c592 and the MD5 of “Hello” is 8b1a9953c4611296a827abf8c47804d7.
MD5 and SHA-256 were both meant as cryptographic hash algorithms, but MD5 has been shown much vulnerable for a long time now, so don’t use it in cases where you do need a cryptographic hash. I used MD5 for my example simply because it’s shorter than state-of-the-art cryptographic hashes.
If you want to study this further, see Wikipedia’s page on cryptographic hash functions
I didn’t mention it earlier and this isn’t a necessary detail for the purpose of this guide, but hashing is notably used in the process of digital signing: typically, a signing function will first produce a hash of the message to be signed, and then sign the hash instead of signing the whole message. This is because signing algorithms typically have limitations that make them weaker (or even vulnerable) if you use them on a large amount of data, and also they are much slower than a hashing function, so by signing just a hash you go faster.
On to the Bitcoin block chain
Before building our block chain, let’s summarize the problem: building a system to store and move “money” (bitcoins), in a context where you can’t trust people (so the transactions must be verifiable based on math / cryptography / hard proofs). Also, reward people that make the system run (hence the “miners” get BTC from fees plus a block reward) but avoid an infinite inflation (hence the block reward gets lower and lower as time passes).
Part 1: spending money
We have to start building somewhere. And what better place to start than what money is for: spending!
Alice, again, has a private key, and obviously the public keys that matches. Let’s call this key pair “ALICE001”. This is also her Bitcoin address. That’s right, a Bitcoin address is simply a public key. Or more precisely, the hash of a public key (this difference doesn’t matter much, except that it means you can keep your address secret until you use it).
Alice can have as many Bitcoin addresses as she wants, and she stores them in her wallet: basically, a Bitcoin wallet is just a file containing all the private keys of your addresses. It actually stores more stuff, but the strict minimum is the private keys, and notably it doesn’t contain coins. The coins don’t really move, they just get assigned to different Bitcoin addresses.
Because this information is publicly written in the blockchain, everyone in the world knows that “ALICE001” owns 1 bitcoin.
Alice wants to send 0.5 Bitcoin to Bob. She finds that Bob’s address is “BOB001”. She then creates a message that says: “I send 0.5 BTC to BOB001, I give 0.0001 BTC as network fee, and the remaining 0.4999 BTC go back to ALICE001” (NB: she could send to more addresses at the same time, and also she could send the remainder to a new ALICE002 address instead of ALICE001 to improve security and privacy a bit). She uses her ALICE001 secret key to sign the message, and she sends all this to the Bitcoin network.
Step 1 complete: now everyone knows that ALICE001 transferred 0.5 BTC to BOB001. But this isn’t over.
Part 2: recording the spend
The world knows that ALICE001 sent 0.5 BTC to BOB001, but this isn’t enough. Because nobody can (or should) be trusted. So for instance, based on her original balance, ALICE001 could send 1 BTC to CAROL001 and 1 BTC to DAVE001, and these would be all potentially valid transactions because we cannot trust the time at which ALICE001 made the transactions so we don’t know which one is “first” and valid, and which ones are later and to be rejected.
Let’s go back to the end of part 1, so Alice didn’t try to make a mess and only sent one transaction (0.5 BTC to BOB001), and she broadcasted it to the Bitcoin network.
The Bitcoin network consists of nodes, which are people running Bitcoin Core. Among those, some are people who just run it in order to have a local copy of the blockchain (for instance to send transactions themselves), and some are “miners”.
What miners do is that they gather all new transactions, they select the ones they like (typically those with the highest fees, or fee/size ratio), they stash them into a block (in Bitcoin’s case, the maximum size of a block is 1,000,000 bytes, but a miner can decide to not fill it), and then… they try to get their block accepted.
Since Alice included a decent fee, most miners will probably put her transaction in their next block soon.
But then, how to decide which block gets accepted as the next block?
Part 3: building and inserting a block
Schematically, a block contains (this isn’t the full list, to keeps things concise):
– the hash of the previous block (this is where the “chain” part is, by the way: each block refers to another parent block… and just like that we have a chain of blocks!)
– the timestamp
– the target difficulty (more on that a few lines below)
– all the valid transactions the miner decided to include (up to the maximum size)
– a transaction that rewards the miner’s address with all the block’s fees plus the fixed block reward (initially 50 BTC but divided by 2 every 210,000 blocks: this is how, as time passes, the maximum number of BTC added into the system will get very close to 21 millions)
– plus a small arbitrary part
Then the miner computes a SHA-256 hash of all this.
And here let me introduce the target difficulty: Bitcoin was designed with the idea that a new block should be inserted, on average, every 10 minutes. With that in mind, every 2016 blocks, the Bitcoin network decides a target difficulty that the next blocks should fulfill, based on the time it took to mine those last 2016 blocks, and also based on the previous difficulty (to avoid having too strong fluctuations).
Long story short, the difficulty corresponds to how many zeroes the hash should start with. As we saw earlier, the hash in unpredictable, so the only way to find a hash that starts with enough zeroes is to modify the block and try again. The block can change if a new transaction is added in the meantime, or otherwise simply by incrementing the “small arbitrary part” at the end of the list.
So what a miner does, to get his block accepted, is try many many time to increment the “small arbitrary part” and compute the hash and hope it starts with enough zeroes.
Once a miner finds a new block with a hash that has enough zeroes, it sends it as fast as it can to the network (in case another miner finds a good block more or less at the same time, the one that is lucky enough to spread first eventually wins), where each node will verify it (be it a miner or not) and pass it on.
Part 4: disagreements
As I briefly discussed, it’s possible that 2 different new blocks are “mined” by 2 different miners almost at the same time, and it’s also possible that Alice sends several valid transactions at the same time (this is referred to as a double spend).
In case of 2 new blocks, first, the one that is based on the longest chain wins.
If they both have the same parent (so they have the same chain length), a fork will occur, where a part of the network will work on adding on top of one block while part of the network will work on adding on top of the other block. Eventually though, one of the 2 chains will grow longer than the other, and the shorter chain will be dropped.
As for Alice, if she sends multiple conflicting transactions, eventually one of them will be included in the blockchain. Once this has happened, the other transactions won’t be valid anymore (as they are now impossible) and will eventually be dropped. If a fork occurs at the same time, though, it’s possible that one transaction is included in one chain, and a different transaction in another chain. Eventually though, only one chain will remain and will be the one that decides which transaction is kept.
The number of blocks added on top of the block containing a transaction is commonly referred to as the number of confirmations. The higher this number is, the surer you can be that the transaction is definitely inserted in the main chain. Nowadays, many exchanges and merchants require 6 confirmations before considering a transaction final. This is completely arbitrary, but also a reasonable value. Even though for small transactions you’re probably good enough with just one confirmation.
Part 5: getting it started
Just for the fun of it, I asked ChatGPT to tell us about the first block. I’ll put the whole answer, uncut, because I found it amusing how even GPT gets a bit philosophical in its answers.
The first Bitcoin block, also known as the “genesis block,” was mined by Bitcoin’s creator, Satoshi Nakamoto, on January 3, 2009. This block, with the block height of 0, served as the starting point of the Bitcoin blockchain.
The content of the genesis block was unique compared to subsequent blocks. It contained a single transaction, which awarded 50 bitcoins to a specific Bitcoin address. This transaction is often referred to as the “coinbase transaction” and served as the issuance of the initial bitcoins.
In addition to the coinbase transaction, the genesis block included a message embedded in the block’s coinbase parameter, which reads: “The Times 03/Jan/2009 Chancellor on brink of second bailout for banks.” This message is widely interpreted as a commentary on the instability of the traditional financial system and serves as a timestamp for the creation of the Bitcoin network.
Overall, the genesis block laid the foundation for the decentralized and trustless system of peer-to-peer electronic cash that Bitcoin represents.
So all in all, just make a somewhat arbitrary first block, and then pile up new blocks on top of it as above-described, and there you have the Bitcoin blockchain.
I used ChatGPT for that part mainly as a suggestion that, in case you need more details on a specific point, you can ask some of these large language models, as this is something they should be good at detailing, given that it’s something technical, with source code and documentation available, and that it’s old enough (way older than the cut-off point in training data for those models). But the rest was, as Aldo Stérone would point out, “written by a human enjoy it while it lasts”.
Closing remarks
Except for the first one about storage, these are mostly random rants, so feel free to skip.
If you have questions, or if you think I missed something important, the comments are, as always, down there.
See you in another post book lengthy post.
Storing all this
Nowadays, most blocks are close to (or at) their maximum capacity of 1MB. Meaning the blockchain grows roughly by 1MB every 10 minutes, or 144MB per day, or… more than 50GB per year.
We definitely need to have a full record of that, otherwise we can’t verify the transaction history so we can’t verify who owns which Bitcoin and either the system collapses or we have to trust someone with starting values that are different from the very first block. But the whole concept is “trust noone”.
So we need many people running Bitcoin Core and opting to store the whole blockchain.
So we need many people running Bitcoin Core and dedicating 500 GB to it as of January 2024 + 50 additional GB every year.
Enough said.
This also means that the Bitcoin network currently operates pretty much at max capacity. A workaround to this is the Lightning Network, which is basically an extra layer on top of the blockchain, to perform smaller and faster transactions outside of the blockchain.
If you want to study this further, see Wikipedia’s page on Lightning Network
Binance, Coinbase, Kraken, and the like
These services were originally created to exchange bitcoins for dollars, euros, etc. Nowadays however, many people don’t keep their keys themselves, instead they trust these platforms to hold their bitcoins for them. Remember how the initial problem was to create a trust-less system? OOPS.
Ponzis and shitcoins
As I mentioned at the top, and as you now have (hopefully) realized, the blockchain is, in itself, a simple concept. Very ingenious and creative (hence hard to invent) but simple (thus easy to implement). It does use complex cryptographic stuff, but in practice, you use ready-made, open-source (and FLOSS) libraries for these. On top of that, Bitcoin Core itself is open-source, as cryptography-centered software should be.
As a result, making a copycat is easy.
As you may have guessed, whoever creates the first block gets “some” spare change, plus then that person gets to mine “a few” more easy blocks alone, and later with a slowly growing amount of early adopters. As such, people have long criticized Bitcoin as being a bit of a ponzi scheme. I think I’ve even heard that as an argument in favor of the “G1” (pronounced “June”) alternative currency, which claims to avoid this pitfall itself, barely a few years ago. I agree with that criticism, but on the other hand, 1) how else could they have gotten it started and 2) this is now part of a more and more distant past.
Anyway, since making a copycat is easy and being the first (or an early adopter) is a jackpot, there is a incessant influx of “alternative coins”, aka “altcoins” aka shitcoins. All of them with founders hoping to be the head of their own ponzi. White Bitcoin couldn’t avoid starting a bit like a ponzi, those altcoins certainly could: by not being created in the first place. But eh, that’s the modern-day casino, where people feel like they’re “trading” rather than gambling on which shitcoin will still have buyers next week… but I must be getting old and grumpy 🤷
Recent Comments