Immortalizing Scientific Publications
Bitcoin OP_RETURNs
This October I had the pleasure of being on 3 panels at the Camp Nakamoto Bitcoin retreat on Sandy Island on Lake Winnipesaukee. The topic was about the recently recycled concerns over inscriptions on the Bitcoin Blockchain. This is not a new debate. Satoshi Nakamoto had this debate with many members of Bitcoin Core a decade ago and it was decided that its impossible to stop in a censorship resistant peer to peer network. However there is an argument for the community to not actively encourage or enable large file storage based on the risks it brings to such a network.
https://www.campnakamoto.com/
This is a very important topic for censorship resistant scientific publication but any solution to this problem invites the capacity to post illegal content (porn, offensive memes, etc).
There is a distinct difference to traditional posting of data on a website. Bitcoin doesn’t allow you to post a PDF or an actual image. You have to encode it in HEX and chop up the data into allowable sizes to fit into multiple transactions. So any data in the chain, needs a decoder that can reconstruct an image or a large file from multiple transactions and the peer to peer network circulating this data has no apriori knowledge on how to decode the data. “I know it when I see it” argument is hard to satisfy here.
A paper published in 2018 discussing the various methods of encoding arbitrary data in Bitcoin and how some of this data may in fact be illegal if you can sort out the data encoding used.
We have been using this Inscription functionality for nearly a decade at Medicinal Genomics. Since the Cannabis field is censored and you can’t notarize cannabis genetics at a federally regulated notary or bank, the solution was to sequence a plants genome, Hash that large file into a 32 byte fingerprint with SHA-256 and include that hash in the OP_RETURN of a Bitcoin transaction. This creates an immutable time stamp of when those genetics existed. This has actually been used in court to resolve the theft of cannabis genetics.
The debate has been reheated very recently as a recent upgrade to Bitcoin core v30 opened the OP_RETURN from 80 bytes to 100Kb. This is certainly a large upgrade but it should be known that there are mechanisms to put 400Kb into the Witness data prior to this and most bitcoiners don’t want the part of the chain that contains the UTXOs (unspent transaction outputs) to be filled with non-financial data. The UTXOs are the core part of the chain people need to parse to validate new transactions. It contains which addresses own which coins. The OP_RETURN can be pruned in lightweight instances of the chain and one can still validate who owns which coins without OP_RETURN data. So the new v30 implementation is meant encourage people to move these inscriptions into a part of the chain that is less chain bloating for lightweight wallets and nodes. Think of it as realizing you can’t stop skateboarding, so just make a park and encourage people to skate and graffiti in a designated space.
One of the seminal works behind spam reduction in such networks was published by Adam Back with his implementation of HashCash. This forces a CPU expense for each email such that spamming 100,000 email addresses costs you money.
Satoshi referenced Adams work when implementing a cost per byte for every transaction and it played a roll in Satoshis proof of work invention. So this “spam” is being paid for and according to many students of the Austrian Economist, Carl Menger, Value is subjective and if someone is paying for “Spam” its only spam in the eyes of the protester but not to the payer.
The moment someone is paying (sats/byte) for each byte in the blockchain, it becomes financial transaction information in eyes of “team Menger.”
This debate actually dates back to the genesis block of Bitcoin where Satoshi him/herself even inscribed their own ‘graffiti’ into the first ever Bitcoin transaction. This was placed into the Coinbase (not OP_RETURN).
In fact most financial transactions have some metadata about the transaction. Your credit card rarely simply has X paid Y. It usually includes why or what it is you bought.
On the flip side of this argument is that blockchains are NOT great databases. In order to enable decentralization, every ounce of energy is expended to keep the transactions as compressed as possible as every node in the network needs to maintain a copy of the chain and its ever growing in size (2-4Mb) every 10 minutes and currently 693Gb.
No one wants a Blockchain that gets so large only Google can host it. We want it to work on a Raspberry Pi so extreme decentralization can be attained otherwise it leads to centralization which leads to Zuckerberg like censorship.
Why should scientists care about this esoteric topic?
Bitcoin has been running for 15 years and no one has ever been able to censor a block. As a result of this it has become the ledger to secure $2.4T worth of money. This makes it an attractive place to etch data you need to immortalize as long as you can do it in a manner that abides but the consensus rules of Bitcoin and pays the fees for such storage and mining.
You can’t place large data here! The blocks are limited in size and there are consensus rules that limit this to 100-400Kb/txn. As it stands the current chain is 693Gb of disk space and 15 years from now it will about twice as large (~2Tb). With the rate of memory storage improvements, this shouldn’t really be problem given the rate of Kryder’s law (like Moore’s law on CPUs except for disk space). But very large blocks can become difficult to relay around all the nodes and we want to maintain the cheapest possible hardware footprint possible for cheap adoption of nodes.
Given these limitations, this is a great tool to place a hash of your published paper and a link to it. You can’t and shouldn’t store the whole paper but you can store a SHA-256 hash (32Bytes) of your paper which is a unique mathematical fingerprint of your file. If you change one byte in that file, the resulting SHA256 hash would be completely different. You cannot derive the paper from the hash. They are unidirectional functions (Paper→hash). This is why a link is required to give context to the hash. You can shasum -a 256 your PDF or PNG and confirm that the hosted paper is identical to the hash on the BTC chain so it proves the timestamp and that nothing has been tampered with.
Once on chain, you have an immutable time stamp of its existence and hopefully a link to decentralized data store like Nostr or IPFS to contain the paper.
This is our DNA contamination paper posted on NOSTR which is a decentralized version of Twitter. It doesn’t accept PDFs but you can take your PDF and convert it to a PNG file, upload it to your Nostr account and post the Shasum -a 256 <PNG FILE>.
In the event Nostr ever dies, you can post a link to this file in the Bitcoin OP_RETURN to IPFS.
You can use a hardware wallet like Trezor to send a transaction to yourself.
In the upper right corner there are 3 dots(…) If you click on that you can “Attach message” into an OP_RETURN
At the moment Trezor is still defaulting to the old 80 Byte OP_RETURN limit but allows this short link to NCBI for 21 cents.
You will note the data get written into Hexadecimal on chain. Its very hard to make an image from 80 Bytes but if you were determined, you could submit multiple transactions with contiguous OP_RETURNs that only you would probably know how to reassemble the chopped up data. This gets easier with 100Kb OP_RETURNS but does beg the question of “Do you know it when you see it”.
I’m not a huge fan of making it the OP_RETURN large. I just need a hash and a link to a paper and anything larger is going to invite sloppy use of precious block space and perhaps controversial image data. Keep in mind, even without this new OP_RETURN feature, people have been stuffing data into the Witness which is not ideal as this pollutes the UTXO sets with this data instead. Some of these data are already contested as being illegal in many jurisdictions but if the network is decentralized, there is no single server to shutoff.
So now we have this text embedded into the Bitcoin Blockchain for eternity.
https://pubmed.ncbi.nlm.nih.gov/40913499/
Speicher, Rose, McKernan
Courageous Truth Unacceptable Jessica
This transaction is now confirmed on chain in Block 919977.
There are other services like Mara Slipstream that will assemble a transaction directly at the miner level to bypass the peer to peer network that gossips the transactions around the world. So boycotting large OP_RETURNs at the node relay level can be bypassed but these miners are much more centralized and can probably be zuckerberged with a state visit to implement filtering. This invites a slippery slope as any government filtering at the miner level will slide into sanctioning transactions that aren’t controversial but politically unfavorable like Truckers in Canada. There are household miners like 1TH/s BitAxe that are in place to make this attack surface less impactful.
This txn isn’t ideal as we are linking to NCBI and they can choose to censor papers. It is rare they even censor retracted papers but government shutdowns have been known to shutdown PubMeds support.
A better solution would be to point to the NOSTR link but that link doesn’t fit in the old 80 Byte OP_RETURN limit as its too long.
To understand why this is important to censorship free science, see this podcast with Efrat Fenigson. Efrat Fenigson
Not being a fan of 100Kb OP_RETURN, there are things you can do as node owner. You can choose not to relay transactions of over a certain OP_RETURN size. This is more of a protest move as you need about 90% of the nodes to all run this for it to work and with miners willing to accept direct OP_RETURNs it will likely be bypassed.
You can do this in both Bitcoin Core v30 or in Bitcoin Knots. These are software tools that Bitcoin relay nodes run. Bitcoin Knots defaults to 42 Byte OP_RETURN and a Core v30 defaults to 100Kb. I’m in the middle and want enough data to include a hash and a link but not enough to embed an image. The link is important as we want to encourage the layer 2 of large data. Get it off chain so its not copied to every node in the world.
Once you realize you can place a hash and a link to scientific papers into BTC and etch them into eternity, you begin to see this is the ideal network to run a “Peer 2 Peer” peer review system, where all reviewer comments can be placed on Nostr or IPFS and hashed and stashed on BTC. This disintermediates the Journals which are a centralized attack surface to Peer review given their conflicted payments for advertisement from Pharmaceutical companies. In the current system your $3-$10K Article processing fees go to the Journal. They keep the copyright of your work. The reviewers are often anonymous and their review comments are often hidden from the public (not in all journals). The reviewers get Zero of these Article processing fees.
In an ideal Peer review market, you need pricing signals to optimize it. Reviewers need to get paid and their payment should reflect their quality and punctuality. If you need a fast lane for publication, there needs to be a pricing signal to triage the urgency. A global network like Bitcoin enables international settlement of reviewer incentives and a transparent open review.
We did this with the Cannabis Genome Project in 2018.
Addendum-
Since Trezor would not allow an 80 Byte OP_RETURN, I turned to my own Node and constructed an OP_RETURN with the following text. The Primal Link to our PNG file for our paper and followed by the sha256 hash of the file.
https://primal.net/e/nevent1qqsq0stndpynym9fayqey3faylxcu8cxjxsfag387rch782u2z6ndhqdtw58j
a486c8542c0938222520a20d6db33121a8eebe329716ceb5221848f4bb63d2cf
I constructed my own BTC transaction on my node, funded it at 5sats/byte and handed that directly to a miner at Mara SlipStream for direct incorporation into a transaction bypassing the peer to peer gossip network where filtering of such transactions may limit its reach in the gossip chain.
This demonstrates that regardless of where the Bitcoin Core v30 versus the Bitcoin Knots debate lands, we have a process for putting PDFs→PNGs onto Nostr with the link to the Nostr content etched into BTC with a Hash of the posted PNG file.
The PNG file downloaded from NOSTR is below.
If you have 5 hours to spare you can watch 2 debates on this topic. I really enjoyed both of them and believe compromise would be a 200-1000B datacarriersize. This is enough to build a layer 2 into Nostr or IPFS.



















Thanks very much Kevin. Very interesting topic. I am just getting up to speed with BlockChain. Are you selecting Bitcoin because the system is so robust and has never been compromised? Are there other block chains out there that could be utilized? I believe the video channel Odysee is built on blockchain technology. Thanks again for all your amazing work and for always telling the truth. Peace.
On the overburdening-with-data of Bitcoin OP:
"The best place to hide a needle is in a pile of needles."
It is just vulnerable to being swamped with crap.
;-(