Cacophony of Coincidences
Mistakes were not made
The Mystery ORF under discussion inexcellent article is 1252 Amino Acids long.
Jessica’s article bridges off of our previous Substack and is a great read about what exactly are Spidroin peptides.
So I have a question for the more mathematically inclined audience members. The length of this Overlapping ORF stuns every bioinformatics person I show it to.
What is the frequency of Overlapping ORFs as a function of length?
ChatGPT is not going to save you here-
A simple back of the envelop calculation I did to get an approximation of the odds that we are playing with…
3 out of 64 Codons are stop codons.
If you pick codons randomly like a 64 sided coin, you should run into a stop codon every 21-22 amino acids.
61/64 do not stop the amino acid synthesis.
So 95.3% of the time (61/64), you will have a coding codon, not a stop.
What are the odds of flipping this 95.3% coin 1252 times in a row and never seeing a stop codon?
95.3% ^1252. This is 7.86E-27
Does this number resonate with any literature on Overlapping ORFs?
Longest ones in SARs-CoV-2 spike are 120 bases or 40 Amino Acids. 294 bases is the longest in the whole SARs-CoV-2 genome
So its very unlikely this occurred as an Ooops.
Lets steel man this and take a paper from the viromania crowd: Ed Holmes
They looked at all viral genomes in NCBI.
Note the Y axis is a Log scale and nothing exceeds 3Kb. The Mystery ORF is 3759bp long.
So there is nothing detected in all viruses discovered to date that is this long of an overlapping ORF.
I also asked a well cited bioinformatics colleague this question. Someone very skilled in building DNA assemblers; a hard core quant person. They had a much better suggestion to get the root of the answer that would consider the codon frequencies in the top strand and how much freedom they have to wobble and not produce a stop codon on the bottom strand. I suspect this approach will come up with a more rare number than the entirely random coin toss model above as it places constraints on the codon table on the reverse strand that are predicated on the codons in the top strand.
Given the controversial nature of this topic, I have left the author of the below advice anon but I am happy to de-anon the quote at any time.
SnapGene does give us the codon frequencies for both ORFs.
So if anyone is brave enough to refine the estimate with the given advice above, all the data is here but I suspect the answer will still remain an astronomically low number that this is a by chance Ooops.
Meanwhile, over on FBIbook, Simply posting our preprint with ‘No Comment’ was torn down even though the FDA, the EMA and Health Canada have all admitted these sequences are in fact in the vaccine.
When I ask to ‘Learn more’ or ‘Request review’, I get this apology.
So in conclusion, This is highly unlikely to have occurred by chance and even when the centralized health agencies come to learn of this, the factcheckers will be sure to tell you…
Which brings me to some entertainment to close this out.
Make Jim Scott’s ‘The Crime’ go viral.
‘The thought police are at the door. Could this be 1984?”