The EMA documents are wealth of information.
There are two things you won’t find in there.
1)A qPCR curve or description of the qPCR master mix they used for their qPCR assessment of the dsDNA contamination. Nor do they describe what CT is called! Their primers are disclosed but they span T7 to the Kozak consensus sequence (106bp) but there is no vector primer pair. Its a SYBR green based method. This can be problematic as they are targeting a region that has the highest RNA/DNA ratio in the sample. This may lead to under estimating the DNA levels with qPCR. They should have an amplicon targeting the vector and one targeting the insert and preferably one in the insert that isn’t known for having the highest transcriptional activity. That was a clever move.
But here is the zinger…
2)You will not find any raw sequence data. They mention they have RNA-Sequencing data but they don’t share it. Instead they summarize it as the following.
They claim to have a minimum of 100,000X coverage over the mRNA transcript. They then go on to claim they also were able to confirm the vector sequence!
Read that again…
Their RNA-Sequencing data picked up the plasmid! RNA-sequencing often selects against sequencing DNA.
With Illumina sequencing, 30X coverage is usually required to confirm clinical grade plasmids.
Often people sequence much deeper than 30X as it is so affordable to do so but using less than 30X coverage would likely run into some criticisms in peer review. This group used 1,000X coverage for plasmids on Illumina but you could certainly get away with 50-100X. Sequencing plasmids on Illumina is like sipping out of fire hose and it is hard to get less than 100X on a plasmid unless a lot of RNA is obscuring it.
So Pfizers own admission to the EMA concludes they have plasmid contamination!
Exactly how much they won’t share with you but if you assume they have 100,000X over the transcript and at least 30X coverage over the plasmid to “confirm it” then the ratio of 100,000X mRNA /30X plasmid = 3,333 RNA/DNA. Keep in mind they are performing RNA sequencing and most methods favor sequencing RNA over DNA so this ratio is very likely biased in the favor of RNA.
Despite the bias against sequencing DNA that it is still over the EMA’s limit of 1/3030 and no one at the EMA picked up on this subtlety. I missed it as well on my first tour through this 216 page document.
Our team has received a lot of critique over our publication process of this contamination work. I’m still at odds trying to understand the double standards people have. Our sequencing data of Pfizer’s vaccines is public. Theirs is not. There is evidence in their communication with the EMA that they have DNA Plasmid contamination that is high enough for them to see it in an RNA-Sequencing process that selects against sequencing DNA. A defect we picked up and confirmed as citizen scientists but are accused of publishing non-peer reviewed work.
We have also noticed that the people critiquing our work, never critiqued Andrew Fires work sequencing the vaccines. Their lab never put the raw Illumina sequence public nor applied qPCR or other tools to verify it. Only if the results present something they don’t agree with do the peer review knives come out.
This is pure irony in the end.
Pfizer ultimately providing the confirmation of our sequencing efforts is rich.
To quote Elon Musk: “The most entertaining outcome is the most likely”.
PS- David Wiseman and an appropriately skeptical commenter below (@adelaidean) pushed back and asked why Pfizer can’t claim the sentence only partially confirms the sequence of just the transcript. You could read that sentence to be referring to the confirmation of not the noun in that sentence (Eam1104i linearized pST4-1525 reference plasmid) but to be referring to the noun in the previous sentence (the BNT162b2 transcript). It would be poor form but a potential rebuttal of theirs. It’s a good question. The answer to it lies in some familiarity of best practices in the bioinformatics space.
The right thing to do with such data is to de novo assembly it with a tool like MegaHit. I’m certain they did this as it would inform on E.coli background and other artifacts at play that are very informative to their process.
But…
Once you have reduced yourself to read mapping you have introduced a massive bias as you are only asking for confirmation of what you are looking for. The moment you have agreed to commit such a sin, you wouldn’t do it half way. You especially wouldn’t do it half way, if it forced you to disclose your vector sequence. So if you going to read map to confirm just the transcript, you would map the reads to just the public 4284bp transcript. Since that sequence is already public, you have no additional disclosure risk.
Additionally the Eam1104i site is not in the transcript and they call that site out by name implying they have sequence confirmation of it.
If you read the rest of the EMA documents you will see they were asking a lot of questions about the plasmid vector confirmation as Pfizer outsourced that process. They also had lots of questions about the Eam1104i digestion process. I think this was a way to give the EMA assurances they had these under control without having to disclose the coverage of that plasmid. They did share the Vector map (Its missing SV40) but no sequence of it.
Finally, Our data shows that at 100,000X min read coverage over the transcript, you would get enough reads over the vector to have over 30X coverage. They have the depth to see it.
“These reads were mapped to the Eam1104i-linearized pST4-1525 reference plasmid and the sequence identity was confirmed.”
If they only had sequence coverage over the transcript, this sentence should read
“the sequence identity over the transcript was confirmed”.
Otherwise the statement implies the whole plasmid reference was confirmed. There is no reason for a bioinformatics person to mention that the plasmid reference sequence has been confirmed if they didn’t mean exactly that as the BNT162b2 reference is a citable public reference.
Thank you. I printed your paper "Sequencing of bivalent . . ." I believe WCH provided it. I also printed, a second time, "Intracellular Reverse Transcription . . ." Alden et al Marinis and "The Efficacy of Covid-19 Boosters . . . " Ophir et al McCullough papers. The very first research I did in April of 2020 was on Reverse Transcription, Luciferase, and Ferritin. Thank you again for doing all that you do. PS I So want to run these papers over to my alma mater MS Integrative Genomics dept but they all drank the koolaid and even wrote for and recieved an NIH grant to study variants. I was specifically told this in person when I presented Dr Richard Flemming's Jmol Spike coformation to my microbiology professor back in May of 2021. He looked freaked out so I took the paper back. My micro prof did allow me send him one of Dr McCulloughs very 1st hour long video presentation though. Later I tried to present data to the Soc/Psych dept and they called the popo on me. The popo were cool and volunteered to take a look at Dr David Martin's documents. I did not have a group backing me so I turned them down at the time. I then focused my attention on informing the public elementary Superintendent and the elementary school nurses. The nurses were not allowed to talk to me and referred me to the charge nurse. The Superintendent did listen and took down a bunch of the alt websites for legit data that I offered him. He also disclosed PODS -Points of Dispersement Sites at schools that trumped state laws for schools that took covid cares money. I also called the state Covid health dept and provided them with a direct link to the WHO's Implied Consent document stating that if you sent your child to school on that day. They said they would review it and that they were not aware of it. I also called all the local blood banks. Managerial were very unpleasent towards me too but I hope I made a bit of a difference.
Keep giving them hell! 💪