Nepetalactone Newsletter

Share this post

Failure of the linearization reaction in the Pfizer bivalent vaccine manufacturing process

anandamide.substack.com

Failure of the linearization reaction in the Pfizer bivalent vaccine manufacturing process

Anandamide
Mar 14
30
9
Share this post

Failure of the linearization reaction in the Pfizer bivalent vaccine manufacturing process

anandamide.substack.com
Expression vector map disclosed to the EMA without any DNA Sequence provided.

The EMA documentation from

Sasha Latypova
discloses Pfizers linearization strategy which utilizes a TypeIIs restriction enzyme known as Eam1104i. Restriction enzymes cut DNA based on a specific sequence pattern. Eam1104i cuts at 5’ CTCTTCN^NNN 3’ on the top strand and 3’ GAGAAGNNNN^N 5’ on the bottom strand. This can be seen in the Pfizer provided vector map at base 4,280 or ~6:00 above.

Figure 1. Eam1104i recognition sequence

Lets looks through the Deep Sequencing in IGV and see if we can find intact molecules. Upper left is an obvious place where the enzyme Eam1104i is cutting as you can see the consistent end points in the molecules we sequence. Upper right is another group of reads where the cut sites are intact (grey = sequence that matches reference).

Image
Figure 2. IGV view of the reads at the near the cut sites. Green reads are paired that diverge from the average insert size (105bp) reads. Grey reads reads are reads that map with the expected insert size.

Let’s zoom in on that so you can see the details. Sequencing OGs will notice there is more sequencing error near these sites as we near the poly As and some sequencing adaptors (dim ATCGs in the grey reads). What does that mean?

Image
Figure 3. IGV view of just the properly paired reads

Polymerase slippage is something that occurs with many polymerases (strand displacement polymerases exhibit less of this artifact) that stutter in long stretches of the same nucleotide like Poly As and Poly Ts. They ‘slip’ out of frame during amplification and you get an echo effect in the sequence data. PCR exacerbates the problem. Sanger sequencing of slippage is seen in figure

Image
Figure 4. taken from Nucleics website depicting polymerase slippage in Sanger sequencing reads. Recall Sanger sequencing is not a single molecule method. It’s an ensemble sequencing method were each peak represents millions of copies of a template and once they start to disagree or stutter you see peak on top of peaks like the G base at 170bp.

There are 2 PCR steps in the sequencing methods.

1)To amplify the Library after Illumina Adaptors are ligated on

2)During Bridge PCR on the Illumina array.

1- creates inter read disagreements

2- creates intra read disagreements

Image
Figure 5. Bridge PCR on the Illumina arrays creates clusters of 1000 molecules which should be clonaly identical… Unless there is PCR slippage during bridge PCR.

1- Can create high quality reads which disagree with each other on the length of the poly A and the neighboring sequence

2- Creates sequencing clusters that have discordant DNA and not fluorescently pure. This is spectral noise on the cluster being sequenced as all molecule in the cluster dont agree.. Low quality read locations.

Image
Figure 6. Paired reads which are orientated across the ends of the reference. These are known as Outtie reads as the green reads point outward from the reference sequence. The reads on the Left are pointing 5’→3’ to the Left. The reads on the right are pointing 5’→3’ to the right. They are joined by a thin green line which proves they came from the same spot on the cluster array. The same Bridge PCR molecule has reads that point away from each other. That only happens when the reference is a circle and they are actually pointing at each other. Your reference genome is creating an illusion of mis-orientate paired reads.

Since a sequence of 110As is like a jigsaw puzzle thats all one color, you want some edges to these sequences that help to define it as an edge piece or a provide some signature.

You want reads that anchor outside of the poly A on both sides of the 110bp poly A region. So you have some signature to map reads. Since we have a very short library (250bp) we can’t expect many reads to bridge 100bp poly A because ~150bp of the 250bp is Illumina adaptors.

Image
Figure 7. Depiction of Illumina libraries. Indexes are DNA barcodes used to multiplex different libraries on the same sequencing run.

This is why we are being very cautious on calculating the % circular DNA. If you have very short insert libraries, you won’t sample many reads that span a 110bp un-mappable polyA region. You need 100bp+110bp+100bp so you have unique 100mers on each side of the polyA. 460bp library would be preferred but we might be able to live with 360bp (50bp anchor +110bp polyA +50bp anchor+ 150bp Illumina adaptors).

Image
Figure 8. Reference for the length of the Illumina adaptors

We got very lucky we captured any junctions at all and for that reason any estimation of circularity will be an underestimate. You'll notice out average insert size was 105bp.

Image
Figure 9. Samtools stats of the library insert size calculated from mapping the reads to the reference.

250bp library - 150bp adaptors = 100bp inserts.

Image
Figure 10. Agilent Tape Station 4200 electrophoresis of the fragment libraries generated from the vaccine DNA. Next attempt at this will use larger insert sizes and fragment the DNA less aggressively.

As you can see, while we learned a lot from these RNase A libraries, we now know there is a long Poly A in the vector and to properly span it we need larger insert libraries. This is as easy as making the 3minute fragmentation step down to 1minute and we should get larger inserts for our next Illumina run. This should provide a less biased assessment of the circle vs linear assessment.

Since the linearization step uses a single cut site, they could very easily quantitate the efficiency of this step with a qPCR assay that spans the cut site. The cut site is very close to the poly A tract which might make this a bit tricky to design.

They should also better disclose what assay they are using to detect residual DNA from the DNase I step. The next post will touch on this step of the process and what we can learn by using DNases and RNases in qPCR.

9
Share this post

Failure of the linearization reaction in the Pfizer bivalent vaccine manufacturing process

anandamide.substack.com
9 Comments
Janiesaysyay
Mar 15·edited Mar 15

This pucing tour is depressing but, nonetheless I pfind it pfascinating. Thank you for all you are doing. Miss being able to see soothing kitty pics on my feed to help me cope. When the world is melting down, there's nothing like a cat to reassure you life is still purrty great.

Expand full comment
Reply
Mongol
Mar 15

Ben from USMortality failed to reproduce the genome of Wuhan-Hu-1 using MEGAHIT, but I think it might be because he didn't do any quality control or trimming or merging of the reads: https://usmortality.substack.com/p/sars-cov-2-genome-assembly, https://github.com/USMortality/Megahit-SARS-CoV-2/blob/master/megahit.sh. Are you able to modify his shell script so it produces the correct result?

Expand full comment
Reply
9 replies by Anandamide and others
7 more comments…
TopNewCommunity

No posts

Ready for more?

© 2023 Anandamide
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing