Microbial DNA sequencing

Microbial DNA sequencing: individual clones and metagenomics

In 2010 the Oxford Genomics Centre partnered with the Modernising Medical Microbiology consortium in a bid to translate advances in sequencing technologies into diagnostic tools for tracking the spread of hospital infections. A major focus was to develop a robust and cost effective solution to prepare and sequence high numbers of bacterial DNA. As a result of this collaboration we are now able to offer a library preparation service ideally suited to high volumes of low complexity samples.

The majority of whole genome sequencing performed at the OGC is for microbial samples. When sequencing bacterial DNA, the GC content can contribute to the quality of data obtained. Our workflow has been specifically selected to cope with extreme genomes, from C.difficile (29% GC). to Tuberculosis (65% GC).

The library preparation steps are based on a miniaturised version of a standard gDNA protocol; fragmentation, end repair, adapter ligation and index incorporating PCR amplification. Although depth of coverage required is dependent on the analysis, due to the small genome sizes it is often necessary to multiplex high numbers of libraries for sequencing on a single lane. We routinely use a unique dual indexing strategy (avoiding index misallocation issue), commonly pooling 192 samples to be run using 150 bp paired-end sequencing on either the HiSeq 4000 or the NovaSeq 6000 (72-93 Gb per unit).

This library preparation technique has since been rolled out for use in bacterial metagenomics to allow identification of multiple bacterial species within the same sample.

If you are interested in this service, here are some points to consider:

Batch size

We use liquid handling robotics for all steps in the protocol. For this reason it is more appropriate for us to handle sample numbers based on 96 and 384-well formats. Fewer sample numbers / partial plates will be considered. Please contact us to discuss your needs.

Sample submission

Due to the high sample numbers involved it is not appropriate to submit material in tubes. All samples should be submitted using appropriate plates. Details of suitable plates can be found in our sample submission requirements.


Although our protocol will accommodate some variation, it is preferable that purified gDNA is normalised to 30 ng/ul, with 60 ul provided for each sample (this allows us to repeat the prep if necessary). The lowest input mass is 100 ng, and the maximum input volume is 34 ul. Although dependent on sample quality, an increased failure rate is observed with input masses below 100 ng.


Concentration of all samples will be measured but to minimise cost, the assessment of quality is limited to representative numbers within batches. Final library QC is limited to concentration to determine appropriate multiplex pooling.

Failure rate

Variations in sample material quality, elution buffers and GC content will influenced the success of the library. For this type of prep we normally allow for 10 % failure before reporting back to our customers. What we consider to be a ‘failed’ library is based on our normal expectation and experience. If less than 10% of the libraries fail QC (or following discussion with the customer), those identified as ‘failed’ will be added to the pool by default and often be represented in the data. However, the actual representation (determined through demultiplexed read counts) will depend on the quality of the library.

If this level of failure is unacceptable for you for metagenomic studies then there is a standard gDNA protocol that could be used instead. This approach is not scaled down and includes more QC steps, thus enabling the identification of issues earlier in the process and reducing the failure rate of poor performing samples. There is a higher cost associated with this and so for large sample numbers it may still not be appropriate. An alternative might be to consider 16S amplicon variable region sequencing, which is another way to detect species in a metagenomic sample. We use a protocol that targets the v3 and v4 regions through two rounds of PCR, the second PCR effectively completing the library so that it’s sequencer-ready after our QC. These libraries are suitable for running at 300bp PE on the MiSeq, which for this library type typically produces 10-17 million reads. The longer read lengths cover the full targeted region and allow microbial identification.

Microbial RNA sequencing: metatranscriptomics

As with metagenomics, it is also possible to take a microbial sample and to sequence the entire transcriptome. For this we would select the mRNA fraction of the RNA, before conversion to cDNA and synthesis of an Illumina sequencing library. As with the genomic approaches listed above, we routinely use unique dual-indexing for this library type to avoid the index misallocation issue and although it it dependent on sample number, we would typically sequence this on the HiSeq 4000 or NovaSeq 6000 using 150bp PE reads (240 million reads per unit).

Whichever approach you choose, our methods for library production have been optimized to be competitive on cost, speed and efficiency and are frequently reviewed for further improvements. If you have any questions relating to microbial genome sequencing or to discuss your next sequencing project please contact us.

In addition, the Oxford Centre for Microbiome Studies (OCMS) has been established to accelerate studies into microbial communities by providing support and expertise, you may wish to contact them to discuss your project.

Further reading:





As of 2020, the Oxford Genomics Centre no longer supports the HiSeq 4000 platform

Author: Simon Engledow and Lorna Witty