Next Generation Sequencing in the context of today’s science.

In $23,000 and 3 days you can generate 8 giga-bases (billion nucleotides) of sequence using next-gen sequencing technology, a feat that would take $23 million and 4 years with a common DNA sequencer. The sequencers used to complete the genomes of human, the nematode C. elegans, and maize produce on the order of 6 million bases per day. Next-Gen sequencers such as the 454 produce 500 million bases in a day. The more recent Illumina sequencer produces more than sixteen times that.

Competing in today’s scientific environment, both commercial and academic, necessitates the use of the latest technology. Move beyond studying markers and genes to full genome, transcriptome and epigenome characterization. Next-Gen even enables previously impractical experiments such as environmental, population and metagenomics.

While this technology offers new promise to life science and medical researchers, it also creates complex challenges. The members of Cofactor have significant experience analyzing terabytes of data and exploiting the unique strengths of each Next-Gen platform. Check out our prices, availability and contact us to get started.

Flipping NextGen: using biological systems to characterize NextGen sequencing technologies

By: Jarret Glasscock*, Ryan Richt and Matt Hickenbotham

Background At a current 12 gigabases per sequencing run (and grow- ing), there have been significant advancements in DNA sequencing technologies resulting in next generation (NextGen) sequencing platforms that produce 5 orders of magnitude more data than platforms used for the human genome project.

Results A broad range of genomes was surveyed in order to assess characteristics necessary to sufficiently analyze these bio- logical systems. In the context of genome re-sequencing projects we found 15 bp was needed to uniquely map 98% of loci in many bacteria, while 20 bp was needed before hitting lower asymptotes to uniquely characterize a fraction of more complex genomes (Figure 1). Transcriptomes on the other hand were much less variable and required fewer bases (x) to uniquely map a much larger percentage (y) of their sequence space. For example, more than 98% of the complex human transcriptome could be uniquely characterized with as few as 20 bp. Finally, de-novo sequencing (i.e. without a reference) would require a minimum of 1/2 of the sequence length to be unique in order to allow sufficient contig extension in the assembly process. For example, 40–50 bp reads are necessary for de-novo characterization of these systems uniquely defined by 20–25 bp reads. As of 2009, short read NextGen sequencing technologies have moved to 50 bp and beyond, ushering in what is expected to be the start of a revolution in genomics.

Conclusion These results establish a lower bound on sequence length (x) required to sufficiently conduct re-sequencing, tran- scriptome, and de-novo sequencing projects. The asymp- totic nature of the results also provides a guide for what percentage of the total space (y) we might expect to define in genomes/transcriptomes of similar size and complex- ity.

screen-shot-2010-08-29-at-13209-pm









Figure of defined 1 genome or transcriptome (Y-axis) uniquely Percent by a read length (X-axis) Percent of genome or transcriptome (Y-axis) uniquely defined by a read length (X-axis).