When choosing how many reads you need purchase for your experiment, several factors come into play. Remember you do not have to decide this by yourself. Cofactor specialists will ensure you get the right number of reads for your experimental goal at no additional charge.
First, read start-site coverage is random across the genome, so the distribution of final read depth across sites is almost exactly as expected from a Poisson distribution. What does this mean? If you sequence 1x as many bases as your genome has sites, you will only cover 63% of the genome sites in the average experiment because many reads will overlap with each other. Therefore, you need to sequence more to overcome these random effects.
Second, although many next-gen platforms actually have lower-error rates (about 1 in 1,000) than traditional sequencing platforms, when you consider how many sites you are sequencing (millions or billions) those little errors add up. To get correct calls at a majority of sites, you need to sequence more.
Therefore, for standard diploid genomic-type projects, you need about 30x coverage of the genome. For de novo assembly, you need more like 100x coverage for current short-read assemblers to give decent results.
Third, if you are sequencing from a mixed population of samples, you need to sequence enough to ensure you have sampled from all constituent individuals. If you are sequencing a library of transformants, you need to sequence enough to observe your low-level transformants several times. For these types of experiments you may need a fold coverage 10 – 100 thousand.
Fourth, if you are sequencing from RNA, not all sites have the same expression level. Actually, gene expression in almost all organisms is given by a power-law distribution with a parameter of about 2. What does this mean? Some genes are very highly expressed and some have lower copies per total fragments of RNA. Because the distribution is so lop-sided, you need to sequence much more in total so that you have a sufficient number of counts of lowly-expressed genes to identify and quantitate them accurately. For the Human transcriptome, 40 million reads is a good starting place.
For larger and de novo projects, you should also consider a mixture of several platforms, long and short reads, and single and paired-end/mate-pair reads of various insert sizes. Cofactor software and experts can help you choose this mixture in a cost-optimal way for a variety of extant and simulated genomes.