Cyre2067
The Living Force
I thought this article was interesting because it's blowing up nature's homepage like a tunguska blast, there's also the cosmic leap it makes between the data and its conclusion. First off, my grandfather died of Small Cell Lung Cancer (SCLC) and he was a life-long tobacco smoker. His method of preference was pipe tobacco, unfiltered, and of questionable quality and chemical content. So I don't doubt that chemicals within tobacco can cause cancer. What erks me about this paper is it's basic assumption that lung cancer is caused by 'tobacco smoking' as a 'lifestyle choice'. No regard is given to the chemical content nor quality of the tobacco, and the other life-style based exposure to chemical toxins.
I also don't expect anyone without a background in science to really read it, but I post it here for anyone interested with some comments as I saw fit.
This is really cool, the technology they used to do this project allows for rapid sequencing of a genome and comparison across types. So they took a lung cell and compared it via this technology to a line established from a SCLC patient. What kills me is that they assume SCLC is caused by 'tobacco smoke' and completely ignore all the toxic additives in most cigarettes. Not to mention our horrible air quality, water quality, and overall toxic overdose that we get just by living in the environment.
There are also some methodological issues I have, first, they don't have the patients smoking history? Really? What if he wasn't a smoker? They address that in the next line, but without having his smoking history I feel like the whole correlation of SCLC to smoking, especially in this paper, is shoddy at best. They do give some supporting infos, but again I'm not sure how well researched the science is because I haven't checked it out myself.
So what they say above is that loss of two proteins, key to DNA repair, RB1 and P53 is present in their artificially generated cell line. These proteins are also broken in SCLC, so their cell line is 'genetically typical' of SCLC. That's a stretch, imho. However, it is to be expected if the cell line was generated from a SCLC patient's cancerous lung cells. I would say it is genetically typical in this particular regard, they're making a much broader statement.
Again, very generic terms here. 'Tobacco smoke' well, what kind? From which product? Smoked how? No due is given to air/water/food/environmental toxins, which make up the bulk of mutagens we are exposed to here on BBM. They also immediately assume all these mutations were caused by 'tobacco smoke' - another stretch. Prove it? Not in this paper....
The above is typical of most all chemical carcinogens, not just those found in tobacco smoke, more broad strokes. No wonder they're on the front page of nature.
The numbers here aren't too significant. There's another issue with this cell line, it's been transformed. Which basically means they took normal cells from the patient and genetically tweaked them into a stable cell line. A stable cell line is basically a more 'cancerous' version of what you started with. So how many of these mutations were there originally and how many did they create, it remains unknown and unaddressed in this paper.
Left out the methods/references for brevity. 1 Mutation per 15 cigarettes? Puh-lease. The number of genetic mutations you're exposed to every day is dramatic, sun light and oxygen being the primary culprits. So while I like this paper for the technology used, and the raw science done, the interpretation is wayyyy off and leaves a lot to be desired. There's another article, which is basically the 'short PR version' of the paper which might be worth a casual review now that you've gotten to look at the 'science'.
Here it is.

I also don't expect anyone without a background in science to really read it, but I post it here for anyone interested with some comments as I saw fit.
Nature advance online publication 16 December 2009 | doi:10.1038/nature08629; Received 17 September 2009; Accepted 30 October 2009; Published online 16 December 2009
A small-cell lung cancer genome with complex signatures of tobacco exposure
Erin D. Pleasance1, Philip J. Stephens1, Sarah O’Meara1,2, David J. McBride1, Alison Meynert3, David Jones1, Meng-Lay Lin1, David Beare1, King Wai Lau1, Chris Greenman1, Ignacio Varela1, Serena Nik-Zainal1, Helen R. Davies1, Gonzalo R. Ordoñez1, Laura J. Mudie1, Calli Latimer1, Sarah Edkins1, Lucy Stebbings1, Lina Chen1, Mingming Jia1, Catherine Leroy1, John Marshall1, Andrew Menzies1, Adam Butler1, Jon W. Teague1, Jonathon Mangion2, Yongming A. Sun4, Stephen F. McLaughlin5, Heather E. Peckham5, Eric F. Tsung5, Gina L. Costa5, Clarence C. Lee5, John D. Minna6, Adi Gazdar6, Ewan Birney3, Michael D. Rhodes4, Kevin J. McKernan5, Michael R. Stratton1,7, P. Andrew Futreal1 & Peter J. Campbell1,8
1. Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK
2. Life Technologies, Warrington WA3 7QH, UK
3. European Bioinformatics Institute, Hinxton CB10 1SD, UK
4. Life Technologies, Foster City, California 94404, USA
5. Life Technologies, Beverley, Massachusetts 01915, USA
6. University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
7. Institute of Cancer Research, Sutton, Surrey SM2 5NG, UK
8. Department of Haematology, University of Cambridge CB2 2XY, UK
Correspondence to: Peter J. Campbell1,8 Correspondence and requests for materials should be addressed to P.J.C. (Email: pc8@sanger.ac.uk).
Abstract
Cancer is driven by mutation. Worldwide, tobacco smoking is the principal lifestyle exposure that causes cancer, exerting carcinogenicity through >60 chemicals that bind and mutate DNA. Using massively parallel sequencing technology, we sequenced a small-cell lung cancer cell line, NCI-H209, to explore the mutational burden associated with tobacco smoking. A total of 22,910 somatic substitutions were identified, including 134 in coding exons. Multiple mutation signatures testify to the cocktail of carcinogens in tobacco smoke and their proclivities for particular bases and surrounding sequence context. Effects of transcription-coupled repair and a second, more general, expression-linked repair pathway were evident. We identified a tandem duplication that duplicates exons 3–8 of CHD7 in frame, and another two lines carrying PVT1–CHD7 fusion genes, indicating that CHD7 may be recurrently rearranged in this disease. These findings illustrate the potential for next-generation sequencing to provide unprecedented insights into mutational processes, cellular repair pathways and gene networks associated with cancer.
This is really cool, the technology they used to do this project allows for rapid sequencing of a genome and comparison across types. So they took a lung cell and compared it via this technology to a line established from a SCLC patient. What kills me is that they assume SCLC is caused by 'tobacco smoke' and completely ignore all the toxic additives in most cigarettes. Not to mention our horrible air quality, water quality, and overall toxic overdose that we get just by living in the environment.
More than 1 billion people worldwide smoke tobacco1. With 20× greater risk of developing lung cancer than non-smokers and increased risk of many other tumour types, a smoker’s lifestyle choice represents the most significant carcinogenic exposure confronting health services today. Tobacco smoke contains more than 60 mutagens that bind and chemically modify DNA2, 3, and these brand the lung cancer genome with characteristic mutational patterns. Point mutations in, for example, TP53 and KRAS show different signatures between smokers and non-smokers with lung cancer2, 3, 4. However, such studies have been limited to a few genes, and it is unclear how representative these findings are of mutational processes across the whole genome5. In vitro assays and mouse models have been important tools for testing the mutagenicity of individual chemical constituents of tobacco smoke, but are of limited value for generalizing to the complexity of smoking behaviours, systemic metabolism and cancer development in humans. Massively parallel sequencing technologies promise the capacity to paint a genome-wide portrait of mutation in human cancer. Such data will provide unprecedented insights into the relative contributions of different tobacco carcinogens to mutation in vivo, the effects of local DNA structure on mutability and the cellular defence mechanisms against exogenous mutagens.
Lung cancer is the leading cause of cancer-related deaths worldwide, developing in more than a million new patients annually6. Small-cell lung cancer (SCLC), representing 15% of cases, is a distinct subtype associated with a typical clinical picture of early metastasis, initial response to chemotherapy but subsequent relapse, and a 2-year survival of <15%7. Several tumour suppressor genes are inactivated, including TP53 (80–90% of cases8), RB1 (60–90% of cases9, 10) and PTEN (13% of cases11). Infrequent activating mutations have been found in PIK3CA, EGFR and KRAS (all 10% or lower; http://www.sanger.ac.uk/genetics/CGP/cosmic/), and MYC is amplified in 20% of cases.
The development of massively parallel sequencing technologies makes it feasible to catalogue all classes of somatically acquired mutation in a cancer, including base substitutions12, 13, insertions and deletions (indels)12, 13, copy number changes14 and genomic rearrangements14. Reports from high-coverage sequencing of two acute myeloid leukaemia genomes have been published, which have concentrated on detecting point mutations in exons and regulatory regions12, 13. Here, we report the first detailed analysis of a human cancer classically associated with tobacco smoking, giving unprecedented insights into the mutational burden associated with this lifestyle choice. Such analyses highlight the advances that will be made in our understanding of the pathogenesis of cancer as we sequence hundreds to thousands of human tumours15.
Sequencing of a SCLC cell line
Most small-cell lung cancers are not surgically resected7, meaning that cell lines are an indispensable resource for studying this disease. NCI-H209 is an immortal cell line derived from a bone marrow metastasis of a 55-year-old male with SCLC, taken before chemotherapy16. The smoking history of the patient is not recorded16. However, the specimen showed histologically typical small cells with classic neuroendocrine features: >97% of such tumours are associated with tobacco smoking17, 18. An Epstein–Barr-virus-transformed lymphoblastoid line, NCI-BL209, has been generated from the patient. NCI-H209 has been extensively characterized by spectral karyotyping, capillary sequencing and high-resolution copy-number array (http://www.sanger.ac.uk/genetics/CGP/cosmic/).
There are also some methodological issues I have, first, they don't have the patients smoking history? Really? What if he wasn't a smoker? They address that in the next line, but without having his smoking history I feel like the whole correlation of SCLC to smoking, especially in this paper, is shoddy at best. They do give some supporting infos, but again I'm not sure how well researched the science is because I haven't checked it out myself.
Using the SOLiD platform, we generated 25-base-pair (bp) short-read, mate-pair shotgun sequences from the tumour and matched normal genomes. On the basis of detailed power calculations, we estimated that tumour and normal genomes should be sequenced to 30-fold depth to identify somatically acquired genetic variants with high sensitivity and distinguish them from both sequencing errors and germline polymorphisms (Fig. 1a). In total, 112 gigabases (Gb; 39× coverage) from the tumour and 90 Gb (31×) from the normal were aligned to the reference genome (Fig. 1b).
Bioinformatic algorithms were developed to identify somatically acquired genetic variation from the sequencing data (Supplementary Fig. 1 and Supplementary Tables 1–5), subjected to rigorous validation by polymerase chain reaction (PCR) and capillary sequencing. We had previously identified 29 base substitutions, of which 22 (76%) were called by our algorithm from the SOLiD sequencing data (Supplementary Results and Supplementary Table 6). A total of 79 novel coding substitutions and 354 randomly chosen genome-wide variants called by the algorithm were also tested. A total of 77 (97%) of the coding substitutions and 333 (94%) of the random variants were confirmed as genuine somatic mutations (Supplementary Table 7). Under the conditions given here, small indels are difficult to detect and neither of two known indels in coding sequence was identified. Of putative somatic indels that were called, the true-positive rate was 25% by capillary sequencing (Supplementary Results and Supplementary Table 8). Therefore, only somatic indels which were confirmed by capillary sequencing are reported here. All somatic genomic rearrangements called by anomalous read pairs were validated by PCR and capillary sequencing across the breakpoint, as previously described14.
Bioinformatic algorithms were developed to identify somatically acquired genetic variation from the sequencing data (Supplementary Fig. 1 and Supplementary Tables 1–5), subjected to rigorous validation by polymerase chain reaction (PCR) and capillary sequencing. We had previously identified 29 base substitutions, of which 22 (76%) were called by our algorithm from the SOLiD sequencing data (Supplementary Results and Supplementary Table 6). A total of 79 novel coding substitutions and 354 randomly chosen genome-wide variants called by the algorithm were also tested. A total of 77 (97%) of the coding substitutions and 333 (94%) of the random variants were confirmed as genuine somatic mutations (Supplementary Table 7). Under the conditions given here, small indels are difficult to detect and neither of two known indels in coding sequence was identified. Of putative somatic indels that were called, the true-positive rate was 25% by capillary sequencing (Supplementary Results and Supplementary Table 8). Therefore, only somatic indels which were confirmed by capillary sequencing are reported here. All somatic genomic rearrangements called by anomalous read pairs were validated by PCR and capillary sequencing across the breakpoint, as previously described14.
For point mutations in coding regions, we found the previously described RB1 C706F mutation, known to abrogate protein function19, and the mutation that disrupts a splice site in TP53. Combined loss of RB1 and TP53 is a characteristic feature of SCLC, confirming that NCI-H209 is genetically typical of this disease. One G>T transversion generated a premature stop codon in MLL2. We have observed clustering of truncating mutations in this gene, a histone methyltransferase, in renal cancer20. Of coding variants, 94 are predicted to change amino acids, and 36 are synonymous. Because cancer is a clonal disease in which the phenotypic consequences of mutation are subject to Darwinian natural selection, accumulation of mutations conferring selective advantage on cancer subclones will manifest as an excess of non-synonymous mutations. However, the observed non-synonymous:synonymous ratio of 2.61:1 is not significantly different from that expected by chance (P = 0.3), suggesting that the majority of coding variants do not confer a selective advantage to the cancer.
So what they say above is that loss of two proteins, key to DNA repair, RB1 and P53 is present in their artificially generated cell line. These proteins are also broken in SCLC, so their cell line is 'genetically typical' of SCLC. That's a stretch, imho. However, it is to be expected if the cell line was generated from a SCLC patient's cancerous lung cells. I would say it is genetically typical in this particular regard, they're making a much broader statement.
Owing to the limited throughput of capillary sequencing, there has previously been little attempt to explore regulatory regions of the genome for potential oncogenic mutations. To address this, we extracted somatic substitutions occurring within 2 kilobases (kb) either side of known transcription start sites, which would generally include gene promoters. Mutations were evenly distributed across the 4-kb regions (Supplementary Fig. 2A). We applied hidden Markov models to predict which substitutions might affect transcription factor binding sites. The distribution observed was no different to that seen in random, simulated sets of ‘mutations’ (Supplementary Fig. 2B), indicating that, analogous to substitutions in coding sequence, most of those found in regulatory regions are selectively neutral to the cancer. Nonetheless, as with coding mutations, there may be a small number that alter transcription factor binding and affect gene regulation, thus providing phenotypic variation for selection to act upon. For example, a T>G mutation 49 bp upstream of the transcription start site of a gene in the RAS oncogene family, RAB42, is predicted to have significant disruptive effects on a potential binding motif for the RAS-responsive RREB1 transcription factor (P = 3 × 10-98; Supplementary Fig. 2C).
Taken together, these data indicate that most of the mutations in coding and promoter regions of the NCI-H209 genome are passenger events, conferring no selective advantage to the cells. Ranking algorithms can be useful to prioritize variants for further study, but the key evidence for identifying driver mutations is recurrence in independent tumour samples, supplemented by functional studies.
Multiple mutation signatures in NCI-H209
Tobacco smoke contains more than 60 carcinogens which bind and chemically modify DNA, characteristically forming bulky adducts at purine bases (guanine and adenine)3. Adducts distort the DNA helix and, if not corrected by nucleotide excision repair or other pathways, allow non-Watson–Crick pairing during DNA replication. The physicochemical properties of the mutagen determine which adduct is formed, what repair mechanism is induced and which mis-pairing is permissible3. The substantial mutational load carried in the NCI-H209 genome allows us to discern with great statistical power several distinct mutation signatures—genomic records of the medley of mutagens deposited in the airways and lungs by tobacco smoking.
Again, very generic terms here. 'Tobacco smoke' well, what kind? From which product? Smoked how? No due is given to air/water/food/environmental toxins, which make up the bulk of mutagens we are exposed to here on BBM. They also immediately assume all these mutations were caused by 'tobacco smoke' - another stretch. Prove it? Not in this paper....
G>T/C>A transversions were the commonest change observed (34%), followed by G>A/C>T (21%) and A>G/T>C (19%) transitions (Fig. 2a). This distribution is remarkably similar to the pattern of substitutions observed in TP53 in SCLC cases curated from the published literature (Supplementary Fig. 3). This implies first that the NCI-H209 genome is typical of SCLC, and therefore of tobacco-associated mutational profiles, and second that most mutations were acquired in vivo, not during cell culture. G>T transversions caused by polycyclic aromatic hydrocarbons occur more frequently at methylated CpG dinucleotides in vitro and in TP5321, 22. To explore this genome-wide, we compared the base preceding G>T mutations with the base before wild-type guanines in NCI-H209 (Fig. 2b). CpG dinucleotides were significantly enriched among the G>T mutation set compared to controls (odds ratio (OR), 1.5; 95% confidence interval (CI), 1.3–1.6; P < 0.0001). We can use the fact that only 10–20% of CpG dinucleotides in CpG islands are constitutively methylated compared with 60–70% outside of CpG islands23 to assess how cytosine methylation affects mutations at the neighbouring guanine (Fig. 2c). G>T mutations at CpG dinucleotides were significantly more likely to be found outside CpG islands than expected by chance (OR, 1.8; 95% CI, 1.1–2.8; P = 0.02), suggesting that these transversions do indeed preferentially occur at methylated CpGs.
We next assessed the base preceding the guanine for G>A and G>C mutations (Fig. 2b). For G>A transitions, marked enrichment of CpG dinucleotides was observed in the mutation set compared with wild-type guanines in the genome (OR, 4.0; 95% CI, 3.7–4.3; P < 0.0001), and these showed a strong propensity to occur outside CpG islands (OR, 2.6; 95% CI, 1.6–4.1; P < 0.0001). This is consistent with the well-described phenomenon of spontaneous deamination of methylated cytosine to thymine. Although G>C transversions showed a similar enrichment for CpG context (OR, 2.2; 95% CI, 1.9–2.5; P < 0.0001), these were significantly more likely to occur within CpG islands (OR, 0.6; 95% CI, 0.4–1.0; P = 0.05), indicating that the carcinogen responsible targets unmethylated CpG dinucleotides. In keeping with previous reports24, 25, we found that the guanine base in G>C transversions was more frequently followed by an adenine than expected by chance (OR, 1.4; 95% CI, 1.3–1.5; P < 0.0001).
For mutations involving adenines, fewer substitutions of all classes were seen at GpA dinucleotides than expected by chance (P < 0.0001; Fig. 2d), and A>T and A>G occurred significantly more frequently at TpA than expected (P < 0.0001). Among somatically acquired indels, single-base-pair insertions were more likely to be gains of A or T nucleotides than C or G (8:1). Curiously, single base deletions favoured loss of C/G nucleotides, rather than A/T (26:12), and there was a propensity for the C/G deletions to occur at CC or GG dimers or longer (18 out of 26). In contrast to the frequency of indels at runs of A or T nucleotides, deletions at C or G tracts are not well described, and our findings may reflect a distinct mutation signature.
Thus, the sequence context of the ~23,000 mutations in the NCI-H209 genome provides tremendous power to identify multiple distinctive mutation signatures, not evident from targeted re-sequencing studies of limited genomic regions.
Imprint of two DNA repair pathways
Several pathways can repair DNA lesions caused by exogenous carcinogens. Bulky adducts on purines are the predominant form of DNA damage induced by tobacco carcinogens, and can be sufficiently disruptive to impede RNA polymerase when they occur on the transcribed strand of genes. Stalled RNA polymerases can recruit the nucleotide excision repair machinery, leading to excision of the altered nucleotide, preventing mutation. In studies of TP53 mutations in lung cancer, G>T transversions occur more frequently on the non-transcribed strand2, 5, suggesting that many of the same lesions occurring on the transcribed strand are correctly identified and removed by the cell. We found that guanine and adenine substitutions are generally less frequent on the transcribed than the non-transcribed strand (Supplementary Fig. 4), confirming that purines seem to be the major target of carcinogens in tobacco smoke.
The above is typical of most all chemical carcinogens, not just those found in tobacco smoke, more broad strokes. No wonder they're on the front page of nature.
We next correlated mutation prevalence to gene expression (Fig. 2e). For a given level of gene expression, the effects of transcription-coupled repair are revealed by the significant separation of curves for mutations on the transcribed and non-transcribed strands. We found evidence for significant transcription-coupled repair for G>T transversions (P < 0.0001), as well as A>G (P = 0.003) and A>T (P = 0.03), possibly G>C (P = 0.08), but not G>A (P = 0.3) or A>C (P = 0.8) mutations. Thus, the extent of transcription-coupled repair differs for the various classes of mutation, presumably reflecting differences in the ability of the transcription-coupled repair machinery to recognize and/or repair different adduct lesions.
For most mutations, there seems to be another novel expression-linked repair pathway that operates on both strands and is at least as numerically important as transcription-coupled repair. Thus, significantly lower mutation prevalence, on both transcribed and non-transcribed strands, was observed in more highly expressed genes for G>T (P < 0.0001), G>A (P < 0.0001), G>C (P < 0.0001) and A>T (P < 0.0001). Again, there are some interesting differences across mutation classes in the relative contributions of the two repair pathways. For A>G mutations, only transcribed strand mutations decreased with higher gene expression, suggesting that transcription-coupled repair is the more important pathway for preventing such events. In contrast, G>A mutations occurred equally on transcribed and non-transcribed strands, but mutations on both strands were significantly reduced in more highly expressed genes, indicating that the novel expression-linked repair pathway is more important than transcription-coupled repair here.
Taken together, these data imply that at least two separate DNA repair pathways have been enlisted for protection of the NCI-H209 genome, notwithstanding the difficulties in extrapolating cell line expression levels to in vivo expression during cancer progression. The fact that the two pathways have operated with differing efficacy across the six classes of mutation implies that the lesions have distinct physicochemical effects on DNA structure, with variable recognition and excision by the genome surveillance machinery.
Genomic rearrangements and copy number
We identified 58 somatically acquired genomic rearrangements in the NCI-H209 genome. These include 18 (31%) deletions and 9 (16%) tandem duplications. The majority of rearrangements, however, cannot be ascribed to classical structural variant patterns, due to the considerably greater complexity of somatically acquired rearrangements compared to germline events. This is exemplified by a set of rearrangements incorporating regions from chromosomes 1p32-36 and 4q25-28 (Fig. 3). Here, most of the intrachromosomal rearrangements are in inverted orientation, but cannot be classical inversions because they demarcate copy number changes and do not have reciprocal breakpoints. By similar reasoning, most interchromosomal rearrangements also seem to be unbalanced. Other clusters of unbalanced rearrangements were found in NCI-H209, including chromosomes 3q and 5q, and we have seen this phenomenon in many other solid tumour genomes.
The numbers here aren't too significant. There's another issue with this cell line, it's been transformed. Which basically means they took normal cells from the patient and genetically tweaked them into a stable cell line. A stable cell line is basically a more 'cancerous' version of what you started with. So how many of these mutations were there originally and how many did they create, it remains unknown and unaddressed in this paper.
Chromosomal rearrangements can juxtapose two genes: if they are in the same orientation with an intact open reading frame, an oncogenic fusion gene may result. In NCI-H209, a predicted in-frame fusion gene was created by a 240-kb deletion on chromosome 16, adjoining the first two exons of CREBBP with the 3′ portion of BTBD12, a gene involved in repair of double-stranded DNA breaks26, 27. Notably, in acute myeloid leukaemia, CREBBP is recurrently fused with MYST3 (ref. 28). PCR with reverse transcription (RT–PCR) showed that the predicted CREBBP–BTBD12 fusion transcript is expressed in NCI-H209, but not in 55 other SCLC cell lines. The significance of the predicted fusion gene with respect to cancer development is therefore unclear.
CHD7 rearrangements in SCLC cell lines
Intrachromosomal rearrangements can also result in internal rearrangements of genes, through loss or duplication of exons. A 39-kb tandem duplication was found in CHD7, predicted to lead to in-frame duplication of exons 3–8 (Fig. 4a). We previously identified a massively amplified and highly expressed fusion gene comprising exons 1–3 of PVT1, a non-coding RNA gene immediately downstream of MYC, and exons 4–38 of CHD7 in another SCLC cell line, NCI-H217114. This raises the possibility that CHD7 rearrangements may be recurrent in SCLC. Using multiplex ligation-dependent probe amplification, we identified a further SCLC cell line (LU-135) with internal exon copy number alterations, among 63 lines screened (Supplementary Fig. 5). LU-135 was therefore studied by mate-pair sequencing (Fig. 4b). This demonstrated that, as for NCI-H2171, the CHD7 amplicon was linked to MYC amplification. One breakpoint predicted the existence of a fusion gene between exon 1 of PVT1 and exons 14–38 of CHD7 (Fig. 4c), and as demonstrated by RT–PCR across the breakpoint, this transcript is expressed. In keeping with genomic amplification and active expression of the PVT1 locus, NCI-H2171 and LU-135 show particularly elevated levels of CHD7 transcripts (Fig. 4d). SCLC cell lines on average show a log2 greater expression of CHD7 than both non-small-cell lung cancer lines and other tumour types (P < 0.0001).
Thus, CHD7 is rearranged in three SCLC cell lines. Two carry a PVT1–CHD7 fusion gene in the setting of MYC amplification. PVT1 is a non-coding gene immediately downstream of MYC, and may itself be a transcriptional target of the MYC protein29. Insertion of CHD7 into this locus with subsequent amplification gives the double hit of increased gene copy number and regulatory elements for a co-amplified transcription factor, explaining the massive overexpression seen in these cell lines. PVT1 is recurrently rearranged in variant Burkitt’s lymphoma translocations30, and may be oncogenic31. The NCI-H209 rearrangement is predicted to duplicate one of the two chromodomains. CHD7 is a chromatin remodeller, promoting enhancer-mediated transcription through association with histone H3K4 methylation32. Histone modifiers have been implicated as cancer genes33, and a family member, CHD5, may function as a tumour suppressor gene34. Recurrent rearrangements of CHD7 in SCLC would be an interesting extension of this theme if functional studies and genomic analyses of primary samples confirm our data.
Discussion
The compendium of somatic alterations in a cancer genome is shaped by multiple intrinsic and extrinsic processes, including exposure to mutagens, selective pressures active in the tissue microenvironment, genomic instability and DNA repair pathways15. The advent of massively parallel sequencing heralds an era in which unbiased, genome-wide mutation screens allow the consequences of these processes to be discerned and decoded. Even in this single lung cancer genome, we can identify several distinctive point mutation patterns, reflecting the cocktail of carcinogens present in cigarette smoke, as well as signatures of the partially successful attempts of the cell’s surveillance machinery to repair DNA damage. The complete catalogue of somatically acquired mutations in a given cancer harbours the subset of variants that drive the neoplastic phenotype, and a likely candidate, CHD7 rearrangement, has emerged from the NCI-H209 genome.
Tobacco smoke deposits many hundreds of chemicals in the airways and lungs. Each carcinogen-associated mutation represents the consequence of three processes: chemical modification of a purine by a mutagen, failure to repair the lesion by genome surveillance pathways and incorrect nucleotide incorporation opposite the distorted base during DNA replication. G>T transversions are the commonest substitution in NCI-H209, mutations previously linked to polycyclic aromatic hydrocarbons3 and acrolein22 in tobacco smoke. We found enrichment of G>T mutations at CpG dinucleotides, especially outside CpG islands, supporting in vitro evidence that these carcinogens preferentially bind methyl-CpG dinucleotides21, 22. Polycyclic aromatic hydrocarbons containing a cyclopentane ring have been associated with G>C transversions35. We found them enriched at CpG dinucleotides, but, in contrast to G>T and G>A mutations, our data indicate that unmethylated CpGs are the target here, underscoring the remarkable statistical power genome-wide mutation screens give for delineating mutation spectra.
We can also infer signatures of DNA repair in cancer genomes. Transcription-coupled repair is induced by stalling of RNA polymerase at bulky adducts on the transcribed strand, and we saw evidence for this process in NCI-H209. We have also discovered the imprint of a novel and more general form of expression-linked repair, through which mutation frequency is reduced on both strands in highly expressed genes. The expression-linked decrease in mutation frequency may reflect global genomic nucleotide excision repair, in which distorting adducts are corrected genome-wide36. Why this pathway should be more effective in highly transcribed regions is unclear. One possibility is that single-stranded (ss)DNA formed on both strands during transcription facilitates recognition of the adduct, and there is some evidence that components of the nucleotide excision repair pathway can recognize adducts in ssDNA37. Strikingly, some mutation types were repaired almost exclusively by transcription-coupled repair (A>G), some showed evidence for only the more general expression-linked repair (G>A), whereas others had features of both mechanisms (G>T, A>T). Such differences are presumably determined by the physicochemical properties of the multitudinous adducts induced by carcinogens present in tobacco smoke.
On average, lung cancer develops after 50 pack-years of smoking38 (where a pack-year is 7,300 cigarettes, representing the number smoked in a pack a day for a year). Candidate gene re-sequencing studies suggest that the mutation prevalence in NCI-H209 is similar to that of primary lung cancers39, 40. If the majority of mutations derive from the mélange of mutagens present in tobacco smoke, the clone of cells that ultimately becomes cancerous would acquire, over its lifetime, an average of one mutation for every 15 cigarettes smoked. If this is the case in a localized cluster of cells, then the number of mutations acquired across the whole bronchial tree from even one cigarette must be substantial. The data presented here demonstrate the power of whole-genome sequencing to disentangle the many complex mutational signatures found in cancers induced by tobacco smoke.
Left out the methods/references for brevity. 1 Mutation per 15 cigarettes? Puh-lease. The number of genetic mutations you're exposed to every day is dramatic, sun light and oxygen being the primary culprits. So while I like this paper for the technology used, and the raw science done, the interpretation is wayyyy off and leaves a lot to be desired. There's another article, which is basically the 'short PR version' of the paper which might be worth a casual review now that you've gotten to look at the 'science'.
Here it is.
Cancer genomes reveal risks of sun and smoke
Sequencing of skin and lung cancers show that many mutations could be prevented.
Brendan Borrell
Researchers have completed the genetic sequences of two types of cancer — skin cancer and small-cell lung cancer — revealing that the genomes bear the hallmarks of their respective carcinogens: sun and smoke. Worldwide, the two diseases kill a total of nearly 250,000 people each year, despite the fact that they are largely preventable.
Tumours develop when a normal cell's DNA is damaged, allowing that cell to proliferate unchecked. By sequencing and cataloguing all the mutations in a single tumour type from multiple individuals, scientists aim to identify the genes that are most susceptible to damage, to understand the processes underlying DNA repair, and to develop drugs that counteract certain types of damage.
Scientists from the Cancer Genome Project at the Wellcome Trust Sanger Institute in Hinxton, near Cambridge, UK, and their collaborators at partner institutions describe the genetic sequences of cell lines derived from patients with small-cell lung cancer1 or malignant melanoma2. The studies are published online today in Nature.
These papers mark the completion of the fourth and fifth cancer-cell genomes to be sequenced, and come just one year after a team from Washington University School of Medicine in St Louis published the first cancer genome, from a patient with leukaemia3. The breast-cancer genome was published by a Canadian-led consortium in October this year4, and dozens more sequences are expected to come out of The Cancer Genome Atlas Program of the US National Cancer Institute in Bethesda, Maryland — a project that is slated to receive US$275 million over the next two years from the National Institutes of Health.
"We are in the middle of an explosive development in cancer-genome sequencing," says Matthew Meyerson, a cancer-genomics expert at the Dana-Farber Cancer Institute in Boston, Massachusetts, who was not involved in the research. "Whole-genome sequencing is the wave of the future for both cancer-gene discovery and, eventually, for cancer diagnosis."
Fifteen cigarettes, another mutation
Peter Campbell, a haemotologist and cancer-genomics expert at the Sanger Institute who worked on the latest studies, says that the number of genetic mutations they identified — 33,345 for melanoma and 22,910 for lung cancer — was remarkable. The mutations were not distributed evenly throughout the genome — many were present outside of gene-coding regions, suggesting that cells had repaired damaged DNA in those key regions.
Campbell says that the findings help to answer lingering questions about whether carcinogens cause most mutations directly, or if cancer itself contributes to the mutations by disrupting the function of DNA-repair mechanisms. The team found that most mutations were single-base DNA substitutions that could be traced to the carcinogenic effects of chemicals in tobacco smoke (in the case of the small-cell lung cancer genome) or ultraviolet light (in the melanoma genome), supporting the idea that these two cancers are largely preventable. The team estimates that every 15 cigarettes smoked results in a DNA mutation. "Every pack of cigarettes is like a game of Russian roulette," Campbell says. "Most of those mutations will land where nothing happens in the genome and won't do major damage, but every once in a while they'll hit a cancer gene."
The lung-cancer study also identified one recurrent mutation — a duplication of the chromatin-remodelling gene CHD7, which regulates the activity of other genes. The team had already identified the existence of this mutation in 2008, but the current study1 confirms its presence in three independent cell lines. Such recurrent mutations could point to key cancer genes that may be useful drug targets.
Some scientists, however, are more circumspect about the benefits of cancer-genome sequencing. Steve Elledge, an expert in DNA damage and cancer genetics at Harvard Medical School in Boston, Massachusetts, was impressed with the new analysis but says that the potential impact on cancer diagnosis and treatment will not be fully felt until scientists have hundreds of sequences at hand — a costly prospect. "It's still very expensive, and I think all these efforts should be coupled with an equal amount of effort on studying gene function," he says.
Corrected:
This article previously stated that each cigarette smoked could result in an estimated 15 DNA mutations. In fact, the typical smoker would acquire one mutation for every 15 cigarettes smoked.



; Like Homeland Security, with their command center the WHO. http://www.cdc.gov/tobacco/global/gtss/tobacco_atlas/pdfs/tobacco_atlas.pdf