If you’ve taken an introductory genetics course, you probably know a little bit about the techniques and tools that geneticists use to connect diseases with underlying genes. You’ve probably learned that it is a long and tedious process to go from knowing the phenotype (outward manifestation) of a disease to the underlying genetic cause. (Don’t worry if you have no prior knowledge of this, I will try to explain everything here!) What you might not be aware of is that new breakthroughs in the past couple of years have shortened the gene-hunting process by several orders of magnitude — in some cases, you can now go from disease to gene in just two weeks! The McGill Genome Innovation Centre is one of the institutions leading the efforts in this area of research, and I’m fortunate enough to have the opportunity to take part in it.
Until recently, geneticists did what are called “linkage studies” to narrow down the candidate genes that contribute to a disease. Researchers are actually still doing this. To understand how this works, we need to revisit the biological process of meiosis. During the formation of sperm and eggs, our cells undergo a process called meiosis, which reduces the genetic material in each cell by a half. When an egg is fertilized by a sperm, the full genome is reconstituted. For the most part, our somatic (non-sex) cells inherit two copies of every gene, one copy from our mom and one from our dad. What’s important is that during meiosis, genetic material is switched around, or recombined, between the paternally and maternally inherited chromosomes – this gives rise to genetic variation in the offspring, which is very important. The idea of linkage is that genes closer together on a chromosome will be less likely to be recombined with respect to each other; in other words, they tend to be passed onto the next generation together.
We can get an idea of how close together two genes are on a chromosome by looking at how frequently they recombine in meiosis. In the same vein, if a gene is linked to a disease, it tends to be inherited with the disease. Scientists use this fact to narrow down candidate disease regions by looking at hundreds of genetic markers (a DNA sequence whose location is known), and how closely they are linked to the disease phenotype. This technique is quite limiting, since you need to have a large family pedigree to be able to elucidate inheritance patterns. Also, you can narrow down the region, but never quite locate the gene. It took scientists 10 years to find the gene for Huntington’s disease after figuring out its approximate location in the genome!
So, what has changed? In recent years, sequencing technology has advanced exponentially. We can now sequence the entire genome within a few weeks. In particular, we can speed up the process even more by only sequencing the exons – where most disease causing mutations are thought to be located. This process is called “exome sequencing” and is much faster because just looking at the exons (which makes up 1/60 of the human genome) narrows down the amount of data drastically. Whereas in the past looking at DNA sequences was only feasible if we had a very narrow candidate region, geneticists can now look at the entire genome (or exome) at once. By comparing the DNA sequences of an affected individual with unaffected controls, we can pick out mutations that contribute to disease.
Sounds simple, right? Well, not quite. Keep in mind that there are 3 billion nucleotides in our genome. Individuals also differ naturally at hundreds of thousands of locations within the genome and those differences do not necessarily contribute to disease. The only viable method of analyzing the huge amount of data is through the use of computers, and the size and complexity of the genome presents computational challenges.
My research focuses on one aspect of these challenges. I’m currently helping to develop methods to detect a specific type of genomic mutation known as copy number variations (CNVs). These are simply large deletions and duplications of parts of our genome, some of which are benign and some contributes to disease. I’m mostly interested in looking at rare CNVs that have large effects on Mendelian diseases, which are disorders caused by the malfunction of a single gene. The first step is to get a sense of what CNVs look like in the data, then we must think of ways to distinguish between rare mutations that cause disease and common mutations that do not. Then can we start writing programs to try to pick out these features automatically from the huge data sets that we collect. So far we’ve had some success in picking out disease-causing mutations, but we are constantly looking to reduce false positives and improve the performance of our program.
Yuhao is a U2 Computer Science and Biology Joint program student, and the managing editor of MSURJ.