Genotype imputation based on discriminant and cluster analysis

Mahmoud, Medhat

dc.contributor.author	Mahmoud, Medhat
dc.date.accessioned	2012-11-09T13:30:37Z
dc.date.available	2012-11-09T13:30:37Z
dc.date.copyright	2012
dc.date.issued	2012-11-09
dc.identifier.uri	http://hdl.handle.net/11250/186135
dc.description.abstract	The recent development of high-throughput systems for genotyping SNP in Eukaryote has led to an extraordinary amount of research activity, particularly in areas such as whole-genome selection of livestock and genome-wide association studies for detection of quantitative trait loci (Van Tassell et al., 2008). Recent technological advances allow us to rapidly genotype more than 10 million SNPs in an individual, accounting for 10% of the estimated number of common SNPs (more than 1% minor allele frequency) across the population. As a result of missing SNPs, true associations might be missed if the causal SNP is not genotyped or if the causal variant is an unknown variant. SNP imputation is important in reducing the cost of re-sequencing and when genotyping all considered animals may be too costly and sometimes not feasible because DNA may not be available for all animals. Computational algorithms and statistical methods have been developed to account for some of the unobserved variants. The main idea behind these methods is based on the observation that SNPs in close proximity to one another in the genome tend to be correlated, or in non-random association (linkage disequilibrium). Several powerful methods to impute missing SNP genotypes already exist that, apart from the genotypic information at the locus of interest, “using only pedigree data” (Gengler, 2007, 2008), “only surrounding markers” (FastPHASE; Scheet and Stephens, 2006), or both (Li and Jiang, 2003; Kong et al., 2008; Meuwissen and Goddard, 2010; Mulder et al., 2010b). The mixed model (BLUP) method presented by Gengler et al. (2007) uses BLUP to find the missing gene content conditional on genotypic information of relatives. “Several articles have described comparisons of imputation methods with respect to computational efficiency and the accuracy of results” (Pei YF, 2008; Yu Z, 2007; Nothnagel M, 2009). Overall, MACH, BEAGLE, and IMPUTE have been shown to have a proximate similar accuracy, and all of these programs have been shown to outperform other methods for imputation such as FAST PHASE (Scheet P, 2006) and PLINK (Purcell S, 2007). Consequently, we perceived a substantial need to proposing a new technique for SNP Imputation with applying linear Discrimination and Clustering Analysis Algorithms. To evaluate the factors potentially affecting imputation accuracy rates (ARs), we used simulated data sets to investigate the effects of Linkage disequilibrium (LD), Minor allele frequency (MAF) of un-typed SNPs, marker density (MD), reference sample size (n) and the different numbers of SNPs in every haplotype block, in imputation accuracy rate (AR) and the performance of linear discriminant analysis and clustering Analysis as a SNP imputation method.	no_NO
dc.language.iso	eng	no_NO
dc.publisher	Norwegian University of Life Sciences, Ås
dc.subject	SNP Imputation	no_NO
dc.subject	Clustering	no_NO
dc.subject	Linear discrimination	no_NO
dc.title	Genotype imputation based on discriminant and cluster analysis	no_NO
dc.type	Master thesis	no_NO
dc.subject.nsi	VDP::Mathematics and natural science: 400::Basic biosciences: 470::Genetics and genomics: 474	no_NO
dc.subject.nsi	VDP::Agriculture and fishery disciplines: 900::Agriculture disciplines: 910::Livestock breeding, rearing, reproduction: 912	no_NO
dc.source.pagenumber	57	no_NO

Tilhørende fil(er)

Filnavn:: Mahmoud2012.pdf
Størrelse:: 2.839Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Master's theses (IHA) [318]

Vis enkel innførsel