[Study on genetic structure differences and adjustment strategies in different areas of China].
Zhu M., Lyu J., Yu CQ., Jin GF., Guo Y., Bian Z., Robin W., Iona M., Chen ZM., Shen HB., Hu ZB., Li LM., China Kadoorie Biobank Collaborative Group None.
Objective: To describe the genetic structure of populations in different areas of China, and explore the effects of different strategies to control the confounding factors of the genetic structure in cohort studies. Methods: By using the genome-wide association study (GWAS) on data of 4 500 samples from 10 areas of the China Kadoorie Biobank (CKB), we performed principal components analysis to extract the first and second principal components of the samples for the component two-dimensional diagram generation, and then compared them with the source of sample area to analyze the characteristics of genetic structure of the samples from different areas of China. Based on the CKB cohort data, a simulation data set with cluster sample characteristics such as genetic structure differences and extensive kinship was generated; and the effects of different analysis strategies including traditional analysis scheme and mixed linear model on the inflation factor (λ) were evaluated. Results: There were significant genetic structure differences in different areas of China. Distribution of the principal components of the population genetic structure was basically consistent with the geographical distribution of the project area. The first principal component corresponds to the latitude of different areas, and the second principal component corresponds to the longitude of different areas. The generated simulation data showed high false positive rate (λ=1.16), even if the principal components of the genetic structure was adjusted or the area specific subgroup analysis was performed, λ could not be effectively controlled (λ>1.05); while, by using a mixed linear model adjusting for the kinship matrix, λ was effectively controlled regardless of whether the genetic structure principal component was further adjusted (λ=0.99). Conclusions: There were large differences in genetic structure among populations in different areas of China. In molecular epidemiology studies, bias caused by population genetic structure needs to be carefully treated. For large cohort data with complex genetic structure and extensive kinship, it is necessary to use a mixed linear model for association analysis.