Genetic Validation of Psoriasis Phenotyping in UK Biobank Supports the Utility of Self-Reported Data and Composite Definitions for Large Genetic and Epidemiological Studies

The Journal of investigative dermatology(2023)

引用 1|浏览27
暂无评分
摘要
In dermatology and elsewhere, GWAS meta-analyses now routinely include data from large-scale population-based biobanks (Zhou et al., 2022Zhou W. Kanai M. Wu K.-H.H. Rasheed H. Tsuo K. Hirbo J.B. et al.Global Biobank Meta-analysis Initiative: powering genetic discovery across human disease.Cell Genomics. 2022; 2100192Abstract Full Text Full Text PDF Scopus (41) Google Scholar). Many examples (Boutin et al., 2020Boutin T.S. Charteris D.G. Chandra A. Campbell S. Hayward C. Campbell A. et al.Insights into the genetic basis of retinal detachment.Hum Mol Genet. 2020; 29: 689-702Crossref PubMed Scopus (16) Google Scholar; Han et al., 2020Han Y. Jia Q. Jahani P.S. Hurrell B.P. Pan C. Huang P. et al.Genome-wide analysis highlights contribution of immune system pathways to the genetic architecture of asthma.Nat Commun. 2020; 11: 1776Crossref PubMed Scopus (71) Google Scholar; Mitchell et al., 2022Mitchell B.L. Saklatvala J.R. Dand N. Hagenbeek F.A. Li X. Min J.L. et al.Genome-wide association meta-analysis identifies 29 new acne susceptibility loci.Nat Commun. 2022; 13: 702Crossref PubMed Scopus (9) Google Scholar; Tachmazidou et al., 2019Tachmazidou I. Hatzikotoulas K. Southam L. Esparza-Gordillo J. Haberland V. Zheng J. et al.Identification of new therapeutic targets for osteoarthritis through genome-wide analyses of UK Biobank data.Nat Genet. 2019; 51: 230-236Crossref PubMed Scopus (227) Google Scholar) have used data from UK Biobank, a study of >500,000 participants aged 40–70 years with self-reported and electronic health record–derived clinical diagnoses (Bycroft et al., 2018Bycroft C. Freeman C. Petkova D. Band G. Elliott L.T. Sharp K. et al.The UK Biobank resource with deep phenotyping and genomic data.Nature. 2018; 562: 203-209Crossref PubMed Scopus (2866) Google Scholar). However, correct interpretation of genetic or epidemiological associations identified in biobank data should acknowledge that cases selected via study-specific self-report and electronic health record procedures may be subject to misclassification or a different disease phenotype on average than those ascertained in a specialist clinical setting and typically used in molecular studies of disease processes (Cai et al., 2020Cai N. Revez J.A. Adams M.J. Andlauer T.F.M. Breen G. Byrne E.M. et al.Minimal phenotyping yields genome-wide association signals of low specificity for major depression.Nat Genet. 2020; 52: 437-447Crossref PubMed Scopus (125) Google Scholar). We focus on chronic plaque psoriasis, reporting a framework that uses genetic effect size estimates to evaluate the consistency between candidate biobank phenotypes and psoriasis diagnosed by a specialist physician. Specifically, we assess the degree to which candidate biobank definitions capture nonpsoriasis cases—or (presumably milder) psoriasis cases with lower genetic liability than typical specialist-ascertained cases—by regressing estimated genetic effect sizes at established psoriasis susceptibility loci against reference values obtained from a previous GWAS of psoriasis case cohorts in which recruitment was based on in-person specialist diagnosis (Tsoi et al., 2017Tsoi L.C. Stuart P.E. Tian C. Gudjonsson J.E. Das S. Zawistowski M. et al.Large scale meta-analysis characterizes genetic architecture for common psoriasis associated variants.Nat Commun. 2017; 815382Crossref PubMed Scopus (194) Google Scholar) (Figure 1). Our inverse variance–weighted regression slope estimates a lower bound for the positive predictive value (minPPV) for true psoriasis cases among participants selected by the candidate biobank definition (full details are provided in Supplementary Materials and Methods). We validate our approach on dermatologist-derived case-control psoriasis GWASs and simulated case-control cohorts with known misclassification rate (Supplementary Materials and Methods, Supplementary Figure S1, and Supplementary Table S4). We applied our method to UK Biobank (unrelated White British participants after quality control; N = 336,733), in which psoriasis cases can be defined using a single data source (self-reporting, linked general practitioner [GP] diagnoses or Hospital Episode Statistics; Table 1, Supplementary Table S1), or combinations thereof. Among single-source candidate psoriasis definitions, self-reported psoriasis (NSRP = 4,244) was most concordant with specialist-diagnosed psoriasis (minPPVSRP = 66.9%, 95% confidence interval [CI]: 61.2–72.6%), even more so with a self-reported psoriasis-relevant medication (N = 1,927; minPPV = 73.9%, 95% CI: 65.2–82.6%). Psoriasis definitions from Hospital Episode Statistics (HES) identified fewer psoriasis cases (NHESany = 1,726) and were less concordant (minPPVHESany = 57.9%, 95% CI: 48.9–66.8%). GP-based psoriasis definitions were least concordant with specialist diagnosis (NGP = 5,768; minPPVGP = 46.4%, 95% CI: 40.5–52.3%), albeit improving when multiple GP diagnoses were required (NGP2 = 2,422; minPPVGP2 = 58.6%, 95% CI: 50.3–66.9%).Table 1List of Selected Candidate UK Biobank Psoriasis Phenotypes, with Abbreviations, Case Numbers before and after Genotyping QC, IVW Estimate, and Power to Detect a Common (MAF = 30%) Risk Factor of Weak Effect (OR = 1.1)AbbreviationPhenotype DescriptionNumber of Psoriasis Cases (All)Number of Psoriasis Cases (Genotyped, White British Unrelated)IVW Regression Slope (∼minPPV)Mean (95% CI)(vs Selected Controls, n = 141,279)Power to Detect Common Weak Effect (vs Selected Controls)Single data sourceSRPSelf-reported psoriasis6,1104,2440.669 (0.612–0.726)0.478SRPMSelf-reported psoriasis and medication relevant to psoriasis2,7501,9270.739 (0.652–0.826)0.296HESmainPsoriasis as main diagnosis in linked HES4492890.605 (0.422–0.788)0.077HESsecPsoriasis as secondary diagnosis in linked HES2,3001,5320.587 (0.491–0.683)0.175HESanyPsoriasis as main or secondary diagnosis in linked HES2,5931,7260.579 (0.489–0.668)0.178GPrawPsoriasis diagnosis in linked GP data, using read codes corresponding to ICD-10 psoriasis codes in UK Biobank mapping file11,5607,9560.324 (0.279–0.370)0.243GPPsoriasis diagnosis in linked GP data, using curated list of read codes8,4445,7680.464 (0.405–0.523)0.340GP2Two or more psoriasis diagnoses in GP data using curated read codes3,4722,4220.586 (0.503–0.669)0.242GP3Three or more psoriasis diagnoses in GP data using curated read codes1,9841,3890.614 (0.515–0.714)0.172Combined data sources1-SRP-HESanyAny one of SRP or HESany7,5685,1940.624 (0.570–0.677)0.4991-SRP-GPAny one of SRP or GP12,6168,6470.517 (0.471–0.563)0.5431-SRP-GP2Any one of SRP or GP28,3205,7860.615 (0.559–0.670)0.5381-SRP-HESany-GPAny one of SRP, HESany or GP13,6669,3160.508 (0.463–0.553)0.5611-SRP-HESany-GP2Any one of SRP, HESany or GP29,5466,5740.585 (0.535–0.636)0.5422-SRP-HESany-GPAny two of SRP, HESany or GP3,0562,1220.721 (0.638–0.805)0.303All-SRP-HESany-GPAll three of SRP, HESany or GP4253000.818 (0.616–1.020)0.1052-SRP-SRM-HESany-GPAny two of SRP, SRM, HESany or GP5,0803,4990.696 (0.628–0.763)0.4432-SRP-SRM-HESany-GP2Any two of SRP, SRM, HESany or GP24,2912,9650.726 (0.660–0.792)0.4163-SRP-SRM-HESany-GPAny three of SRP, SRM, HESany or GP1,5991,1220.771 (0.675–0.866)0.216All-SRP-SRM-HESany-GPAll four of SRP, SRM, HESany or GP2621850.870 (0.637–1.104)0.084Phenotypes incorporating PsA codesSRP+PsASelf-reported psoriasis or psoriatic arthritis6,6364,6030.664 (0.606–0.721)0.503SRPM+PsASelf-reported psoriasis or PsA, and psoriasis-relevant medication3,0132,1070.747 (0.661–0.832)0.330HESany+PsAPsoriasis or PsA as main or secondary diagnosis in linked HES3,3882,2720.616 (0.530–0.703)0.252GP+PsAPsoriasis or PsA diagnosis in linked GP data using curated read codes8,8086,0240.457 (0.398–0.517)0.3481-SRP-HESany-GP+PsAAny one of SRP+PsA, HESany+PsA or GP+PsA14,4759,8640.510 (0.464–0.555)0.5822-SRP-HESany-GP+PsAAny two of SRP+PsA, HESany+PsA or GP+PsA3,7192,5790.713 (0.629–0.797)0.3482-SRP-SRM-HESany-GP+PsAAny two of SRP+PsA, SRM+PsA, HESany+PsA or GP+PsA5,6923,9170.688 (0.620–0.756)0.468Abbreviations: CI, confidence interval; GP, general practitioner; HES, Hospital Episode Statistics; IVW, inverse variance-weighted; MAF, minor allele frequency; minPPV, lower bound of positive predictive value for psoriasis phenotype (i.e., IVW regression slope); OR, odds ratio; PsA, psoriatic arthritis; QC, quality control; SRP, self-reported psoriasis; SRPM, self-reported psoriasis-relevant medication. Open table in a new tab Abbreviations: CI, confidence interval; GP, general practitioner; HES, Hospital Episode Statistics; IVW, inverse variance-weighted; MAF, minor allele frequency; minPPV, lower bound of positive predictive value for psoriasis phenotype (i.e., IVW regression slope); OR, odds ratio; PsA, psoriatic arthritis; QC, quality control; SRP, self-reported psoriasis; SRPM, self-reported psoriasis-relevant medication. We recognize that the large sample sizes afforded by biobank studies may offset limitations in phenotype stringency when considering statistical power to detect novel genetic and epidemiological associations (Supplementary Figure S2). We therefore estimated the power to detect an association with a novel psoriasis risk factor (population frequency 0.3, odds ratio 1.1; Table 1) (details and results for other scenarios are presented in Supplementary Materials and Methods and Supplementary Figure S3). Among single-source candidate definitions in UK Biobank, self-reported psoriasis demonstrated the highest power for discovery (powerSRP = 47.8%), substantially higher than the larger but less concordant GP-based definition (powerGP = 34.0%). We then considered composite psoriasis definitions based on multiple data sources. Requiring a single coding from any source conferred limited agreement with specialist-defined psoriasis (minPPV1-SRP-HESany-GP = 50.8%, 95% CI: 46.3–55.3%) but large case numbers such that statistical power for discovery exceeded all other definitions (N1-SRP-HESany-GP = 9,316; power1-SRP-HESany-GP = 56.1%). Requiring two independent corroborative codings improved concordance with specialist-defined psoriasis to ∼70% (minPPV2-SRP-HESany-GP = 72.2%, 95% CI: 63.8–80.5%; minPPV2-SRP-SRM-HESany-GP = 69.6%, 95% CI: 62.8–76.3%) although power (power2-SRP-HESany-GP = 30.3%; power2-SRP-SRM-HESany-GP = 44.3%) remained lower than the top-performing single-source definition (powerSRP = 47.8%). UK Biobank participants with psoriasis codings across all data sources demonstrated high concordance (minPPVAll-SRP-HESany-GP = 81.8%, 95% CI: 61.6–102.0%; minPPVAll-SRP-SRM-HESany-GP = 87.0%, 95% CI: 63.7–110.4%) with CIs crossing 100%. This is consistent with our positive control GWASs, which had slope estimates between 0.9 and 1.1 with CIs crossing 1 (Supplementary Table S4), the smallest cohort (n = 464 cases) being the only exception. Our estimated minPPV of self-reported psoriasis in the UK Biobank (67%) is much higher than previous self-reported psoriasis in 23andMe (36%) (Tsoi et al., 2017Tsoi L.C. Stuart P.E. Tian C. Gudjonsson J.E. Das S. Zawistowski M. et al.Large scale meta-analysis characterizes genetic architecture for common psoriasis associated variants.Nat Commun. 2017; 815382Crossref PubMed Scopus (194) Google Scholar). This may be due to ascertainment differences: rather than an online questionnaire, UK Biobank participants are interviewed by a trained research nurse and are required to have seen a doctor for each reported condition (UK, 2012UK BiobankUK Biobank resource 100235: the verbal interview within ACE centres.https://biobank.ctsu.ox.ac.uk/showcase/ukb/docs/TouchscreenQuestionsMainFinal.pdfDate: 2012Date accessed: March 7, 2022Google Scholar). Primary care and hospital data may have lower estimated minPPVs than self-reporting owing to misclassification because of the difficulty in the nonspecialist differential diagnosis of psoriasis from other common lesional skin diseases. Alternatively, patients diagnosed through primary care or hospital episodes (in which most recorded diagnoses are secondary) may have milder psoriasis on average, with consequently reduced genetic liability, than those included in dermatologist-diagnosed psoriasis GWASs; previous work showed that 90% of psoriasis primary care diagnoses were subsequently confirmed by GP reviewers (Seminara et al., 2011Seminara N.M. Abuabara K. Shin D.B. Langan S.M. Kimmel S.E. Margolis D. et al.Validity of The Health Improvement Network (THIN) for the study of psoriasis.Br J Dermatol. 2011; 164: 602-609PubMed Google Scholar). The relatively low regression slope estimates for UK Biobank psoriasis indicators may represent not only case misclassification but also a lower genetic liability for psoriasis among patients with mild disease than those with severe disease. We recognize that without a formal validation exercise, methods such as those presented here are unable to distinguish between these scenarios. However, when considering that most molecular research into psoriasis biology is conducted in patients with moderate-severe psoriasis, our inverse variance–weighted slope estimates remain valuable as a measure of aggregate genetic risk among cases equivalent to a positive predictive value for dermatologist-ascertained psoriasis. The optimal psoriasis definition for future genetic and epidemiological investigations will depend on the specific research aims. In UK Biobank, we recommend that discovery research, with statistical power a priority, defines cases using any self-reported or electronic health record psoriasis coding (and our maximum statistical power of 58% should be interpreted in the context of contributing to larger meta-analyses); studies requiring accurate effect size estimates and high concordance with dermatologist-diagnosed psoriasis are encouraged to use two or more data sources. We also recommend the inclusion of psoriatic arthritis diagnostic codes for the beneficial effect on sample size with minimal drop-off in concordance (Supplementary Table S2). It remains unclear whether concordance is unaffected because of the psoriatic arthritis–only participants having cutaneous psoriasis not coded in UK Biobank, or because psoriatic arthritis shares genetic risk loci with cutaneous psoriasis. In UK Biobank, a definition requiring only self-reporting of psoriasis balances both high diagnostic validity and statistical power; generalization of this finding to other datasets may depend on the ascertainment method. To facilitate such assessments, we have demonstrated here an approach to assess the composition of psoriasis diagnoses when assembling future cohorts from large electronic health record/questionnaire-based biobank studies. The UK Biobank resource is available to bona fide researchers for health-related research in the public interest (https://www.ukbiobank.ac.uk/enable-your-research). Biomarkers of Systemic Treatment Outcomes in Psoriasis (BSTOP) data are available for approved research use by making an application to the BSTOP Data Access Committee (https://www.kcl.ac.uk/lsm/research/divisions/gmm/departments/dermatology/research/stru/groups/bstop/documents). Jake R. Saklatvala: http://orcid.org/0000-0003-0836-4928 Ken B. Hanscombe: http://orcid.org/0000-0002-3715-6805 Satveer K. Mahil: http://orcid.org/0000-0003-4692-3794 Lam C. Tsoi: http://orcid.org/0000-0003-1627-5722 James T. Elder: http://orcid.org/0000-0003-4215-3294 Jonathan N. Barker: http://orcid.org/0000-0002-9030-183X Michael A. Simpson: http://orcid.org/0000-0002-8539-8753 Catherine H. Smith: http://orcid.org/0000-0001-9918-1144 Nick Dand: http://orcid.org/0000-0002-1805-6278 SKM reports departmental income from AbbVie, Almirall, Eli Lilly, Novartis, Sanofi, and UCB, outside the submitted work. CHS is principal investigator on MRC (PSORT) and EC-funded consortia with multiple industry partners (see PSORT.org.uk, BIOMAP-IMI.eu, and HIPPOCRATES-IMI.eu for up-to-date listings of contributory partners), and is a co-supervisor of PhD studentships through MRC/industry collaboration (Boehringer Ingelheim GmbH). The remaining authors state no conflict of interest. This project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking (JU) under grant agreement number 821511 (Biomarkers in Atopic Dermatitis and Psoriasis). The JU receives support from the European Union’s Horizon 2020 research and innovation program and the European Federation of Pharmaceutical Industries and Associations. This publication reflects only the author's view and the JU is not responsible for any use that may be made of the information it contains. This research has been conducted using the UK Biobank resource (approved project 15147) and uses data provided by patients and collected by the NHS as part of their care and support. ND received funding from Health Data Research UK (MR/S003126/1), which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation, and the Wellcome Trust. SKM is funded by an MRC Clinical Academic Research Partnership Award (MR/T02383X/1). The authors would like to thank the Psoriasis Association for ongoing support and funding since the inception of BSTOP (reference: RG2/10: RG2/10). The authors also acknowledge the invaluable support of the National Institute for Health and Care Research through the clinical research networks and its contribution in facilitating recruitment to BSTOP. Members of the BSTOP Study Group who contributed to the collection of valuable clinical information and samples for profiling (excluding individually named authors of this work) are Nadia Aldoori, Mahmud Ali, Alex Anstey, Fiona Antony, Charles Archer, Suzanna August, Periasamy Balasubramaniam, Kay Baxter, Anthony Bewley, Alexandra Bonsall, Victoria Brown, Katya Burova, Aamir Butt, Mel Caswell, Sandeep Cliff, Mihaela Costache, Sharmela Darne, Emily Davies, Claudia DeGiovanni, Trupti Desai, Bernadette DeSilva, Victoria Diba, Eva Domanne, Harvey Dymond, Caoimhe Fahy, Leila Ferguson, Maria-Angeliki Gkini, Alison Godwin, Fiona Hammonds, Sarah Johnson, Teresa Joseph, Manju Kalavala, Mohsen Khorshid, Liberta Labinoti, Nicole Lawson, Alison Layton, Tara Lees, Nick Levell, Helen Lewis, Calum Lyon, Sandy McBride, Sally McCormack, Kevin McKenna, Serap Mellor, Ruth Murphy, Paul Norris, Caroline Owen, Urvi Popli, Gay Perera, Nabil Ponnambath, Helen Ramsay, Aruni Ranasinghe, Saskia Reeken, Rebecca Rose, Rada Rotarescu, Ingrid Salvary, Kathy Sands, Tapati Sinha, Simina Stefanescu, Kavitha Sundararaj, Kathy Taghipour, Michelle Taylor, Michelle Thomson, Joanne Topliffe, Roberto Verdolini, Rachel Wachsmuth, Martin Wade, Shyamal Wahie, Sarah Walsh, Shernaz Walton, Louise Wilcox, and Andrew Wright. Conceptualization: ND; Data Curation: SKM, CHS; Formal Analysis: JRS, KBH; Funding Acquisition: JNB, MAS, CHS, ND; Investigation: JRS, KBH; Methodology: JRS, ND; Project Administration: CHS, ND; Resources: LCT, JTE; Supervision: JNB, MAS, CHS, ND; Writing - Original Draft Preparation: JRS, ND; Writing - Review and Editing: JRS, KBH, SKM, LCT, JTE, JNB, MAS, CHS, ND This project used UK Biobank data under approved project number 15147. UK Biobank is a prospective study with >500,000 participants aged 40–69 years when recruited in 2006–2010 (Bycroft et al., 2018Bycroft C. Freeman C. Petkova D. Band G. Elliott L.T. Sharp K. et al.The UK Biobank resource with deep phenotyping and genomic data.Nature. 2018; 562: 203-209Crossref PubMed Scopus (2933) Google Scholar). The study has collected and continues to collect extensive phenotypic and genotypic detail about its participants, including data from questionnaires, physical measures, sample assays, accelerometry, multimodal imaging, genome-wide genotyping, and longitudinal follow-up for a wide range of health-related outcomes. Linkage to health record data comprises Hospital Episode Statistics data that document hospital inpatient visits and primary care data (currently available for 230,105 participants). The UK Biobank study was approved by the National Health Service National Research Ethics Service (approval nos. 11/NW/0382, 16/NW/0274), and all participants provided written informed consent. We defined nine candidate psoriasis definitions based on a single-source data type (Table 1): two based on self-reported illnesses and medications, three based on linked Hospital Episode Statistics data, and four based on linked primary care data. Full details of the codes included are given in Supplementary Table S1. Self-reported medications include prescription medications being taken regularly by participants at the time of their assessment center visit (UK Biobank field 20003). The full list of medications reported by participants self-reporting psoriasis was reviewed by dermatologists (SKM, CHS) to identify relevant psoriasis medications. Linked primary care data include two types of read code: readV2 and readCTV3 (NHS Digital, 2020NHS Digital. Read codes, https://digital.nhs.uk/services/terminology-and-classifications/read-codes; 2020 (accessed 7 March 2021).Google Scholar). We included codes of both types. We further distinguished candidate primary care phenotypes based on read codes that corresponded to International Classification of Diseases, 10th Revision, L40 codes in a UK Biobank mapping file (Table 1: GPraw) (UK Biobank,UK Biobank. UK Biobank resource 592, https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=592; 2021 (accessed 7 March 2021).Google Scholar) from those based on a previously validated list of psoriasis read codes (Table 1: GP) (Seminara et al., 2011Seminara N.M. Abuabara K. Shin D.B. Langan S.M. Kimmel S.E. Margolis D. et al.Validity of the Health Improvement Network (THIN) for the study of psoriasis.Br J Dermatol. 2011; 164: 602-609PubMed Google Scholar). Validated read code lists in readV2 format were mapped to readCTV3 using the UK Biobank mapping file (UK Biobank,UK Biobank. UK Biobank resource 592, https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=592; 2021 (accessed 7 March 2021).Google Scholar). We further considered candidate psoriasis definitions based on combining data sources. These ranged from broader definitions in which a single psoriasis coding across data sources would be sufficient, to stricter definitions requiring psoriasis codings from multiple data sources (Table 1). Because psoriatic arthritis typically presents with skin lesions, we considered additional candidate psoriasis definitions based on expanded lists of self-report, primary care (Ogdie et al., 2013Ogdie A. Langan S. Love T. Haynes K. Shin D. Seminara N. et al.Prevalence and treatment patterns of psoriatic arthritis in the UK.Rheumatology (Oxford). 2013; 52: 568-575Crossref PubMed Scopus (106) Google Scholar), and Hospital Episode Statistics codes that included psoriatic arthritis (Table 1; Supplementary Table S1). The UK Biobank central team performed genotype calling and imputation. Genotyping was performed using the Affymetrix UK BiLEVE Axiom array (n ∼50,000) and the Affymetrix UK Biobank Axiom array (n ∼450,000) (Bycroft et al., 2018Bycroft C. Freeman C. Petkova D. Band G. Elliott L.T. Sharp K. et al.The UK Biobank resource with deep phenotyping and genomic data.Nature. 2018; 562: 203-209Crossref PubMed Scopus (2933) Google Scholar). Based on quality control metrics provided by UK Biobank, we removed samples that exhibited gender mismatch, excess relatedness, heterozygosity, or missingness > 5% and extracted individuals determined by UK Biobank to form an unrelated subset of homogeneous (White British) ancestry. We then removed additional individuals with low call rates (<98%) in well-called (>90%) markers, giving 336,814 samples for subsequent analysis (336,733 after withdrawals). Genome-wide imputation was performed by the UK Biobank central team using IMPUTE2 software and a reference panel derived from UK10K and 1,000 Genomes phase 3 haplotypes (Howie et al., 2011Howie B. Marchini J. Stephens M. Genotype imputation with thousands of genomes.G3(Bethesda). 2011; 1: 457-470Crossref PubMed Scopus (710) Google Scholar, Howie et al., 2009Howie B.N. Donnelly P. Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.PLoS Genet. 2009; 5e1000529Crossref PubMed Scopus (2897) Google Scholar). For subsequent analysis, we considered variants with imputation R2 > 0.7 and minor allele frequency > 0.5%. We performed association testing at 35 variants of interest (see in subsequent section) for each candidate psoriasis definition we generated. Each definition provided a set of participants to be considered for psoriasis cases. For unaffected controls, we included participants who had linked primary care data and were negative for psoriasis under all candidate psoriasis definitions (n = 141,279). We fitted a logistic regression for each variant using PLINK v2.0 (Chang et al., 2015Chang C.C. Chow C.C. Tellier L.C. Vattikuti S. Purcell S.M. Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets.GigaScience. 2015; 4: 7Crossref PubMed Scopus (5161) Google Scholar), using 20 ancestry principal components and genotyping array as covariates. To derive a reference genetic instrument representative of dermatologist-diagnosed psoriasis, summary statistics from seven dermatologist-derived case-control GWASs (totaling 13,229 cases and 21,543 controls) (Tsoi et al., 2017Tsoi L.C. Stuart P.E. Tian C. Gudjonsson J.E. Das S. Zawistowski M. et al.Large scale meta-analysis characterizes genetic architecture for common psoriasis associated variants.Nat Commun. 2017; 815382Crossref PubMed Scopus (197) Google Scholar) were analyzed using an inverse variance–weighted (IVW) fixed effect meta-analysis. We identified 38 independent genome-wide significant (P < 5 × 10−8) associations at least 1 Mb apart. This excluded associations in the major histocompatibility complex region on chromosome 6: the strong association between HLA-C∗06:02 and psoriasis age of onset means that estimated effect sizes at this locus are strongly influenced by ascertainment strategy, and comparison across studies is complex. Of the 38 lead variants, 35 were available in the UK Biobank imputed genetic dataset, whereas the remaining 3 were unavailable with no suitable proxy found using the LDLink platform (Machiela and Chanock, 2015Machiela M.J. Chanock S.J. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants.Bioinformatics. 2015; 31: 3555-3557Crossref PubMed Scopus (1011) Google Scholar). For each candidate UK Biobank psoriasis definition, effect sizes (betas) at the 35 lead variants were regressed against effect sizes from the reference instrument (Supplementary Table S5), weighted by inverse variance to give higher weight to loci with more confident effect size estimates, using function mr_ivw from R package MendelianRandomization (version 0.5.0) (Yavorska and Burgess, 2017Yavorska O.O. Burgess S. MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data.Int J Epidemiol. 2017; 46: 1734-1739Crossref PubMed Scopus (690) Google Scholar). The slope of the regression line gives an indication of how depressed, on average, are the effect sizes of the candidate psoriasis phenotype in comparison to established psoriasis effect sizes (Figure 1); a slope of 1 would indicate effect sizes consistent with those already established for dermatologist-derived psoriasis (full results in Supplementary Table S2). To understand how accuracy and statistical power would be affected by a less stringent definition of controls, we fitted alternative regression models in which all participants not positive for the candidate psoriasis definition were included as “unselected” controls (Supplementary Table S3). We observed slightly higher regression slopes when using selected controls, in comparison to using unselected controls (Supplementary Figure S4). With the assumption that the unaffected control group of UK Biobank participants were representative of the control datasets in dermatologist-derived psoriasis GWAS, we considered that any effect size depression relative to the reference genetic instrument could be driven by the inclusion of misclassified individuals within the psoriasis cases (compared with the specialist-diagnosed psoriasis cohorts from the study by Tsoi et al.). This may be through incorrect self-report or misdiagnosis by nondermatologists in linked Hospital Episode Statistics or primary care data. At any single locus, the degree of effect size depression will depend on the positive predictive value (PPV; true positives / true positives + false positives) of the candidate UK Biobank definition but also on the allele frequency and the magnitude of the established effect. The relationship between the PPV and the IVW regression slope is therefore complex, and we undertook simulations to inform our interpretation of IVW regression slopes. Using PLINK v1.9 (Chang et al., 2015Chang C.C. Chow C.C. Tellier L.C. Vattikuti S. Purcell S.M. Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets.GigaScience. 2015; 4: 7Crossref PubMed Scopus (5161) Google Scholar), we simulated genetic datasets representing affected and unaffected individuals for 35 SNVs with effect size and frequency equivalent to those in the reference genetic instrument. In each simulation, we included 125,000 controls from simulated unaffected individuals, and 5,000 cases that were a mix of simulated affected (true posi
更多
查看译文
关键词
CI,GP,HES,minPPV,SRP
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要