Assessing the importance of primary care diagnoses in the UK Biobank.
Clifton L., Liu X., Collister JA., Littlejohns TJ., Allen N., Hunter DJ.
The UK Biobank has made general practitioner (GP) data (censoring date 2016-2017) available for approximately 45% of the cohort, whilst hospital inpatient and death registry (referred to as "HES/Death") data are available cohort-wide through 2018-2022 depending on whether the data comes from England, Wales or Scotland. We assessed the importance of case ascertainment via different data sources in UKB for three diseases that are usually first diagnosed in primary care: Parkinson's disease (PD), type 2 diabetes (T2D), and all-cause dementia. Including GP data at least doubled the number of incident cases in the subset of the cohort with primary care data (e.g. from 619 to 1390 for dementia). Among the 786 dementia cases that were only captured in the GP data before the GP censoring date, only 421 (54%) were subsequently recorded in HES. Therefore, estimates of the absolute incidence or risk-stratified incidence are misleadingly low when based only on the HES/Death data. For incident cases present in both HES/Death and GP data during the full follow-up period (i.e. until the HES censoring date), the median time difference between an incident diagnosis of dementia being recorded in GP and HES/Death was 2.25 years (i.e. recorded 2.25 years earlier in the GP records). Similar lag periods were also observed for PD (median 2.31 years earlier) and T2D (median 2.82 years earlier). For participants with an incident GP diagnosis, only 65.6% of dementia cases, 69.0% of PD cases, and 58.5% of T2D cases had their diagnosis recorded in HES/Death within 7 years since GP diagnosis. The effect estimates (hazard ratios, HR) of established risk factors for the three health outcomes mostly remain in the same direction and with a similar strength of association when cases are ascertained either using HES only or further adding GP data. The confidence intervals of the HR became narrower when adding GP data, due to the increased statistical power from the additional cases. In conclusion, it is desirable to extend both the coverage and follow-up period of GP data to allow researchers to maximise case ascertainment of chronic health conditions in the UK.