medRxiv. 2020 Sep 25:2020.09.24.20201228. doi: 10.1101/2020.09.24.20201228. Preprint.
Background The COVID-19 epidemic of 2019-20 is due to the novel coronavirus SARS-CoV-2. Following first case description in December, 2019 this virus has infected over 10 million individuals and resulted in at least 500,000 deaths world-wide. The virus is undergoing rapid mutation, with two major clades of sequence variants emerging. This study sought to determine whether SARS-CoV-2 sequence variants are associated with differing outcomes among COVID-19 patients in a single medical system. Methods Whole genome SARS-CoV-2 RNA sequence was obtained from isolates collected from patients registered in the University of Washington Medicine health system between March 1 and April 15, 2020. Demographic and baseline medical data along with outcomes of hospitalization and death were collected. Statistical and machine learning models were applied to determine if viral genetic variants were associated with specific outcomes of hospitalization or death. Findings Full length SARS-CoV-2 sequence was obtained 190 subjects with clinical outcome data. 35 (18.4%) were hospitalized and 14 (7.4%) died from complications of infection. A total of 289 single nucleotide variants were identified. Clustering methods demonstrated two major viral clades, which could be readily distinguished by 12 polymorphisms in 5 genes. A trend toward higher rates of hospitalization of patients with Clade 2 was observed (p=0.06). Machine learning models utilizing patient demographics and co-morbidities achieved area-under-the-curve (AUC) values of 0.93 for predicting hospitalization. Addition of viral clade or sequence information did not significantly improve models for outcome prediction. Conclusion SARS-CoV-2 shows substantial sequence diversity in a community-based sample. Two dominant clades of virus are in circulation. Among patients sufficiently ill to warrant testing for virus, no significant difference in outcomes of hospitalization or death could be discerned between clades in this sample. Major risk factors for hospitalization and death for either major clade of virus include patient age and comorbid conditions.