Epidemiological features of MERS cases
From September 2012 to June 2020, a total of 2562 laboratory-confirmed MERS cases were reported. After excluding 112 cases with incomplete data, 2450 confirmed MERS cases, together with 150 derived case clusters, were included in subsequent analyses (Table 1). The median age was 53 years old (IQR: 38‒65), and 69.4% of cases were male. Healthcare workers accounted for 13.7% (335/2450) of the total patients. The median time from disease onset to diagnosis was 5 days (IQR: 3‒8). Death occurred in 802 patients, leading to a case fatality rate (CRF) of 32.7% (95% CI: 30.9‒34.6%). Among the 1453 patients with known exposure history, 356 (24.0%) reported animal contact. Zoonotic infections (i.e., countries in transmission categories i and ii) only occurred in the Middle East, although cases with animal contact had also been imported into Europe and Southeast Asia (Fig. 1). Among countries with locally infected patients, the proportion of cases with animal contact exceeded 50% in Qatar, followed by 15‒29% in Oman, United Arab Emirates (UAE), and Saudi Arabia (Fig. 1).
Saudi Arabia, Republic of Korea and UAE together reported the vast majority of the cases and case clusters (Additional file 1: Table S2 and Figure S1‒S2). Patients from the three countries shared similar age and sex distributions, but Saudi Arabia had the highest CFR (35.1%), probably due to a higher proportion of underlying conditions (Table 1). Republic of Korea had the highest proportion (> 99%) of hospital infections and the shortest time (median = 4 days) from disease onset to diagnosis, yet its CFR (18.8%) is comparable to that of UAE (16.1%). UAE had the highest proportion of asymptomatic infections and the longest survival time among fatal cases (Table 1).
Cases with animal contact were older, more likely to be male, and more likely to have underlying conditions and a longer delay from disease onset to diagnosis, in comparison to cases without (Additional file 1: Table S4). These characteristics of cases with animal contact could partially explain their significantly higher CFR (35.1% vs 24.3%, P < 0.001) than those without animal contact (Additional file 1: Table S4). However, among fatal cases, those with animal contact had a slightly longer survival time than those without (median = 11.5 vs 9 days, P < 0.001). Cases with only patient contact were more likely to be health-care workers and more likely to be asymptomatic (Additional file 1: Table S4). The proportion of cases with animal contact was increasing from 2012 to 2018 and seemed to stabilize thereafter. Epidemics peaked mostly between April and September (Additional file 1: Figure S3). In contrast, cases with animal contact were more likely to occur from January to March. Seasonality differed slightly between countries, e.g., peaks of zoonotic infections in UAE were also observed in May. The peak of the global epidemic in June of 2015 was due to the opportunistic outbreak in Republic of Korea, which does not necessarily reflect seasonality in the endemic setting.
Determinants for case fatality
All the eight individual-level factors (age group, sex, living in the Middle East, healthcare worker, underlying conditions, animal contact history before onset, delay from disease onset to diagnosis and onset year) were significantly associated with mortality in the univariate analysis, with age and underlying condition as the leading factors with OR > 5 (Table 2). We identified a multivariate model containing all eight main effects, as well as two-way interactions among age group, sex and animal contact. After controlling for other factors, underlying condition was associated with 3.5-fold increase in the risk (OR = 3.5, 95% CI: 2.5–4.89), whereas being a healthcare worker was protective (OR = 0.33, 95% CI: 0.21–0.54). OTC ≥ 5 days increased the risk by 31% (95% CI: 5–63%). Compared to 2012–2014, the risk of death was higher during 2015–2016 (OR = 1.63, 95% CI: 1.27–2.09) but lower during 2017–2019 (OR = 0.70, 95% CI: 0.53–0.92).
Odds ratios with regard to each of age group, sex and animal contact while controlling for the other two are shown in Additional file 1: Table S5–S7. The older age group (≥ 65 years) was associated with a higher risk of death in general, with significant ORs ranging 1.64–10.52 for all combinations of sex and animal contact history (Additional file 1: Table S5). However, the age effect was more prominent among patients without animal contact. Among patients without animal contact, the adjusted OR were 5.65 (95% CI: 3.75–8.50) for males and 10.52 (95% CI: 6.46–17.11) for females, compared to 1.64 (95% CI: 1.03–2.62) for males and 3.06 (95% CI: 1.59–5.86) for females among those with animal contact. From these estimates, it is also clear that the age effect was more prominent among female patients compared to male patients. Sex did not affect the risk of death significantly, although males tended to have higher risk among the younger age group without animal contact (adjusted OR = 2.34, 95% CI: 1.57–3.49, Additional file 1: Table S6). A history of animal contact was associated with a higher risk (adjusted OR = 2.97, 95% CI: 1.10–7.98) among female cases < 65 years but with a lower risk (adjusted OR = 0.31, 95% CI: 0.18–0.51) among male cases ≥ 65 years old (Additional file 1: Table S7). The model-estimated effects of age group, sex, and underlying conditions on CFR are in line with the observed CFRs stratified by each of these variables (Additional file 1: Figure S4). The CFR decreased gradually since 2015.
After the screening of correlations, 20 out of the 34 socioenvironmental variables (Additional file 1: Table S3) remained for further analysis: population density, elevation, camel density, 8 variables regarding land covers (cropland, forest, shrubland, grassland, wetland, bareland, waterbody, urban), 6 ecoclimatic variables (bio1, bio2, bio3, bio4, bio5, bio12), transportation (railway, main road) and number of hospitals.
The first case was reported in Bisha, central-west Saudi Arabia in September 2012. The disease spread more rapidly towards the east (UAE and Oman) than towards other directions (Fig. 2a). The diffusion appears to be acerating in recent years. At the second administrative level, 12 of the 14 eco-geographic variables were associated with the spread of MERS in the univariable Cox regressions (Table 3). In the multivariate analysis, positive associations with the disease diffusion were found for seven factors, and the top three drivers are intersection with main roads [adjusted hazard ratio (HR) = 15.45, 95% CI: 2.11–113.26, P = 0.007], intersection with railways (adjusted HR = 2.33, 95% CI:1.37–3.97, P = 0.002) and elevation (adjusted HR = 2.40, 95% CI: 1.47–3.91, P < 0.001). A higher coverage of cropland seemed to have impeded the disease diffusion (adjusted HR = 0.51, 95% CI: 0.27–0.95, P = 0.034). We overlaid the land coverage and transportation networks with the spatiotemporal distribution of the first reported cases in each space unit (Fig. 2b). In the Arabian Peninsula, intersection with major roads and railroads was clearly associated with earlier invasion.
Phylogeny and phylogeographic analysis of whole-genome sequences
In total, 499 MERS-CoV full-genome sequences were obtained from GenBank, including 251 sequences from human patients, 237 from camel, seven from bat, three from hedgehog, and one from Lama glama (llama). These sequences were collected between 2011 and 2019 from 15 countries, and 90.0% of them were from Middle East. IQ-Tree selected the GTR and FreeRate with ten categories as the best substitution model for these sequences.
An initial analysis showed that sequences from bat and hedgehog form a separate clade distant from the main clade of sequences from humans, camels and Lama glama (Additional file 1: Figure S5a), confirming that camel is the major zoonotic reservoir of MERS-CoV for spillover to human. We excluded the ten sequences from bat and hedgehog from subsequent analyses. The main clade of human, camel and llama strains, named clade C, contains five subclades numbered C1–C5, with C5 being the largest subclade with 398 sequences (Fig. 3, Additional file 1: Figure S5b). Overall, sequences from human and camel mixed throughout the whole tree, indicating multiple introduction events from camel to human. Nevertheless, the human and camel sequences sampled after 2016 in our database were genetically distant from each other. The root ancestor of clade C, dated back to January 2007 (confidence interval: April 2006–September 2008), was 49.3% likely from camel and 50.7% likely from human. The case mortality rates differed between clades in the phylogenetic tree. C5 was associated with a higher mortality rate than other clades (a difference of 1% in CFR, P-value = 2.0 × 10–4, Fig. 3). A sub-clade of C5, C5.1, showed an even higher mortality rate than clades C1–C4 (a difference of 4% in CFR, P-value < 2.2 × 10–16). Compared to subclades C1–C4, we found non-synonymous mutations in regions encoding the ORF3 protein (P86L) and the NS4B protein (M6T) among the C5.1 sequences as well as in regions encoding the 1AB protein (S6737N), the NS4A protein (P106S), and the Membrane protein (V69I) among other C5 sequences.
The spatiotemporal transmission pattern of clade C was characterized by intense local migration within the Middle East and occasional long distance exportation (Movie 1; Additional file 1: Figure S6). The top three most likely locations of the inferred root ancestor were Riyadh of Saudi Arabia, the Nile Delta region and Jordan with posterior probabilities of 31%, 17% and 12%, respectively. Riyadh appeared to be the major source exporting infections both locally and internationally. It was estimated with 99% posterior probability as the location of common ancestral node for subclades C3, C4, and C5 which cover 97.5% of the collected sequences. Early exportation of the virus to Egypt and Jordan likely occurred before 2010. The circulation of MERS-CoV among dromedary camels in East Africa possibly started before 2010. The model inferred migration of the virus from Egypt to Ethiopia during 2011–2013 and subsequently to Kenya during 2014–2017, partly supported by a serological study conducted in Egypt in 2013 that found both domestic dromedary camels and those imported from Ethiopia were seropositive . Intense migration of the virus from Riyadh towards local cities in Saudi Arabia, Abu Dhabi in UAE and Europe started during 2011–2012. Abu Dhabi soon joined Riyadh as the second hub exporting the virus to other Middle East cities as well as to Europe. The opportunistic exportation events from the Middle East to the United States in 2014 and to East Asia in 2015 were correctly captured by the model.
To identify key genetic sites affecting the transmissibility of MERS-CoV from animals to human, we performed positive selection analysis on the collected sequences. Based on the phylogenetic tree, we focused on two branches with potentially high selection pressure (Additional file 1: Figure S5a), the branch separating the hedgehog-related clade from the clades among bat, camel and human (branch A), and the one separating the bat-related sequences from clade C among camel and human (branch B). We tested whether specific sites underwent positive selection along branches A or B using the branch-site test model implemented in the codeML program of the PAML package. For branch A, we identified three proteins (NS3, 1AB polyprotein and nucleoprotein) and 59 sites in the 1AB polyprotein under positive selection (Additional file 1: Table S8). For branch B, we identified two different proteins (ORF8b and spike glycoprotein) and eight sites in the spike glycoprotein under positive selection. Three of the eight spike glycoprotein sites, 77:Y, 486:H and 636:Q have not been previously reported. No positive selection was detected in the spike glycoprotein for branch A.