| NB: When considering information concerning groups of people with ME/CFS and those with long COVID, it is important to remember that ME/CFS is a symptom-based clinical diagnosis not a mechanistic one. It is clear there is a high degree of shared pathophysiology between ME/CFS and long COVID, and the two diagnostic labels are not mutually exclusive. Importantly, some individuals with long COVID meet ME/CFS diagnostic criteria or have a dual diagnosis. |
Key points
- There are many difficulties faced when attempting to estimate the prevalence of long COVID (LC). In this, the first of two articles, ME Research UK highlights the following issues:
- There is no validated biomarker to detect LC. Rather, LC is identified based on the presence and duration of certain symptoms, and there are several different published definitions. Notably, the published definitions of LC are not consistent, meaning that the findings from prevalence studies using different definitions may not be comparable.
- Prevalence of LC varies depending on the method used to identify the study population. For example, through self-report, hospital records, GP records. Notably prevalence also varies depending on where the data is from, for example from LC clinics compared with a sample from the general population.
- There is also variation in LC prevalence based on the type of study design used, for example studies which follow participants up over time may estimate a different prevalence rate to those which just look at a single time point.
- While estimates of the prevalence of LC exist, they are limited by a number of factors such as the definition used, how representative the study population is of the general population, and how reliable the methods are.
- Prevalence estimates change over time, and vary between countries (and between different research studies), meaning that it is not always appropriate to directly compare estimates or to extrapolate figures to populations for which they were not intended/beyond that for which they were initially calculated.
Introduction
In this, the first of two articles discussing the difficulties faced when attempting to reach a consensus regarding a prevalence estimate for long COVID (LC), ME Research UK will highlight:
- Differences between definitions of LC and the impact this has on research.
- The influence of using different methods to identify cases of LC are identified.
- Difficulties in identifying cases of LC, especially now COVID-19 testing is no longer routine.
- How study design can influence the prevalence of a disease.
Differences in definitions used
Ideally, LC would be reliably diagnosed using a measurable biological indicator (biomarker) over a set period of time – more than 12 weeks according to NICE – following a confirmed COVID-19 infection. This would both accurately enable clear distinction from other diseases, and have high ability to detect those with LC (sensitivity) and to identify those without the disease (specificity).
However, as with ME/CFS, there is no validated biomarker to detect LC. Rather, LC is identified based on the presence and duration of certain symptoms, and there are several different published definitions which can be used for LC including: NASEM 2024, WHO 2021, United States National Centre for Health Statistics (US-NCHS) definition, United Kingdom Office for National Statistics (UK-ONS) definition, and NICE 2020.
Notably, the published definitions of LC are not consistent. This means that when research studies use different definitions to identify the illness, the resulting prevalence estimates may not be comparable. Regrettably, some studies also use definitions developed for the purpose of their research rather than published definitions, further complicating matters.
Differences in prevalence rates according to definition have been illustrated in a study by Wisk and colleagues, published in 2025. Here, researchers considered five different LC definitions – taken from previously published papers – and applied them to 4,575 participants; 3,521 who had a history of COVID-19 (those with “self-reported symptoms suggestive of acute SARS-CoV-2 infection at the time of a SARS-CoV-2 test” followed by a further test confirming history of infection) and 1,054 who did not, from the INSPIRE cohort in the US.
Results showed that depending on the criteria used, the prevalence of LC amongst those who had a history of COVID-19 infection ranged from:
- 30.8% to 42.0% at 3 months after COVID infection.
- 14.2% to 21.9% at 6 months after COVID infection.
Interestingly, Wisk and colleagues also applied the symptom-based criteria to the COVID-negative population, and found that similar proportions met the long COVID criteria:
- 28.08% and 40.32% at 3 months
- 14.60% to 23.27% at 6 months
Findings not only demonstrate variation in prevalence by definition, but also suggest that criteria which require the presence of only one long COVID symptom may not act as a reliable method for identifying true cases of the illness.
It is worth noting that some of the LC ‘cases’ identified among those who were ‘COVID-negative’, may have been in people who had experienced an asymptomatic COVID-19 infection, or in those who did not take a COVID test. As only those who “self-reported symptoms suggestive of acute SARS-CoV-2 infection at the time of a SARS-CoV-2 test” were eligible for the further testing to confirm COVID history, participants who did not self-report the relevant symptoms, or who had not done a COVID test despite having the illness, would have been allocated to the COVID-negative group despite potentially having a history of infection.
Differences depending on the method of identification
As with all prevalence estimates, the prevalence of LC varies depending on the method used to identify the study population, including those with the disease.
Methods of identification could include those who self-report through an online survey, hospital records, GP records (and within these medical code or free text notes), recruitment or data from LC clinics, or a sample of the general population.
Difference in prevalence estimate by method of identification has been clearly demonstrated by a study using electronic health record data in Scotland. Results indicated that even within the health record data, there were significant variations in the prevalence estimate based on the method used to identify cases among the 4,676,390 participants.
Clinical codes identified fewest cases (1,092, 0.02%), followed by free text (8,368, 0.2%), sick notes (14,469, 0.3%), and what was termed an ‘operational definition’ based on patterns of clinical interactions recorded in the electronic health records (64,193, 1.4%).
Overall, 1.7% were identified as having LC using one or more method of identification.
Interestingly, there was limited overlap in cases identified between the different measures, and all measures considered indicated a similar trend in the prevalence of LC over time.
Difficulties identifying cases of LC
As COVID-19 is still prevalent but routine testing has ceased, identifying cases of LC is becoming more complex. While cases of LC which developed following a symptomatic COVID-19 infection accompanied by a positive COVID test are more clear cut, those which arise following mild COVID symptoms and no COVID test, an unknown virus with COVID-like symptoms, or an asymptomatic COVID infection not confirmed with a test, are much harder to capture. These difficulties identifying cases of LC may give the impression that the illness is becoming more rare, but in reality the decline in new cases could reflect limitations in measurement.
It is also worth noting that research suggests that the symptoms of long COVID and ME/CFS overlap, and studies have shown that a proportion of individuals with LC meet ME/CFS criteria such as the Canadian Consensus Criteria (CCC) or the 2021 NICE guidelines. However, not everyone with LC meets diagnostic criteria for ME/CFS. Whether or not a person has undiagnosed long COVID does not affect whether or not they qualify for an ME/CFS diagnosis (as ME/CFS is a symptom-based diagnosis).
Variation by study design
There is also variation in LC prevalence based on the type of study design used:
- Cross sectional – provides a snapshot in time but does not allow for a picture over time of the number of people who have LC
- Longitudinal – Repeated measures over time of the number of people with LC. This gives an indication of how rates of prevalence change but is limited by loss to follow-up which may bias the results, especially if those with LC systematically are more or less likely to drop out.
- Case-Control: Often produce “artificial” prevalence, as the proportion of cases to controls is typically set by the researcher rather than reflecting the true population, making them better for identifying risk factors than measuring population burden.
Whilst a well-designed study without any limitations can provide an accurate prevalence estimate, any flaws in design can lead to systematic error which can distort findings.
For example:
- If the sample population is not representative of the target population the prevalence may be skewed. For example, if only those who were admitted to a hospital with a COVID-19 infection are included, the prevalence of LC may be artificially high.
- Studies with high non-response rates can overestimate prevalence if individuals with LC are more likely to participate than those without.
- Using different diagnostic tests (e.g., self-reported surveys versus laboratory tests) can significantly alter prevalence, with screenings and self reported information tending to overestimate compared to use of strict LC criteria.
- Clearly defining those with LC is essential, as including ineligible individuals as ‘cases’ can overestimate the prevalence.
Importantly, it is not usual for these ‘flaws’ to be due to researcher error, rather they are normally due to factors like insufficient resources, or poor quality of the data available. It is important that any limitations, alongside their implications, are clearly reported by the researchers in the discussion section of their papers.
Summary
While estimates of the prevalence of LC exist, they are limited by a number of factors such as the definition used, how representative the study population is of the general population, and how reliable the methods are. Additionally, prevalence estimates change over time – and vary between countries (and between different research studies) – meaning that it is not always appropriate to directly compare estimates or to extrapolate figures to populations for which they were not intended/beyond that for which they were initially calculated.
Part 2 of this article will discuss further difficulties faced when attempting to arrive at a consensus regarding the prevalence of long COVID.
