With data that come along one number at a time, it is easy to get lost in the details. To see the big picture, it helps to use a time-series graph that will draw your eye in the direction that your mind wants to go. These simple graphs reveal how the values are changing over time and thereby place each value in context, making them more easily understood. Here we will look at some time-series that provide a global perspective on the Covid-19 pandemic.
ADVERTISEMENT |
Comparable numbers
With 329 million people, the United States is the third largest country in the world. China and India have more than four times the population of the United States, and fourth-place Indonesia has only 82 percent of the population of the United States. These size differences make direct comparisons misleading, which is why data are often normalized and expressed as rates per million population. While these rates provide an equitable basis for making comparisons, the per-capita rates are one step removed from the original values. A conversion is required to turn these rates into values we can expect to see in practice. This conversion is not complicated, but it nevertheless creates an obstacle for readers to either hurdle or stumble over.
…
Comments
Grouping strategy
Surely a better grouping would be groups of countries conducting similar numbers of tests. That would still be potentially misleading as what groups are being tested is important (those w/symptoms vs. general population for example.) But it would likely be more meaningful than similar populations.
If wishes were horses then
If wishes were horses then beggars would ride.
It does no good to wish for data that are not available.
The 50 states are certainly as heterogeneous as the countries grouped together here.
Dr. Wheeler's Analysis is Excellent
1. Dr. Wheeler's Analysis is excellent. It does no one any good to complain about accurate or reliable data- there is no such thing-except in a theoretical world.
2. I would suggest that Dr. Wheeler should analyze the following data as well: COVD-19 Deaths per 100,000 population; COVID-19 deaths per 100,000 COVID-19 patients; and these rates before and after "lockdowns". The groups could be the same countries as grouped in his present article.
On COVID-19 Data
I have been tracking COVID-19 cases and deaths in the US down to the county level for some Government dashboards, as part of a contract I'm currently on. We use the data from Johns Hopkins. I don't track data from other countries. I can tell you that there are often anomalies in the US data; some of it is structural (there are a number of states that don't turn in deaths numbers on Sundays -- and, apparently, some that don't turn them in on Mondays). Some of the anomalies are not easily explained.
A couple of questions were raised in this discussion. One was about the "testing" question. Testing is one pretty good way to measure, and if we knew more about the reliability of the tests, it would allow for quantifying some of the uncertainty. Most of the databases I've seen have fairly extensive discussions on how the data were collected. For COVID, the numbers generally reflect diagnoses. I suppose you could argue that a doctor's diagnosis is not as definitive as a lab test, but the virus doesn't care whether your doctor has access to lab tests or not - it's going to infect you either way, and doctors have to diagnose their patients' problems whether they can get the tests or not. You can also argue about whether the tests are accurate, but then we have to deal with the conditional probability arguments around false positives and false negatives, and it would probably turn out that MD diagnoses are at least as reliable as lab tests, taken as a whole.
And then there are cases that are unreported...we can only speculate about those, so I don't. I just recognize that for any number of very good reasons (and in some cases for very bad reasons), every number I get is probably low. How low, I can't know. I just know that the numbers reported are optimistic. In years to come, we will be able to do some time series analyses on deaths, and we should be able to estimate the excessive death rate during these months/years. That should be an indication of how low the death count was. Cases we probably can't know without some sort of universal antibody testing. The numbers reported by JH are "confirmed cases." I give my client all the caveats, so they know what I'm reporting.
Do I wish the data were better, cleaner, more comprehensive? You bet. I get two numbers each day for each county in the US: a confirmed cases count and a deaths count. I would love to know how many of those cases are active cases. That number isn't universally reported. Same with hospitalizations. That makes finding things like deaths per some number of patients a challenge. If I had 10 or 20 dedicated analysts who could pore over data from all the disparate sources, I might be able to get those counts on a reliable basis, and some of that is available at higher levels of aggregation, but I am looking at US Counties and States, and I have to have data that are reasonably comparable county-to-county. That would take a bigger crew than I have budget for.
There is already some evidence for the effects of lockdowns - most of it comes from Asia, where they practiced lockdowns, social distancing and masking much more rigorously than in many other countries.
Within county data?
County-level data sounds highly detailed when considering the entire country. Within the county you live, on the other hand, you would very much like to have finer data. Santa Clara County, CA, has 2 million inhabitants. Half the population lives in San Jose; 1/20, in Palo Alto. These are radically different environments but I have not been able to find data at the city or neighborhood level.
And yes, as Don Wheeler says, the number of confirmed cases is a lower bound to the number of infections, but we don't know whether the number of infections is 5 or 10 times the number of confirmed cases. All it would take to find out is testing random samples of a few thousand members of the general population. As the size of this multiplier is a function of testing practices, such studies would have to be done in multiple places. As far as I know, this has yet to be done anywhere.
As a country, we have the resources to poll constantly on voter intentions and on TV viewership but we haven't been able to do it to estimate how many of us are infected. Do we not want to know?
Regarding Cases Reported
Thanks for this post, Dr. Wheeler, and Rip, thanks for your work and your reply.
You say that the number of unreported cases is not known, but you "just know that the numbers reported are optimistic." In other words, that you suspect that they are lowball figures. This has been a source of speculation among conspiracy theorists: That there are incentives for health systems to incorrectly attribute a death to COVID-19 that are related to funding or reimbursements or some such. I don't understand the details of their theory, but the upshot is that they believe there to be some structural incentive to inflate death figures, and the "mass collusion" takes the form of a structural incentive to inflate death numbers. With your experience under your contract, looking at county data and some of the details of collection methodology, do you have any opinion on this matter?
Thanks.
Jeff
Thanks for your statistical perspective
I enjoy your articles because numbers don't lie. I guess they can be manipulated to lie by some, but that's usually political statistics. :)
I'm not a numbers person but when I see your numbers, I'm truly astonished that as a country we are not doing something to get these numbers down. I don't know what the answers are, but part is individual responsibility to do the basics - wear a mask, wash your hands, socially distance. We are free in this country to do as we please, BUT ONLY if it doesn't harm others. Not taking personal responsbility can harm others in this pandemic.
Keep up your articles - hopefully we'll be able to see the US stats come down sooner rather than later.
Mary Chisholm
MicroRidge Systems
Excellent Analysis
Figure 3 of your article tells the whole story. Using the original data along with rational grouping makes it all more powerful.
Thanks
Testing?
You do not address the number of tests being conducted and considering that most cases are asymptomatic the 29.6% of new cases are misleading. If you can in the numbers provided , can you address this?
July up-date
As a contributor to the first 3 columns in this series one might expect that I would no longer be surprised by the tricks Dr Wheeler pulls out of the Covid 19 bag.
However that is not the case. Each column manages to provide a new and informative perspective on this unfolding catastrophe. In spite of these continually changing perspectives, they all illustrate the same underlying message that no data have meaning apart from the context in which they have been gathered and within which they will be applied.
This latest column uses an interesting format to supply that context, while at the same time educating the reader about how to appreciate this new perspective.The coronavirus numbers themselves are so unrelentingly ominous that changing perspectives on them is an important way to keep them fresh and meaningful. Grouping data into geographically and culturally meaningful clusters that level the playing field when making comparisons across clusters avoids the temptation to massage the data to make a particular point.
These data do indeed represent the " voice of the Covid 19 process" as it unfolds as the global pandemic it has become. Viewing its growth rates within various regions of the world illustrates just how futile and irresponsible our efforts have been to contain its growth within the US. We are now in the unenviable position of having become a parriah within the international community, ironically encapsulated by the wall that a scientific analysis of Covid 19 data has imposed on us.
I can't say I look forward to what August's column will disclose since it is by now painfully obvious that we have chosen the once unthinkable option of letting the pandemic run its course. However, I do know that Don will once again find a way of presenting those data in a way that is informative, intellectua honest, and readily accessible.
Excellent overview of Dr. Wheeler's latest analysis
Thank you for your contributions earlier and also explaining so carefully and clearly what we have found here in Dr. Wheeler's paper. You are right, we have built a wall, though maybe not the one some envisioned. Myself and my family have done our part but when we do have to go out, it seems half of the people around us do not care. Fine to not care about yourself, but your family and friends? Ridiculous. The results are brutally obvious. To all that take issue with this analysis, or the previous ones, write your papers and we will discuss them. I find them to be the best analyses on the web.
Allen
Operational definitions - Measurement
What impact to you feel the financial "incentive" for labeling a death a COVID-19 death or the reliability of the testing (a lot of false positives being reported) have on this analysis?
The data
How can the gap be so big between the US and the rest of the world, especially India, when grouped by coparable population numbers? Could it be that instead of showing that the US is more infectious, the data is really showing that the US is doing more testing than the other countries or is more transparent in the counts?
Testing the “testing hypothesis”
Let's examine the testing hypothesis as an explanation of the magnitude of the discrepancy between Covid 19 data from the US versus the rest of the world.
1) Are we really testing that many more people than the rest of the world? Admittedly, expressed as rates per million people, we test a lot- but no where enough more to account for the magnitude of this difference.
2) If testing were a reasonable explanation for the number of cases we report, it should show up in our positivity rate. It doesn'.
3) I testing accounted for most of the difference between our case reports vs other countries, then changes in our testing rates should co-vary with the magnitude of the differences between the cases reported by US vs other countries. It doesn't. We continue to pull away from the pack regardless of what testing policy is in place at any given point in time.
4) I will refrain from commenting on our "transparency"as a possible explanation for our supremacy within the international community by noting that this administrations recent attempt to divert data from going directly to the CDC was reversed only after it became publicly known and caused a severe backlash.
Infections versus confirmed cases
As you state upfront, the numbers you are discussing are confirmed cases, not infections. inside the article, however, you refer to "infection rates" for the daily number of new confirmed cases. There are two problems here:
In addition, the quality of census data varies greatly between countries. Even in the US, it is subject to politically motivated undercounts, and, even with the best intentions, many countries lack the resources to accurately count their populations. In many countries, both the number of confirmed cases and the population have such broad margins of error as to make the "confirmed cases per 100,000 inhabitants" numbers useless.
On the data Used
Every day I look at the European CDC database and extract the values for 30 countries around the world.
The population data they use comes from the World Bank, and is sufficiently accurate for the way it is used here.
This means that the counts of confirmed cases create lower bounds for the infection rates.
Think about this very carefully! The number of tests given in the different groups are not sufficiently differnet to explain the 37-fold difference found between Europe and the U.S.
Data quality
I believe your conclusions regarding the US and the Western European countries you list. I would trust Japanese data also. For the rest of the world, it's a different story.
On the European CDC website, you can download the list of data sources for all the countries they cover. Most are health ministries. For population data, the World Bank does not have agents going around the world counting people. They may make adjustments but the raw data is from official censuses.
For government statistics to be trustworthy, a country has to be democratic and rich enough. In dictatorships, the statistics say whatever the government wants to put out; in poor democracies, collecting and processing good data is unaffordable.
Thanks Don for Your Count Data Presentation
You know I am o fan of PPM (Parts Per Million). You have supported me saying it is "a robust method". Even so, people seldom understand PPM, especially not being used to that calulation. I am convinced that your last count data presentation will reach outside the statistical community to political people familiar with "mere counting data with graphs". I do not find any problems with the accuracy of confirmed cases, as we compare integers (first digit) and orders of magnitude (factors of 10) between countrys at different phases of the progressing pandemic. I have found some simple metrics to use: If average human life span is 80 years, the flat death rate in population is 34 PPM. We now have decreased to 1 PPM Covid death rate in Sweden, average 10 per day on 10 million people, that is a pretty small proportion of 1/34 covid deaths, a 3% share of total 340 avg "normal deaths" per day. If we can keep that level we are very happy, but it still needs the same level of social distancing and hand washing. People here do not wear masks due to the public health recommendation that they do not help. I am one of very few when I have gone public twice the last four months. Officials here do not understand the calculations of mutual mask reduction of infection rate when moving around in public. As US claims a public reduction factor of 5 for general masks, I have seen factor 45 reduction rates as both transmitter and receiver wear masks. Me, as an electronics and acoustics engineer, is familiar with various filter attenuators. I hope someone proves that the reduction rate with masks is at least a factor of 10. What is the infection rate attenuation of nurse's professional PPEs: Face Shields + Face Masks? Empirical data should be available by now.
Add new comment