I have been analyzing the New York Times Covid19 data from:
https://github.com/nytimes/covid-19-data/archive/master.zip
and I believe I have found some interesting things. I would
be willing to share my findings with this newsgroup if
anyone is interested.
RosemontCrest <[email protected]> wrote:
As moderator of the misc.consumers newsgroup, I represent all readers by
stating that no one is interested in your conjecture. Seriously, you
don't need permission from anyone. Fire away.
I would like to know that at least a few people are interested
in the subject, lest I become one of those pitiable people
who post their poetry and such.
On 5/25/2020 4:18 PM, root wrote:
RosemontCrest <[email protected]> wrote:
As moderator of the misc.consumers newsgroup, I represent all readers by >>> stating that no one is interested in your conjecture. Seriously, you
don't need permission from anyone. Fire away.
I would like to know that at least a few people are interested
in the subject, lest I become one of those pitiable people
who post their poetry and such.
My wife and girlfriend are both interested. So, including me, that makes
"a few."
As moderator of the misc.consumers newsgroup, I represent all readers by stating that no one is interested in your conjecture. Seriously, you
don't need permission from anyone. Fire away.
OK, but let me know if you and yours lose interest.
root <[email protected]> wrote:
OK, but let me know if you and yours lose interest.
Let me put one thing out there before going into details:
I conclude the NYT data are seriously flawed and should
not be used for policy decisions. Among other things
I assert that most of the recent new "cases" stem
from recounting old cases.
My approach to looking at the data begins with a mathematical
model in order to guide me. The model isn't very complicated.
Every localized contagion begins with an exponential growth.
The number of new cases is proportional to the number of
infected people. If we represent the number of infected
people as I(t), a function of time, the initial stage of
the contagion looks like:
dI/dt = R * I
which results in I(t) in the form of an exponential exp(R*t).
As time progresses, however, it becomes harder for the
contagion to find new people to infect. So the original
equation has to be modified to this:
dI/dt = R * I * (N-I) where N is the number of people
that can be infected, and R is another constant. I choose
to rewrite that equation as:
dI/dt = R * I * (1-I/N) where the last term can be seen
as the probability that a person selected at random will
be uninfected. Another term has to be added to the equation
to account for infected people coming into the area
from elsewhere. I want to refer to I as internally
generated infections and E(t) as infected people that
enter the area. At least one such person has to enter
in order to begin the infection since nothing can
happen when I starts at zero.
So now the equation becomes:
dI/dt = R * I * (1-I/N) + E(t)
Casting this equation into a form applicable to the NYT
data it becomes:
(daily new cases) = R * (cases so far) * (1-I/N) + (incoming cases)
As far as the math goes the worst is over. Please hang in there
if you can.
Typically you can look at the external cases as if they result
from a process similar to radioactivity: Random clicks that
happen at some average rate but are not otherwise predictable.
Such a process grows linearly in time and, at some point, will
be swamped by the exponential growth of the internal growth.
During the intial stages of the contagion, however, the E(t)
cannot be ignored. For most of my work with the NYT data
I ignored E(t) altogether.
My actual model includes a time (about 5 days) over which
an infected person is not contagious, and another period
(about 28 days) after which an infected person is no
longer contagious. Neither of these two refinements are
important.
My main interest in looking at the data was to find
a way to estimate how many people are actually infected
when some number are reported. I will go into my efforts
in that direction another time.
One last point for this chapter: since I can't show you
plots of the data I will have to describe what the data
show. If you have access to tools to manipulate data
and show results I hope you can follow along.
This is a sample of what I have to offer. Please let
me know if you still have any interest.
Thank you.
On 5/25/2020 5:27 PM, root wrote:
root <[email protected]> wrote:
OK, but let me know if you and yours lose interest.
Let me put one thing out there before going into details:
I conclude the NYT data are seriously flawed and should
not be used for policy decisions. Among other things
I assert that most of the recent new "cases" stem
from recounting old cases.
My approach to looking at the data begins with a mathematical
model in order to guide me. The model isn't very complicated.
Every localized contagion begins with an exponential growth.
The number of new cases is proportional to the number of
infected people. If we represent the number of infected
people as I(t), a function of time, the initial stage of
the contagion looks like:
dI/dt = R * I
which results in I(t) in the form of an exponential exp(R*t).
As time progresses, however, it becomes harder for the
contagion to find new people to infect. So the original
equation has to be modified to this:
dI/dt = R * I * (N-I) where N is the number of people
that can be infected, and R is another constant. I choose
to rewrite that equation as:
dI/dt = R * I * (1-I/N) where the last term can be seen
as the probability that a person selected at random will
be uninfected. Another term has to be added to the equation
to account for infected people coming into the area
from elsewhere. I want to refer to I as internally
generated infections and E(t) as infected people that
enter the area. At least one such person has to enter
in order to begin the infection since nothing can
happen when I starts at zero.
So now the equation becomes:
dI/dt = R * I * (1-I/N) + E(t)
Casting this equation into a form applicable to the NYT
data it becomes:
(daily new cases) = R * (cases so far) * (1-I/N) + (incoming cases)
As far as the math goes the worst is over. Please hang in there
if you can.
Typically you can look at the external cases as if they result
from a process similar to radioactivity: Random clicks that
happen at some average rate but are not otherwise predictable.
Such a process grows linearly in time and, at some point, will
be swamped by the exponential growth of the internal growth.
During the intial stages of the contagion, however, the E(t)
cannot be ignored. For most of my work with the NYT data
I ignored E(t) altogether.
My actual model includes a time (about 5 days) over which
an infected person is not contagious, and another period
(about 28 days) after which an infected person is no
longer contagious. Neither of these two refinements are
important.
My main interest in looking at the data was to find
a way to estimate how many people are actually infected
when some number are reported. I will go into my efforts
in that direction another time.
One last point for this chapter: since I can't show you
plots of the data I will have to describe what the data
show. If you have access to tools to manipulate data
and show results I hope you can follow along.
This is a sample of what I have to offer. Please let
me know if you still have any interest.
Thank you.
That's too much. You can stop now. We certainly don't want to see
another newsgroup destroyed by unrelated politics.
Pay no mind to Bob F. It actively participates in off-topic, political discussions on alt.home.repair, thus destroying that newsgroup with
unrelated politics. Notice that it is the one who introduced politics to
this discussion about scientific data analysis.
I am interested to see more of your findings. Are you willing to share
your data?
RosemontCrest <[email protected]> wrote:
Pay no mind to Bob F. It actively participates in off-topic, political
discussions on alt.home.repair, thus destroying that newsgroup with
unrelated politics. Notice that it is the one who introduced politics to
this discussion about scientific data analysis.
I am interested to see more of your findings. Are you willing to share
your data?
The data I work with is publicly available at: https://github.com/nytimes/covid-19-data/archive/master.zip
Which includes data for the US as a whole, every state and
territory, and every county. Lots to chew on.
I don't want to post if there is no interest.
I will abandon the approach I started to take in favor
of a more important and easier thread to follow.
The above source reports a daily account of number of reported
cases and number of deaths. For some time I have wondered how
may people have actually been infected when some lesser number
is reported. Let's say that when X cases have been reported
there are really M*X people infected. Can I squeeze M out of
the reported data. I think I can.
Here is the basic plan:
1. for any given data set compute the daily differences of the
number of cases.
2. divide these daily differences by the corresponding number
of cases.
3. compute the variance and SD of the daily differences.
There is a trend to the daily differences and
that trend has to be removed or corrected before
computing the SD.
4. compute the expected variance and SD of the daily
differences.
I will show how the expected SD is computed below.
5. The ratio of the first SD to the second SD is my
best estimate of M.
I was motivated to consider this approach because the
SD of the daily differences was too large to be
explained.
In step 2 we divided the daily differences by the
number of cases to-date. What does this number
mean? Let C be the number of cases and deltaC be
the daily change. I assert that deltaC/C is the
probability that one of the C cases will infect
a new person in the next day. This is a binomial
probability (p) and, for a large value of C, we can
approximate the SD of the number of new cases by
a normal distribution with SD=sqrt(p*(1-p)*C)
Tyoically this is a few hundred cases. In contrast
the SD from step 3 is a few thousand cases and
the ratio (M) is a number on the order of 10.
I have computed the values for each of the states
and territories and the value for the US is 14.5 or
so. There is a discrepancy in that number which
I am investigating. Whatever the value of M,
the lethality of the Sars-Cov2 virus as determined
by deaths/cases is reduced by the factor M. If
M were 14.5 and deaths/cases was 4% then the
revised lethality would be .275% which is less
than 3 times that of ordinary seasonal flu.
The number M is vitally important.
There are still some things about the procedure that
bother me. I use my own software for all this, but
I have a friend who uses Excel to do the computations
at his end.
If you are familiar with Excel you can easily bring
up the data an have a look for yourself. Get back
here if you have any questions, and if you tire
of this let me know as well.
Thanks.
Thank you for the link. I remain interested and hope that others express interest. Presenting more findings and discussion may garner more
interest. Please continue to pursue and share your endeavor.
There are still some things about the procedure that
bother me.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (0 / 16) |
| Uptime: | 162:00:04 |
| Calls: | 12,094 |
| Calls today: | 2 |
| Files: | 15,000 |
| Messages: | 6,517,780 |