• Looking at Covid19 data

    From root@21:1/5 to All on Mon May 25 20:33:15 2020
    I have been analyzing the New York Times Covid19 data from:

    https://github.com/nytimes/covid-19-data/archive/master.zip

    and I believe I have found some interesting things. I would
    be willing to share my findings with this newsgroup if
    anyone is interested.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From RosemontCrest@21:1/5 to root on Mon May 25 15:01:12 2020
    On 5/25/2020 1:33 PM, root wrote:
    I have been analyzing the New York Times Covid19 data from:

    https://github.com/nytimes/covid-19-data/archive/master.zip

    and I believe I have found some interesting things. I would
    be willing to share my findings with this newsgroup if
    anyone is interested.

    As moderator of the misc.consumers newsgroup, I represent all readers by stating that no one is interested in your conjecture. Seriously, you
    don't need permission from anyone. Fire away.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From RosemontCrest@21:1/5 to root on Mon May 25 16:25:05 2020
    On 5/25/2020 4:18 PM, root wrote:
    RosemontCrest <[email protected]> wrote:

    As moderator of the misc.consumers newsgroup, I represent all readers by
    stating that no one is interested in your conjecture. Seriously, you
    don't need permission from anyone. Fire away.


    I would like to know that at least a few people are interested
    in the subject, lest I become one of those pitiable people
    who post their poetry and such.

    My wife and girlfriend are both interested. So, including me, that makes
    "a few."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to RosemontCrest on Mon May 25 23:42:26 2020
    RosemontCrest <[email protected]> wrote:
    On 5/25/2020 4:18 PM, root wrote:
    RosemontCrest <[email protected]> wrote:

    As moderator of the misc.consumers newsgroup, I represent all readers by >>> stating that no one is interested in your conjecture. Seriously, you
    don't need permission from anyone. Fire away.


    I would like to know that at least a few people are interested
    in the subject, lest I become one of those pitiable people
    who post their poetry and such.

    My wife and girlfriend are both interested. So, including me, that makes
    "a few."


    OK, but let me know if you and yours lose interest.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to RosemontCrest on Mon May 25 23:18:13 2020
    RosemontCrest <[email protected]> wrote:

    As moderator of the misc.consumers newsgroup, I represent all readers by stating that no one is interested in your conjecture. Seriously, you
    don't need permission from anyone. Fire away.


    I would like to know that at least a few people are interested
    in the subject, lest I become one of those pitiable people
    who post their poetry and such.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to root on Tue May 26 00:27:25 2020
    root <[email protected]> wrote:


    OK, but let me know if you and yours lose interest.

    Let me put one thing out there before going into details:
    I conclude the NYT data are seriously flawed and should
    not be used for policy decisions. Among other things
    I assert that most of the recent new "cases" stem
    from recounting old cases.

    My approach to looking at the data begins with a mathematical
    model in order to guide me. The model isn't very complicated.

    Every localized contagion begins with an exponential growth.
    The number of new cases is proportional to the number of
    infected people. If we represent the number of infected
    people as I(t), a function of time, the initial stage of
    the contagion looks like:

    dI/dt = R * I

    which results in I(t) in the form of an exponential exp(R*t).

    As time progresses, however, it becomes harder for the
    contagion to find new people to infect. So the original
    equation has to be modified to this:

    dI/dt = R * I * (N-I) where N is the number of people
    that can be infected, and R is another constant. I choose
    to rewrite that equation as:

    dI/dt = R * I * (1-I/N) where the last term can be seen
    as the probability that a person selected at random will
    be uninfected. Another term has to be added to the equation
    to account for infected people coming into the area
    from elsewhere. I want to refer to I as internally
    generated infections and E(t) as infected people that
    enter the area. At least one such person has to enter
    in order to begin the infection since nothing can
    happen when I starts at zero.

    So now the equation becomes:

    dI/dt = R * I * (1-I/N) + E(t)

    Casting this equation into a form applicable to the NYT
    data it becomes:

    (daily new cases) = R * (cases so far) * (1-I/N) + (incoming cases)

    As far as the math goes the worst is over. Please hang in there
    if you can.

    Typically you can look at the external cases as if they result
    from a process similar to radioactivity: Random clicks that
    happen at some average rate but are not otherwise predictable.
    Such a process grows linearly in time and, at some point, will
    be swamped by the exponential growth of the internal growth.

    During the intial stages of the contagion, however, the E(t)
    cannot be ignored. For most of my work with the NYT data
    I ignored E(t) altogether.

    My actual model includes a time (about 5 days) over which
    an infected person is not contagious, and another period
    (about 28 days) after which an infected person is no
    longer contagious. Neither of these two refinements are
    important.

    My main interest in looking at the data was to find
    a way to estimate how many people are actually infected
    when some number are reported. I will go into my efforts
    in that direction another time.

    One last point for this chapter: since I can't show you
    plots of the data I will have to describe what the data
    show. If you have access to tools to manipulate data
    and show results I hope you can follow along.

    This is a sample of what I have to offer. Please let
    me know if you still have any interest.

    Thank you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bob F@21:1/5 to root on Mon May 25 20:15:07 2020
    On 5/25/2020 5:27 PM, root wrote:
    root <[email protected]> wrote:


    OK, but let me know if you and yours lose interest.

    Let me put one thing out there before going into details:
    I conclude the NYT data are seriously flawed and should
    not be used for policy decisions. Among other things
    I assert that most of the recent new "cases" stem
    from recounting old cases.

    My approach to looking at the data begins with a mathematical
    model in order to guide me. The model isn't very complicated.

    Every localized contagion begins with an exponential growth.
    The number of new cases is proportional to the number of
    infected people. If we represent the number of infected
    people as I(t), a function of time, the initial stage of
    the contagion looks like:

    dI/dt = R * I

    which results in I(t) in the form of an exponential exp(R*t).

    As time progresses, however, it becomes harder for the
    contagion to find new people to infect. So the original
    equation has to be modified to this:

    dI/dt = R * I * (N-I) where N is the number of people
    that can be infected, and R is another constant. I choose
    to rewrite that equation as:

    dI/dt = R * I * (1-I/N) where the last term can be seen
    as the probability that a person selected at random will
    be uninfected. Another term has to be added to the equation
    to account for infected people coming into the area
    from elsewhere. I want to refer to I as internally
    generated infections and E(t) as infected people that
    enter the area. At least one such person has to enter
    in order to begin the infection since nothing can
    happen when I starts at zero.

    So now the equation becomes:

    dI/dt = R * I * (1-I/N) + E(t)

    Casting this equation into a form applicable to the NYT
    data it becomes:

    (daily new cases) = R * (cases so far) * (1-I/N) + (incoming cases)

    As far as the math goes the worst is over. Please hang in there
    if you can.

    Typically you can look at the external cases as if they result
    from a process similar to radioactivity: Random clicks that
    happen at some average rate but are not otherwise predictable.
    Such a process grows linearly in time and, at some point, will
    be swamped by the exponential growth of the internal growth.

    During the intial stages of the contagion, however, the E(t)
    cannot be ignored. For most of my work with the NYT data
    I ignored E(t) altogether.

    My actual model includes a time (about 5 days) over which
    an infected person is not contagious, and another period
    (about 28 days) after which an infected person is no
    longer contagious. Neither of these two refinements are
    important.

    My main interest in looking at the data was to find
    a way to estimate how many people are actually infected
    when some number are reported. I will go into my efforts
    in that direction another time.

    One last point for this chapter: since I can't show you
    plots of the data I will have to describe what the data
    show. If you have access to tools to manipulate data
    and show results I hope you can follow along.

    This is a sample of what I have to offer. Please let
    me know if you still have any interest.

    Thank you.


    That's too much. You can stop now. We certainly don't want to see
    another newsgroup destroyed by unrelated politics.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From RosemontCrest@21:1/5 to Bob F on Wed May 27 14:15:28 2020
    On 5/25/2020 8:15 PM, Bob F wrote:
    On 5/25/2020 5:27 PM, root wrote:
    root <[email protected]> wrote:


    OK, but let me know if you and yours lose interest.

    Let me put one thing out there before going into details:
    I conclude the NYT data are seriously flawed and should
    not be used for policy decisions. Among other things
    I assert that most of the recent new "cases" stem
    from recounting old cases.

    My approach to looking at the data begins with a mathematical
    model in order to guide me. The model isn't very complicated.

    Every localized contagion begins with an exponential growth.
    The number of new cases is proportional to the number of
    infected people. If we represent the number of infected
    people as I(t), a function of time, the initial stage of
    the contagion looks like:

    dI/dt = R * I

    which results in I(t) in the form of an exponential exp(R*t).

    As time progresses, however, it becomes harder for the
    contagion to find new people to infect. So the original
    equation has to be modified to this:

    dI/dt = R * I * (N-I) where N is the number of people
    that can be infected, and R is another constant. I choose
    to rewrite that equation as:

    dI/dt = R * I * (1-I/N) where the last term can be seen
    as the probability that a person selected at random will
    be uninfected. Another term has to be added to the equation
    to account for infected people coming into the area
    from elsewhere. I want to refer to I as internally
    generated infections and E(t) as infected people that
    enter the area. At least one such person has to enter
    in order to begin the infection since nothing can
    happen when I starts at zero.

    So now the equation becomes:

    dI/dt = R * I * (1-I/N) + E(t)

    Casting this equation into a form applicable to the NYT
    data it becomes:

    (daily new cases) = R * (cases so far) * (1-I/N) + (incoming cases)

    As far as the math goes the worst is over. Please hang in there
    if you can.

    Typically you can look at the external cases as if they result
    from a process similar to radioactivity: Random clicks that
    happen at some average rate but are not otherwise predictable.
    Such a process grows linearly in time and, at some point, will
    be swamped by the exponential growth of the internal growth.

    During the intial stages of the contagion, however, the E(t)
    cannot be ignored. For most of my work with the NYT data
    I ignored E(t) altogether.

    My actual model includes a time (about 5 days) over which
    an infected person is not contagious, and another period
    (about 28 days) after which an infected person is no
    longer contagious. Neither of these two refinements are
    important.

    My main interest in looking at the data was to find
    a way to estimate how many people are actually infected
    when some number are reported. I will go into my efforts
    in that direction another time.

    One last point for this chapter: since I can't show you
    plots of the data I will have to describe what the data
    show. If you have access to tools to manipulate data
    and show results I hope you can follow along.

    This is a sample of what I have to offer. Please let
    me know if you still have any interest.

    Thank you.


    That's too much. You can stop now. We certainly don't want to see
    another newsgroup destroyed by unrelated politics.

    Pay no mind to Bob F. It actively participates in off-topic, political discussions on alt.home.repair, thus destroying that newsgroup with
    unrelated politics. Notice that it is the one who introduced politics to
    this discussion about scientific data analysis.

    I am interested to see more of your findings. Are you willing to share
    your data?

    Thank you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to RosemontCrest on Thu May 28 00:25:26 2020
    RosemontCrest <[email protected]> wrote:

    Pay no mind to Bob F. It actively participates in off-topic, political discussions on alt.home.repair, thus destroying that newsgroup with
    unrelated politics. Notice that it is the one who introduced politics to
    this discussion about scientific data analysis.

    I am interested to see more of your findings. Are you willing to share
    your data?


    The data I work with is publicly available at: https://github.com/nytimes/covid-19-data/archive/master.zip
    Which includes data for the US as a whole, every state and
    territory, and every county. Lots to chew on.

    I don't want to post if there is no interest.

    I will abandon the approach I started to take in favor
    of a more important and easier thread to follow.

    The above source reports a daily account of number of reported
    cases and number of deaths. For some time I have wondered how
    may people have actually been infected when some lesser number
    is reported. Let's say that when X cases have been reported
    there are really M*X people infected. Can I squeeze M out of
    the reported data. I think I can.

    Here is the basic plan:
    1. for any given data set compute the daily differences of the
    number of cases.
    2. divide these daily differences by the corresponding number
    of cases.
    3. compute the variance and SD of the daily differences.

    There is a trend to the daily differences and
    that trend has to be removed or corrected before
    computing the SD.
    4. compute the expected variance and SD of the daily
    differences.

    I will show how the expected SD is computed below.
    5. The ratio of the first SD to the second SD is my
    best estimate of M.

    I was motivated to consider this approach because the
    SD of the daily differences was too large to be
    explained.

    In step 2 we divided the daily differences by the
    number of cases to-date. What does this number
    mean? Let C be the number of cases and deltaC be
    the daily change. I assert that deltaC/C is the
    probability that one of the C cases will infect
    a new person in the next day. This is a binomial
    probability (p) and, for a large value of C, we can
    approximate the SD of the number of new cases by
    a normal distribution with SD=sqrt(p*(1-p)*C)
    Tyoically this is a few hundred cases. In contrast
    the SD from step 3 is a few thousand cases and
    the ratio (M) is a number on the order of 10.

    I have computed the values for each of the states
    and territories and the value for the US is 14.5 or
    so. There is a discrepancy in that number which
    I am investigating. Whatever the value of M,
    the lethality of the Sars-Cov2 virus as determined
    by deaths/cases is reduced by the factor M. If
    M were 14.5 and deaths/cases was 4% then the
    revised lethality would be .275% which is less
    than 3 times that of ordinary seasonal flu.

    The number M is vitally important.

    There are still some things about the procedure that
    bother me. I use my own software for all this, but
    I have a friend who uses Excel to do the computations
    at his end.

    If you are familiar with Excel you can easily bring
    up the data an have a look for yourself. Get back
    here if you have any questions, and if you tire
    of this let me know as well.

    Thanks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From RosemontCrest@21:1/5 to root on Wed May 27 22:02:34 2020
    On 5/27/2020 5:25 PM, root wrote:
    RosemontCrest <[email protected]> wrote:

    Pay no mind to Bob F. It actively participates in off-topic, political
    discussions on alt.home.repair, thus destroying that newsgroup with
    unrelated politics. Notice that it is the one who introduced politics to
    this discussion about scientific data analysis.

    I am interested to see more of your findings. Are you willing to share
    your data?


    The data I work with is publicly available at: https://github.com/nytimes/covid-19-data/archive/master.zip
    Which includes data for the US as a whole, every state and
    territory, and every county. Lots to chew on.

    I don't want to post if there is no interest.

    I will abandon the approach I started to take in favor
    of a more important and easier thread to follow.

    The above source reports a daily account of number of reported
    cases and number of deaths. For some time I have wondered how
    may people have actually been infected when some lesser number
    is reported. Let's say that when X cases have been reported
    there are really M*X people infected. Can I squeeze M out of
    the reported data. I think I can.

    Here is the basic plan:
    1. for any given data set compute the daily differences of the
    number of cases.
    2. divide these daily differences by the corresponding number
    of cases.
    3. compute the variance and SD of the daily differences.

    There is a trend to the daily differences and
    that trend has to be removed or corrected before
    computing the SD.
    4. compute the expected variance and SD of the daily
    differences.

    I will show how the expected SD is computed below.
    5. The ratio of the first SD to the second SD is my
    best estimate of M.

    I was motivated to consider this approach because the
    SD of the daily differences was too large to be
    explained.

    In step 2 we divided the daily differences by the
    number of cases to-date. What does this number
    mean? Let C be the number of cases and deltaC be
    the daily change. I assert that deltaC/C is the
    probability that one of the C cases will infect
    a new person in the next day. This is a binomial
    probability (p) and, for a large value of C, we can
    approximate the SD of the number of new cases by
    a normal distribution with SD=sqrt(p*(1-p)*C)
    Tyoically this is a few hundred cases. In contrast
    the SD from step 3 is a few thousand cases and
    the ratio (M) is a number on the order of 10.

    I have computed the values for each of the states
    and territories and the value for the US is 14.5 or
    so. There is a discrepancy in that number which
    I am investigating. Whatever the value of M,
    the lethality of the Sars-Cov2 virus as determined
    by deaths/cases is reduced by the factor M. If
    M were 14.5 and deaths/cases was 4% then the
    revised lethality would be .275% which is less
    than 3 times that of ordinary seasonal flu.

    The number M is vitally important.

    There are still some things about the procedure that
    bother me. I use my own software for all this, but
    I have a friend who uses Excel to do the computations
    at his end.

    If you are familiar with Excel you can easily bring
    up the data an have a look for yourself. Get back
    here if you have any questions, and if you tire
    of this let me know as well.

    Thanks.

    Thank you for the link. I remain interested and hope that others express interest. Presenting more findings and discussion may garner more
    interest. Please continue to pursue and share your endeavor.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to RosemontCrest on Thu May 28 14:44:29 2020
    RosemontCrest <[email protected]> wrote:

    Thank you for the link. I remain interested and hope that others express interest. Presenting more findings and discussion may garner more
    interest. Please continue to pursue and share your endeavor.


    Thank you for your interest.

    Here are the results of my work so far:

    The data columns are M=(actual cases/reported cases) effective Lethality #reported cases, and number of data points.

    Alabama 9.22984 0.412334 15650 74
    Alaska 1.94678 1.00473 416 75
    Arizona 7.68512 0.637109 16783 121
    Arkansas 13.7653 0.142299 6180 76
    California 9.23045 0.43338 99925 122
    Colorado 6.42828 0.858365 24553 82
    Connecticut 10.1035 0.903221 41303 79
    Delaware 7.95252 0.465357 9066 76
    DistrictofColumbia 5.05766 1.05321 8334 80
    Florida 12.7563 0.34465 52247 86
    Georgia 12.6857 0.34598 42066 85
    Guam 7.72439 0.068437 1139 72
    Hawaii 1.84357 1.45675 633 81
    Idaho 3.56433 0.836378 2699 74
    Illinois 21.224 0.208173 113486 123
    Indiana 5.96245 1.03078 32856 81
    Iowa 9.32092 0.280614 17999 79
    Kansas 15.0902 0.145086 9352 80
    Kentucky 12.638 0.357062 9175 81
    Louisiana 19.0675 0.378296 38252 78
    Maine 3.41531 1.11136 2109 75
    Maryland 11.9939 0.404626 48290 82
    Massachusetts 9.44957 0.727614 93693 115
    Michigan 12.8618 0.744241 55040 77
    Minnesota 5.48437 0.777822 21969 81
    Mississippi 6.67879 0.706156 13731 76
    Missouri 5.69143 0.999333 12437 80
    Montana 3.51638 0.949924 479 74
    Nebraska 9.30838 0.135461 12619 99
    Nevada 5.35033 0.93381 8059 82
    NewHampshire 4.00271 1.25849 4231 85
    NewJersey 13.1519 0.549124 155764 83
    NewMexico 4.55819 1.00166 7130 76
    NewYork 10.298 0.769915 368669 86
    NorthCarolina 10.9867 0.302079 24188 84
    NorthDakota 3.4634 0.632874 2422 76
    NorthernMarianaIslands 4.54384 2.00071 22 59
    Ohio 6.95347 0.887367 33006 78
    Oklahoma 5.10908 1.00832 6137 81
    Oregon 3.91037 0.96379 3967 88
    Pennsylvania 10.2271 0.702067 72873 81
    PuertoRico 5.50629 0.723253 3324 74
    RhodeIsland 7.26354 0.595135 14210 86
    SouthCarolina 5.24323 0.821752 10416 81
    SouthDakota 7.7962 0.140552 4653 77
    Tennessee 7.61852 0.216634 20960 82
    Texas 10.6637 0.256482 57541 104
    Utah 3.30713 0.349506 8622 91
    Vermont 2.58748 2.18303 967 80
    VirginIslands 4.82072 1.80381 69 73
    Virginia 16.514 0.195645 39342 80
    Washington 10.8357 0.475859 21278 126
    WestVirginia 6.0884 0.667746 1854 70
    Wisconsin 8.9827 0.369586 15923 111
    Wyoming 3.836 0.38478 850 76

    US 19.6869 0.302295 1.6701e+06 125

    The overall average for the states is M=8, meaning 8 actual cases
    for every reported case, and a lethality of 0.7% which is about
    seven times as lethal as seasonal flu. While I don't want to
    minimize a 0.7% death rate, it is a lot better than a 4%-6% rate.


    These numbers differ by a factor of sqrt(2) from my earlier summary
    owing to a correction for trend removal in the difference data.

    As I said above:

    There are still some things about the procedure that
    bother me.

    What bothers me is what I see as systematic reporting vagaries and
    errors. These two problems are most evident in the US data. I
    encourage anyone interested to look at the US data. The daily
    differences report a significant weekly pattern which has been
    reported in the Wall Street Journal. This, along with a linear
    downward trend must be rectified before the data can be used.

    As you can see the US data stands out with a very high M
    value of 19.7 or so. Using a (model dependent) analysis I
    conclude that the US data suffer from an accumulation of
    recounted cases.

    Although this does not prove recounting, take a look at
    two consecutive days in the data for California:

    2020-02-25 Humboldt California 06023 1 0
    2020-02-25 LosAngeles California 06037 1 0
    2020-02-25 Orange California 06059 1 0
    2020-02-25 Sacramento California 06067 1 0
    2020-02-25 SanDiego California 06073 1 0
    2020-02-25 SanFrancisco California 06075 3 0
    2020-02-25 SantaClara California 06085 2 0
    2020-02-25 Solano California 06095 1 0 <<<<<<

    2020-02-26 Humboldt California 06023 1 0
    2020-02-26 LosAngeles California 06037 1 0
    2020-02-26 Marin California 06041 1 0
    2020-02-26 Napa California 06055 1 0
    2020-02-26 Orange California 06059 1 0
    2020-02-26 Sacramento California 06067 3 0
    2020-02-26 SanDiego California 06073 1 0
    2020-02-26 SanFrancisco California 06075 3 0
    2020-02-26 SantaClara California 06085 2 0
    2020-02-26 Solano California 06095 11 0 <<<<<<
    2020-02-26 Sonoma California 06097 1 0

    The last two entries on each line are accumulated cases and deaths
    for each county.

    On the first day California has a total of 11 cases and Solano
    County has one of those 11. On the following day Solano has
    11 cases, Sacramento picked up 2, and Sonoma 1. Solano
    has a prison hospital and my guess is that some of the
    previously counted victims were all transferred to Solano
    and recounted. There should be a way to keep track of
    who the victims are and not recount them if they move.
    All those recount errors in all the states accumulate
    in the US data.

    I don't have much more to say on the data. I'm willing
    to answer any questions you have if you want to look
    at the data yourself.

    Thanks for reading.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)