On Fri, 13 Sep 2019 04:45:09 -0700 (PDT), Cosine <
[email protected]>
wrote:
Hi:
Given a method as the reference golden method, how do we statistically show that a new method is better than the reference one?
My first thought was something about, "contradiction in terms".
Pragmatically, though, I have come up with an example based
on strong measurement assumptions, and an example based
on retreating to the original sort of data that makes a clinical
predictor "golden".
Measurement Assumption: If we assume that the underlying
variable is smoothly varying across time, then you can compare
the "jitter" - noise - in two time series of (closely spaced)
measures.
Retreat to basic data: A clincal indicator (say) is "golden" if it
predicts the eventual occurrance of some event (death?).
An alternative that proves to be more accurate in the long run (cross-validation studies) becomes the new golded standard.
In practice, this may use ROC curves, which balance false-
negatives against false-positives. For instance, the TB scratch-
test is "positive" for a reaction larger than come conventional
and specific size. There is one ROC curve based on size. Do you
call it a different "method" (I don't) if you use a different cutoff?
A totally different method would have a different curve, and
it might reflect a different phenomenon. For instance, detecting
active TB bacteria is not the same as detecting TB antibodies.
So: What is the purpose of the test?
One way I could think of is to define a statistical variable, I, and then conduct a hypothesis test to see if avg( I_new-I_ref ) >0, which means that in average the difference of the performance of the new method and of the reference one is greater
than zero.
I don't follow.
But how do we define one such statistical variable? Also, I heard that sometimes people would conduct hypothesis for multiple variables. In that case, how do we know that the new method is better than the reference one or not?
The difficulties rasied by "gold standards" are logical, not
statistical. If you have more than one purpose in mind, it is
very easy to suggest that there might be more than one
"gold standard" test to meet the several purposes.
--
Rich Ulrich
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)