Forum: >>> Magnum BBS <<<

? is the mose sample numbers the better

From Cosine@21:1/5 to All on Mon Sep 24 00:28:09 2018

Hi:

If we can neglect the cost of collecting the samples of a statistical experiment, is it true that the more sample number the better, e.g., higher statistical power?

Thanks,

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Jones@21:1/5 to Cosine on Mon Sep 24 14:22:21 2018

Cosine wrote:

Hi:

If we can neglect the cost of collecting the samples of a
statistical experiment, is it true that the more sample number the
better, e.g., higher statistical power?

Thanks,

The answer is Yes, No, and Maybe.

If you start with a design that has absolutely no power to detect what
you are lookng for, then increasing the number of samples won't affect
that. Similarly, if you have a complicated experiment looking at
several possible effects simultaneously then there may be ways of
increasing the overall sample-size that only affects the power for some
of these effects.

Perhaps more practically, with "an experiment" there is the underlying
idea of having to maintain a constant set of conditions in which you
are not interested or know nothing about. You can imagine that taking
many samples may take a long time, during which un-monitored conditions
may wander off.

One should also remember the question of quality-control of the data
and the effect of the number of samples on how well this can be done.
If you end up with more invalid data in a dataset, then the power can
go down.

But "Yes", if you have a carefully thought-out and well-orgainsed
experimental set-up.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich Ulrich@21:1/5 to [email protected] on Tue Sep 25 02:51:14 2018

On Mon, 24 Sep 2018 14:22:21 +0000 (UTC), "David Jones"
<[email protected]> wrote:

Cosine wrote:

Hi:

If we can neglect the cost of collecting the samples of a
statistical experiment, is it true that the more sample number the
better, e.g., higher statistical power?

Thanks,

The answer is Yes, No, and Maybe.

If you start with a design that has absolutely no power to detect what
you are lookng for, then increasing the number of samples won't affect
that. Similarly, if you have a complicated experiment looking at
several possible effects simultaneously then there may be ways of
increasing the overall sample-size that only affects the power for some
of these effects.

Perhaps more practically, with "an experiment" there is the underlying
idea of having to maintain a constant set of conditions in which you
are not interested or know nothing about. You can imagine that taking
many samples may take a long time, during which un-monitored conditions
may wander off.

One should also remember the question of quality-control of the data
and the effect of the number of samples on how well this can be done.
If you end up with more invalid data in a dataset, then the power can
go down

But "Yes", if you have a carefully thought-out and well-orgainsed >experimental set-up.

Good Answer.

I'll add a question or a caution about the question itself. And
a comment about Observational studies, where the warnings
are different.

I have not used the term "statistical experiment" and I'm not sure
what it means. David is answering as if it refers to randomized,
controlled experiments, and not to ones that are "observational".

I think of a third sort that I might call a "statistical experiment,"
and that is the monte-carlo study using only computers, equations,
and parameters. Yes, bigger N helps them.

For observational studies, the extra concern lies in ruling out
competing hypotheses, rather than drawing a strong conclusion
about a very narrow version of the question. Broadening the
scope of the questions to use different comparisons can be more
useful than simply increasing the N for the original sample.

- Thus, an observational, clinical study using records (where
cost is no big issue) might use two or three control groups, with
different definitions, instead of inflating the N in just one group.

Political surveys seldom go beyond 1600 for a local sample
since extraneous, uncontrolled sources of error are large
enough that the real precision of the estimate can't be
improved beyond the estimate you get with 1600 (the error
of which is a bit larger than the formula for the SE indicates).

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Cosine@21:1/5 to All on Tue Sep 25 19:17:03 2018

Just thought of another question. Would it be true that taking a large sample size N makes those rare events more likely happen? Consequently, this would make us more likely to reject H0 when actually we should accept it?

After all, in doing a hypothesis test, we assign H0, Ha, and alpha. Then we take samples and get a p-value. If this p-value is less than alpha/2, we say that the result of this sample reflects the happening of an event being rarer than the extremity.
Therefore we reject the H0. But isn't it true that with a large sample, more likely those rare events would happen?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Jones@21:1/5 to Cosine on Wed Sep 26 05:01:22 2018

Cosine wrote:

Just thought of another question. Would it be true that taking a
large sample size N makes those rare events more likely happen?
Consequently, this would make us more likely to reject H0 when
actually we should accept it?

After all, in doing a hypothesis test, we assign H0, Ha, and alpha.
Then we take samples and get a p-value. If this p-value is less than
alpha/2, we say that the result of this sample reflects the happening
of an event being rarer than the extremity. Therefore we reject the
H0. But isn't it true that with a large sample, more likely those
rare events would happen?

In strandard situations and "in theory", as the sample size increases,
the probability of rejecting the null hypothesis if the null hypothesis
is true is fixed, while the the probability of rejecting the null
hypothesis if the null hypothesis is false wll increase (for a fixed "alternative hypothesis"). This conclusion depends on a number of
assumptions:

(a) the evaluation of the null distribution to get the critical values
for different sample sizes is not badly affected by numerical
approximations or inaccuracies that may lead to the true "alpha" for a
given nominal alpha fluctuating as the sample-size changes;

(b) the specification of the test statistic for any given sample size
should behave sensibly... for example that you don't just ignore any
samples beyond some fixed sample-size, and that you don't choose a
radically different test statistic formula as the sample size changes'.
Since the general theory of hypothesis testing allows you to choose any
test statistic you like, it seems difficult to justify the conclusion
without coming up with a precise definition of "using essentially the
same test statistic as the sample-size changes" that can be applied in
general ... but for standard situations the meaning may seem clear.
Even for a fairly standard situation of using a likelihood ratio test
for a complicated non-normal model, there seems to be no formal
justification that the power of the test is strictly monotonic as the sample-size changes. (This last may be wrong.);

(c) that the sorts of poor experimental comditions as discussed for
your first question do no apply.

Some special consideration needs to be gien to your sentence: "But
isn't it true that with a large sample, more likely those rare events
would happen?" It is implicit in the theory that the probability
distributions involved fully take into account any "rare events".
Rejection of the null hypthesis happens if an event happens that would
be considered rare if the null model actually holds. What is considered
"rare" is determined by what you call "alpha": the probability that the
test statistic is greater than the critical vaue if the null hypohesis
is true. You are free to change "alpha" (as the sample size changes) if
you want to, as the theory doesn't specify that you should use any
particular value .... you need to choose something sensible for the
particular circumstances of the decision you are trying to make.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich Ulrich@21:1/5 to All on Thu Sep 27 18:11:42 2018

On Tue, 25 Sep 2018 19:17:03 -0700 (PDT), Cosine <[email protected]>
wrote:

David has provided another broad and good commentary.
I will offer my briefer thoughts on the narrow question.

Just thought of another question. Would it be true that taking
a large sample size N makes those rare events more likely happen? >Consequently, this would make us more likely to reject H0 when
actually we should accept it?

By the simple formula for standard error, clearly, the SE is
smaller for larger N. Thus, an estimate - including the estimate
of the rates of something rare in two groups - becomes more
/precise/ with a larger N.

After all, in doing a hypothesis test, we assign H0, Ha, and alpha.
Then we take samples and get a p-value. If this p-value is less
than alpha/2, we say that the result of this sample reflects the
happening of an event being rarer than the extremity.

This last seems to reflect differences in vocabulary or assumptions,
because it does not make good sense to me as it is. " ... rarer than
the extremity" is not the way I speak of tests. And "happening
of an event" is not the way I say anything about the test-result
of a trial.

"If this p-value is less than alpha/2, we say that" - my ending,
picking an aribitrary test - this t-value is more extreme than we
would expect by chance, under the null hypothesis.

Therefore we reject the H0. But isn't it true that with a large sample,
more likely those rare events would happen?

Do you have a particular H0 and Ha in mind?

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to All on Tue Oct 30 20:29:05 2018

I have the same question indeed. Intuitively, I would like to say 'Yes'. But you will never know what it is going on in the magic real world.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From DTowey@21:1/5 to David Jones on Fri Feb 1 09:50:31 2019

On Monday, September 24, 2018 at 7:22:25 AM UTC-7, David Jones wrote:

Cosine wrote:

Hi:

If we can neglect the cost of collecting the samples of a
statistical experiment, is it true that the more sample number the
better, e.g., higher statistical power?

Thanks,

The answer is Yes, No, and Maybe.

If you start with a design that has absolutely no power to detect what
you are lookng for, then increasing the number of samples won't affect
that. Similarly, if you have a complicated experiment looking at
several possible effects simultaneously then there may be ways of
increasing the overall sample-size that only affects the power for some
of these effects.

Perhaps more practically, with "an experiment" there is the underlying
idea of having to maintain a constant set of conditions in which you
are not interested or know nothing about. You can imagine that taking
many samples may take a long time, during which un-monitored conditions
may wander off.

One should also remember the question of quality-control of the data
and the effect of the number of samples on how well this can be done.
If you end up with more invalid data in a dataset, then the power can
go down.

But "Yes", if you have a carefully thought-out and well-orgainsed experimental set-up.

Basic illustrative examples:

I have a billion units of digital accounting records needing to be verified for correctness. I choose to select a random sampling of 300. My end result is no errors discovered in the 300. What can I conclude with generally accepted statistical accuracy?

The same facts with a random sampling of 600? 1200? 2400? etc.

The same facts with a random sampling of only 75?

The correct answers to these examples will answer the general question, and also allow you to begin to see an answer to what is minimum sampling N in general.

Douglas

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From DTowey@21:1/5 to Cosine on Wed Jul 31 08:57:30 2019

On Monday, September 24, 2018 at 12:28:11 AM UTC-7, Cosine wrote:

Hi:

If we can neglect the cost of collecting the samples of a statistical experiment, is it true that the more sample number the better, e.g., higher statistical power?

Thanks,

Basic illustrative examples:

I have a billion units of digital accounting records needing to be verified for correctness. I choose to select a random sampling of 300. My end result is no errors discovered in the 300. What can I conclude with generally accepted statistical accuracy?

The same facts with a random sampling of 600? 1200? 2400? etc.

The same facts with a random sampling of only 75?

The correct answers to these examples will answer the general question, and also allow you to begin to see an answer to what is minimum sampling N in general.

Douglas

Six months has now gone by. How sad.

You can conclude from a 300 sampling, with 95% confidence, that there is
less than 1% error contained in the one billion units. One million, or
one trillion, or whatever, also.

From a 600 sampling, same 95% confidence, less than 1/2% error.

From a 1200 sampling, same 95% confidence, less than 1/4% error.

From a 2400 sampling, same 95% confidence, less than 1/8% error.

Finally, from a 75 sampling, same 95% confidence, less than something more
than 4% error.

It is possible there is a pattern here.

General conclusion: More is better, up to a practical maximum.

For different reasoning, around 150 is the practical minimum if you
need at least a minimally reliable meaning from your sampling.

Douglas

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Fri Jul 31 15:23:30 2026
  from Wales, Uk via Telnet
- Rixter
  Fri Jul 31 12:17:09 2026
  from Madison, Nc via Telnet
- Krenn
  Fri Jul 31 10:41:58 2026
  from Sydney, Nsw via Telnet
- Krenn
  Fri Jul 31 10:34:35 2026
  from Sydney, Nsw via Telnet
- Shift
  Fri Jul 31 06:46:34 2026
  from Leeds, England via SSH
- Centurion
  Fri Jul 31 00:59:56 2026
  from Berea, Ohio via Telnet
- Rixter
  Fri Jul 31 00:00:46 2026
  from Madison, Nc via Telnet
- Bob Worm
  Thu Jul 30 20:01:55 2026
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	119:41:08
Calls:	12,465
Calls today:	7
Files:	15,200
Messages:	6,538,283

? is the mose sample numbers the better

Who's Online

Recent Visitors

System Info