• realloc() - frequency, conditions, or experiences about relocation?

    From Janis Papanagnou@21:1/5 to All on Mon Jun 17 08:08:07 2024
    In a recent thread realloc() was a substantial part of the discussion. "Occasionally" the increased data storage will be relocated along
    with the previously stored data. On huge data sets that might be a
    performance factor. Is there any experience or are there any concrete
    factors about the conditions when this relocation happens? - I could
    imagine that it's no issue as long as you're in some kB buffer range,
    but if, say, we're using realloc() to substantially increase buffers
    often it might be an issue to consider. It would be good to get some
    feeling about that internal.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Janis Papanagnou on Mon Jun 17 10:18:40 2024
    Janis Papanagnou <[email protected]> writes:

    In a recent thread realloc() was a substantial part of the discussion. "Occasionally" the increased data storage will be relocated along
    with the previously stored data. On huge data sets that might be a performance factor. Is there any experience or are there any concrete
    factors about the conditions when this relocation happens? - I could
    imagine that it's no issue as long as you're in some kB buffer range,
    but if, say, we're using realloc() to substantially increase buffers
    often it might be an issue to consider. It would be good to get some
    feeling about that internal.

    There is obviously a cost, but there is (usually) no alternative if
    contiguous storage is required. In practice, the cost is usually
    moderate and can be very effectively managed by using an exponential
    allocation scheme: at every reallocation multiply the storage space by
    some factor greater than 1 (I often use 3/2, but doubling is often used
    as well). This results in O(log(N)) rather than O(N) allocations as in
    your code that added a constant to the size. Of course, some storage is
    wasted (that /might/ be retrieved by a final realloc down to the final
    size) but that's rarely significant.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Malcolm McLean on Mon Jun 17 10:55:33 2024
    Malcolm McLean <[email protected]> writes:

    On 17/06/2024 10:18, Ben Bacarisse wrote:
    Janis Papanagnou <[email protected]> writes:

    In a recent thread realloc() was a substantial part of the discussion.
    "Occasionally" the increased data storage will be relocated along
    with the previously stored data. On huge data sets that might be a
    performance factor. Is there any experience or are there any concrete
    factors about the conditions when this relocation happens? - I could
    imagine that it's no issue as long as you're in some kB buffer range,
    but if, say, we're using realloc() to substantially increase buffers
    often it might be an issue to consider. It would be good to get some
    feeling about that internal.
    There is obviously a cost, but there is (usually) no alternative if
    contiguous storage is required. In practice, the cost is usually
    moderate and can be very effectively managed by using an exponential
    allocation scheme: at every reallocation multiply the storage space by
    some factor greater than 1 (I often use 3/2, but doubling is often used
    as well). This results in O(log(N)) rather than O(N) allocations as in
    your code that added a constant to the size. Of course, some storage is
    wasted (that /might/ be retrieved by a final realloc down to the final
    size) but that's rarely significant.

    So can we work it out?

    What is "it"?

    Let's assume for the moment that the allocations have a semi-normal distribution,

    What allocations? The allocations I talked about don't have that
    distribution.

    with negative values disallowed. Now ignoring the first few
    values, if we have allocated, say, 1K, we ought to be able to predict the value by integrating the distribution from 1k to infinity and taking the mean.

    I have no idea what you are talking about. What "value" are you looking
    to calculate?

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Janis Papanagnou on Mon Jun 17 14:15:11 2024
    On 17/06/2024 08:08, Janis Papanagnou wrote:
    In a recent thread realloc() was a substantial part of the discussion. "Occasionally" the increased data storage will be relocated along
    with the previously stored data. On huge data sets that might be a performance factor. Is there any experience or are there any concrete
    factors about the conditions when this relocation happens? - I could
    imagine that it's no issue as long as you're in some kB buffer range,
    but if, say, we're using realloc() to substantially increase buffers
    often it might be an issue to consider. It would be good to get some
    feeling about that internal.

    Janis

    Consider your target audience and their hardware, the target OS, and the realistic size of your data. If the target is a PC, you can happily
    malloc tens of MB at the start without a care, and for systems that do
    not actually allocate system memory until you try to access the area,
    there is no cost to this.

    So in many situations where you are reading and parsing data from a
    file, you can just do the initial malloc with more than enough space for
    any realistic input file. You might still implement a realloc solution
    for occasional extreme uses, and because it is nice to avoid artificial
    limits for programs, but efficiency matters a lot less in those cases.

    It may also be the case that even if realloc returns a different address
    and logically copies a lot of data, that this is done by smarter virtual
    memory mapping so that only the mapping changes, and the underlying
    physical ram does not need to be copied. But I don't know if OS's and
    realloc implementations are smart enough to do that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Janis Papanagnou on Mon Jun 17 15:21:24 2024
    On 17.06.2024 08:08, Janis Papanagnou wrote:
    In a recent thread realloc() was a substantial part of the discussion. "Occasionally" the increased data storage will be relocated along
    with the previously stored data. On huge data sets that might be a performance factor. Is there any experience or are there any concrete
    factors about the conditions when this relocation happens? - I could
    imagine that it's no issue as long as you're in some kB buffer range,
    but if, say, we're using realloc() to substantially increase buffers
    often it might be an issue to consider. It would be good to get some
    feeling about that internal.

    Let me add...

    I'd assume that there's some basic allocation size defined; some
    simple test sample with a handful of bytes didn't relocate the data.
    Yet I don't know whether allocated memory is managed sequentially
    or has linked blocks. A peek info the source code might help. What
    I found is this comment for extending chunks:[*]
    * Extending forward into following adjacent free chunk.
    * Shifting backwards, joining preceding adjacent space
    * Both shifting backwards and extending forward.
    * Extending into newly sbrked space
    Going to investigate that source code[*] later...

    Janis

    [*] https://elixir.bootlin.com/glibc/glibc-2.1.2/source/malloc/malloc.c
    (line 3077 ff)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Malcolm McLean on Mon Jun 17 15:33:47 2024
    Malcolm McLean <[email protected]> writes:

    On 17/06/2024 10:55, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    On 17/06/2024 10:18, Ben Bacarisse wrote:
    Janis Papanagnou <[email protected]> writes:

    In a recent thread realloc() was a substantial part of the discussion. >>>>> "Occasionally" the increased data storage will be relocated along
    with the previously stored data. On huge data sets that might be a
    performance factor. Is there any experience or are there any concrete >>>>> factors about the conditions when this relocation happens? - I could >>>>> imagine that it's no issue as long as you're in some kB buffer range, >>>>> but if, say, we're using realloc() to substantially increase buffers >>>>> often it might be an issue to consider. It would be good to get some >>>>> feeling about that internal.
    There is obviously a cost, but there is (usually) no alternative if
    contiguous storage is required. In practice, the cost is usually
    moderate and can be very effectively managed by using an exponential
    allocation scheme: at every reallocation multiply the storage space by >>>> some factor greater than 1 (I often use 3/2, but doubling is often used >>>> as well). This results in O(log(N)) rather than O(N) allocations as in >>>> your code that added a constant to the size. Of course, some storage is >>>> wasted (that /might/ be retrieved by a final realloc down to the final >>>> size) but that's rarely significant.

    So can we work it out?
    What is "it"?

    Let's assume for the moment that the allocations have a semi-normal
    distribution,
    What allocations? The allocations I talked about don't have that
    distribution.

    with negative values disallowed. Now ignoring the first few
    values, if we have allocated, say, 1K, we ought to be able to predict the >>> value by integrating the distribution from 1k to infinity and taking the >>> mean.
    I have no idea what you are talking about. What "value" are you looking
    to calculate?

    We have a continuously growing buffer, and we want the best strategy for reallocations as the stream of characters comes at us. So, given we now how many characters have arrived, can we predict how many will arrive, and therefore ask for the best amount when we reallocate, so that we neither
    make too many reallocation (reallocate on every byte received) or ask for
    too much (demand SIZE_MAX memory when the first byte is received).?

    Obviously not, or we'd use the prediction. You question was probably rhetorical, but it didn't read that way.

    Your strategy for avoiding these extremes is exponential growth.

    It's odd to call it mine. It's very widely know and used. "The one I mentioned" might be less confusing description.

    You
    allocate a small amount for the first few bytes. Then you use exponential growth, with a factor of ether 2 or 1.5. My question is whether or not we
    can be cuter. And of course we need to know the statistical distribution of the input files. And I'm assuming a semi-normal distribution, ignoring the files with small values, which we will allocate enough for anyway.

    And so we integrate the distribution between the point we are at and infinity. Then we tkae the mean. And that gives us a best estimate of how many bytes are to come, and therefore how much to grow the buffer by.

    I would be surprised if that were worth the effort at run time. A
    static analysis of "typical" input sizes might be interesting as that
    could be used to get an estimate of good factors to use, but anything
    more complicated than maybe a few factors (e.g. doubling up to 1MB then
    3/2 thereafter) is likely to be too messy to useful.

    Also, the cost of reallocations is not constant. Larger ones are
    usually more costly than small ones, so if one were going to a lot of
    effort to make run-time guesses, that cost should be factored in as
    well.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Shepelev@21:1/5 to All on Mon Jun 17 18:02:49 2024
    XPost: sci.stat.math

    [cross-posted to: ci.stat.math]

    Malcolm McLean:

    We have a continuously growing buffer, and we want the
    best strategy for reallocations as the stream of
    characters comes at us. So, given we now how many
    characters have arrived, can we predict how many will
    arrive,

    Do you mean in the next bunch, or in total (till the end of
    the buffer's lifetime)?

    and therefore ask for the best amount when we reallocate,
    so that we neither make too many reallocation (reallocate
    on every byte received) or ask for too much (demand
    SIZE_MAX memory when the first byte is received).?

    Your strategy for avoiding these extremes is exponential
    growth. You allocate a small amount for the first few
    bytes. Then you use exponential growth, with a factor of
    ether 2 or 1.5.

    This strategy ensures a constant ratio between the amount of
    reallocated data to the length of the buffer by making
    reallocations less frequent as the buffer grows.

    And so we integrate the distribution between the point we
    are at and infinity. Then we tkae the mean. And that gives
    us a best estimate of how many bytes are to come, and
    therefore how much to grow the buffer by.

    You have an apriori distribution of the buffer size (can be
    tracked on-the-fly, if unknown beforehand) and a partially
    filled buffer. The task is to calculate the a-posteriori
    distribution of /that/ buffer's final size, and then to
    allocate the predicted value based on a good percentile.

    How about using a percentile instead of the mean, e.g. if
    the current size corresponds to percentile p, you allocate a
    capacity corresponding to percentile 1-(1-p)/k , where k>1
    denotes the balance between space and time efficency. For
    example, if the 60th percentile of the buffer is required
    and k=2, you allocate a capacity sufficient to hold
    100-(100-60)/2=80% of buffers.

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Shepelev@21:1/5 to All on Mon Jun 17 18:10:34 2024
    Ben Bacarisse to Malcolm McLean:

    We have a continuously growing buffer, and we want the
    best strategy for reallocations as the stream of
    characters comes at us. So, given we now how many
    characters have arrived, can we predict how many will
    arrive, and therefore ask for the best amount when we
    reallocate, so that we neither make too many
    reallocation (reallocate on every byte received) or ask
    for too much (demand SIZE_MAX memory when the first byte
    is received).?

    Obviously not, or we'd use the prediction.

    Not so obvious to me, for the exponential algorithm may be
    the best when the distribution of buffer size is /not/
    known, whereas Malcolm is interested in the cases when we
    know it.

    Your strategy for avoiding these extremes is exponential
    growth.

    It's odd to call it mine. It's very widely know and used.
    "The one I mentioned" might be less confusing description.

    I think it is a modern English idiom, which I dislike as
    well. StackOverflow is full of questions starting like:
    "How do you do this?" and "How do I do that?" They are
    informal ways of the more literary "How does one do this?"
    or "What is the way to do that?"

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Harnden@21:1/5 to Ben Bacarisse on Mon Jun 17 16:15:27 2024
    On 17/06/2024 15:33, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    On 17/06/2024 10:55, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    On 17/06/2024 10:18, Ben Bacarisse wrote:
    Janis Papanagnou <[email protected]> writes:

    In a recent thread realloc() was a substantial part of the discussion. >>>>>> "Occasionally" the increased data storage will be relocated along
    with the previously stored data. On huge data sets that might be a >>>>>> performance factor. Is there any experience or are there any concrete >>>>>> factors about the conditions when this relocation happens? - I could >>>>>> imagine that it's no issue as long as you're in some kB buffer range, >>>>>> but if, say, we're using realloc() to substantially increase buffers >>>>>> often it might be an issue to consider. It would be good to get some >>>>>> feeling about that internal.
    There is obviously a cost, but there is (usually) no alternative if
    contiguous storage is required. In practice, the cost is usually
    moderate and can be very effectively managed by using an exponential >>>>> allocation scheme: at every reallocation multiply the storage space by >>>>> some factor greater than 1 (I often use 3/2, but doubling is often used >>>>> as well). This results in O(log(N)) rather than O(N) allocations as in >>>>> your code that added a constant to the size. Of course, some storage is >>>>> wasted (that /might/ be retrieved by a final realloc down to the final >>>>> size) but that's rarely significant.

    So can we work it out?
    What is "it"?

    Let's assume for the moment that the allocations have a semi-normal
    distribution,
    What allocations? The allocations I talked about don't have that
    distribution.

    with negative values disallowed. Now ignoring the first few
    values, if we have allocated, say, 1K, we ought to be able to predict the >>>> value by integrating the distribution from 1k to infinity and taking the >>>> mean.
    I have no idea what you are talking about. What "value" are you looking >>> to calculate?

    We have a continuously growing buffer, and we want the best strategy for
    reallocations as the stream of characters comes at us. So, given we now how >> many characters have arrived, can we predict how many will arrive, and
    therefore ask for the best amount when we reallocate, so that we neither
    make too many reallocation (reallocate on every byte received) or ask for
    too much (demand SIZE_MAX memory when the first byte is received).?

    Obviously not, or we'd use the prediction. You question was probably rhetorical, but it didn't read that way.

    Your strategy for avoiding these extremes is exponential growth.

    It's odd to call it mine. It's very widely know and used. "The one I mentioned" might be less confusing description.

    You
    allocate a small amount for the first few bytes. Then you use exponential
    growth, with a factor of ether 2 or 1.5. My question is whether or not we
    can be cuter. And of course we need to know the statistical distribution of >> the input files. And I'm assuming a semi-normal distribution, ignoring the >> files with small values, which we will allocate enough for anyway.

    And so we integrate the distribution between the point we are at and
    infinity. Then we tkae the mean. And that gives us a best estimate of how
    many bytes are to come, and therefore how much to grow the buffer by.

    I would be surprised if that were worth the effort at run time. A
    static analysis of "typical" input sizes might be interesting as that
    could be used to get an estimate of good factors to use, but anything
    more complicated than maybe a few factors (e.g. doubling up to 1MB then
    3/2 thereafter) is likely to be too messy to useful.

    Also, the cost of reallocations is not constant. Larger ones are
    usually more costly than small ones, so if one were going to a lot of
    effort to make run-time guesses, that cost should be factored in as
    well.


    I usually keep track:

    struct
    {
    size_t used;
    size_t allocated;
    void *data;
    };

    Then, if used + new_size is more than what's already been allocated then
    a realloc will be required.

    Start with an initial allocated size that's 'resonable' - the happy path
    will never need any reallocs.

    Otherwise multiply by some factor. Typicall I just double it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Janis Papanagnou on Mon Jun 17 16:50:07 2024
    Janis Papanagnou <[email protected]> writes:
    In a recent thread realloc() was a substantial part of the discussion. >"Occasionally" the increased data storage will be relocated along
    with the previously stored data. On huge data sets that might be a >performance factor. Is there any experience or are there any concrete
    factors about the conditions when this relocation happens? - I could
    imagine that it's no issue as long as you're in some kB buffer range,
    but if, say, we're using realloc() to substantially increase buffers
    often it might be an issue to consider. It would be good to get some
    feeling about that internal.

    I've not found a use for realloc in the last forty five years, myself.

    I suspect that the performance issues are not an issue for relatively
    small datasets, and are often exhibited during the non-performance critical 'setup' phase of an algorithm.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Malcolm McLean on Mon Jun 17 16:58:52 2024
    Malcolm McLean <[email protected]> writes:
    On 17/06/2024 10:55, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:



    I have no idea what you are talking about. What "value" are you looking
    to calculate?

    We have a continuously growing buffer,

    At this point, you should be asking yourself
    if there are better alternatives for storing
    the incoming data than to a continuously growing
    dynamically allocated piecemeal buffer.

    C character stdio tends to work well for streaming applications
    (i.e. pipelines where the input is (minimally) processed and forwarded
    to the output), but not so efficiently for applications that need to
    look at the data en masse.

    Personnally, I'd mmap the input file and eschew stdio completely
    and just walk through memory with the appropriate pointer.

    (mmap showed up in the late 80s, so you can pretend it
    is C90 if you like).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Malcolm McLean on Mon Jun 17 20:11:48 2024
    On 17/06/2024 11:31, Malcolm McLean wrote:
    On 17/06/2024 10:18, Ben Bacarisse wrote:
    Janis Papanagnou <[email protected]> writes:

    In a recent thread realloc() was a substantial part of the discussion.
    "Occasionally" the increased data storage will be relocated along
    with the previously stored data. On huge data sets that might be a
    performance factor. Is there any experience or are there any concrete
    factors about the conditions when this relocation happens? - I could
    imagine that it's no issue as long as you're in some kB buffer range,
    but if, say, we're using realloc() to substantially increase buffers
    often it might be an issue to consider. It would be good to get some
    feeling about that internal.

    There is obviously a cost, but there is (usually) no alternative if
    contiguous storage is required.  In practice, the cost is usually
    moderate and can be very effectively managed by using an exponential
    allocation scheme: at every reallocation multiply the storage space by
    some factor greater than 1 (I often use 3/2, but doubling is often used
    as well).  This results in O(log(N)) rather than O(N) allocations as in
    your code that added a constant to the size.  Of course, some storage is
    wasted (that /might/ be retrieved by a final realloc down to the final
    size) but that's rarely significant.

    So can we work it out?

    Let's assume for the moment that the allocations have a semi-normal distribution, with negative values disallowed. Now ignoring the first
    few values, if we have allocated, say, 1K, we ought to be able to
    predict the value by integrating the distribution from 1k to infinity
    and taking the mean.


    First, there is no reason for assuming such a distribution, other than
    saying "lots of things are roughly normal".

    Secondly, knowing the distribution gives you /no/ information about any
    given particular case. You know the distribution for the results of
    rolling two die - does that mean you can predict the next roll?

    Thirdly, not all distributions have a mean (look up the Cauchy
    distribution if you like).

    Fourthly, even if you know the mean, it tells you nothing of use.


    Knowing a bit about the distribution of file sizes can be useful, but
    not nearly in the way you describe here. If you know that the files are
    rarely or never bigger than 10 MB, malloc 10 MB and forget the realloc.
    If you know they are often bigger than that, mmap the file and forget
    the realloc.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Scott Lurndal on Mon Jun 17 20:20:57 2024
    On Mon, 17 Jun 2024 16:50:07 GMT
    [email protected] (Scott Lurndal) wrote:

    Janis Papanagnou <[email protected]> writes:
    In a recent thread realloc() was a substantial part of the
    discussion. "Occasionally" the increased data storage will be
    relocated along with the previously stored data. On huge data sets
    that might be a performance factor. Is there any experience or are
    there any concrete factors about the conditions when this relocation >happens? - I could imagine that it's no issue as long as you're in
    some kB buffer range, but if, say, we're using realloc() to
    substantially increase buffers often it might be an issue to
    consider. It would be good to get some feeling about that internal.

    I've not found a use for realloc in the last forty five years, myself.


    Did you find use for std::vector:resize()?
    If yes, that could be major reason behind not finding use for realloc(). Another possible reason is coding for environments where dynamic
    allocation either not used at all or used only during start up.

    At least for me those are major reasons why I very rarely used realloc
    since beginning of programming as a pro.

    I suspect that the performance issues are not an issue for relatively
    small datasets, and are often exhibited during the non-performance
    critical 'setup' phase of an algorithm.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Mon Jun 17 19:02:13 2024
    Michael S <[email protected]> writes:
    On Mon, 17 Jun 2024 16:50:07 GMT
    [email protected] (Scott Lurndal) wrote:

    Janis Papanagnou <[email protected]> writes:
    In a recent thread realloc() was a substantial part of the
    discussion. "Occasionally" the increased data storage will be
    relocated along with the previously stored data. On huge data sets
    that might be a performance factor. Is there any experience or are
    there any concrete factors about the conditions when this relocation
    happens? - I could imagine that it's no issue as long as you're in
    some kB buffer range, but if, say, we're using realloc() to
    substantially increase buffers often it might be an issue to
    consider. It would be good to get some feeling about that internal.

    I've not found a use for realloc in the last forty five years, myself.


    Did you find use for std::vector:resize()?

    I'm pretty sure (checks) that I posted this reply to comp.lang.c.

    std::vector::resize() doesn't work well from C (well, I can mangle
    the names and use an explicit this pointer, but why bother?).

    If yes, that could be major reason behind not finding use for realloc(). >Another possible reason is coding for environments where dynamic
    allocation either not used at all or used only during start up.

    Or because the algorithms used don't call for realloc. Or there
    are better alternatives (like mmap).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Anton Shepelev on Tue Jun 18 00:09:24 2024
    Anton Shepelev <anton.txt@g{oogle}mail.com> writes:

    Ben Bacarisse to Malcolm McLean:

    [next is a comment from Malcolm]

    Your strategy for avoiding these extremes is exponential
    growth.

    It's odd to call it mine. It's very widely know and used.
    "The one I mentioned" might be less confusing description.

    I think it is a modern English idiom, which I dislike as
    well. StackOverflow is full of questions starting like:
    "How do you do this?" and "How do I do that?" They are
    informal ways of the more literary "How does one do this?"
    or "What is the way to do that?"

    I have a different take here. First the "your" of "your
    strategy" reads as a definite pronoun, meaning it refers
    specifically to Ben and not to some unknown other party.
    (And incidentally is subtly insulting because of that,
    whether it was meant that way or not.)

    Second the use of "you" to mean an unspecified other person
    is not idiom but standard usage. The word "you" is both a
    definite pronoun and an indefinite pronoun, depending on
    context. The word "they" also has this property. Consider
    these two examples:

    The bank downtown was robbed. They haven't been caught
    yet.

    They say the sheriff isn't going to run for re-election.

    In the first example "they" is a definite pronoun, referring
    to the people who robbed the bank. In the second example,
    "they" is an indefinite pronoun, referring to unspecified
    people in general (perhaps but not necessarily everyone).
    The word "you" is similar: it can mean specifically the
    listener, or it can mean generically anyone in a broader
    audience, even those who never hear or read the statement
    with "you" in it.

    The word "one" used as a pronoun is more formal, and to me
    at least often sounds stilted. In US English "one" is most
    often an indefinite pronoun, either second person or third
    person. But "one" can also be used as a first person
    definite pronoun (referring to the speaker), which an online
    reference tells me is chiefly British English. (I would
    guess that this usage predominates in "the Queen's English"
    dialect of English, but I have very little experience in
    such things.)

    Finally I would normally read "I" as a first person definite
    pronoun, and not an indefinite pronoun. So I don't have any
    problem with someone saying "how should I ..." when asking
    for advice. They aren't asking how someone else should ...
    but how they should ..., and what advice I might give could
    very well depend on who is doing the asking.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rosario19@21:1/5 to [email protected] on Tue Jun 18 11:50:48 2024
    On Mon, 17 Jun 2024 08:08:07 +0200, Janis Papanagnou <[email protected]> wrote:

    In a recent thread realloc() was a substantial part of the discussion. >"Occasionally" the increased data storage will be relocated along
    with the previously stored data. On huge data sets that might be a >performance factor. Is there any experience or are there any concrete
    factors about the conditions when this relocation happens? - I could
    imagine that it's no issue as long as you're in some kB buffer range,
    but if, say, we're using realloc() to substantially increase buffers
    often it might be an issue to consider. It would be good to get some
    feeling about that internal.

    Janis

    the only problem i see it is the memory that is free is the first has
    to be used, or be returned from malloc or realloc, because that memory
    is already in a good position near the cpu

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to Anton Shepelev on Tue Jun 18 17:59:30 2024
    XPost: sci.stat.math

    Anton Shepelev wrote:

    [cross-posted to: ci.stat.math]

    Malcolm McLean:

    We have a continuously growing buffer, and we want the
    best strategy for reallocations as the stream of
    characters comes at us. So, given we now how many
    characters have arrived, can we predict how many will
    arrive,

    Do you mean in the next bunch, or in total (till the end of
    the buffer's lifetime)?

    and therefore ask for the best amount when we reallocate,
    so that we neither make too many reallocation (reallocate
    on every byte received) or ask for too much (demand
    SIZE_MAX memory when the first byte is received).?

    Your strategy for avoiding these extremes is exponential
    growth. You allocate a small amount for the first few
    bytes. Then you use exponential growth, with a factor of
    ether 2 or 1.5.

    This strategy ensures a constant ratio between the amount of
    reallocated data to the length of the buffer by making
    reallocations less frequent as the buffer grows.

    And so we integrate the distribution between the point we
    are at and infinity. Then we tkae the mean. And that gives
    us a best estimate of how many bytes are to come, and
    therefore how much to grow the buffer by.

    You have an apriori distribution of the buffer size (can be
    tracked on-the-fly, if unknown beforehand) and a partially
    filled buffer. The task is to calculate the a-posteriori
    distribution of that buffer's final size, and then to
    allocate the predicted value based on a good percentile.

    How about using a percentile instead of the mean, e.g. if
    the current size corresponds to percentile p, you allocate a
    capacity corresponding to percentile 1-(1-p)/k , where k>1
    denotes the balance between space and time efficency. For
    example, if the 60th percentile of the buffer is required
    and k=2, you allocate a capacity sufficient to hold
    100-(100-60)/2=80% of buffers.

    Based on essentially no background to this question, not much can be
    said. However, if one starts from the suggestion above to use the mean
    of some distribution (or later some percentile), one notes that the
    "mean" is just the minimum of a quadratic cast function ,,, so an
    improvement would be to base the choice on some more realistic cost
    function, chosen for the actual application. Given that the scenario
    apparently involves a sequence of such decisions, the obvious extension
    of the cost-based approach would be to employ some form of dynamic
    programming. Of course, this might not be appealing, in which case one
    might choose the theoretically-simple approach of tuning a policy based
    on good stchastic simulations of the situation.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Duffy@21:1/5 to Anton Shepelev on Wed Jun 19 06:48:17 2024
    XPost: sci.stat.math

    In sci.stat.math Anton Shepelev <anton.txt@g{oogle}mail.com> wrote:
    [cross-posted to: ci.stat.math]

    Malcolm McLean:

    We have a continuously growing buffer, and we want the
    best strategy for reallocations as the stream of
    characters comes at us. So, given we now how many
    characters have arrived, can we predict how many will
    arrive,

    Do you mean in the next bunch, or in total (till the end of
    the buffer's lifetime)?

    Isn't this a halting problem? Aren't the more important data:
    how much memory the user is allowed to allocate, the properties of
    the current system's memory allocation algorithm, when your stream
    will have to go to disc or other slow large volume storage, how
    the stream can be compressed on the fly (the latter might well give
    strong predictions for future storage requirements based on what
    has been read to date).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Shepelev@21:1/5 to All on Wed Jun 19 15:20:00 2024
    XPost: sci.stat.math

    Malcolm McLean writes that, given the log-normal distribution
    of file sizes with known parameters,

    we can work out, given that a file is at least N
    characters, what is the prbablity that an allocation of
    any size will contain the whole file, and how many bytes,
    on average will be wasted.

    This is why I thought statisticians might help him: Malcolm
    wants to find the aposteriori distribution of the size of a
    file, after it has been found to exceed N bytes. Am I right
    that if we take the remaining (N>20) part of the density
    function and re-normalise it, we shall obtain the desired
    distribution?

    My proposition was as follows:

    1. Find quantile q0 corresponding to the buffer size
    currently requested.

    2. Calculate new quantile q1 = 1-(1-q0)/k, where k>1 is
    an adjustable parameter, and use its corresponding
    value as the new allocation size.

    For example, assuming for simplicity a uniform [0,20]
    distribution of file sizez and k=2, a sequence of allocation
    may look like this:

    requested allocated
    2 20-(20- 2)/2 = 11
    12 20-(20-12)/2 = 16
    18 20-(20-18)/2 = 19
    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Malcolm McLean on Wed Jun 19 16:36:01 2024
    XPost: sci.stat.math

    Malcolm McLean <[email protected]> writes:

    No. We have to have some knowledge. And what we probaby know is that the input is a file stored on someone's personal computer. And someone has published on the statistical distribution of such files

    That's not the case that matters (to me at least). If the input is a
    file, we have a much better way of "guessing" the size than guessing and growing -- just ask for the size. Sure, we might need to make
    adjustments if the file is changing, but there is always a better
    measure than any statistical analysis.

    To some extent this seems like a solution in search of a problem.
    Growing the buffer exponentially is simple and effective.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Ben Bacarisse on Wed Jun 19 19:41:49 2024
    XPost: sci.stat.math

    On 19/06/2024 17:36, Ben Bacarisse wrote:
    Malcolm McLean <[email protected]> writes:

    No. We have to have some knowledge. And what we probaby know is that the
    input is a file stored on someone's personal computer. And someone has
    published on the statistical distribution of such files

    That's not the case that matters (to me at least). If the input is a
    file, we have a much better way of "guessing" the size than guessing and growing -- just ask for the size. Sure, we might need to make
    adjustments if the file is changing, but there is always a better
    measure than any statistical analysis.

    To some extent this seems like a solution in search of a problem.

    It seems more like a solution that doesn't exist in search of a problem
    with absurdly unrealistic requirements. And even if Malcolm's solution existed, and the problem existed, it /still/ wouldn't work - knowing the distribution of file sizes tells us nothing about the size of any given
    file.

    Growing the buffer exponentially is simple and effective.


    Yes, that's the general way to handle buffers when you don't know what
    size they should be.

    A better solutions for this sort of program is usually, as you say,
    asking the OS for the file size (there is no standard library function
    for getting the file size, but it's not hard to do for any realistic
    target OS). And then for big files, prefer mmap to reading the file
    into a buffer.

    It's only really for unsized "files" such as piped input that you have
    no way of getting the size, and then exponential growth is the way to
    go. Personally, I'd start with a big size (perhaps 10 MB) that is
    bigger than you are likely to need in practice, but small enough that it
    is negligible on even vaguely modern computers. Then the realloc code is unlikely to be used (but it can still be there for completeness).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to David Brown on Wed Jun 19 22:24:35 2024
    XPost: sci.stat.math

    David Brown <[email protected]> writes:

    On 19/06/2024 17:36, Ben Bacarisse wrote:
    Growing the buffer exponentially is simple and effective.

    Yes, that's the general way to handle buffers when you don't know what size they should be.

    A better solutions for this sort of program is usually, as you say, asking the OS for the file size (there is no standard library function for getting the file size, but it's not hard to do for any realistic target OS). And then for big files, prefer mmap to reading the file into a buffer.

    It's only really for unsized "files" such as piped input that you have no
    way of getting the size, and then exponential growth is the way to go. Personally, I'd start with a big size (perhaps 10 MB) that is bigger than
    you are likely to need in practice, but small enough that it is negligible
    on even vaguely modern computers. Then the realloc code is unlikely to be used (but it can still be there for completeness).

    There are other uses that have nothing to do with files. I have a small dynamic array library (just a couple of function) that I use for all
    sorts of things. I can read a file or parse tokens or input a line just
    by adding characters. Because of its rather general use, I don't start
    with a large buffer (though the initial size can be set).

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Shepelev@21:1/5 to All on Thu Jun 20 01:53:47 2024
    XPost: sci.stat.math

    Malcolm McLean:

    We have to have some knowledge. And what we probaby know
    is that the input is a file stored on someone's personal
    computer. And someone has published on the statistical
    distribution of such files And they have a log-normal
    distribution with a mean and a median which he gives. So
    with that informaton, we can work out, given that a file
    is at least N characters, what is the prbablity that an
    allocation of any size will contain the whole file, and
    how many bytes, on average will be wasted.

    Observe that the standard algorithm of exponential growth is
    memoryless and self-similar in that in does not depend on
    context, or the history of previous reallocations. These
    properties belong to (or even identify?) the exponential
    distribution. We can therefore assume that exponential-
    growth strategy is ideal for exponentially distributed
    buffer sizes, and under that assumption determine the
    relation between the CDF values (p) corresponding to
    consequent re-allcoations:

    p = e^x/L ,
    p0 = 1-e^(L*x0) ,
    p1 = 1-e^(L*x1) ,
    x1 = k*x0 (by our strategy), =>
    p1 = 1-(1-p0)^k .

    which does not depend on the distribution and lets us
    generalise this approach for any distribution:

    x1 = Q( 1 - ( 1 - CDF(x0) )^k )

    where:

    x0 : the required size
    x1 : the new recommended capacity
    Q(p) : the p-Quantile of the given distribution
    CDF(x): the CDF of the given distribution
    k>1 : balance between speed and space efficiency

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Ben Bacarisse on Thu Jun 20 13:22:31 2024
    On 19/06/2024 23:24, Ben Bacarisse wrote:
    David Brown <[email protected]> writes:

    On 19/06/2024 17:36, Ben Bacarisse wrote:
    Growing the buffer exponentially is simple and effective.

    Yes, that's the general way to handle buffers when you don't know what size >> they should be.

    A better solutions for this sort of program is usually, as you say, asking >> the OS for the file size (there is no standard library function for getting >> the file size, but it's not hard to do for any realistic target OS). And
    then for big files, prefer mmap to reading the file into a buffer.

    It's only really for unsized "files" such as piped input that you have no
    way of getting the size, and then exponential growth is the way to go.
    Personally, I'd start with a big size (perhaps 10 MB) that is bigger than
    you are likely to need in practice, but small enough that it is negligible >> on even vaguely modern computers. Then the realloc code is unlikely to be
    used (but it can still be there for completeness).

    There are other uses that have nothing to do with files.

    Of course. This comment was for the specific purposes being discussed
    here. For other uses, there can be many other structures and algorithms
    that fit better. Exponentially increasing the size when needed is a
    good general-purpose method.

    I have a small
    dynamic array library (just a couple of function) that I use for all
    sorts of things. I can read a file or parse tokens or input a line just
    by adding characters. Because of its rather general use, I don't start
    with a large buffer (though the initial size can be set).


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Vir Campestris@21:1/5 to Bonita Montero on Thu Jun 20 21:08:00 2024
    On 17/06/2024 10:22, Bonita Montero wrote:

    realloc() is just a convenience funciton. Usually the reallocation
    can't happen in-place and a second malloc() followed by a copy and
    a free() does the same.
    For large data it would be nice if the pages being deallocated later
    would be incrementally marked as discardable after copying a portion.
    This would result in only a small portion of additional physical
    memory being allocated since the newly allocated pages become asso-
    ciated with phyiscal pages when they're touched first. Windows has VirtualAlloc() with MEM_RESET for that, Linux has madvise() with MADV_DONTNEED.

    "Usually can't happen in place"?

    Really? It's not something I use a lot, but when it's appropriate I
    will. It's got the advantage over doing this myself that for some
    portion of calls all the run time library needs to do is change the size
    field in the structure.

    Nothing else.

    No copying, and no duplicate allocations.

    What proportion of calls can be managed by changing the size field alone depends on your workload and the platform. But I doubt there are many
    cases where it is 0%.

    Andy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Mon Jun 24 08:40:03 2024
    On Fri, 21 Jun 2024 21:12:12 +0200, Bonita Montero wrote:

    Usually you don't resize the block with a few bytes ...

    The usual way I use realloc is to maintain separate counts of the number
    of array elements I have allocated, and the number I am actually using. A realloc call is only needed when the latter hits the former. Every time I
    call realloc, I will extend by some minimum number of array elements (e.g. 128), roughly comparable to the sort of array size I typically end up
    with.

    And then when the structure is complete, I do a final realloc call to
    shrink it down so the size is actually that used. Is it safe to assume
    such a call will never fail? Hmm ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Mon Jun 24 13:40:08 2024
    On 24/06/2024 11:55, Keith Thompson wrote:

    Something else that occurs to me: If a shrinking realloc() never fails
    in practice, then any code you write to handle a failure won't be
    tested.


    That is always a problem with allocation functions. Have you ever known
    a non-pathological malloc() to fail?

    I think, in fact, there's a good argument for ignoring the possibility
    of malloc (and calloc and realloc) failures for most PC code. There is virtually no chance of failure in reality, and if you get one, there is
    almost never a sensible way to deal with it - you just kick the can down
    the road by having functions return NULL until something gives up and
    stops the program with an error message. You might as well just let the
    OS kill the program when you try to access memory at address 0.

    I've seen more than enough error handling code that has never been
    tested in practice - including error handling code with bugs that lead
    to far worse problems than just killing the program.

    Of course such treatment is not appropriate for all allocations (or
    other functions that could fail). But often I think it is better to
    write clearer and fully testable (and tested!) code which ignores
    hypothetical errors, rather than some of the untestable and untested
    jumbles that are sometimes seen in an attempt to "handle" allocation
    failures.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Malcolm McLean on Mon Jun 24 09:32:40 2024
    Malcolm McLean <[email protected]> writes:

    On 18/06/2024 08:09, Tim Rentsch wrote:

    Anton Shepelev <anton.txt@g{oogle}mail.com> writes:

    Ben Bacarisse to Malcolm McLean:

    [next is a comment from Malcolm]

    Your strategy for avoiding these extremes is exponential
    growth.

    It's odd to call it mine. It's very widely know and used.
    "The one I mentioned" might be less confusing description.

    I think it is a modern English idiom, which I dislike as
    well. StackOverflow is full of questions starting like:
    "How do you do this?" and "How do I do that?" They are
    informal ways of the more literary "How does one do this?"
    or "What is the way to do that?"

    I have a different take here. First the "your" of "your
    strategy" reads as a definite pronoun, meaning it refers
    specifically to Ben and not to some unknown other party.
    (And incidentally is subtly insulting because of that,
    whether it was meant that way or not.)

    Second the use of "you" to mean an unspecified other person
    is not idiom but standard usage. The word "you" is both a
    definite pronoun and an indefinite pronoun, depending on
    context. The word "they" also has this property. Consider
    these two examples:

    The bank downtown was robbed. They haven't been caught
    yet.

    They say the sheriff isn't going to run for re-election.

    In the first example "they" is a definite pronoun, referring
    to the people who robbed the bank. In the second example,
    "they" is an indefinite pronoun, referring to unspecified
    people in general (perhaps but not necessarily everyone).
    The word "you" is similar: it can mean specifically the
    listener, or it can mean generically anyone in a broader
    audience, even those who never hear or read the statement
    with "you" in it.

    The word "one" used as a pronoun is more formal, and to me
    at least often sounds stilted. In US English "one" is most
    often an indefinite pronoun, either second person or third
    person. But "one" can also be used as a first person
    definite pronoun (referring to the speaker), which an online
    reference tells me is chiefly British English. (I would
    guess that this usage predominates in "the Queen's English"
    dialect of English, but I have very little experience in
    such things.)

    Finally I would normally read "I" as a first person definite
    pronoun, and not an indefinite pronoun. So I don't have any
    problem with someone saying "how should I ..." when asking
    for advice. They aren't asking how someone else should ...
    but how they should ..., and what advice I might give could
    very well depend on who is doing the asking.

    Ben said

    Restore snipped Ben upthread

    "In practice, the cost is usually moderate and can be very
    effectively managed by using an exponential allocation scheme: at
    every reallocation multiply the storage space by some factor greater
    than 1 (I often use 3/2, but doubling is often used as well)."

    So it's open and shut, and no two ways about it. Ben's strategy is exponential growth. And to be fair I use that strategy myself in
    functions like fslutp(). It's only not Ben's strategy if we mean to
    imply that Ben was the first person to use expoential growth, or the
    first to understand the mathematical implications, and of course
    that's not the case. It was all worked out by Euler long before any
    of us were born. [...]

    You have an annoying habit. Your writing often comes across as
    authoritarian and somewhat condescending. Furthermore you tend not
    to listen very well. Your response above is a case in point. You
    ignore what I'm talking about (which is not whether Ben uses an
    exponential growth strategy, or whether such a strategy is "Ben's"
    or not), and instead talk about something that is irrelevant to what
    I was saying. You have completely missed the point. Your comments
    do nothing to extend the conversation. From where I sit all they do
    is cause irritation and illustrate how muddled your thinking is.
    I'm sure this isn't the first time you've heard comments along these
    lines. It would be nice if you would make an effort to improve
    your behavior in light of these repeated comments.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Tim Rentsch on Mon Jun 24 19:19:38 2024
    On 24/06/2024 18:32, Tim Rentsch wrote:

    You have an annoying habit. Your writing often comes across as
    authoritarian and somewhat condescending. Furthermore you tend not
    to listen very well.

    The irony of that post is /astounding/.

    I have met few people with a greater knowledge and insight in the C
    language than you. And I have met few with less self-insight.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Malcolm McLean on Mon Jun 24 18:20:32 2024
    Malcolm McLean <[email protected]> writes:
    On 24/06/2024 12:40, David Brown wrote:

    Of course such treatment is not appropriate for all allocations (or
    other functions that could fail).  But often I think it is better to
    write clearer and fully testable (and tested!) code which ignores
    hypothetical errors, rather than some of the untestable and untested
    jumbles that are sometimes seen in an attempt to "handle" allocation
    failures.


    Baby X has bbx_malloc() which is guaranteed never to return NULL, and
    never to return a pointer to an allocation which cannot be indexed by an
    int.

    What do you mean by 'indexed by an int'? So, what happens if I index
    your allocation with -109235?

    Or did you mean to say unsigned (or positive) int less than the
    size of the allocation?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Mon Jun 24 22:59:23 2024
    On Mon, 24 Jun 2024 13:40:08 +0200, David Brown wrote:

    Have you ever known a non-pathological malloc() to fail?

    I was once commissioned, many decades ago, to write a multispectral image viewer to run on old MacOS. I followed my usual memory-allocation
    discipline. The client reported how he tried to open too many images at
    once, and ran out of memory; my program reported one out-of-memory error,
    gave up trying to open the rest of the files, and gracefully recovered
    without crashing.

    The program that had been supplied to him for Microsoft Windows, however,
    gave an error for *each* file it failed to open.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Tue Jun 25 07:06:41 2024
    On Mon, 24 Jun 2024 18:50:15 +0100, Malcolm McLean wrote:

    Baby X has bbx_malloc() which is guaranteed never to return NULL ...

    Does it actually allocate the (physical) memory?

    I wrote a memory-hog app for Android once, and found that allocating large amounts of memory space had very little impact on the system. Then when I
    added code to actually write data into those allocated pages, that’s when
    it really started to break into a sweat ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Keith Thompson on Tue Jun 25 07:02:39 2024
    On Mon, 24 Jun 2024 02:55:39 -0700, Keith Thompson wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    The usual way I use realloc is to maintain separate counts of the
    number of array elements I have allocated, and the number I am actually
    using. A realloc call is only needed when the latter hits the former.
    Every time I call realloc, I will extend by some minimum number of
    array elements (e.g. 128), roughly comparable to the sort of array size
    I typically end up with.

    And then when the structure is complete, I do a final realloc call to
    shrink it down so the size is actually that used. Is it safe to assume
    such a call will never fail? Hmm ...

    It's not safe to assume that a shrinking realloc call will never fail.
    It's possible that it will never fail in any existing implementation,
    but the standard makes no such guarantee.

    ...

    Having said all that, if realloc fails (indicated by returning a null pointer), you still have the original pointer to the object.

    In other words, it’s safe to ignore any error from that last shrinking realloc? That’s good enough for me. ;)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Vir Campestris@21:1/5 to Bonita Montero on Tue Jun 25 11:55:02 2024
    On 25/06/2024 09:48, Bonita Montero wrote:
    Test this code with your Linux installation. For my installation
    glibc does all realloc()ations in-place. Really surprising for me.

    #include <stdio.h>
    #include <stdlib.h>

    int main()
    {
        void *p = malloc( 0x100000000 );
        printf( "%p\n", p );
        p = realloc( p, 1 );
        printf( "%p\n", p );
        malloc( 0x100000000 - 0x10000 );
        p = realloc( p, 0x100000000 );
        printf( "%p\n", p );
    }
    Try allocating a bunch of little items, and looking at where they are.
    They'll likely be contiguous, or evenly spaced, depending on your implementation and what "little" is.

    Then resize them all. Some will move.

    Andy.
    --
    Your C++ comment up-thread BTW is off-topic here. My favourite C++
    container is vector, and that has a reserve call so you can keep growing
    the container without lots of reallocations.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Keith Thompson on Tue Jun 25 07:21:42 2024
    On 6/25/24 6:05 AM, Keith Thompson wrote:
    Lawrence D'Oliveiro <[email protected]d> writes:
    On Mon, 24 Jun 2024 02:55:39 -0700, Keith Thompson wrote:
    Lawrence D'Oliveiro <[email protected]d> writes:
    The usual way I use realloc is to maintain separate counts of the
    number of array elements I have allocated, and the number I am actually >>>> using. A realloc call is only needed when the latter hits the former.
    Every time I call realloc, I will extend by some minimum number of
    array elements (e.g. 128), roughly comparable to the sort of array size >>>> I typically end up with.

    And then when the structure is complete, I do a final realloc call to
    shrink it down so the size is actually that used. Is it safe to assume >>>> such a call will never fail? Hmm ...

    It's not safe to assume that a shrinking realloc call will never fail.
    It's possible that it will never fail in any existing implementation,
    but the standard makes no such guarantee.

    ...

    Having said all that, if realloc fails (indicated by returning a null
    pointer), you still have the original pointer to the object.

    In other words, it’s safe to ignore any error from that last shrinking
    realloc? That’s good enough for me. ;)

    What? No, that's not what I said at all.

    Suppose you do something like:

    some_type *p = malloc(BIG_VALUE);
    // ...
    p = realloc(p, SMALL_VALUE);

    If the realloc() succeeds and doesn't relocate and copy the object,
    you're fine. If realloc() succeeds and *does* relocate the object, p
    still points to memory that has now been deallocated, and you don't have
    a pointer to the newly allocated memory. If realloc() fails, it returns
    a null pointer, but the original memory is still valid -- but again, the assignment clobbers your only pointer to it.

    I presume you can write code that handles all three possibilities, but
    you can't just ignore any errors.


    The idiom I always learned for realloc was something like:


    some_type *p = malloc(size);
    if (!p) {
    // allocation failed, do something about it. (might be just abort)
    }

    ...

    some_type *np = realloc(p, new_size);
    if (np) {
    p = np;
    } else {
    // p still points to old buffer, but you didn't get the new size
    // so do what you can to handle the situation.
    }

    // p here points to the current buffer,
    // might be the old size or the new.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DFS@21:1/5 to Bonita Montero on Tue Jun 25 09:56:58 2024
    On 6/25/2024 4:48 AM, Bonita Montero wrote:
    Test this code with your Linux installation. For my installation
    glibc does all realloc()ations in-place. Really surprising for me.

    #include <stdio.h>
    #include <stdlib.h>

    int main()
    {
        void *p = malloc( 0x100000000 );
        printf( "%p\n", p );
        p = realloc( p, 1 );
        printf( "%p\n", p );
        malloc( 0x100000000 - 0x10000 );
        p = realloc( p, 0x100000000 );
        printf( "%p\n", p );
    }


    $ gcc -Wall montera_test.c -o mt
    montera_test.c: In function ‘main’:
    montera_test.c:10:9: warning: ignoring return value of ‘malloc’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
    10 | malloc( 0x100000000 - 0x10000 );
    | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


    $ ./mt
    0x7fb976f12010
    0x7fb976f12010
    0x7fb876f11010

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Wed Jun 26 00:51:40 2024
    On Tue, 25 Jun 2024 10:38:28 +0200, Bonita Montero wrote:

    Am 25.06.2024 um 09:06 schrieb Lawrence D'Oliveiro:

    I wrote a memory-hog app for Android once, and found that allocating
    large amounts of memory space had very little impact on the system.
    Then when I added code to actually write data into those allocated
    pages, that’s when it really started to break into a sweat ...

    Then android is also doing overcommit.

    It is running a Linux kernel, and that tends to be the default setup in
    Linux.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Vir Campestris@21:1/5 to Bonita Montero on Wed Jun 26 12:15:33 2024
    On 25/06/2024 12:28, Bonita Montero wrote:

    The interesting part is that after doing the first realloc()
    the memory being freee isn't reused for the next malloc().

    That's entirely implementation dependent.

    Andy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phil Carmody@21:1/5 to Keith Thompson on Fri Jun 28 11:01:38 2024
    Keith Thompson <[email protected]> writes:
    Suppose you do something like:

    some_type *p = malloc(BIG_VALUE);
    // ...
    p = realloc(p, SMALL_VALUE);

    ... If realloc() succeeds and *does* relocate the object, p
    still points to memory that has now been deallocated, and you don't have
    a pointer to the newly allocated memory.

    Surely some mistake?

    However, such self-assignments are bad for the reasons you state later;
    verify, then update.

    Phil
    --
    We are no longer hunters and nomads. No longer awed and frightened, as we have gained some understanding of the world in which we live. As such, we can cast aside childish remnants from the dawn of our civilization.
    -- NotSanguine on SoylentNews, after Eugen Weber in /The Western Tradition/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Lawrence D'Oliveiro on Fri Jun 28 06:36:45 2024
    On 6/25/24 03:02, Lawrence D'Oliveiro wrote:
    On Mon, 24 Jun 2024 02:55:39 -0700, Keith Thompson wrote:
    ...
    Having said all that, if realloc fails (indicated by returning a null
    pointer), you still have the original pointer to the object.

    In other words, it’s safe to ignore any error from that last shrinking realloc? That’s good enough for me. ;)

    No, you misunderstand:

    q = realloc(p, SMALL_VALUE);

    Then if q is null, p still points at the originally allocated memory. If
    q is not null, then it may point at newly allocated memory, and p has in indeterminate value. You cannot go forward ignoring the possibility that
    no new object was allocated, because if you do, you have no way of
    knowing which of the two pointers you can safely dereference. You need,
    at least,

    if(q)
    p = q;

    then you can safely use p, regardless of whether realloc() allocated new memory.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Keith Thompson on Fri Jun 28 06:37:49 2024
    On 6/25/24 06:05, Keith Thompson wrote:
    ...
    Suppose you do something like:

    some_type *p = malloc(BIG_VALUE);
    // ...
    p = realloc(p, SMALL_VALUE);

    If the realloc() succeeds and doesn't relocate and copy the object,
    you're fine. If realloc() succeeds and *does* relocate the object, p
    still points to memory that has now been deallocated, and you don't have
    a pointer to the newly allocated memory. ...

    ? I believe that, in that case, p does point to the newly allocated memory.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Sat Jun 29 00:14:46 2024
    On Tue, 18 Jun 2024 11:46:36 +0100, Malcolm McLean wrote:

    Here are some real stats on file sizes, in case anone is interested.

    Data set, / OS Log-normal median & mean, Arithmetic mean, 50% occupied
    by (< mean)

    whole data set, 9.0 KB, 730 KB, 1.5 MB < 5.4 KB
    Mac OS 8.0 KB, 533 KB, 1.4 MB < 4.9 KB
    Windows 11.5 KB, 1.0 MB, 1.7 MB < 8.3 KB
    GNU/Linux 10.8 KB, 1.7MB, 2.2 MB < 4.8 KB

    https://www.researchgate.net/publication/353066615_How_Big_Are_Peoples%27_Computer_Files_File_Size_Distributions_Among_User-managed_Collections

    I don’t see any error bars. Without those, it hard to attach any
    significance to the differences in figures.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to anton.txt@g{oogle}mail.com on Tue Jul 2 00:51:33 2024
    XPost: sci.stat.math

    On Mon, 17 Jun 2024 18:02:49 +0300, Anton Shepelev
    <anton.txt@g{oogle}mail.com> wrote:

    [cross-posted to: ci.stat.math]

    Anton,

    The post being responded to was originally to comp.lang.c
    which I don't subscribe to.

    I have a question that I suppose reflects on my news source,
    GigaNews, or else on my reader, Forte Agent.

    Was this thread something posted 15 or 20 years ago?

    I tried to call up the original post by clicking on the Message
    ID when looking at headers; nothing comes up when Agent goes
    online to look. The header shows multiple earlier messages;
    none of them come up for me.

    My clicking on Message ID works elsewhere. The logical and
    simple explanation is that this is a thread old enough that
    GigaNews does not have it.

    I suppose that someone else might be able to tell me, if their
    supplier goes back further or if GigaNews is somehow failing
    to show me something that is recent.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Rich Ulrich on Tue Jul 2 03:02:07 2024
    XPost: sci.stat.math

    On 7/2/2024 12:51 AM, Rich Ulrich wrote:
    On Mon, 17 Jun 2024 18:02:49 +0300, Anton Shepelev <anton.txt@g{oogle}mail.com> wrote:

    [cross-posted to: ci.stat.math]

    Anton,

    The post being responded to was originally to comp.lang.c
    which I don't subscribe to.

    I have a question that I suppose reflects on my news source,
    GigaNews, or else on my reader, Forte Agent.

    Was this thread something posted 15 or 20 years ago?

    I tried to call up the original post by clicking on the Message
    ID when looking at headers; nothing comes up when Agent goes
    online to look. The header shows multiple earlier messages;
    none of them come up for me.

    My clicking on Message ID works elsewhere. The logical and
    simple explanation is that this is a thread old enough that
    GigaNews does not have it.

    I suppose that someone else might be able to tell me, if their
    supplier goes back further or if GigaNews is somehow failing
    to show me something that is recent.


    MID: <v4ojs8$gvji$[email protected]>

    http://al.howardknight.net/

    That gives this URL, as a copy of the message kicking off the thread.

    http://al.howardknight.net/?STYPE=msgid&MSGI=%3Cv4ojs8%24gvji%241%40dont-email.me%3E

    Some USENET News clients can work from the MID directly, but Thunderbird does not.
    A bare MID does not work for everyone.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Malcolm McLean on Tue Jul 2 16:39:21 2024
    Malcolm McLean <[email protected]> writes:

    On 29/06/2024 01:14, Lawrence D'Oliveiro wrote:
    On Tue, 18 Jun 2024 11:46:36 +0100, Malcolm McLean wrote:

    Here are some real stats on file sizes, in case anone is interested.

    Data set, / OS Log-normal median & mean, Arithmetic mean, 50% occupied
    by (< mean)

    whole data set, 9.0 KB, 730 KB, 1.5 MB < 5.4 KB
    Mac OS 8.0 KB, 533 KB, 1.4 MB < 4.9 KB
    Windows 11.5 KB, 1.0 MB, 1.7 MB < 8.3 KB
    GNU/Linux 10.8 KB, 1.7MB, 2.2 MB < 4.8 KB

    https://www.researchgate.net/publication/353066615_How_Big_Are_Peoples%27_Computer_Files_File_Size_Distributions_Among_User-managed_Collections
    I don’t see any error bars. Without those, it hard to attach any
    significance to the differences in figures.

    You don't need error bars becuase those fugures indicate a
    distribution. The file are log-normally distributed wth given means
    and median. So the spread is part of that data.

    There are (or should be) two different distributions. Error bars are
    intended to show the spread within the data. The log normal
    distribution is across the data.

    Now I suspect they didn't do it this way and just amalgamated all the
    file save data into one, but that explains rather than excuses the lack
    of error bars!

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to [email protected] on Tue Jul 2 11:45:25 2024
    XPost: sci.stat.math

    On Mon, 01 Jul 2024 22:10:00 -0700, Keith Thompson <[email protected]> wrote:

    Rich Ulrich <[email protected]> writes:
    On Mon, 17 Jun 2024 18:02:49 +0300, Anton Shepelev
    <anton.txt@g{oogle}mail.com> wrote:

    [cross-posted to: ci.stat.math]

    Anton,

    The post being responded to was originally to comp.lang.c
    which I don't subscribe to.

    I have a question that I suppose reflects on my news source,
    GigaNews, or else on my reader, Forte Agent.

    Was this thread something posted 15 or 20 years ago?

    I tried to call up the original post by clicking on the Message
    ID when looking at headers; nothing comes up when Agent goes
    online to look. The header shows multiple earlier messages;
    none of them come up for me.

    My clicking on Message ID works elsewhere. The logical and
    simple explanation is that this is a thread old enough that
    GigaNews does not have it.

    I suppose that someone else might be able to tell me, if their
    supplier goes back further or if GigaNews is somehow failing
    to show me something that is recent.

    The first article in this thread was posted to comp.lang.c by Janis >Papanagnou on 17 Jun 2024.

    There were several followups on the same day. The diret parent of your >article was cross-posted to comp.lang.c and sci.stat.math by Anton
    Shepelev (his was the first cross-posted article in the thread).

    Thanks, so it looks like a failure by GigaNews to retrieve the
    recent posts.

    I did see a bunch of cross-posted followups. I don't know C,
    and I thought there could be more context.

    Looking at original, absent posts is something I've done dozens
    of times over the years. Never a problem, except fora time or two
    with posts from the 1990s. I've used GigaNews since my regular
    ISP stopped providing Usenet access, maybe 15 years ago.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to [email protected] on Tue Jul 2 11:58:11 2024
    XPost: sci.stat.math

    On Tue, 02 Jul 2024 11:52:56 -0400, Rich Ulrich
    <[email protected]> wrote:


    Forte Agent invites me to click on the MID; asks if it is a
    mail or MID; asks if it should search the net. It still works
    when I test it on an old message in another group.


    Now it occurs to me -- It actually does make sense,
    economically, if what is searched online is limited the
    groups I subscribe to, or (even) only the group that
    is currently active.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to Paul on Tue Jul 2 11:52:56 2024
    XPost: sci.stat.math

    On Tue, 2 Jul 2024 03:02:07 -0400, Paul <[email protected]d> wrote:

    On 7/2/2024 12:51 AM, Rich Ulrich wrote:
    On Mon, 17 Jun 2024 18:02:49 +0300, Anton Shepelev
    <anton.txt@g{oogle}mail.com> wrote:

    [cross-posted to: ci.stat.math]

    Anton,

    The post being responded to was originally to comp.lang.c
    which I don't subscribe to.

    I have a question that I suppose reflects on my news source,
    GigaNews, or else on my reader, Forte Agent.

    Was this thread something posted 15 or 20 years ago?

    I tried to call up the original post by clicking on the Message
    ID when looking at headers; nothing comes up when Agent goes
    online to look. The header shows multiple earlier messages;
    none of them come up for me.

    My clicking on Message ID works elsewhere. The logical and
    simple explanation is that this is a thread old enough that
    GigaNews does not have it.

    I suppose that someone else might be able to tell me, if their
    supplier goes back further or if GigaNews is somehow failing
    to show me something that is recent.


    MID: <v4ojs8$gvji$[email protected]>

    http://al.howardknight.net/

    That gives this URL, as a copy of the message kicking off the thread.

    http://al.howardknight.net/?STYPE=msgid&MSGI=%3Cv4ojs8%24gvji%241%40dont-email.me%3E


    Yes, that's the message I see when I plug the Message ID
    into the program at http://al.howardknight.net/

    Thanks. I'm saving that.


    Some USENET News clients can work from the MID directly, but Thunderbird does not.
    A bare MID does not work for everyone.

    Forte Agent invites me to click on the MID; asks if it is a
    mail or MID; asks if it should search the net. It still works
    when I test it on an old message in another group.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Rich Ulrich on Tue Jul 2 15:09:25 2024
    XPost: sci.stat.math

    On 7/2/2024 11:58 AM, Rich Ulrich wrote:
    On Tue, 02 Jul 2024 11:52:56 -0400, Rich Ulrich
    <[email protected]> wrote:


    Forte Agent invites me to click on the MID; asks if it is a
    mail or MID; asks if it should search the net. It still works
    when I test it on an old message in another group.


    Now it occurs to me -- It actually does make sense,
    economically, if what is searched online is limited the
    groups I subscribe to, or (even) only the group that
    is currently active.


    Every device has "retention", but retention is limited.

    Whether it's a search site, or a USENET server (even Forte had
    their own news server, at one time), you need retention for
    older articles to be search-able either as body text, or as
    a <mid>.

    Now that Google Groups is no longer connected to USENET,
    that's one fewer places with a decent-sized archive. Google closed
    their service, after the ThaiSpam incident. The Eternal-September
    server, changed from one server to a two-server setup. The
    Transit Server had Spam Assassin loaded on it, removing THaiSpam,
    and the second server continued to offer normal ("filtered") service.

    The comp.lang.c group was one of the groups under attack. Since
    Google was letting the spam in, now that Google is disconnected,
    the spam is gone, and the readership on CLC has gone up.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Paul on Tue Jul 2 16:58:14 2024
    XPost: sci.stat.math

    On 7/2/24 15:09, Paul wrote:
    ...
    Now that Google Groups is no longer connected to USENET,
    ...
    that's one fewer places with a decent-sized archive. Google closed
    their service, after the ThaiSpam incident.

    While they no longer store new messages, they still have one of the
    largest archives of old messages, and it's still available for searching.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Paul on Tue Jul 2 16:54:46 2024
    XPost: sci.stat.math

    On 7/2/24 15:09, Paul wrote:
    ...
    Now that Google Groups is no longer connected to USENET,

    I just checked, and as I had expected, Google is still connected.

    that's one fewer places with a decent-sized archive. Google closed
    their service, after the ThaiSpam incident.

    They didn't close their service. They just stopped adding new messages
    to their archives. The messages that were stored prior to the closing
    are still available for searching.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Wed Jul 3 23:48:41 2024
    On Tue, 2 Jul 2024 10:18:32 +0100, Malcolm McLean wrote:

    The file are log-normally distributed wth given means and median. So the spread is part of that data.

    That’s an assumption of the parametric fit, not a fact of the data. Error bars would indicate how close the fit is.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Shepelev@21:1/5 to All on Mon Jul 8 19:34:56 2024
    XPost: sci.stat.math

    I had plumb forgot about this solution of mine:

    p0 = 1-e^(L*x0) ,
    p1 = 1-e^(L*x1) ,
    x1 = k*x0 (by our strategy), =>
    p1 = 1-(1-p0)^k .

    which does not depend on the distribution and lets us
    generalise this approach for any distribution:

    x1 = Q( 1 - ( 1 - CDF(x0) )^k )
    where:

    x0 : the required size
    x1 : the new recommended capacity
    Q(p) : the p-Quantile of the given distribution
    CDF(x): the CDF of the given distribution
    k>1 : balance between speed and space efficiency

    Let us test it with the exponential distribution, for which:

    Q (p) = -Ln( 1 - p )/L
    CDF(x) = 1 - e^(-Lx)

    Substituting these into the equation for x1:

    x1 = Q ( 1 - ( 1 - ( 1 - e^(-Lx0) ) )^k ) =
    Q ( 1 - ( e^(-Lx0) )^k ) =
    Q ( 1 - e^(-kLx0) ) =
    -Ln( e^(-kLx0) )/L = k*x0 (QED)

    That is, my solution is a/the generalisation of the
    exponential growth strategy.

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Shepelev@21:1/5 to All on Mon Jul 8 20:01:21 2024
    XPost: sci.stat.math

    Rich Ulrich:

    Thanks, so it looks like a failure by GigaNews to retrieve
    the recent posts.

    I did see a bunch of cross-posted followups. I don't know
    C, and I thought there could be more context.

    Characters are being read in (say, from a file) sequentially
    and stored in computer memory (RAM) in an "array" -- a
    linear data structure storing elements (our characters) in a
    sequential order, one ofter the other at addresses with
    increasing indexes -- somewhat like a mathematical vector.

    In order to store a character in an array, sufficient memory
    has to be "allocated" for it, but while reading we do not
    know beforehand the size of the file (or the total length of
    the sequence), and therefore increase the allocated aray
    size prospectively once the previous allocation is filled.
    This operaion is called `realloc' and frequently involves
    the tedious copying of the entire array onto a new location
    in memory, taking a time in proportion to the number
    elemennts so far allocated.

    The question is to develop an optimal allcation strategy for
    a given distribution of file sizes. The fasted solution is
    to allocate a gigantic array beforehand, but it is a
    terrible waste of memory. The slowet solution is to
    reallcoate for each single character read it, but is a
    terrible waste of CPU time. As I understand the problem, a
    strategy is needed that manifests some compromise between
    the extremes.

    Looking at original, absent posts is something I've done
    dozens of times over the years. Never a problem, except
    fora time or two with posts from the 1990s. I've used
    GigaNews since my regular ISP stopped providing Usenet
    access, maybe 15 years ago.

    Just in case, there are many totally free Usenet servers,
    e.g.:
    http://www.eternal-september.org/
    https://www.i2pn2.org/

    and even a web interface:

    https://www.novabbs.com

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to anton.txt@g{oogle}mail.com on Sun Jul 21 19:40:16 2024
    XPost: sci.stat.math

    On Mon, 8 Jul 2024 20:01:21 +0300, Anton Shepelev
    <anton.txt@g{oogle}mail.com> wrote:

    Rich Ulrich:

    Thanks, so it looks like a failure by GigaNews to retrieve
    the recent posts.

    I did see a bunch of cross-posted followups. I don't know
    C, and I thought there could be more context.

    <snip. Thanks for the details.>

    Okay, today I discovered that there was no failure by Giganews
    or by Forte Agent -- Instead, there was BEHAVIOR by Agent
    that I was not aware of.

    Today, was looking at All Desks (Agent terminology) to see what
    was in Sent, and I noticed there were messages in Inbox -- Those
    were the messages that I thought Agent had failed to retrieve.
    (Nice discussion there.)

    I guess - every other time I've clicked on Message-ID, the old
    message was in the group where I was reading and the old one
    showed up where I was reading. I've read the Agent group for
    ages, and I don't remember this feature ever being mentioned.

    Live and learn.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Shepelev@21:1/5 to All on Tue Jul 23 16:47:39 2024
    XPost: sci.stat.math

    Rich Ulrich:

    Today, was looking at All Desks (Agent terminology) to see
    what was in Sent, and I noticed there were messages in
    Inbox -- Those were the messages that I thought Agent had
    failed to retrieve. (Nice discussion there.)

    Looking forward to your take on the problem. Mine is rather
    simplisitc, but can be easily tested with several
    distributions. This is like extrapolation: we know the
    optimal solution for the given distribution (exponential)
    and want to devise a general method to get the optimal
    solution for any given distribution.

    Malcolm, if you are still interested, can you provide a test
    program that measures some statiscits for various allocation
    strategies on various distributions, inclusing the
    exponential?

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)