In a recent thread realloc() was a substantial part of the discussion. "Occasionally" the increased data storage will be relocated along
with the previously stored data. On huge data sets that might be a performance factor. Is there any experience or are there any concrete
factors about the conditions when this relocation happens? - I could
imagine that it's no issue as long as you're in some kB buffer range,
but if, say, we're using realloc() to substantially increase buffers
often it might be an issue to consider. It would be good to get some
feeling about that internal.
On 17/06/2024 10:18, Ben Bacarisse wrote:
Janis Papanagnou <[email protected]> writes:So can we work it out?
In a recent thread realloc() was a substantial part of the discussion.There is obviously a cost, but there is (usually) no alternative if
"Occasionally" the increased data storage will be relocated along
with the previously stored data. On huge data sets that might be a
performance factor. Is there any experience or are there any concrete
factors about the conditions when this relocation happens? - I could
imagine that it's no issue as long as you're in some kB buffer range,
but if, say, we're using realloc() to substantially increase buffers
often it might be an issue to consider. It would be good to get some
feeling about that internal.
contiguous storage is required. In practice, the cost is usually
moderate and can be very effectively managed by using an exponential
allocation scheme: at every reallocation multiply the storage space by
some factor greater than 1 (I often use 3/2, but doubling is often used
as well). This results in O(log(N)) rather than O(N) allocations as in
your code that added a constant to the size. Of course, some storage is
wasted (that /might/ be retrieved by a final realloc down to the final
size) but that's rarely significant.
Let's assume for the moment that the allocations have a semi-normal distribution,
with negative values disallowed. Now ignoring the first few
values, if we have allocated, say, 1K, we ought to be able to predict the value by integrating the distribution from 1k to infinity and taking the mean.
In a recent thread realloc() was a substantial part of the discussion. "Occasionally" the increased data storage will be relocated along
with the previously stored data. On huge data sets that might be a performance factor. Is there any experience or are there any concrete
factors about the conditions when this relocation happens? - I could
imagine that it's no issue as long as you're in some kB buffer range,
but if, say, we're using realloc() to substantially increase buffers
often it might be an issue to consider. It would be good to get some
feeling about that internal.
Janis
In a recent thread realloc() was a substantial part of the discussion. "Occasionally" the increased data storage will be relocated along
with the previously stored data. On huge data sets that might be a performance factor. Is there any experience or are there any concrete
factors about the conditions when this relocation happens? - I could
imagine that it's no issue as long as you're in some kB buffer range,
but if, say, we're using realloc() to substantially increase buffers
often it might be an issue to consider. It would be good to get some
feeling about that internal.
On 17/06/2024 10:55, Ben Bacarisse wrote:
Malcolm McLean <[email protected]> writes:We have a continuously growing buffer, and we want the best strategy for reallocations as the stream of characters comes at us. So, given we now how many characters have arrived, can we predict how many will arrive, and therefore ask for the best amount when we reallocate, so that we neither
On 17/06/2024 10:18, Ben Bacarisse wrote:What is "it"?
Janis Papanagnou <[email protected]> writes:So can we work it out?
In a recent thread realloc() was a substantial part of the discussion. >>>>> "Occasionally" the increased data storage will be relocated alongThere is obviously a cost, but there is (usually) no alternative if
with the previously stored data. On huge data sets that might be a
performance factor. Is there any experience or are there any concrete >>>>> factors about the conditions when this relocation happens? - I could >>>>> imagine that it's no issue as long as you're in some kB buffer range, >>>>> but if, say, we're using realloc() to substantially increase buffers >>>>> often it might be an issue to consider. It would be good to get some >>>>> feeling about that internal.
contiguous storage is required. In practice, the cost is usually
moderate and can be very effectively managed by using an exponential
allocation scheme: at every reallocation multiply the storage space by >>>> some factor greater than 1 (I often use 3/2, but doubling is often used >>>> as well). This results in O(log(N)) rather than O(N) allocations as in >>>> your code that added a constant to the size. Of course, some storage is >>>> wasted (that /might/ be retrieved by a final realloc down to the final >>>> size) but that's rarely significant.
Let's assume for the moment that the allocations have a semi-normalWhat allocations? The allocations I talked about don't have that
distribution,
distribution.
with negative values disallowed. Now ignoring the first fewI have no idea what you are talking about. What "value" are you looking
values, if we have allocated, say, 1K, we ought to be able to predict the >>> value by integrating the distribution from 1k to infinity and taking the >>> mean.
to calculate?
make too many reallocation (reallocate on every byte received) or ask for
too much (demand SIZE_MAX memory when the first byte is received).?
Your strategy for avoiding these extremes is exponential growth.
You
allocate a small amount for the first few bytes. Then you use exponential growth, with a factor of ether 2 or 1.5. My question is whether or not we
can be cuter. And of course we need to know the statistical distribution of the input files. And I'm assuming a semi-normal distribution, ignoring the files with small values, which we will allocate enough for anyway.
And so we integrate the distribution between the point we are at and infinity. Then we tkae the mean. And that gives us a best estimate of how many bytes are to come, and therefore how much to grow the buffer by.
We have a continuously growing buffer, and we want the
best strategy for reallocations as the stream of
characters comes at us. So, given we now how many
characters have arrived, can we predict how many will
arrive,
and therefore ask for the best amount when we reallocate,
so that we neither make too many reallocation (reallocate
on every byte received) or ask for too much (demand
SIZE_MAX memory when the first byte is received).?
Your strategy for avoiding these extremes is exponential
growth. You allocate a small amount for the first few
bytes. Then you use exponential growth, with a factor of
ether 2 or 1.5.
And so we integrate the distribution between the point we
are at and infinity. Then we tkae the mean. And that gives
us a best estimate of how many bytes are to come, and
therefore how much to grow the buffer by.
We have a continuously growing buffer, and we want the
best strategy for reallocations as the stream of
characters comes at us. So, given we now how many
characters have arrived, can we predict how many will
arrive, and therefore ask for the best amount when we
reallocate, so that we neither make too many
reallocation (reallocate on every byte received) or ask
for too much (demand SIZE_MAX memory when the first byte
is received).?
Obviously not, or we'd use the prediction.
Your strategy for avoiding these extremes is exponential
growth.
It's odd to call it mine. It's very widely know and used.
"The one I mentioned" might be less confusing description.
Malcolm McLean <[email protected]> writes:
On 17/06/2024 10:55, Ben Bacarisse wrote:
Malcolm McLean <[email protected]> writes:We have a continuously growing buffer, and we want the best strategy for
On 17/06/2024 10:18, Ben Bacarisse wrote:What is "it"?
Janis Papanagnou <[email protected]> writes:So can we work it out?
In a recent thread realloc() was a substantial part of the discussion. >>>>>> "Occasionally" the increased data storage will be relocated alongThere is obviously a cost, but there is (usually) no alternative if
with the previously stored data. On huge data sets that might be a >>>>>> performance factor. Is there any experience or are there any concrete >>>>>> factors about the conditions when this relocation happens? - I could >>>>>> imagine that it's no issue as long as you're in some kB buffer range, >>>>>> but if, say, we're using realloc() to substantially increase buffers >>>>>> often it might be an issue to consider. It would be good to get some >>>>>> feeling about that internal.
contiguous storage is required. In practice, the cost is usually
moderate and can be very effectively managed by using an exponential >>>>> allocation scheme: at every reallocation multiply the storage space by >>>>> some factor greater than 1 (I often use 3/2, but doubling is often used >>>>> as well). This results in O(log(N)) rather than O(N) allocations as in >>>>> your code that added a constant to the size. Of course, some storage is >>>>> wasted (that /might/ be retrieved by a final realloc down to the final >>>>> size) but that's rarely significant.
Let's assume for the moment that the allocations have a semi-normalWhat allocations? The allocations I talked about don't have that
distribution,
distribution.
with negative values disallowed. Now ignoring the first fewI have no idea what you are talking about. What "value" are you looking >>> to calculate?
values, if we have allocated, say, 1K, we ought to be able to predict the >>>> value by integrating the distribution from 1k to infinity and taking the >>>> mean.
reallocations as the stream of characters comes at us. So, given we now how >> many characters have arrived, can we predict how many will arrive, and
therefore ask for the best amount when we reallocate, so that we neither
make too many reallocation (reallocate on every byte received) or ask for
too much (demand SIZE_MAX memory when the first byte is received).?
Obviously not, or we'd use the prediction. You question was probably rhetorical, but it didn't read that way.
Your strategy for avoiding these extremes is exponential growth.
It's odd to call it mine. It's very widely know and used. "The one I mentioned" might be less confusing description.
You
allocate a small amount for the first few bytes. Then you use exponential
growth, with a factor of ether 2 or 1.5. My question is whether or not we
can be cuter. And of course we need to know the statistical distribution of >> the input files. And I'm assuming a semi-normal distribution, ignoring the >> files with small values, which we will allocate enough for anyway.
And so we integrate the distribution between the point we are at and
infinity. Then we tkae the mean. And that gives us a best estimate of how
many bytes are to come, and therefore how much to grow the buffer by.
I would be surprised if that were worth the effort at run time. A
static analysis of "typical" input sizes might be interesting as that
could be used to get an estimate of good factors to use, but anything
more complicated than maybe a few factors (e.g. doubling up to 1MB then
3/2 thereafter) is likely to be too messy to useful.
Also, the cost of reallocations is not constant. Larger ones are
usually more costly than small ones, so if one were going to a lot of
effort to make run-time guesses, that cost should be factored in as
well.
In a recent thread realloc() was a substantial part of the discussion. >"Occasionally" the increased data storage will be relocated along
with the previously stored data. On huge data sets that might be a >performance factor. Is there any experience or are there any concrete
factors about the conditions when this relocation happens? - I could
imagine that it's no issue as long as you're in some kB buffer range,
but if, say, we're using realloc() to substantially increase buffers
often it might be an issue to consider. It would be good to get some
feeling about that internal.
On 17/06/2024 10:55, Ben Bacarisse wrote:
Malcolm McLean <[email protected]> writes:
We have a continuously growing buffer,
I have no idea what you are talking about. What "value" are you looking
to calculate?
On 17/06/2024 10:18, Ben Bacarisse wrote:
Janis Papanagnou <[email protected]> writes:So can we work it out?
In a recent thread realloc() was a substantial part of the discussion.
"Occasionally" the increased data storage will be relocated along
with the previously stored data. On huge data sets that might be a
performance factor. Is there any experience or are there any concrete
factors about the conditions when this relocation happens? - I could
imagine that it's no issue as long as you're in some kB buffer range,
but if, say, we're using realloc() to substantially increase buffers
often it might be an issue to consider. It would be good to get some
feeling about that internal.
There is obviously a cost, but there is (usually) no alternative if
contiguous storage is required. In practice, the cost is usually
moderate and can be very effectively managed by using an exponential
allocation scheme: at every reallocation multiply the storage space by
some factor greater than 1 (I often use 3/2, but doubling is often used
as well). This results in O(log(N)) rather than O(N) allocations as in
your code that added a constant to the size. Of course, some storage is
wasted (that /might/ be retrieved by a final realloc down to the final
size) but that's rarely significant.
Let's assume for the moment that the allocations have a semi-normal distribution, with negative values disallowed. Now ignoring the first
few values, if we have allocated, say, 1K, we ought to be able to
predict the value by integrating the distribution from 1k to infinity
and taking the mean.
Janis Papanagnou <[email protected]> writes:
In a recent thread realloc() was a substantial part of the
discussion. "Occasionally" the increased data storage will be
relocated along with the previously stored data. On huge data sets
that might be a performance factor. Is there any experience or are
there any concrete factors about the conditions when this relocation >happens? - I could imagine that it's no issue as long as you're in
some kB buffer range, but if, say, we're using realloc() to
substantially increase buffers often it might be an issue to
consider. It would be good to get some feeling about that internal.
I've not found a use for realloc in the last forty five years, myself.
I suspect that the performance issues are not an issue for relatively
small datasets, and are often exhibited during the non-performance
critical 'setup' phase of an algorithm.
On Mon, 17 Jun 2024 16:50:07 GMT
[email protected] (Scott Lurndal) wrote:
Janis Papanagnou <[email protected]> writes:
In a recent thread realloc() was a substantial part of the
discussion. "Occasionally" the increased data storage will be
relocated along with the previously stored data. On huge data sets
that might be a performance factor. Is there any experience or are
there any concrete factors about the conditions when this relocation
happens? - I could imagine that it's no issue as long as you're in
some kB buffer range, but if, say, we're using realloc() to
substantially increase buffers often it might be an issue to
consider. It would be good to get some feeling about that internal.
I've not found a use for realloc in the last forty five years, myself.
Did you find use for std::vector:resize()?
If yes, that could be major reason behind not finding use for realloc(). >Another possible reason is coding for environments where dynamic
allocation either not used at all or used only during start up.
Ben Bacarisse to Malcolm McLean:
[next is a comment from Malcolm]
Your strategy for avoiding these extremes is exponential
growth.
It's odd to call it mine. It's very widely know and used.
"The one I mentioned" might be less confusing description.
I think it is a modern English idiom, which I dislike as
well. StackOverflow is full of questions starting like:
"How do you do this?" and "How do I do that?" They are
informal ways of the more literary "How does one do this?"
or "What is the way to do that?"
In a recent thread realloc() was a substantial part of the discussion. >"Occasionally" the increased data storage will be relocated along
with the previously stored data. On huge data sets that might be a >performance factor. Is there any experience or are there any concrete
factors about the conditions when this relocation happens? - I could
imagine that it's no issue as long as you're in some kB buffer range,
but if, say, we're using realloc() to substantially increase buffers
often it might be an issue to consider. It would be good to get some
feeling about that internal.
Janis
[cross-posted to: ci.stat.math]
Malcolm McLean:
We have a continuously growing buffer, and we want the
best strategy for reallocations as the stream of
characters comes at us. So, given we now how many
characters have arrived, can we predict how many will
arrive,
Do you mean in the next bunch, or in total (till the end of
the buffer's lifetime)?
and therefore ask for the best amount when we reallocate,
so that we neither make too many reallocation (reallocate
on every byte received) or ask for too much (demand
SIZE_MAX memory when the first byte is received).?
Your strategy for avoiding these extremes is exponential
growth. You allocate a small amount for the first few
bytes. Then you use exponential growth, with a factor of
ether 2 or 1.5.
This strategy ensures a constant ratio between the amount of
reallocated data to the length of the buffer by making
reallocations less frequent as the buffer grows.
And so we integrate the distribution between the point we
are at and infinity. Then we tkae the mean. And that gives
us a best estimate of how many bytes are to come, and
therefore how much to grow the buffer by.
You have an apriori distribution of the buffer size (can be
tracked on-the-fly, if unknown beforehand) and a partially
filled buffer. The task is to calculate the a-posteriori
distribution of that buffer's final size, and then to
allocate the predicted value based on a good percentile.
How about using a percentile instead of the mean, e.g. if
the current size corresponds to percentile p, you allocate a
capacity corresponding to percentile 1-(1-p)/k , where k>1
denotes the balance between space and time efficency. For
example, if the 60th percentile of the buffer is required
and k=2, you allocate a capacity sufficient to hold
100-(100-60)/2=80% of buffers.
[cross-posted to: ci.stat.math]
Malcolm McLean:
We have a continuously growing buffer, and we want the
best strategy for reallocations as the stream of
characters comes at us. So, given we now how many
characters have arrived, can we predict how many will
arrive,
Do you mean in the next bunch, or in total (till the end of
the buffer's lifetime)?
we can work out, given that a file is at least N
characters, what is the prbablity that an allocation of
any size will contain the whole file, and how many bytes,
on average will be wasted.
No. We have to have some knowledge. And what we probaby know is that the input is a file stored on someone's personal computer. And someone has published on the statistical distribution of such files
Malcolm McLean <[email protected]> writes:
No. We have to have some knowledge. And what we probaby know is that the
input is a file stored on someone's personal computer. And someone has
published on the statistical distribution of such files
That's not the case that matters (to me at least). If the input is a
file, we have a much better way of "guessing" the size than guessing and growing -- just ask for the size. Sure, we might need to make
adjustments if the file is changing, but there is always a better
measure than any statistical analysis.
To some extent this seems like a solution in search of a problem.
Growing the buffer exponentially is simple and effective.
On 19/06/2024 17:36, Ben Bacarisse wrote:
Growing the buffer exponentially is simple and effective.
Yes, that's the general way to handle buffers when you don't know what size they should be.
A better solutions for this sort of program is usually, as you say, asking the OS for the file size (there is no standard library function for getting the file size, but it's not hard to do for any realistic target OS). And then for big files, prefer mmap to reading the file into a buffer.
It's only really for unsized "files" such as piped input that you have no
way of getting the size, and then exponential growth is the way to go. Personally, I'd start with a big size (perhaps 10 MB) that is bigger than
you are likely to need in practice, but small enough that it is negligible
on even vaguely modern computers. Then the realloc code is unlikely to be used (but it can still be there for completeness).
We have to have some knowledge. And what we probaby know
is that the input is a file stored on someone's personal
computer. And someone has published on the statistical
distribution of such files And they have a log-normal
distribution with a mean and a median which he gives. So
with that informaton, we can work out, given that a file
is at least N characters, what is the prbablity that an
allocation of any size will contain the whole file, and
how many bytes, on average will be wasted.
David Brown <[email protected]> writes:
On 19/06/2024 17:36, Ben Bacarisse wrote:
Growing the buffer exponentially is simple and effective.
Yes, that's the general way to handle buffers when you don't know what size >> they should be.
A better solutions for this sort of program is usually, as you say, asking >> the OS for the file size (there is no standard library function for getting >> the file size, but it's not hard to do for any realistic target OS). And
then for big files, prefer mmap to reading the file into a buffer.
It's only really for unsized "files" such as piped input that you have no
way of getting the size, and then exponential growth is the way to go.
Personally, I'd start with a big size (perhaps 10 MB) that is bigger than
you are likely to need in practice, but small enough that it is negligible >> on even vaguely modern computers. Then the realloc code is unlikely to be
used (but it can still be there for completeness).
There are other uses that have nothing to do with files.
I have a small
dynamic array library (just a couple of function) that I use for all
sorts of things. I can read a file or parse tokens or input a line just
by adding characters. Because of its rather general use, I don't start
with a large buffer (though the initial size can be set).
realloc() is just a convenience funciton. Usually the reallocation
can't happen in-place and a second malloc() followed by a copy and
a free() does the same.
For large data it would be nice if the pages being deallocated later
would be incrementally marked as discardable after copying a portion.
This would result in only a small portion of additional physical
memory being allocated since the newly allocated pages become asso-
ciated with phyiscal pages when they're touched first. Windows has VirtualAlloc() with MEM_RESET for that, Linux has madvise() with MADV_DONTNEED.
Usually you don't resize the block with a few bytes ...
Something else that occurs to me: If a shrinking realloc() never fails
in practice, then any code you write to handle a failure won't be
tested.
On 18/06/2024 08:09, Tim Rentsch wrote:
Anton Shepelev <anton.txt@g{oogle}mail.com> writes:
Ben Bacarisse to Malcolm McLean:
[next is a comment from Malcolm]
Your strategy for avoiding these extremes is exponential
growth.
It's odd to call it mine. It's very widely know and used.
"The one I mentioned" might be less confusing description.
I think it is a modern English idiom, which I dislike as
well. StackOverflow is full of questions starting like:
"How do you do this?" and "How do I do that?" They are
informal ways of the more literary "How does one do this?"
or "What is the way to do that?"
I have a different take here. First the "your" of "your
strategy" reads as a definite pronoun, meaning it refers
specifically to Ben and not to some unknown other party.
(And incidentally is subtly insulting because of that,
whether it was meant that way or not.)
Second the use of "you" to mean an unspecified other person
is not idiom but standard usage. The word "you" is both a
definite pronoun and an indefinite pronoun, depending on
context. The word "they" also has this property. Consider
these two examples:
The bank downtown was robbed. They haven't been caught
yet.
They say the sheriff isn't going to run for re-election.
In the first example "they" is a definite pronoun, referring
to the people who robbed the bank. In the second example,
"they" is an indefinite pronoun, referring to unspecified
people in general (perhaps but not necessarily everyone).
The word "you" is similar: it can mean specifically the
listener, or it can mean generically anyone in a broader
audience, even those who never hear or read the statement
with "you" in it.
The word "one" used as a pronoun is more formal, and to me
at least often sounds stilted. In US English "one" is most
often an indefinite pronoun, either second person or third
person. But "one" can also be used as a first person
definite pronoun (referring to the speaker), which an online
reference tells me is chiefly British English. (I would
guess that this usage predominates in "the Queen's English"
dialect of English, but I have very little experience in
such things.)
Finally I would normally read "I" as a first person definite
pronoun, and not an indefinite pronoun. So I don't have any
problem with someone saying "how should I ..." when asking
for advice. They aren't asking how someone else should ...
but how they should ..., and what advice I might give could
very well depend on who is doing the asking.
Ben said
Restore snipped Ben upthread
"In practice, the cost is usually moderate and can be very
effectively managed by using an exponential allocation scheme: at
every reallocation multiply the storage space by some factor greater
than 1 (I often use 3/2, but doubling is often used as well)."
So it's open and shut, and no two ways about it. Ben's strategy is exponential growth. And to be fair I use that strategy myself in
functions like fslutp(). It's only not Ben's strategy if we mean to
imply that Ben was the first person to use expoential growth, or the
first to understand the mathematical implications, and of course
that's not the case. It was all worked out by Euler long before any
of us were born. [...]
You have an annoying habit. Your writing often comes across as
authoritarian and somewhat condescending. Furthermore you tend not
to listen very well.
On 24/06/2024 12:40, David Brown wrote:
Of course such treatment is not appropriate for all allocations (orBaby X has bbx_malloc() which is guaranteed never to return NULL, and
other functions that could fail). But often I think it is better to
write clearer and fully testable (and tested!) code which ignores
hypothetical errors, rather than some of the untestable and untested
jumbles that are sometimes seen in an attempt to "handle" allocation
failures.
never to return a pointer to an allocation which cannot be indexed by an
int.
Have you ever known a non-pathological malloc() to fail?
Baby X has bbx_malloc() which is guaranteed never to return NULL ...
Lawrence D'Oliveiro <[email protected]d> writes:
The usual way I use realloc is to maintain separate counts of the
number of array elements I have allocated, and the number I am actually
using. A realloc call is only needed when the latter hits the former.
Every time I call realloc, I will extend by some minimum number of
array elements (e.g. 128), roughly comparable to the sort of array size
I typically end up with.
And then when the structure is complete, I do a final realloc call to
shrink it down so the size is actually that used. Is it safe to assume
such a call will never fail? Hmm ...
It's not safe to assume that a shrinking realloc call will never fail.
It's possible that it will never fail in any existing implementation,
but the standard makes no such guarantee.
...
Having said all that, if realloc fails (indicated by returning a null pointer), you still have the original pointer to the object.
Test this code with your Linux installation. For my installationTry allocating a bunch of little items, and looking at where they are.
glibc does all realloc()ations in-place. Really surprising for me.
#include <stdio.h>
#include <stdlib.h>
int main()
{
void *p = malloc( 0x100000000 );
printf( "%p\n", p );
p = realloc( p, 1 );
printf( "%p\n", p );
malloc( 0x100000000 - 0x10000 );
p = realloc( p, 0x100000000 );
printf( "%p\n", p );
}
Lawrence D'Oliveiro <[email protected]d> writes:
On Mon, 24 Jun 2024 02:55:39 -0700, Keith Thompson wrote:
Lawrence D'Oliveiro <[email protected]d> writes:
The usual way I use realloc is to maintain separate counts of the
number of array elements I have allocated, and the number I am actually >>>> using. A realloc call is only needed when the latter hits the former.
Every time I call realloc, I will extend by some minimum number of
array elements (e.g. 128), roughly comparable to the sort of array size >>>> I typically end up with.
And then when the structure is complete, I do a final realloc call to
shrink it down so the size is actually that used. Is it safe to assume >>>> such a call will never fail? Hmm ...
It's not safe to assume that a shrinking realloc call will never fail.
It's possible that it will never fail in any existing implementation,
but the standard makes no such guarantee.
...
Having said all that, if realloc fails (indicated by returning a null
pointer), you still have the original pointer to the object.
In other words, it’s safe to ignore any error from that last shrinking
realloc? That’s good enough for me. ;)
What? No, that's not what I said at all.
Suppose you do something like:
some_type *p = malloc(BIG_VALUE);
// ...
p = realloc(p, SMALL_VALUE);
If the realloc() succeeds and doesn't relocate and copy the object,
you're fine. If realloc() succeeds and *does* relocate the object, p
still points to memory that has now been deallocated, and you don't have
a pointer to the newly allocated memory. If realloc() fails, it returns
a null pointer, but the original memory is still valid -- but again, the assignment clobbers your only pointer to it.
I presume you can write code that handles all three possibilities, but
you can't just ignore any errors.
Test this code with your Linux installation. For my installation
glibc does all realloc()ations in-place. Really surprising for me.
#include <stdio.h>
#include <stdlib.h>
int main()
{
void *p = malloc( 0x100000000 );
printf( "%p\n", p );
p = realloc( p, 1 );
printf( "%p\n", p );
malloc( 0x100000000 - 0x10000 );
p = realloc( p, 0x100000000 );
printf( "%p\n", p );
}
Am 25.06.2024 um 09:06 schrieb Lawrence D'Oliveiro:
I wrote a memory-hog app for Android once, and found that allocating
large amounts of memory space had very little impact on the system.
Then when I added code to actually write data into those allocated
pages, that’s when it really started to break into a sweat ...
Then android is also doing overcommit.
The interesting part is that after doing the first realloc()
the memory being freee isn't reused for the next malloc().
Suppose you do something like:
some_type *p = malloc(BIG_VALUE);
// ...
p = realloc(p, SMALL_VALUE);
... If realloc() succeeds and *does* relocate the object, p
still points to memory that has now been deallocated, and you don't have
a pointer to the newly allocated memory.
On Mon, 24 Jun 2024 02:55:39 -0700, Keith Thompson wrote:...
Having said all that, if realloc fails (indicated by returning a null
pointer), you still have the original pointer to the object.
In other words, it’s safe to ignore any error from that last shrinking realloc? That’s good enough for me. ;)
Suppose you do something like:
some_type *p = malloc(BIG_VALUE);
// ...
p = realloc(p, SMALL_VALUE);
If the realloc() succeeds and doesn't relocate and copy the object,
you're fine. If realloc() succeeds and *does* relocate the object, p
still points to memory that has now been deallocated, and you don't have
a pointer to the newly allocated memory. ...
Here are some real stats on file sizes, in case anone is interested.
Data set, / OS Log-normal median & mean, Arithmetic mean, 50% occupied
by (< mean)
whole data set, 9.0 KB, 730 KB, 1.5 MB < 5.4 KB
Mac OS 8.0 KB, 533 KB, 1.4 MB < 4.9 KB
Windows 11.5 KB, 1.0 MB, 1.7 MB < 8.3 KB
GNU/Linux 10.8 KB, 1.7MB, 2.2 MB < 4.8 KB
https://www.researchgate.net/publication/353066615_How_Big_Are_Peoples%27_Computer_Files_File_Size_Distributions_Among_User-managed_Collections
[cross-posted to: ci.stat.math]
On Mon, 17 Jun 2024 18:02:49 +0300, Anton Shepelev <anton.txt@g{oogle}mail.com> wrote:
[cross-posted to: ci.stat.math]Anton,
The post being responded to was originally to comp.lang.c
which I don't subscribe to.
I have a question that I suppose reflects on my news source,
GigaNews, or else on my reader, Forte Agent.
Was this thread something posted 15 or 20 years ago?
I tried to call up the original post by clicking on the Message
ID when looking at headers; nothing comes up when Agent goes
online to look. The header shows multiple earlier messages;
none of them come up for me.
My clicking on Message ID works elsewhere. The logical and
simple explanation is that this is a thread old enough that
GigaNews does not have it.
I suppose that someone else might be able to tell me, if their
supplier goes back further or if GigaNews is somehow failing
to show me something that is recent.
On 29/06/2024 01:14, Lawrence D'Oliveiro wrote:
On Tue, 18 Jun 2024 11:46:36 +0100, Malcolm McLean wrote:You don't need error bars becuase those fugures indicate a
Here are some real stats on file sizes, in case anone is interested.I don’t see any error bars. Without those, it hard to attach any
Data set, / OS Log-normal median & mean, Arithmetic mean, 50% occupied
by (< mean)
whole data set, 9.0 KB, 730 KB, 1.5 MB < 5.4 KB
Mac OS 8.0 KB, 533 KB, 1.4 MB < 4.9 KB
Windows 11.5 KB, 1.0 MB, 1.7 MB < 8.3 KB
GNU/Linux 10.8 KB, 1.7MB, 2.2 MB < 4.8 KB
https://www.researchgate.net/publication/353066615_How_Big_Are_Peoples%27_Computer_Files_File_Size_Distributions_Among_User-managed_Collections
significance to the differences in figures.
distribution. The file are log-normally distributed wth given means
and median. So the spread is part of that data.
Rich Ulrich <[email protected]> writes:
On Mon, 17 Jun 2024 18:02:49 +0300, Anton Shepelev
<anton.txt@g{oogle}mail.com> wrote:
[cross-posted to: ci.stat.math]Anton,
The post being responded to was originally to comp.lang.c
which I don't subscribe to.
I have a question that I suppose reflects on my news source,
GigaNews, or else on my reader, Forte Agent.
Was this thread something posted 15 or 20 years ago?
I tried to call up the original post by clicking on the Message
ID when looking at headers; nothing comes up when Agent goes
online to look. The header shows multiple earlier messages;
none of them come up for me.
My clicking on Message ID works elsewhere. The logical and
simple explanation is that this is a thread old enough that
GigaNews does not have it.
I suppose that someone else might be able to tell me, if their
supplier goes back further or if GigaNews is somehow failing
to show me something that is recent.
The first article in this thread was posted to comp.lang.c by Janis >Papanagnou on 17 Jun 2024.
There were several followups on the same day. The diret parent of your >article was cross-posted to comp.lang.c and sci.stat.math by Anton
Shepelev (his was the first cross-posted article in the thread).
Forte Agent invites me to click on the MID; asks if it is a
mail or MID; asks if it should search the net. It still works
when I test it on an old message in another group.
On 7/2/2024 12:51 AM, Rich Ulrich wrote:
On Mon, 17 Jun 2024 18:02:49 +0300, Anton Shepelev
<anton.txt@g{oogle}mail.com> wrote:
[cross-posted to: ci.stat.math]Anton,
The post being responded to was originally to comp.lang.c
which I don't subscribe to.
I have a question that I suppose reflects on my news source,
GigaNews, or else on my reader, Forte Agent.
Was this thread something posted 15 or 20 years ago?
I tried to call up the original post by clicking on the Message
ID when looking at headers; nothing comes up when Agent goes
online to look. The header shows multiple earlier messages;
none of them come up for me.
My clicking on Message ID works elsewhere. The logical and
simple explanation is that this is a thread old enough that
GigaNews does not have it.
I suppose that someone else might be able to tell me, if their
supplier goes back further or if GigaNews is somehow failing
to show me something that is recent.
MID: <v4ojs8$gvji$[email protected]>
http://al.howardknight.net/
That gives this URL, as a copy of the message kicking off the thread.
http://al.howardknight.net/?STYPE=msgid&MSGI=%3Cv4ojs8%24gvji%241%40dont-email.me%3E
Some USENET News clients can work from the MID directly, but Thunderbird does not.
A bare MID does not work for everyone.
On Tue, 02 Jul 2024 11:52:56 -0400, Rich Ulrich
<[email protected]> wrote:
Forte Agent invites me to click on the MID; asks if it is a
mail or MID; asks if it should search the net. It still works
when I test it on an old message in another group.
Now it occurs to me -- It actually does make sense,
economically, if what is searched online is limited the
groups I subscribe to, or (even) only the group that
is currently active.
Now that Google Groups is no longer connected to USENET,...
that's one fewer places with a decent-sized archive. Google closed
their service, after the ThaiSpam incident.
Now that Google Groups is no longer connected to USENET,
that's one fewer places with a decent-sized archive. Google closed
their service, after the ThaiSpam incident.
The file are log-normally distributed wth given means and median. So the spread is part of that data.
p0 = 1-e^(L*x0) ,
p1 = 1-e^(L*x1) ,
x1 = k*x0 (by our strategy), =>
p1 = 1-(1-p0)^k .
which does not depend on the distribution and lets us
generalise this approach for any distribution:
x1 = Q( 1 - ( 1 - CDF(x0) )^k )
where:
x0 : the required size
x1 : the new recommended capacity
Q(p) : the p-Quantile of the given distribution
CDF(x): the CDF of the given distribution
k>1 : balance between speed and space efficiency
Thanks, so it looks like a failure by GigaNews to retrieve
the recent posts.
I did see a bunch of cross-posted followups. I don't know
C, and I thought there could be more context.
Looking at original, absent posts is something I've done
dozens of times over the years. Never a problem, except
fora time or two with posts from the 1990s. I've used
GigaNews since my regular ISP stopped providing Usenet
access, maybe 15 years ago.
Rich Ulrich:
Thanks, so it looks like a failure by GigaNews to retrieve
the recent posts.
I did see a bunch of cross-posted followups. I don't know
C, and I thought there could be more context.
Today, was looking at All Desks (Agent terminology) to see
what was in Sent, and I noticed there were messages in
Inbox -- Those were the messages that I thought Agent had
failed to retrieve. (Nice discussion there.)
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 714 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 141:12:10 |
| Calls: | 12,087 |
| Files: | 14,998 |
| Messages: | 6,517,442 |