Richard Fateman <
[email protected]> wrote:
On
Again, apologies for a delayed response, but thanks for your long note!!!
My comments are interspersed.
rjf
Monday, February 7, 2022 at 6:32:30 PM UTC-8, [email protected] wrote:
Richard Fateman <[email protected]> wrote:
On Sunday, January 2, 2022 at 6:40:15 AM UTC-8, [email protected] wrote:
There is also another possible problem: if code is written as
generic Lisp arithmetic, than for double precision it will be
much slower than specialised one. So you really want several
versions of code: real double precision, complex double precision
and separate arbitrary precision version(s) (you can probably use
single version for real and complex arbitrary precision, but
it may be simpler to have 2 arbitrary precision versions) and
possibly also single precision version (ATM in FriCAS there is
no support for single precision).
This is certainly true, and I think it is a basis for the Julia fans to say they can
make programs run faster by compiling -- on the fly -- specialized code for particular
types. In lisp, there is certainly the option to compile multiple versions of the same
procedure but with different type declarations. So I don't see that as a problem;
more like an opportunity.
Well, the point is that get good performance you need to do more
work than mechanical translation.
Other code is potentially called as foreign function libraries (e.g. Gnu MPFR). I suppose that
any of the packages linked to (say) Sage or Julia could be called from Lisp, since there are
existing interfaces to C and Python. I don't know if anyone has done this, but again to be part
of the Maxima distribution it would have to be platform (hardware, software) agnostic.
So are these "special"? I don't know for sure, but I think there are certainly not dated.
Well, what you mention above are things outside core. If you think
that they are important, than you should really go with "new" systems, AFAICS they have this part much better than Maxima.
I doubt it.
What you mean by "it"? That "new" systems have better non-core
parts?
If all you want to do is write a C or Python program to do bignum arithmetic, and you
are proficient in those languages, that might be better than using Maxima, especially if you
are unwilling to read the documentation, or download the binary code. (Better to spend
a week in the computer lab than an hour in the library.)
I am not sure what binary code changes here.
3. There is a continuing effort by a host of people who provide fixes, enhancements, and applications
in their own library public repositories. There are educational projects, and wholesale adoption of
Maxima in schools and school districts. There is an active Maxima mailing list.
You basically say that there is inertia: in short time for Maxima
folks it is easier to continue to use Maxima than to switch to
something else. True, inerta is powerful force. But if you
look at number of people behind various systems, Maxima would
look better than FriCAS, but worse than several "new" systems.
How many people are using Maxima? Between 11/1/2021 and 2/1/2022 76,209 people downloaded
the Windows version of the binary. It is hard to know how many Unix versions, since they are included
in distributions not directly from sourceforge.
If you look at improvements what matters is people who engage in developement. AFAICS Sympy has more developers than Maxima. Sage defintely has _much_
more developers. Even Julia symbolics, which up to now is now fairly incomplete, seem to have several developers.
4. If there were a set of standard benchmarks for "core" functionalities, there might be a basis for
testing if Maxima's facilities were dated.
In arxiv preprint 1611.02569 Parisse gives an example of polynomial factorization and proposes special method to solve it. ATM FriCAS
does not do very well on this example (2500s on my machine), but
certainly better than systems that can not do it at all. However,
my estimate is that using up-to-date general methods it should take fraction of second (Magma was reported to need 3s, Parisse special
method 2s). So, I would say that possibly all systems have
some work to do.
Parisse states in his conclusion that he hopes that other open-source CAS will implement
this method. It does not look too difficult to do in Maxima.
My adice is to ignore his method. Instead use state-of-the art general
Hensel lifting. This example is very sparse, general Hensel lifting
should work well also on less sparse examples or even completely dense
ones (of couse cost grows with number of terms). AFAICS potentially problematic part in factorization is factor recombination and
leading coefficient problem. In Parisse approach you need to
solve this several times. In standard Hensel lifting one can
use old Kaltofen trick to limit both problems to essentially
single stage (there is some work in each stage, but it gets
much easier after first stage).
Factorization time has not
been raised as a limiting issue in a computation of interest, that I am aware of. The particular
test example is large.
I do not think the example is large. Similar thing may appear from
rather small problem due to unfortunate choice of parametrization
(the point is that _after_ factoring it may be clear how to
change parametrization to make problem smaller, before it may
be not so clear).
Concerring "limiting issue", existing codes in many cases avoided
factorization due to old myth of "expensive factorization". With
faster factorization you can apply it in more cases. As to
more concrete examples, general testing for algebraic dependence
(in presence of nested roots) depends on factorization, and
AFAICS factorization is limiting ability to handle algebraic
dependencies. In similar spirit, computation of Galois groups
needs factorization. In both cases you need algebraic
coefficients which tend to make factorization much more
costly. If you use Trager method for handling algebraic
coefficients you get _much_ larger polynomial as norm.
I am aware of the testing of indefinite integration of
functions of a single variable, comparing Rubi to various other systems. I have some doubts about
measurements of Maxima, since they are done through the partly-blinded eyes of Sage. I have run
some of the "failed" Maxima tests through Maxima and found they succeed, and indeed find answers
that are simpler and smaller than some of the competition. So I would not judge from this.
Rubi testsute has flaws, but I think that it actually overestimates
Maxima capabilites. Namely, examples in Rubi testsuite are
arranged to be easily matchable to patterns.
I don't understand this comment.
1) Rubi examples are essentially all written in factored form.
Have you (or somebody from Maxima team) tried to expand Rubi
examples before integration? I did not do any systematic
testing of this sort, but in few cases I noted that expanding
integrand turned something that Maxima can do into case that
it can not do.
2) In general there are difficulties due to constants. In Rubi
testuite almost all constants are symbolic parameters. This
means that pattern matcher can easily handle them without need
for additional computations.
3) Difficult (and IMO interesting) case in symbolic integration
involve sums with partial cancellations between terms. Such
sums are especially difficult for pattern machers. Almost
all cases in Rubi testsuite match to single rule or single
chain of rules, so no such difficulty in Rubi testsuite.
That eliminates
work that would be otherwise needed to discover true structure
of integrand. AFAICS Maxima basically assume that integrand
can be handled by simple-minded heuristics.
This describes the first of 3 stages in the integration program. There are two more, described in J. Moses' paper in Comm. ACM. The third stage
is a (partial) implementation of the Risch algorithm.
1) AFAICS "Risch algorithm" in Maxima is so incomplete that really is
just another heuristic. IIUC Maxima completely misses work of Rothstein
which is start of modern "Risch algorithm".
2) Risch algorithm can be extened to special functions. Completeness
of extended algortihm is not clear (polylogs and elliptic integrals
are tricky there), but IMO it should have more coverage than
pattern matchers.
It is possible
that the decomposition of the integrand in a differential field would be
more difficult starting from one form or another, but the integration
program has simplifications such as radcan() at its disposal.
Doing simplifications in the middle of integration really is asking
for trouble. Integrator needs full control of from of integrand,
any transformation should be part of integration process. Doing
otherwise plants bugs.
This is true
for Rubi testsuite, but fails for more general cases.
Already random exp-log examples seem to cause trouble for Maxima.
These probably should be reported as bugs, then.
You have example in thread that I started. Full run was done by
Nassir.
Let me add that working on integration I also test integration
on various systems. My examples are probably harder than
average.
Average? For what population of integration tasks? Examples taken from homework problems in Freshman calculus?
Integration tests that I have seen.
Anyway, it is pretty easy to came with examples
that Maxima or Reduce can not do. Maple is much better
there. And Mathematica is still better.
Finding problems where Mathematica fails is not difficult. Improving Maxima to have
a larger coverage of indefinite integrals is probably not hard either. I have been hoping
that wholesale adoption of Rubi would make this unnecessary!
Well, IMO Rubi (and pattern match integrators in general) is dead end.
More precisely, pattern matchers are good if you have somewhat isolated example(s) that you really want to handle. Axiom used to have
7 patterns for popular special functions. One of the patterns matched "exp(-x^2)" and produced error function. Actually, patterns was
a bit more general and was hooked as "last chance" step to Risch
algorithms so it covered much more forms. Still, it was easy to
fool. AFAICS main reason for pattern was that many folks heard
that "exp(-x^2)" was "impossible" and would try it. System that
would return unevaluated result in such case would make bad
impression compared to other systems. In FriCAS this is replaced
by algorithm. Current implementation is still incomplete, but
AFAICS covers all that pattern could hope to cover and many
things which are tricky for pattern matchers. In current
developement version of FriCAS pattern matching is no longer
used for indefinite integrals.
My point is that with pattern matcher you can not hope for wide
coverage. Rubi currently has about 5000 rules which take more
than 20000 lines. This is almost twice as big as FriCAS integrator.
Yet Rubi to get "finite task" needs to drastically limit possible
froms of integrands.
If some of the newer CAS have "better" core algorithms like -- polynomial multiplication,
polynomial GCD, expansion in Taylor series, it would be interesting to take note, and if so
with substantial likelihood they can be inserted into Maxima, or added in via libraries.
Hmm, you should know about Monagan and Pearce work on polynomial operations. It is now included in Maple and gives significant
speedup on large polynomials.
I am aware of their results and corresponded with them. The point you make is correct...
"speedup on large polynomials". In my careful implementation of their ideas,
the results were a slowdown on all problems that were NOT exceedingly large. And
so not of great interest.
Well, I am not sure how "careful" was your implementation. IMO to
get results comparable to Monagan and Pearce would require substantial
effort, with lot of attention to detail. BTW: AFAICS their data
structures are somewhat hostile to Lisp. I do not see how to get
comparable speed from Lisp implementation. And in (unlikely) case
that you coded in C, there are performance traps in C, so
a lot of care is needed for good performance.
I agree that for CAS smaller polymials are quite important and most
my effort is on small and moderately large polynomial. But I would
not dismiss polynomials as "exceedingly large". In equation
solving large polynomials routinely appear at intermediate stages.
Figuring comparisons of Maple to other systems is
hindered by the miscellaneous programming and data-structure decisions in Maple. Thus programs written in the underlying language (Margay? is that still in use?)
Hmm, all Maple literature that I saw claimed "kernel" written in C.
and in the Maple "user" language run at quite different rates.
Maple user language is interpreted, so clearly slower than C.
Probably comparable in speed to Python...
Thus (only) in Maple
did it make sense to map certain polynomial problems into packed
integer problems. The integer computations were in the core, the polynomial problems were not. I don't know if this is still a prevalent feature of
the current Maple.
IIUC for long time they have univariate polynomials in core. OTOH
you wrote paper about using GMP for polynomial operations and some
other systems use this method.
There is long cycle of works of
Monagan and collaborators on polynomial GCD-s and factorisation
giving significant speedups compared to earlier methods.
I have expressed my concern above; a heuristic GCD that maps polynomials
into (big) integers is the kind of "optimization" that might work in Maple and
nowhere else. Or maybe it does work ...
See this 1995 paper
https://dl.acm.org/doi/abs/10.1145/220346.220376
I have read this paper long time ago. Concerning heuristic GCD:
FriCAS uses it for univariate polynomials with integer coefficients.
I have compared it with alternatives and is was clearly superior
on my testcases. One reason is that there is widely available
well-optimized implementation of bignum arithmetic, namely GMP.
Compared to that polynomial libraries have simpler and much
less optimized code. But there is also somewhat deeper reason,
namely current computer architecture. At theoretical level
bignum operations and operations on univariate polynomials
modulo small prime can use essentially equivalent methods.
Textbook may prefer to discuss polynomials, as there are less
troubles due to carry. However, carries merely complicate
code with small performace impact. Other operations in
integer version can be quite efficient. OTOH division needed
in modular approach is slow on modern hardware. Modular algorithms
frequently play tricks to reduce cost of division, but it is
still significant compared to costs in bigint arithmetic.
For multivariate cases IMO variation of Zippel is likely to
be fastest.
IIUC
several of them are incuded in Maple. Some variations are
implemented in FriCAS. As an illustartion let me say that
for GCD-s with algebraic coefficients FriCAS previously used
subresultant GCD (claimed to be very good implementation
of subresultant GCD). Switching to modular methods in
style of Monagan and van Hoej gave large speedup. My
estimates suggest that using different modular method
should be much faster. Monagan shows example where
factorization in algebraic extention using older method
is prohibitivly expensive, but newer method works quite
well. In fact, in all papers he gives examples and IMO
it is clear that what he and collaborators implemented
gives significant progress.
I would not endorse an algorithm for general use if it has only been implemented in Maple. It has to be tested on a more neutral
platform.
Normally Monagan gives all data needed to judge proposed
method regardless of your opinion about Maple. In case
of GCD with algebraic coefficient I can say more: implementation
of Monagan and van Hoej method in FriCAS works essentially
as advertised in the paper. In case of factorization with
coefficients in algebraic extention example from the paper
currently is prohibitively expensive in FriCAS, while they
gave quite reasonable runtime for new method (and my estimates
indicate that their method would be significant improvement
for FriCAS).
Some authors give only relative times, that is system specific
and claimed improvement may be due to bad implementation of
old method. But Monagan gives _much_ more info.
I looked a bit at Maxima multivariate factorization code.
AFAICS this is early Wang method trying to use zero as
evaluation point. This is resonably fast if you are lucky.
But it may be quite slow otherwise. In modern time there
seem to be agreement that zero evals should be avoided,
there is too high risk of degraded performance if you try
them.
If there is a better way to get around unlucky evaluation points than
is currently programmed in Maxima, perhaps someone will
implement it.
AFAICS amoing free system FriCAS and Giac already implement
better method (there is much to be improved in FriCAS
factorizer, but this part was done long ago).
Anyway, already in classic case of integer coefficients
Maxima lacks improvement concerning handling harder cases.
For algebraic coefficients IIUC there is problem of
correctness (IIUC some Maxima methods silently produce
wrong results) and no (correct) support for modular
methods.
If you are aware of bugs, it should be easy to report them.
My info is based on public messages from Maxima developers.
IIUC bugs are in Maxima bugtracker.
State of the art approach would use Hensel
lifting directly for algebraic coefficients. And
there is well developed methodology for exploiting
sparsity with non-zero evaluation point. There are
heuristics for early failure and restart (to avoid
hude loss of time due to bad eval). There are new
approaches to leading coefficient problem.
Again, reporting of bugs should be easy.
Well, I reported bugs when I was using Maxima. Concerning
newer methods, it is for Maxima developers to do the
work. IME telling folks "you should implement algortihm X"
is almost useless, either somebody is willing and able and
will do this anyway or thing will be postponed indefinitely.
Of course, one could try to add this to existing routines.
But IMO needed changes are so large that it makes more
sense to start from scratch.
This hardly makes sense to me. Why should you have to write
a new computer algebra system to write an improved
polynomial factoring program?
Well, that is not only factoring but also gcd and other polynomial
routines. And you need support code like linear equations
solver. In all probably between 5-20 kloc. You could
replace existing Maxima polynomial code by new one. My
point was that existing Maxima factorizer (and surrounding
routines) is unlikely to help writing new factorizer.
And my impression is that doing such replacement in Maxima is
harder than is some other systems.
Concerning power series expansions: AFAICS FriCAS
routines are much better than Maxima routines in this
area. Part of this is that FriCAS uses lazy approach.
Part is that FriCAS has special handling for many
cases (so more code, but better performance).
Speed for truncated Taylor series (if that is what you mean)
has not usually be an issue. Maxima has other routines to
compute the general term of a power series (infinite series).
I don't know if FriCAS does that.
1) FriCAS series are lazy, which in particular means that all
terms that are delivered are correct. This is unlike
Maxima style truncation where cancelation may mean that
delivered result is has lower than requested accuracy.
Note: I wrote the above mainly to explain why I am
reluctant to use word "truncated" describing FriCAS
series.
2) In the past I tried expansion of relatively simple
function. IIRC function fit in single line and desired
part of expansion was less than a screen. Maxima needed
5 min to deliver few terms and for me it was clear that
Maxima is unable to deliver desired result in reasonable
time. This was several years ago, so now this part of
Maxima may be faster. But for me it made much more sense
to improve thing that was already good (namely FriCAS) than
try to fix extensive problems with Maxima implementation...
As another example let me mention that recently on Maxima
list there was a guy with system of equations which Maxima
could not solve (IIRC Maxima run out of memory). I tried
this system in FriCAS. After little modification (which
matches stated intent) the system was easily solvable
(few seconds) in FriCAS. Orginal system was harder
(minutes or maybe tens of minutes) but worked too.
On Maxima list advice was: "you do not need to solve
it, answer would be too complicated to see anything".
Well, FriCAS answer was largish and gave me no
enlightment. But sometimes answers are small despite
large computations needed to obtain them.
Yes, sometimes. A system that may be many times faster again than FriCAS
is Fermat. I think that it would be worth testing, if you want to see
how fast is fast.
https://home.bway.net/lewis/
I know about Fermat. Its supposedly best version was available
only as binary without any explantation how it works internally
(later there was source for older version). But I looked
at provided benchmark results and they were not very impressive.
In some cases FriCAS has comparable timings, in some cases
Fermat time was signifiantly better than FriCAS, but worse
than some other system. My conclusion was that there is
nothing interesting to be learned from Fermat. Maybe in the
past Fermat was really faster than other systems, but not
in recent times.
BTW: Core algoritms in FriCAS typically are much faster than
in Axiom era. AFAICS old Axiom algoritms in many cases were
faster than Maple or Mathematica from comparable time. But
Maple and Mathematica (and other systems) are implementing
better methods all the time so any comparison should
really state which versions are compared (and result may
be different for later versions).
For instance, improved algebraic system solving, limits (e.g. D. Gruntz), manipulation of
special functions. The common notion that "Lisp is slow" and "C is fast" and that therefore
doing nothing other than writing in C is a step forward, I think is wrong.
Unfortunately Lisp implementations are slow. IME fastest is SBCL
where normally I can get about half of speed of comparable C code.
So why are people writing in Python, which is (apparently) quite slow?
Well, slow may be better than nothing. In case of Python, IIUC normal
practice is to have speed-critical parts in C++. As I already
wrote, vast majority of Sage code is in external libraries,
mostly in C or C++.
But at Lisp level this code has complete type declarations,
uses specialized array and machine compatible types. SBCL gives
slower code than C simply because SBCL code generator can not
optimize so well as optimizers in C compilers.
I have looked at SBCL compiler generated code, which is easily viewed
by the lisp disassemble function. Complaints about SBCL code
can be sent to the SBCL mailing list.
I am not sure if complaints would serve any useful purpose.
My impression is that SBCL developers have quite good idea
what could be improved. There is simply question of allocating
developement effort. Since they are doing the work, it is
up to them to decide what has priority.
In principle
on such code ECL or GCL should do better because code could be
easily 1-1 translated to C. But somewhat, both ECL and GCL
insert extra operations which kill performance.
Lisp compilers (some of them, anyway) generate assembly language.
Free compilers that I know normally either emit (binary) machine
code (SBCL, Closure CL, Poplog), C (ECL and GCL), Java bytecodes
(ABCL) or bytecode for their own interpreter (Clisp, interpreted
modes of other compilers). Of the above Poplog has code to
emit assembler, but assembly output normally is used for Pop11
and it is not clear if it would work correctly for Common Lisp.
I have never used commercial Lisp compiler, they may have
features not present in free Lisp-s.
Anyway, may point was that ECL and GCL lost advantage that
code generator in gcc should give them, and on FriCAS code
are slower than SBCL.
However, that
was somewhat optimistic, when I really need performance in
C I could tweak code to use say SSE instructions, doing 2
or 4 times more work per instruction. Also, in gcc I have
easy access to high part of product of 64-bit numbers, which
in several problems roughly doubles performance compared to
64-bit product. So, using C in many cases I could get 4-8
times faster code than going via SBCL.
If you wish to write code in assembler, I assume that it can be
linked to Lisp as well.
That is problematic. In gcc I can have say 2-3 critcal lines
in assembler and rest in C. To use SSE instructions it
is enough to tweak few declarations and say to compiler that
target processor supports relevant instructions. Rest is
automatic, gcc code generator knows how to generate SSE
from normal C source. Also, gcc has built-in functions
that at C level look like a function call, but generate
single assembler instruction. So in gcc one can use low
level features with minimal effort.
In SBCL AFAIK there is no documented way to insert assember
instructions in middle of Lisp routine (SBCL developers can
do this, but there is no explantiation what are the rules).
In Closure CL there is reasonably well documented way to
write Lisp-callable assembler routines, but there are
severe restrictions to satisfy garbage collector.
One could write whole routine in assembler (or C) and
call it like external C routine. But this has overhead
of inter-language call (in Closure CL forces copying of
paramenters and may force system call to adjust environment).
The point is that one can not limit low-level aspect to
tiny part, but one is forced to move larger part of code
to low level, partially because low level routine must
do enough work to amortize cost of call.
Another point is that using C one can get low-level benefits
working at much higher level.
Recently I needed
to work with 2-dimensional arrays. To my dismay I discovered
that my code was much slower than I hoped, about 6 or 8
times slower than comparable C. It turned out that SBCL
(at least version that I used) was unable to optimize
indexing for 2-dimensional arrays and dully generated code
repeatedly fetching dimension from array object and
performing multiplication implied by indexing...
Another thing is that Lisp encourages writing "generic" code,
which works for several types. That is nice for non-critical
code, but there is rather large slowdown compared to well-typed
code with enough type info.
There is an advantage to generic code that works. Then you add
declarations to make it faster. There is an argument that type
declarations make it easier to debug. Maybe. I think that in some
cases the type declarations ARE the bug.
Well, I write generic code when appropriate. But when there is
performance critical part you want it fast. And fast usually
requires specialized code.
Concerning type declarations, Lisp type declarations are error
prone. Since compiler does not check them program may have
latent bug which only surfaces later. In SBCL if you declare
variable type you need to initialize the variable with value
of correct type. This is problematic, as default initalization
is nil. For example, several polynomial routines in Mock-MMA
failed in this way.
In FriCAS you can write generic code using appropriate generic
types. In such case FriCAS will generate Lisp with no type
declarations. Or you can use specialized type, like SingleInteger
(Lisp fixnum) or DoubleFloat and then FriCAS will generate
specialized operations and Lisp type declarations. There
are places in FriCAS with silly-looking code like
if T is DoubleFloat then
X
else
X
where X is code using type T. Meaning of such snippet is that
when type T happens to be DoubleFloat FriCAS will use
specialized version of code otherwise FriCAS will will
use equivalent generic code.
Concerning types and debugging, I find FriCAS types quite
useful during developement, type errors in early phase of
developement in some cases prevented design bugs that otherwise
would waste much more effort. And type checking prevents
some classes of errors, so I can concentrate testing on
things not expressible via types. Also, type help with
readablility, they explain role/format of arguments reducing
need for comments. Lisp types, since they are not checked
by compiler can not really serve similar purpose.
ATM I am staying with SBCL, but you can see that when it comes
to speed of low level code that Lisp is a problem.
I am not convinced that speed is such a problem. If it were, we would
all be using FERMAT.
As I wrote, I do not think Fermat is that fast. And surely
slow is better than not at all. But I think that CAS should
be serious about speed. And I think that leading systems
take speed seriously.
There are areas where interested programmers could add to
a computer algebra system, and they might consider adding to Maxima; a few I've suggested
include an improved interval arithmetic system, and a way of using an inverse symbolic
calculator (see Stoutemyer's interesting paper https://arxiv.org/abs/2103.16720 )
I am more struck by the fact that "new" CAS have rarely improved on those core
capabilities, rarely moving in interesting directions.
I am not sure what you mean by "new" CAS above.
Any system that is newer than Macsyma circa 1970, or Reduce of that time.
In particular, Mathematica, Maple are new. Even though they are now rather old
compared to (say) Sympy.
Well, I would say that Mathematica and Maple added interesting functionalty compared to Maxima. They have their quirks and additions may be not
the ones you would like most, but they have several IMO interesting
additions. For me functinality in specialized systems like GAP or
Pari is also interesting, part of it found way to more general
systems.
FriCAS has interval arthmetic inherited from Axiom, Sage has something
too. I understand your critique of Mathematica arithmetic, but it
is not clear for me if you want for Maxima something different than
FriCAS interval arthmetic and if yes why it would be better.
Concerning recognizing floats, it seems that methods that work
are based on things like LLL or PSQL. Several systems now have
LLL, so this is definte progress compared to say 1980.
There is bunch of
systems that essentially take view that core functionality does not
matter much or can be delegated to other systems and instead concentrate
on packaging and presentation.
Yes, all the world is a web app.
Sadly, while not new Maxima seem
to exibit eqivalent attitude ("all important core work was dane in seventies").
I don't quite agree. I would say "If your system is a re-implementation of
[continued in next message]
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)