Non-LLM example where we do not in practice use original training data
From
Sam Hartman@21:1/5 to
All on Mon May 5 22:30:01 2025
I think many of us modify machine learning models on a regular basis.
And I think when we make those modifications, we do not go back to
original training data, but instead, we modify the model weights.
I suspect I am not the only one who uses rspamd and who uses both the
Bayesian classifier and the neural network classifier, both of which are machine learning models.
My point here is that there a common case where the preferred form of modification for a model definitely is not the original training data.
Some people on the list probably do retain all the messages they submit
for learning.
I know I do not.
(I retain a significant subset and probably could reproduce something if
I had to.)
If I wanted to package up my classifier state and distribute it under a
free software license, I think it should be DFSG free.
I think that to satisfy the DFSG I would need to include all the
training data I still had and any scripts I used.
But I think in that circumstance the model weights would be a reasonable preferred form of modification.
If the way I responded to bug reports was to manually run messages
through rspamc, I think that ought to be DFSG free based on decisions we
have made in similar circumstances in the past.
I appreciate that coming up with a classifier state that was generic
enough to be valuable to package in Debian would be difficult. However,
I think this serves as an example we can all get our heads around to see
that in practice, real users do often use model weights as the preferred
form of modification.
-----BEGIN PGP SIGNATURE-----
iHUEARYKAB0WIQSj2jRwbAdKzGY/4uAsbEw8qDeGdAUCaBkfLwAKCRAsbEw8qDeG dM1yAP9NHf1eGblwJrrL9uyaKBJkx6tPN2xln4zdXonKMabhrgEA543m+ufiKjSX ZJ1F5gk9rMQ8x54rkjCRo2jtOLu4JAU=
=Byrs
-----END PGP SIGNATURE-----
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)