Forum: >>> Magnum BBS <<<

Re: Proposal Alternative: A Model Can Be a Preferred form of Modificati

From Mo Zhou@21:1/5 to Sam Hartman on Mon May 5 21:30:01 2025

Hi Sam,

On 5/5/25 15:12, Sam Hartman wrote:

***Proposal Text***

Choice 2: Software incorporating AI Models Released under DFSG Licenses
free Must provide for
Practical Modification to Comply with DFSG

The project asks those charged with interpreting the DFSG to require
that software incorporating AI models have a preferred form of
modification for the models and that we provide our users the ability to modify these models in order to be included in the main section of the archive. Examples of such a preferred form of modification can include
the original training data for the model. Alternatively, a base model (especially when the base model can be replaced and multiple options are available) along with training data for any fine tuning that has been performed is acceptable. In some cases a model along with necessary
tools to perform incremental fine tuning may be acceptable if doing additional incremental training is actually the approach that the
upstream project uses to modify the model. As with other interpretations
of the DFSG, something cannot be the preferred form of modification if
the upstream of the software under consideration has a more preferred
form of modification that is not public.

Thanks! While I disagree with the proposal -- it only grants the user with
a "partial freedom" instead of "full freedom", the proposal has made a
clear point.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bill Allombert@21:1/5 to All on Mon May 5 22:30:01 2025

Le Mon, May 05, 2025 at 01:12:13PM -0600, Sam Hartman a �crit :

I'm not sure if this is too late. The mail to debian-devel-announce was
kind of late, and I hope there is still some discussion time left.

It is late enough that I am immediately seeking seconds for the
following proposal.
I am also open to wordsmithing if we have time.

If we decide to take more time to think about this issue and build
project consensus, I would be delighted if we did not vote now.

Rationale:

TL;DR: If in practice we are able to modify the software we have, and
the license is DFSG free, then I think we meet DFSG 2 and the software
should be DFSG free.

This proposal extends on the comments I made in https://lists.debian.org/[email protected]

It's been my experience that given the costs of AI training, often the
model itself is the preferred form of modification. I find this
particularly true in the case of LLMs based on my experience over the
last year. I particularly disagree with Russ that doing a full
parameter fine tuning of a model is anything like calling into a
library; to me it seems a lot more like modifying a Smalltalk world or changing a LambdaMoo world and dumping a new core. Even LORA style
retraining looks a lot like the sort of patch files permitted by DFSG 4.
I disagree with those who claim that if we had the original training
data we would choose to start there when we want to modify a model.

Without the original training data, we have no way to know what it
is "inside" the model. The model could generate backdoors and non-free copyrighted material or even more harmful content.

Cheers
--
Bill. <[email protected]>

Imagine a large red swirl here.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sam Hartman@21:1/5 to All on Mon May 5 22:50:01 2025

"Bill" == Bill Allombert <[email protected]> writes:

Bill> Without the original training data, we have no way to know
Bill> what it is "inside" the model. The model could generate
Bill> backdoors and non-free copyrighted material or even more
Bill> harmful content.

And yet we have accepted x86 machine code as the preferred form of modification.
Inspectability (as opposed to preferred form of modification) has never
been at the core of DFSG.
Typically, modifyability has come with some degree of inspectability.

Machine learning models are a case where those two properties split.
And there is sufficient history in my mind that we do not require inspectability the same way we prefer modifyability.

I also think we will start to develop black box inspection tools for
machine learning models, and so the level of inspectability we get with
model weights will improve over time.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sam Hartman@21:1/5 to All on Tue May 6 00:40:01 2025

"Aigars" == Aigars Mahinovs <[email protected]> writes:

Aigars> Another, simpler, alternative would be to vote on the Debian
Aigars> project endorsing
Aigars> https://opensource.org/ai/open-source-ai-definition

Aigars> It basically translates the four freedoms into AI freedoms
Aigars> and introduces "Data Information" as a substitute for
Aigars> (potentially unredistributable) original training data - a
Aigars> description of what data was used for training and how it
Aigars> was acquired and processed. With the key that a sufficiently
Aigars> skilled person should be able to reproduce the data and then
Aigars> the model using this information.

I'd rank that belowe FD, so I would not propose it, but I would rank it
above Choice 1.

Here are my concerns with that definition for Debian:

* The four freedoms do not have any formal role in the DFSG. Pulling
them in here seems like an odd place for Debian to sign onto them.

* The definition refers to OSD rather than DFSG in terms of licenses.

* Data information makes sense to me when talking about base models. But
data information does not guarantee that I can modify a model as part
of a software system in practice. Perhaps in practice it does, but I
would need to think through it more than I have already.

But if you propose that option I will second.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Aigars Mahinovs@21:1/5 to Sam Hartman on Tue May 6 00:20:01 2025

On Mon, 5 May 2025 at 21:13, Sam Hartman <[email protected]> wrote:

***Proposal Text***

Choice 2: Software incorporating AI Models Released under DFSG Licenses
free Must provide for
Practical Modification to Comply with DFSG

The project asks those charged with interpreting the DFSG to require
that software incorporating AI models have a preferred form of
modification for the models and that we provide our users the ability to modify these models in order to be included in the main section of the archive. Examples of such a preferred form of modification can include
the original training data for the model. Alternatively, a base model (especially when the base model can be replaced and multiple options are available) along with training data for any fine tuning that has been performed is acceptable. In some cases a model along with necessary
tools to perform incremental fine tuning may be acceptable if doing additional incremental training is actually the approach that the
upstream project uses to modify the model. As with other interpretations
of the DFSG, something cannot be the preferred form of modification if
the upstream of the software under consideration has a more preferred
form of modification that is not public.

Another, simpler, alternative would be to vote on the Debian project
endorsing https://opensource.org/ai/open-source-ai-definition

It basically translates the four freedoms into AI freedoms and introduces
"Data Information" as a substitute for (potentially unredistributable)
original training data - a description of what data was used for training
and how it was acquired and processed. With the key that a sufficiently
skilled person should be able to reproduce the data and then the model
using this information.

--
Best regards,
Aigars Mahinovs mailto:[email protected]
#--------------------------------------------------------------#
| .''`. Debian GNU/Linux (http://www.debian.org) |
| : :' : |
| `. `' Software Engineer, BMW |
| `- |
#--------------------------------------------------------------#

<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Mon, 5 May 2025 at 21:13, Sam Hartman <<a href="mailto:[email protected]">[email protected]</a>> wrote:<br></div><
blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
***Proposal Text***<br>

Choice 2: Software incorporating AI Models Released under DFSG Licenses<br> free Must provide for<br>
Practical Modification to Comply with DFSG<br>

The project asks those charged with interpreting the DFSG to require<br>
that software incorporating AI models have a preferred form of<br>
modification for the models and that we provide our users the ability to<br> modify these models in order to be included in the main section of the<br> archive. Examples of such a preferred form of modification can include<br> the original training data for the model. Alternatively, a base model<br> (especially when the base model can be replaced and multiple options are<br> available) along with training data for any fine tuning that has been<br> performed is acceptable. In some cases a model along with necessary<br>
tools to perform incremental fine tuning may be acceptable if doing<br> additional incremental training is actually the approach that the<br>
upstream project uses to modify the model. As with other interpretations<br>
of the DFSG, something cannot be the preferred form of modification if<br>
the upstream of the software under consideration has a more preferred<br>
form of modification that is not public.<br>
</blockquote></div><div><br clear="all"></div><div>Another, simpler, alternative would be to vote on the Debian project endorsing <a href="https://opensource.org/ai/open-source-ai-definition">https://opensource.org/ai/open-source-ai-definition</a></div><

<br></div><div>It basically translates the four freedoms into AI freedoms and introduces "Data Information" as a substitute for (potentially unredistributable) original training data - a description of what data was used for training and

how it was acquired and processed. With the key that a sufficiently skilled person should be able to reproduce the data and then the model using this information.</div><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class=
"gmail_signature"><div dir="ltr"><div><span style="font-family:monospace">Best regards,<br> Aigars Mahinovs mailto:<a href="mailto:[email protected]" target="_blank">[email protected]</a><br> #-----------------------------------------
---------------------#<br> | .''`. Debian GNU/Linux (<a href="http://www.debian.org" target="_blank">http://www.debian.org</a>) |<br> | : :' :   �
� |<br></span></div><span style="font-family:monospace"> | `. `' Software Engineer, BMW |<br></span><div><span style="font-family:monospace"> |   `-
   |<br> #--------------------------------------------------------------#</span></div></div></div></div>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Aigars Mahinovs@21:1/5 to Sam Hartman on Tue May 6 11:00:01 2025

Creating two separate (but equal) definitions in the community is IMHO in
long term detrimental to clarity of software freedom.

That is why I would rather see the Debian project first express agreement
with the OSI position (as a strategic decision) and later additionally
provide guidance on how to interpret DFSG in that context - as a technical/supporting documentation that implements this strategic decision. Specifically this guidance would need to state that training data of a AI
model is *not* considered to be the "source code" of the model, but rather
an intermediate build artifact from the real source that is the "training
data information". And noting that having the training model be in a form
that is suitable for further refinement is required to satisfy the
requirement to technically allow further modifications and derived works.

IMHO voting on a strategic decision itself could be a thing that can be
done immediately, but the technical part would need more work and
discussion.

On Tue, 6 May 2025 at 00:30, Sam Hartman <[email protected]> wrote:

"Aigars" == Aigars Mahinovs <[email protected]> writes:

Aigars> Another, simpler, alternative would be to vote on the Debian
Aigars> project endorsing
Aigars> https://opensource.org/ai/open-source-ai-definition

Aigars> It basically translates the four freedoms into AI freedoms
Aigars> and introduces "Data Information" as a substitute for
Aigars> (potentially unredistributable) original training data - a
Aigars> description of what data was used for training and how it
Aigars> was acquired and processed. With the key that a sufficiently
Aigars> skilled person should be able to reproduce the data and then
Aigars> the model using this information.

I'd rank that belowe FD, so I would not propose it, but I would rank it
above Choice 1.

Here are my concerns with that definition for Debian:

* The four freedoms do not have any formal role in the DFSG. Pulling
them in here seems like an odd place for Debian to sign onto them.

* The definition refers to OSD rather than DFSG in terms of licenses.

* Data information makes sense to me when talking about base models. But
data information does not guarantee that I can modify a model as part
of a software system in practice. Perhaps in practice it does, but I
would need to think through it more than I have already.

But if you propose that option I will second.

--
Best regards,
Aigars Mahinovs mailto:[email protected]
#--------------------------------------------------------------#
| .''`. Debian GNU/Linux (http://www.debian.org) |
| : :' : |
| `. `' Software Engineer, BMW |
| `- |
#--------------------------------------------------------------#

<div dir="ltr"><div>Creating two separate (but equal) definitions in the community is IMHO in long term detrimental to clarity of software freedom.</div><div><br></div><div>That is why I would rather see the Debian project first express agreement with
the OSI position (as a strategic decision) and later additionally provide guidance on how to interpret DFSG in that context - as a technical/supporting documentation that implements this strategic decision. Specifically this guidance would need to state
that training data of a AI model is *not* considered to be the "source code" of the model, but rather an intermediate build artifact from the real source that is the "training data information". And noting that having the training
model be in a form that is suitable for further refinement is required to satisfy the requirement to technically allow further modifications and derived works.</div><div><br></div><div>IMHO voting on a strategic decision itself could be a thing that can
be done immediately, but the technical part would need more work and discussion.</div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Tue, 6 May 2025 at 00:30, Sam Hartman <<a href="mailto:[email protected]">
[email protected]</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">>>>>> "Aigars" == Aigars Mahinovs <<a href="mailto:aigarius@
debian.org" target="_blank">[email protected]</a>> writes:<br>

Aigars> Another, simpler, alternative would be to vote on the Debian<br>
Aigars> project endorsing<br>
Aigars> <a href="https://opensource.org/ai/open-source-ai-definition" rel="noreferrer" target="_blank">https://opensource.org/ai/open-source-ai-definition</a><br>

Aigars> It basically translates the four freedoms into AI freedoms<br> Aigars> and introduces "Data Information" as a substitute for<br>
Aigars> (potentially unredistributable) original training data - a<br> Aigars> description of what data was used for training and how it<br> Aigars> was acquired and processed. With the key that a sufficiently<br>
Aigars> skilled person should be able to reproduce the data and then<br>
Aigars> the model using this information.<br>

I'd rank that belowe FD, so I would not propose it, but I would rank it<br> above Choice 1.<br>

Here are my concerns with that definition for Debian:<br>

* The four freedoms do not have any formal role in the DFSG. Pulling<br>
them in here seems like an odd place for Debian to sign onto them.<br>

* The definition refers to OSD rather than DFSG in terms of licenses.<br>

* Data information makes sense to me when talking about base models. But<br>
data information does not guarantee that I can modify a model as part<br>
of a software system in practice. Perhaps in practice it does, but I<br>
would need to think through it more than I have already.<br>

But if you propose that option I will second.<br>
</blockquote></div><div><br clear="all"></div><br><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:monospace">Best regards,<br> Aigars Mahinovs mailto:<
a href="mailto:[email protected]" target="_blank">[email protected]</a><br> #--------------------------------------------------------------#<br> | .''`. Debian GNU/Linux (<a href="http://www.debian.org" target="_blank">http://www.
debian.org</a>) |<br> | : :' : |<br></span></div><span style="font-family:monospace"> | `. `' Software Engineer, BMW �
� |<br></span><div><span style="font-family:monospace"> | `- |<br> #--------------------------------------------------------------#</span></div></

</div></div>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefano Zacchiroli@21:1/5 to Sam Hartman on Tue May 6 13:10:01 2025

Hello Sam,

On Mon, May 05, 2025 at 01:12:13PM -0600, Sam Hartman wrote:

***Proposal Text***

Choice 2: Software incorporating AI Models Released under DFSG Licenses
free Must provide for
Practical Modification to Comply with DFSG

The project asks those charged with interpreting the DFSG to require
that software incorporating AI models have a preferred form of
modification for the models and that we provide our users the ability to modify these models in order to be included in the main section of the archive. Examples of such a preferred form of modification can include
the original training data for the model. Alternatively, a base model (especially when the base model can be replaced and multiple options are available) along with training data for any fine tuning that has been performed is acceptable. In some cases a model along with necessary
tools to perform incremental fine tuning may be acceptable if doing additional incremental training is actually the approach that the
upstream project uses to modify the model. As with other interpretations
of the DFSG, something cannot be the preferred form of modification if
the upstream of the software under consideration has a more preferred
form of modification that is not public.

I don't know yet how I would rank this option w.r.t. the main one, but I
think it's important to have an alternative option, along the lines of
yours above, available on the ballot. I hence second your text above
(assuming it's final already; if not, I'll be happy to do so when it
is).

Cheers
--
Stefano Zacchiroli . [email protected] . https://upsilon.cc/zack _. ^ ._
Full professor of Computer Science o o o \/|V|\/ Télécom Paris, Polytechnic Institute of Paris o o o </> <\> Co-founder & CSO Software Heritage o o o o /\|^|/\ Mastodon: https://mastodon.xyz/@zacchiro '" V "'

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEE8ZooXsFA+JEz681OfH5Cj5NBJ5kFAmgZ7b8ACgkQfH5Cj5NB J5mshQ//c+HI3GPIc6eIhKGdSvQgejlbiQchcUkaRRYhMzTLJ1ilx7NCglFfi48K sAjNYQcdptScd/hjEhC5EWHujNtvp3yaddSokvuTHQWIVM1/SvfZ3mhNeK1b4BUE maB3p9vRvrxLMzRcPuzaBW5WJPr+z27VhxlUGXgphROxQvlwsqlIGsNCBjwUkrQ2 J8qwx3uldb/hK2aGbX0jjCCw9iwdpVXm8ldoENRxMSR70P6VUolfXA80ahNXW3I7 AyGZLi3u60LcWMLs6t05F6c6Ip2vlDwnsPD7WNkTBrrYyx8a7wXGSbfIaY3sWRCQ L1fS3N1ZFECbwlN6vTSryEkx6zjlZ9NBKA/KZlXrtinhsGYLxGYY4TBgVLqpwOSy ZiM46B4uQVycTzTXoegeWIZkGmrd1VL5dz6Z47uBpL7ODVbeFg6Oz2rmociDNP0F yw6dkKR7OsIM/vWDcdLegJ9K+Qldmu0WeDxiPyiqt3RLaC68If9K07KuoOeXb6nq Nw+akm3eqaADZIcobd/7Sp

From =?UTF-8?B?T3R0byBLZWvDpGzDpGluZW4=?@21:1/5 to All on Tue May 6 20:00:01 2025

Hi,

***Proposal Text***

Choice 2: Software incorporating AI Models Released under DFSG Licenses
free Must provide for
Practical Modification to Comply with DFSG

The project asks those charged with interpreting the DFSG to require
that software incorporating AI models have a preferred form of
modification for the models and that we provide our users the ability to
modify these models in order to be included in the main section of the
archive. Examples of such a preferred form of modification can include
the original training data for the model. Alternatively, a base model
(especially when the base model can be replaced and multiple options are
available) along with training data for any fine tuning that has been
performed is acceptable. In some cases a model along with necessary
tools to perform incremental fine tuning may be acceptable if doing
additional incremental training is actually the approach that the
upstream project uses to modify the model. As with other interpretations
of the DFSG, something cannot be the preferred form of modification if
the upstream of the software under consideration has a more preferred
form of modification that is not public.

Another, simpler, alternative would be to vote on the Debian project endorsing https://opensource.org/ai/open-source-ai-definition

It basically translates the four freedoms into AI freedoms and introduces "Data Information" as a substitute for (potentially unredistributable) original training data - a description of what data was used for training and how it was acquired and

processed. With the key that a sufficiently skilled person should be able to reproduce the data and then the model using this information.

The OSI definition of open source was originally derived from the
Debian DFSG, but then they published that Open Source AI some people
who objected it created https://opensourcedefinition.org/ and https://openweight.org/ to emphasize that weights are not open source
without the training data. For background see https://www.einpresswire.com/article/779177703/open-weight-definition-owd-delivering-clarity-while-protecting-the-integrity-of-open-source-ai

Currently the top voted model at
https://huggingface.co/models?sort=likes is DeepSeek-R1, which is
under MIT but of course no training data is available. While
Huggingface has a good UI there does not seem to be any way to search
for models that have an open license AND training data available. It
would be reassuring to see a list of those models and be able to
assess if they are likely to grow and evolve to make sure that Debian
does not adopt an overly strict stance that ends up in a situation
where Debian is void of even small spelling and grammar checking
models that could offer great value to end users.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet
- Centurion
  Mon Jul 27 22:50:42 2026
  from Berea, Ohio via Telnet
- Ataricrypt
  Mon Jul 27 19:19:17 2026
  from England via Telnet
- Bob Worm
  Mon Jul 27 15:19:55 2026
  from Wales, Uk via Telnet
- Rixter
  Mon Jul 27 13:04:59 2026
  from Madison, Nc via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	48:29:02
Calls:	12,444
Calls today:	4
Files:	15,192
Messages:	6,537,117

Re: Proposal Alternative: A Model Can Be a Preferred form of Modificati

Who's Online

Recent Visitors

System Info