• Re: Proposal Alternative: A Model Can Be a Preferred form of Modificati

    From Mo Zhou@21:1/5 to Sam Hartman on Mon May 5 21:30:01 2025
    Hi Sam,

    On 5/5/25 15:12, Sam Hartman wrote:
    ***Proposal Text***

    Choice 2: Software incorporating AI Models Released under DFSG Licenses
    free Must provide for
    Practical Modification to Comply with DFSG

    The project asks those charged with interpreting the DFSG to require
    that software incorporating AI models have a preferred form of
    modification for the models and that we provide our users the ability to modify these models in order to be included in the main section of the archive. Examples of such a preferred form of modification can include
    the original training data for the model. Alternatively, a base model (especially when the base model can be replaced and multiple options are available) along with training data for any fine tuning that has been performed is acceptable. In some cases a model along with necessary
    tools to perform incremental fine tuning may be acceptable if doing additional incremental training is actually the approach that the
    upstream project uses to modify the model. As with other interpretations
    of the DFSG, something cannot be the preferred form of modification if
    the upstream of the software under consideration has a more preferred
    form of modification that is not public.

    Thanks! While I disagree with the proposal -- it only grants the user with
    a "partial freedom" instead of "full freedom", the proposal has made a
    clear point.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bill Allombert@21:1/5 to All on Mon May 5 22:30:01 2025
    Le Mon, May 05, 2025 at 01:12:13PM -0600, Sam Hartman a �crit :

    I'm not sure if this is too late. The mail to debian-devel-announce was
    kind of late, and I hope there is still some discussion time left.

    It is late enough that I am immediately seeking seconds for the
    following proposal.
    I am also open to wordsmithing if we have time.

    If we decide to take more time to think about this issue and build
    project consensus, I would be delighted if we did not vote now.

    Rationale:

    TL;DR: If in practice we are able to modify the software we have, and
    the license is DFSG free, then I think we meet DFSG 2 and the software
    should be DFSG free.

    This proposal extends on the comments I made in https://lists.debian.org/[email protected]


    It's been my experience that given the costs of AI training, often the
    model itself is the preferred form of modification. I find this
    particularly true in the case of LLMs based on my experience over the
    last year. I particularly disagree with Russ that doing a full
    parameter fine tuning of a model is anything like calling into a
    library; to me it seems a lot more like modifying a Smalltalk world or changing a LambdaMoo world and dumping a new core. Even LORA style
    retraining looks a lot like the sort of patch files permitted by DFSG 4.
    I disagree with those who claim that if we had the original training
    data we would choose to start there when we want to modify a model.

    Without the original training data, we have no way to know what it
    is "inside" the model. The model could generate backdoors and non-free copyrighted material or even more harmful content.

    Cheers
    --
    Bill. <[email protected]>

    Imagine a large red swirl here.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sam Hartman@21:1/5 to All on Mon May 5 22:50:01 2025
    "Bill" == Bill Allombert <[email protected]> writes:


    Bill> Without the original training data, we have no way to know
    Bill> what it is "inside" the model. The model could generate
    Bill> backdoors and non-free copyrighted material or even more
    Bill> harmful content.

    And yet we have accepted x86 machine code as the preferred form of modification.
    Inspectability (as opposed to preferred form of modification) has never
    been at the core of DFSG.
    Typically, modifyability has come with some degree of inspectability.

    Machine learning models are a case where those two properties split.
    And there is sufficient history in my mind that we do not require inspectability the same way we prefer modifyability.



    I also think we will start to develop black box inspection tools for
    machine learning models, and so the level of inspectability we get with
    model weights will improve over time.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sam Hartman@21:1/5 to All on Tue May 6 00:40:01 2025
    "Aigars" == Aigars Mahinovs <[email protected]> writes:

    Aigars> Another, simpler, alternative would be to vote on the Debian
    Aigars> project endorsing
    Aigars> https://opensource.org/ai/open-source-ai-definition

    Aigars> It basically translates the four freedoms into AI freedoms
    Aigars> and introduces "Data Information" as a substitute for
    Aigars> (potentially unredistributable) original training data - a
    Aigars> description of what data was used for training and how it
    Aigars> was acquired and processed. With the key that a sufficiently
    Aigars> skilled person should be able to reproduce the data and then
    Aigars> the model using this information.

    I'd rank that belowe FD, so I would not propose it, but I would rank it
    above Choice 1.

    Here are my concerns with that definition for Debian:

    * The four freedoms do not have any formal role in the DFSG. Pulling
    them in here seems like an odd place for Debian to sign onto them.

    * The definition refers to OSD rather than DFSG in terms of licenses.

    * Data information makes sense to me when talking about base models. But
    data information does not guarantee that I can modify a model as part
    of a software system in practice. Perhaps in practice it does, but I
    would need to think through it more than I have already.

    But if you propose that option I will second.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aigars Mahinovs@21:1/5 to Sam Hartman on Tue May 6 00:20:01 2025
    On Mon, 5 May 2025 at 21:13, Sam Hartman <[email protected]> wrote:


    ***Proposal Text***

    Choice 2: Software incorporating AI Models Released under DFSG Licenses
    free Must provide for
    Practical Modification to Comply with DFSG

    The project asks those charged with interpreting the DFSG to require
    that software incorporating AI models have a preferred form of
    modification for the models and that we provide our users the ability to modify these models in order to be included in the main section of the archive. Examples of such a preferred form of modification can include
    the original training data for the model. Alternatively, a base model (especially when the base model can be replaced and multiple options are available) along with training data for any fine tuning that has been performed is acceptable. In some cases a model along with necessary
    tools to perform incremental fine tuning may be acceptable if doing additional incremental training is actually the approach that the
    upstream project uses to modify the model. As with other interpretations
    of the DFSG, something cannot be the preferred form of modification if
    the upstream of the software under consideration has a more preferred
    form of modification that is not public.


    Another, simpler, alternative would be to vote on the Debian project
    endorsing https://opensource.org/ai/open-source-ai-definition

    It basically translates the four freedoms into AI freedoms and introduces
    "Data Information" as a substitute for (potentially unredistributable)
    original training data - a description of what data was used for training
    and how it was acquired and processed. With the key that a sufficiently
    skilled person should be able to reproduce the data and then the model
    using this information.

    --
    Best regards,
    Aigars Mahinovs mailto:[email protected]
    #--------------------------------------------------------------#
    | .''`. Debian GNU/Linux (http://www.debian.org) |
    | : :' : |
    | `. `' Software Engineer, BMW |
    | `- |
    #--------------------------------------------------------------#

    <div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Mon, 5 May 2025 at 21:13, Sam Hartman &lt;<a href="mailto:[email protected]">[email protected]</a>&gt; wrote:<br></div><
    blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
    ***Proposal Text***<br>

    Choice 2: Software incorporating AI Models Released under DFSG Licenses<br> free Must provide for<br>
    Practical Modification to Comply with DFSG<br>

    The project asks those charged with interpreting the DFSG to require<br>
    that software incorporating AI models have a preferred form of<br>
    modification for the models and that we provide our users the ability to<br> modify these models in order to be included in the main section of the<br> archive. Examples  of such a preferred form of modification can include<br> the original training data for the model. Alternatively, a base model<br> (especially when the base model can be replaced and multiple options are<br> available) along with training data for any fine tuning that has been<br> performed is acceptable. In some cases a model along with necessary<br>
    tools to perform incremental fine tuning may be acceptable if doing<br> additional incremental training is actually the approach that the<br>
    upstream project uses to modify the model. As with other interpretations<br>
    of the DFSG, something cannot be the preferred form of modification if<br>
    the upstream of the software under consideration has a more preferred<br>
    form of modification that is not public.<br>
    </blockquote></div><div><br clear="all"></div><div>Another, simpler, alternative would be to vote on the Debian project endorsing <a href="https://opensource.org/ai/open-source-ai-definition">https://opensource.org/ai/open-source-ai-definition</a></div><
    <br></div><div>It basically translates the four freedoms into AI freedoms and introduces &quot;Data Information&quot; as a substitute for (potentially unredistributable) original training data - a description of what data was used for training and
    how it was acquired and processed. With the key that a sufficiently skilled person should be able to reproduce the data and then the model using this information.</div><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class=
    "gmail_signature"><div dir="ltr"><div><span style="font-family:monospace">Best regards,<br>    Aigars Mahinovs        mailto:<a href="mailto:[email protected]" target="_blank">[email protected]</a><br> #-----------------------------------------
    ---------------------#<br> | .&#39;&#39;`.    Debian GNU/Linux (<a href="http://www.debian.org" target="_blank">http://www.debian.org</a>)            |<br> | : :&#39; :                                                  �
    �    |<br></span></div><span style="font-family:monospace"> | `. `&#39;    Software Engineer, BMW                              |<br></span><div><span style="font-family:monospace"> |   `-                             
                               |<br> #--------------------------------------------------------------#</span></div></div></div></div>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aigars Mahinovs@21:1/5 to Sam Hartman on Tue May 6 11:00:01 2025
    Creating two separate (but equal) definitions in the community is IMHO in
    long term detrimental to clarity of software freedom.

    That is why I would rather see the Debian project first express agreement
    with the OSI position (as a strategic decision) and later additionally
    provide guidance on how to interpret DFSG in that context - as a technical/supporting documentation that implements this strategic decision. Specifically this guidance would need to state that training data of a AI
    model is *not* considered to be the "source code" of the model, but rather
    an intermediate build artifact from the real source that is the "training
    data information". And noting that having the training model be in a form
    that is suitable for further refinement is required to satisfy the
    requirement to technically allow further modifications and derived works.

    IMHO voting on a strategic decision itself could be a thing that can be
    done immediately, but the technical part would need more work and
    discussion.

    On Tue, 6 May 2025 at 00:30, Sam Hartman <[email protected]> wrote:

    "Aigars" == Aigars Mahinovs <[email protected]> writes:

    Aigars> Another, simpler, alternative would be to vote on the Debian
    Aigars> project endorsing
    Aigars> https://opensource.org/ai/open-source-ai-definition

    Aigars> It basically translates the four freedoms into AI freedoms
    Aigars> and introduces "Data Information" as a substitute for
    Aigars> (potentially unredistributable) original training data - a
    Aigars> description of what data was used for training and how it
    Aigars> was acquired and processed. With the key that a sufficiently
    Aigars> skilled person should be able to reproduce the data and then
    Aigars> the model using this information.

    I'd rank that belowe FD, so I would not propose it, but I would rank it
    above Choice 1.

    Here are my concerns with that definition for Debian:

    * The four freedoms do not have any formal role in the DFSG. Pulling
    them in here seems like an odd place for Debian to sign onto them.

    * The definition refers to OSD rather than DFSG in terms of licenses.

    * Data information makes sense to me when talking about base models. But
    data information does not guarantee that I can modify a model as part
    of a software system in practice. Perhaps in practice it does, but I
    would need to think through it more than I have already.

    But if you propose that option I will second.



    --
    Best regards,
    Aigars Mahinovs mailto:[email protected]
    #--------------------------------------------------------------#
    | .''`. Debian GNU/Linux (http://www.debian.org) |
    | : :' : |
    | `. `' Software Engineer, BMW |
    | `- |
    #--------------------------------------------------------------#

    <div dir="ltr"><div>Creating two separate (but equal) definitions in the community is IMHO in long term detrimental to clarity of software freedom.</div><div><br></div><div>That is why I would rather see the Debian project first express agreement with
    the OSI position (as a strategic decision) and later additionally provide guidance on how to interpret DFSG in that context - as a technical/supporting documentation that implements this strategic decision. Specifically this guidance would need to state
    that training data of a AI model is *not* considered to be the &quot;source code&quot; of the model, but rather an intermediate build artifact from the real source that is the &quot;training data information&quot;. And noting that having the training
    model be in a form that is suitable for further refinement is required to satisfy the requirement to technically allow further modifications and derived works.</div><div><br></div><div>IMHO voting on a strategic decision itself could be a thing that can
    be done immediately, but the technical part would need more work and discussion.</div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Tue, 6 May 2025 at 00:30, Sam Hartman &lt;<a href="mailto:[email protected]">
    [email protected]</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">&gt;&gt;&gt;&gt;&gt; &quot;Aigars&quot; == Aigars Mahinovs &lt;<a href="mailto:aigarius@
    debian.org" target="_blank">[email protected]</a>&gt; writes:<br>

        Aigars&gt; Another, simpler, alternative would be to vote on the Debian<br>
        Aigars&gt; project endorsing<br>
        Aigars&gt; <a href="https://opensource.org/ai/open-source-ai-definition" rel="noreferrer" target="_blank">https://opensource.org/ai/open-source-ai-definition</a><br>

        Aigars&gt; It basically translates the four freedoms into AI freedoms<br>     Aigars&gt; and introduces &quot;Data Information&quot; as a substitute for<br>
        Aigars&gt; (potentially unredistributable) original training data - a<br>     Aigars&gt; description of what data was used for training and how it<br>     Aigars&gt; was acquired and processed. With the key that a sufficiently<br>
        Aigars&gt; skilled person should be able to reproduce the data and then<br>
        Aigars&gt; the model using this information.<br>

    I&#39;d rank that belowe FD, so I would not propose it, but I would rank it<br> above Choice 1.<br>

    Here are my concerns with that definition for Debian:<br>

    * The four freedoms do not have any formal role in the DFSG. Pulling<br>
      them in here  seems like an odd place for Debian to sign onto them.<br>

    * The definition refers to OSD rather than DFSG in terms of licenses.<br>

    * Data information makes sense to me when talking about base models. But<br>
      data information does not guarantee that I can modify a model as part<br>
      of a software system in practice. Perhaps in practice it does, but I<br>
      would need to think through it more than I have already.<br>

    But if you propose that option I will second.<br>
    </blockquote></div><div><br clear="all"></div><br><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:monospace">Best regards,<br>    Aigars Mahinovs        mailto:<
    a href="mailto:[email protected]" target="_blank">[email protected]</a><br> #--------------------------------------------------------------#<br> | .&#39;&#39;`.    Debian GNU/Linux (<a href="http://www.debian.org" target="_blank">http://www.
    debian.org</a>)            |<br> | : :&#39; :                                                       |<br></span></div><span style="font-family:monospace"> | `. `&#39;    Software Engineer, BMW                �
    �             |<br></span><div><span style="font-family:monospace"> |   `-                                                         |<br> #--------------------------------------------------------------#</span></div></
    </div></div>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefano Zacchiroli@21:1/5 to Sam Hartman on Tue May 6 13:10:01 2025
    Hello Sam,

    On Mon, May 05, 2025 at 01:12:13PM -0600, Sam Hartman wrote:
    ***Proposal Text***

    Choice 2: Software incorporating AI Models Released under DFSG Licenses
    free Must provide for
    Practical Modification to Comply with DFSG

    The project asks those charged with interpreting the DFSG to require
    that software incorporating AI models have a preferred form of
    modification for the models and that we provide our users the ability to modify these models in order to be included in the main section of the archive. Examples of such a preferred form of modification can include
    the original training data for the model. Alternatively, a base model (especially when the base model can be replaced and multiple options are available) along with training data for any fine tuning that has been performed is acceptable. In some cases a model along with necessary
    tools to perform incremental fine tuning may be acceptable if doing additional incremental training is actually the approach that the
    upstream project uses to modify the model. As with other interpretations
    of the DFSG, something cannot be the preferred form of modification if
    the upstream of the software under consideration has a more preferred
    form of modification that is not public.

    I don't know yet how I would rank this option w.r.t. the main one, but I
    think it's important to have an alternative option, along the lines of
    yours above, available on the ballot. I hence second your text above
    (assuming it's final already; if not, I'll be happy to do so when it
    is).

    Cheers
    --
    Stefano Zacchiroli . [email protected] . https://upsilon.cc/zack _. ^ ._
    Full professor of Computer Science o o o \/|V|\/ Télécom Paris, Polytechnic Institute of Paris o o o </> <\> Co-founder & CSO Software Heritage o o o o /\|^|/\ Mastodon: https://mastodon.xyz/@zacchiro '" V "'

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCAAdFiEE8ZooXsFA+JEz681OfH5Cj5NBJ5kFAmgZ7b8ACgkQfH5Cj5NB J5mshQ//c+HI3GPIc6eIhKGdSvQgejlbiQchcUkaRRYhMzTLJ1ilx7NCglFfi48K sAjNYQcdptScd/hjEhC5EWHujNtvp3yaddSokvuTHQWIVM1/SvfZ3mhNeK1b4BUE maB3p9vRvrxLMzRcPuzaBW5WJPr+z27VhxlUGXgphROxQvlwsqlIGsNCBjwUkrQ2 J8qwx3uldb/hK2aGbX0jjCCw9iwdpVXm8ldoENRxMSR70P6VUolfXA80ahNXW3I7 AyGZLi3u60LcWMLs6t05F6c6Ip2vlDwnsPD7WNkTBrrYyx8a7wXGSbfIaY3sWRCQ L1fS3N1ZFECbwlN6vTSryEkx6zjlZ9NBKA/KZlXrtinhsGYLxGYY4TBgVLqpwOSy ZiM46B4uQVycTzTXoegeWIZkGmrd1VL5dz6Z47uBpL7ODVbeFg6Oz2rmociDNP0F yw6dkKR7OsIM/vWDcdLegJ9K+Qldmu0WeDxiPyiqt3RLaC68If9K07KuoOeXb6nq Nw+akm3eqaADZIcobd/7Sp
  • From =?UTF-8?B?T3R0byBLZWvDpGzDpGluZW4=?@21:1/5 to All on Tue May 6 20:00:01 2025
    Hi,

    ***Proposal Text***

    Choice 2: Software incorporating AI Models Released under DFSG Licenses
    free Must provide for
    Practical Modification to Comply with DFSG

    The project asks those charged with interpreting the DFSG to require
    that software incorporating AI models have a preferred form of
    modification for the models and that we provide our users the ability to
    modify these models in order to be included in the main section of the
    archive. Examples of such a preferred form of modification can include
    the original training data for the model. Alternatively, a base model
    (especially when the base model can be replaced and multiple options are
    available) along with training data for any fine tuning that has been
    performed is acceptable. In some cases a model along with necessary
    tools to perform incremental fine tuning may be acceptable if doing
    additional incremental training is actually the approach that the
    upstream project uses to modify the model. As with other interpretations
    of the DFSG, something cannot be the preferred form of modification if
    the upstream of the software under consideration has a more preferred
    form of modification that is not public.


    Another, simpler, alternative would be to vote on the Debian project endorsing https://opensource.org/ai/open-source-ai-definition

    It basically translates the four freedoms into AI freedoms and introduces "Data Information" as a substitute for (potentially unredistributable) original training data - a description of what data was used for training and how it was acquired and
    processed. With the key that a sufficiently skilled person should be able to reproduce the data and then the model using this information.

    The OSI definition of open source was originally derived from the
    Debian DFSG, but then they published that Open Source AI some people
    who objected it created https://opensourcedefinition.org/ and https://openweight.org/ to emphasize that weights are not open source
    without the training data. For background see https://www.einpresswire.com/article/779177703/open-weight-definition-owd-delivering-clarity-while-protecting-the-integrity-of-open-source-ai

    Currently the top voted model at
    https://huggingface.co/models?sort=likes is DeepSeek-R1, which is
    under MIT but of course no training data is available. While
    Huggingface has a good UI there does not seem to be any way to search
    for models that have an open license AND training data available. It
    would be reassuring to see a list of those models and be able to
    assess if they are likely to grow and evolve to make sure that Debian
    does not adopt an overly strict stance that ends up in a situation
    where Debian is void of even small spelling and grammar checking
    models that could offer great value to end users.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)