• [RFC] Counter-Proposal -- Interpretation of DFSG on Artificial Intellig

    From Thorsten Glaser@21:1/5 to All on Wed Apr 23 23:50:01 2025
    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA384

    Cover letter
    ============

    (Please do keep me in Cc, I’m not subscribed to the list.)

    Hi! I had not realised it’s going to GR with this, so I’ve drafted
    a counter proposal, based on the thread on debian-private around <[email protected]> and
    earlier thoughts I’ve collected regarding this topic, such as on https://evolvis.org/~tg/cc.htm and the interpretation guidelines
    on https://mbsd.evolvis.org/MirOS-Licence.htm (this is a mirror on
    a more capable VM).

    I’m not sure how quickly I’ll need seconds, but I would also welcome
    input on this proposal (including from the l10n-en team as I’m not a
    native English speaker).

    I’m PGP-signing this with my DD key, as, for the avoidance of doubt,
    should time be short indeed I’m submitting this as a choice. If time
    isn’t short, I’m tentatively submitting it, with working in feedback
    and updating it first as an option.


    Counter-Proposal -- Interpretation of DFSG on (AI) Models =========================================================

    Please see the original proposal for background on this.

    The counter-proposal is as follows:

    The Debian project requires the same level of freedom for AI models
    than it does for other works entering the archive.

    Notably:

    1. A model must be trained only from legally obtained and used works,
    honour all licences of the works used in training, and be licenced
    under a suitable licence itself that allows distribution, or it is
    not even acceptable for non-free. This includes an understanding
    that “generative AI” output are derivative works of their inputs
    (including training data and the prompt), insofar as these pass
    threshold of originality, that is, generative AI acts similar to
    a lossy compression followed by decompression, or to a compiler.

    Any work resulting from generative use of a model can at most be
    as free as the model itself; e.g. programming with a model from
    contrib/non-free assisting prevents the result from entering main.

    The "/usr/share/doc/PACKAGE/copyright" file must include copyright
    notices from all training inputs as required by Policy for “any
    files which are compiled into the object code shipped in the binary
    package”, except for inputs already separately packaged (such as
    the training software, libraries, or inputs already available from
    packages such as word lists also used for spellchecking).

    Regarding availability of sources used for training, the normal
    rules of the non-free archive apply.

    2 Models are not suitable for the non-free-firmware archive.

    3. For a model to enter the contrib archive, it may at runtime require
    components from outside of Debian main, but the model itself must
    still comply with the DFSG, i.e. follow below requirements for
    models entering main. If a model requires a component outside of
    main at build or training time, it is only admissible to non-free.

    4. For a model to enter the main archive, all works used in training
    must additionally be available, auditable, and under DFSG-compliant
    licencing. All software used to do the training must be available
    in Debian main.

    If the training happens during package build, the sources must be
    present in Debian packages or in the model’s source packages; if
    not, they must still be available in the same way.

    This is the same rule as is used for other precompiled works in
    Debian packages that are not regenerated during build: they must
    be able to be regenerated using only Debian tools, waiving the
    requirement to actually do the regenerating during package build
    is a nod to realistic build time and resource usage.

    5. For a model to enter the main archive, the model training itself
    must *either* happen during package build (which, for models of
    a certain size, may need special infrastructure; the handling of
    this is outside of the scope of this resolution), *or* the model
    resulting from training must build in a sufficiently reproducible
    way that a separate rebuilding effort from the same source will
    result in the same trained model. (This includes using reproducible
    seeds for PRNGs used, etc.)

    For realistic achievability of this goal, the reproducibility
    requirement is relaxed to not require bitwise equality, as long
    as the resulting model is effectively identical. (As a comparison,
    for C programs this would be equivalent to allowing different
    linking order of the object files in the binary or embedded
    timestamps to differ, or a different encoding of the same opcodes
    (like 31 C0 vs. 33 C0 for i386 “xor eax,eax”), but no functional
    changes as determined by experts in the field.)

    6. For handling of any large packages resulting in this, the normal
    processes are followed (such as discussing in advance with the
    relevant teams, ensuring mirrors are not over-burdened, etc).

    The Debian project asks that training sources are not obtained
    unethically, and that the ecological impact of training and using
    AI models be considered.

    [End of proposal.]

    -----BEGIN PGP SIGNATURE-----

    iQIcBAEBCQAGBQJoCV23AAoJEHa1NLLpkAfgfQcP/jDN+p+rY0fPhQUZ/HpJadkJ BawiUYp+TMjsXowrXXy9Mp7FyrlWrj+zROfA1tup2+TkdlQSY8A62aWYS62y5z9y x5TxqwS3+xH6UmtchmX7alxy7u9vUrcsdUM9NKt1DZQANyqq8+pVTpMKauNNsXr+ L8zq/37ludyjCf+c9pnJ066CUaLBBMQGWmfPO8c1mjYWNnACXgYuUH1cw8Sgzr5u vQrdURGfebrmTCQBbmCO5FOzQ3Q/uLjl5CocC8HWF0TBh7vcVtnYCkrvalECJpO5 PlCMUZ0MApuEJ1UTUcj+5lDxdH02dcMdFd7v+OB7+E5Jr+MHDR0wWoVaScm9MYno Eip0sxbzVRqozeAH5bKKSaIQN+4KL/pVB2bYxwR4N5/W/9cxDsJmF/uoB1lZNtL8 DOvLar3RmHNVbaXin/E3afhw5L3O7JeppTSCby9Unyow8hmRjfjhz//ApEbOrWfv CNH7sdM2mkEe0SXoxLyX7wfmZuWQ2SUZ4nwbj3vmHvM6jrVragCJxibQyVEIzuSQ 1FB0MsFa1TrYN4tnR7/q9AiskcHKiTwcdJh0LFCiLZ2F2d2sd4ne60qQTCpmjzzG WkhgeTOeLPCDgkHmC+oUEzGpQruKI/surQ9NSGWbFDyEPTGf9rVzMNlVRp0jJSob 2PclqIcmvlO8Krw+9klA
    =U1FJ
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Carsten Leonhardt@21:1/5 to Thorsten Glaser on Thu Apr 24 02:00:01 2025
    Thorsten Glaser <[email protected]> writes:

    Hi! I had not realised it’s going to GR with this, so I’ve drafted
    a counter proposal [...]

    Could you give a brief summary of the main differences you see between
    your and Mo's proposal? You call it "counter-proposal", so I expected
    something radically different, but so far I fail to see it.

    Regards

    Carsten

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thorsten Glaser@21:1/5 to Carsten Leonhardt on Thu Apr 24 02:20:01 2025
    On Thu, 24 Apr 2025, Carsten Leonhardt wrote:

    Thorsten Glaser <[email protected]> writes:

    Hi! I had not realised it’s going to GR with this, so I’ve drafted
    a counter proposal [...]

    Could you give a brief summary of the main differences you see between
    your and Mo's proposal? You call it "counter-proposal", so I expected >something radically different, but so far I fail to see it.

    Mostly, a hard anti-AI stance (with select exceptions).
    No discussing of its benefits, no calling it inevitable.
    Requiring full sources, full attribution etc. following
    our normal processes. No adopting OSAID terminology.

    I admit Mo’s proposal is… hard to follow, and I’m not
    entirely sure I understood all of it. Here’s what I can say:

    | This proposal focuses on one interpretation of the DFSG on a particular type of

    Mine states clear rules for all models.

    | in the future. If necessary, I can work with the Debian Policy Team to
    | incorporate the GR result into appropriate sections of the Debian Policy (e.g.,

    I don’t think mine requires a Policy change. It just
    affirms existing policy (even refers to it) and says
    no exceptions for “AI”.

    | Debian archive. This proposal does not specify whether the "non-free" section | of Debian archive can include those files.

    Mine does address under which circumstances redistributable
    but not free models can enter non-free, although it defers
    to the usual non-free rules for requirements on source
    availability.

    (It does result in all those TESCREAL models which didn’t
    legally acquire sources and don’t honour their licences
    being inadmissible for even non-free.)


    Basically, I tried to think of what would a model and an AI
    thing (analytic or generative, which are different uses)
    have to be/do for me to consider it acceptable (the d-private
    thread has more on that), added a main/non-free distinction
    on them, made clear non-free-firmware is not a place for it,
    and pondered how the contrib area would fit for it, all with
    (what I think are) how Debian currently handles things.

    Does this help?

    bye,
    //mirabilos
    --
    "Using Lynx is like wearing a really good pair of shades: cuts out
    the glare and harmful UV (ultra-vanity), and you feel so-o-o COOL."
    -- Henry Nelson, March 1999

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco d'Itri@21:1/5 to [email protected] on Thu Apr 24 12:50:01 2025
    [email protected] wrote:

    Could you give a brief summary of the main differences you see between
    your and Mo's proposal? You call it "counter-proposal", so I expected >something radically different, but so far I fail to see it.
    His goal is to make impractical enough to be usually impossible to have
    LLMs and neural networks in Debian.
    I recommend to think hard about the possible ramifications and
    consequences of this.

    --
    ciao,
    Marco

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Thu Apr 24 14:20:01 2025
    CgpPbiBBcHIgMjQsIDIwMjUgMTg6NDAsIE1hcmNvIGQnSXRyaSA8bWRATGludXguSVQ+IHdyb3Rl OgoKPgoKPiBsZW9AZGViaWFuLm9yZyB3cm90ZTogCgo+Cgo+ID5Db3VsZCB5b3UgZ2l2ZSBhIGJy aWVmIHN1bW1hcnkgb2YgdGhlIG1haW4gZGlmZmVyZW5jZXMgeW91IHNlZSBiZXR3ZWVuIAoKPiA+ eW91ciBhbmQgTW8ncyBwcm9wb3NhbD8gWW91IGNhbGwgaXQgImNvdW50ZXItcHJvcG9zYWwiLCBz byBJIGV4cGVjdGVkIAoKPiA+c29tZXRoaW5nIHJhZGljYWxseSBkaWZmZXJlbnQsIGJ1dCBzbyBm YXIgSSBmYWlsIHRvIHNlZSBpdC4gCgo+IEhpcyBnb2FsIGlzIHRvIG1ha2UgaW1wcmFjdGljYWwg ZW5vdWdoIHRvIGJlIHVzdWFsbHkgaW1wb3NzaWJsZSB0byBoYXZlIAoKPiBMTE1zIGFuZCBuZXVy YWwgbmV0d29ya3MgaW4gRGViaWFuLiAKCj4gSSByZWNvbW1lbmQgdG8gdGhpbmsgaGFyZCBhYm91 dCB0aGUgcG9zc2libGUgcmFtaWZpY2F0aW9ucyBhbmQgCgo+IGNvbnNlcXVlbmNlcyBvZiB0aGlz LiAKCj4KCj4gLS0gCgo+IGNpYW8sIAoKPiBNYXJjbyAKCgpUaGUgb25seSBwYXJ0IEkgZG8gbm90 IGxpa2UgaW4gaGlzIHByb3Bvc2FsIGlzIGFib3V0IG5vbi1mcmVlLiBUbyBtZSB3ZSBzaG91bGQg YmUgbW9yZSBsaWJlcmFsIHRoZXJlLiBPdGhlcndpc2UsIGl0IG1hdGNoZXMgbXkgZGVmaW5pdGlv biBvZiBmcmVlIHNvZnR3YXJlLiBUaGUgZmFjdCB0aGF0IGl0IG1heSBiZSBoYXJkIHRvIGZpbmQg TExNIG1hdGNoaW5nIHRoaXMgZGVmaW5pdGlvbiBpcywgSSBhZ3JlZSwgbm90IGEgZ29vZCB0aGlu Zy4gQnV0IHNob3VsZCB3ZSBnaXZlIHVwIG9uIG91ciBmcmVlIHNvZnR3YXJlIGRlZmluaXRpb24g YmVjYXVzZSBvZiB0aGF0PyBJTU8gd2Ugc2hvdWxkIG5vdC4KCgpUaG9tYXMgR29pcmFuZCAoemln bykKCgoK PGh0bWw+PGJvZHk+PGJyPjxkaXYgZGlyPSJsdHIiPk9uIEFwciAyNCwgMjAyNSAxODo0MCwgTWFy Y28gZCYjMzk7SXRyaSAmbHQ7bWRATGludXguSVQmZ3Q7IHdyb3RlOjwvZGl2Pgo8ZGl2IGRpcj0i bHRyIj4mZ3Q7PC9kaXY+CjxkaXYgZGlyPSJsdHIiPiZndDsgbGVvQGRlYmlhbi5vcmcgd3JvdGU6 IDwvZGl2Pgo8ZGl2IGRpcj0ibHRyIj4mZ3Q7PC9kaXY+CjxkaXYgZGlyPSJsdHIiPiZndDsgJmd0 O0NvdWxkIHlvdSBnaXZlIGEgYnJpZWYgc3VtbWFyeSBvZiB0aGUgbWFpbiBkaWZmZXJlbmNlcyB5 b3Ugc2VlIGJldHdlZW4gPC9kaXY+CjxkaXYgZGlyPSJsdHIiPiZndDsgJmd0O3lvdXIgYW5kIE1v JiMzOTtzIHByb3Bvc2FsPyBZb3UgY2FsbCBpdCAmcXVvdDtjb3VudGVyLXByb3Bvc2FsJnF1b3Q7 LCBzbyBJIGV4cGVjdGVkIDwvZGl2Pgo8ZGl2IGRpcj0ibHRyIj4mZ3Q7ICZndDtzb21ldGhpbmcg cmFkaWNhbGx5IGRpZmZlcmVudCwgYnV0IHNvIGZhciBJIGZhaWwgdG8gc2VlIGl0LiA8L2Rpdj4K PGRpdiBkaXI9Imx0ciI+Jmd0OyBIaXMgZ29hbCBpcyB0byBtYWtlIGltcHJhY3RpY2FsIGVub3Vn aCB0byBiZSB1c3VhbGx5IGltcG9zc2libGUgdG8gaGF2ZSA8L2Rpdj4KPGRpdiBkaXI9Imx0ciI+ Jmd0OyBMTE1zIGFuZCBuZXVyYWwgbmV0d29ya3MgaW4gRGViaWFuLiA8L2Rpdj4KPGRpdiBkaXI9 Imx0ciI+Jmd0OyBJIHJlY29tbWVuZCB0byB0aGluayBoYXJkIGFib3V0IHRoZSBwb3NzaWJsZSBy YW1pZmljYXRpb25zIGFuZCA8L2Rpdj4KPGRpdiBkaXI9Imx0ciI+Jmd0OyBjb25zZXF1ZW5jZXMg b2YgdGhpcy4gPC9kaXY+CjxkaXYgZGlyPSJsdHIiPiZndDs8L2Rpdj4KPGRpdiBkaXI9Imx0ciI+ Jmd0OyAtLSA8L2Rpdj4KPGRpdiBkaXI9Imx0ciI+Jmd0OyBjaWFvLCA8L2Rpdj4KPGRpdiBkaXI9 Imx0ciI+Jmd0OyBNYXJjbyA8L2Rpdj4KPGJyPjxkaXYgZGlyPSJsdHIiPlRoZSBvbmx5IHBhcnQg SSBkbyBub3QgbGlrZSBpbiBoaXMgcHJvcG9zYWwgaXMgYWJvdXQgbm9uLWZyZWUuIFRvIG1lIHdl IHNob3VsZCBiZSBtb3JlIGxpYmVyYWwgdGhlcmUuIE90aGVyd2lzZSwgaXQgbWF0Y2hlcyBteSBk ZWZpbml0aW9uIG9mIGZyZWUgc29mdHdhcmUuIFRoZSBmYWN0IHRoYXQgaXQgbWF5IGJlIGhhcmQg dG8gZmluZCBMTE0gbWF0Y2hpbmcgdGhpcyBkZWZpbml0aW9uIGlzLCBJIGFncmVlLCBub3QgYSBn b29kIHRoaW5nLiBCdXQgc2hvdWxkIHdlIGdpdmUgdXAgb24gb3VyIGZyZWUgc29mdHdhcmUgZGVm aW5pdGlvbiBiZWNhdXNlIG9mIHRoYXQ/IElNTyB3ZSBzaG91bGQgbm90LjwvZGl2Pgo8YnI+PGRp diBkaXI9Imx0ciI+VGhvbWFzIEdvaXJhbmQgKHppZ28pPC9kaXY+Cjxicj48YnI+PC9ib2R5Pjwv aHRtbD4=

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Thorsten Glaser on Thu Apr 24 18:00:01 2025
    Thorsten Glaser <[email protected]> writes:

    2 Models are not suitable for the non-free-firmware archive.

    Just a small note on this individual point: I'm not sure how this could
    work in practice. For non-free anything, by definition all we may have is
    a binary blob without any source code. We therefore don't know what that
    binary blob is. How would we know whether it contained an AI model so that
    we know whether to apply this rule?

    Obviously things that are not firmware should not go into the
    non-free-firmware archive section. But I don't see how we have enough information to rule out AI models embedded in firmware. I wouldn't expect
    those to be common, but it seems likely that at least one will show up at
    some point.

    --
    Russ Allbery ([email protected]) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gunnar Wolf@21:1/5 to All on Thu Apr 24 20:40:02 2025
    [email protected] dijo [Thu, Apr 24, 2025 at 08:07:37PM +0800]:
    I am away from my key (travelling...), though as soon as I can, I'll
    second this option. I agree it is much clearer than Mo's proposal.

    Please note that Mo's proposal is much shorter than his original mail,
    which includes background information and links to appendixes. Refer to Message-ID: <[email protected]>

    (and I must say, I'm happy said mail confirmed I quoted right when
    seconding the proposal 😉)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephan =?ISO-8859-1?Q?Verb=FCcheln@21:1/5 to All on Wed Apr 30 07:30:01 2025
    Has there been an open discussion (e.g. on debian-devel or debian-
    legal) of all the pros and cons and corner cases before specific
    proposals have been put up for vote?

    Regards

    -----BEGIN PGP SIGNATURE-----

    iHUEABYKAB0WIQRB1rjSpCJd8a7h6mNgNUJZCjx8YgUCaBG0wgAKCRBgNUJZCjx8 YrbAAQC5Sby9xz/lvy52Cdg3VXb3y1J+48o2RWNcYSM0t7EGlgD+OKoWLYbpJXkL vLWQNn9VsFqN3CW5edB051sE+uDwZQ4=
    =QssA
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?utf-8?Q?=C3=89tienne?= Mollier@21:1/5 to All on Wed Apr 30 20:30:01 2025
    Hi Stephan,

    Has there been an open discussion (e.g. on debian-devel or debian-
    legal) of all the pros and cons and corner cases before specific
    proposals have been put up for vote?

    There has been a stem of discussion on debian-devel[1], but not
    that many contribution to the thread compared to the present one
    on debian-vote. The topic also shown up several times over the
    past few years on debian-ai, although this mailing list may be a
    little more confidential than the ones you mention.

    [1]: https://lists.debian.org/debian-devel/2025/02/msg00015.html

    In hope this clarifies things,
    --
    .''`. Étienne Mollier <[email protected]>
    : :' : pgp: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
    `. `' sent from /dev/pts/3, please excuse my verbosity
    `- on air: Yes - The Revealing Science Of God

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEj5GyJ8fW8rGUjII2eTz2fo8NEdoFAmgSahAACgkQeTz2fo8N EdrKZhAAkhBirlSYO5RUu0iXDNyfvYqHjTcC7i81Oie3vsLtGVX0xKRWTg5jnWBQ 3r/AHDY1atqe6xa1wMQY0wM1kf5WcfiWDR2Vyo5vYdV2bfGFGjNwGjgk+r/M4gAo mgviHgsDEsa+gVhsXFBKpTeVAvB2Qpm9Dx/xsn/067IHgkAqVwy4DmhZUyX7qlZx 6IM+EeQ1K02VIcmQGSbf6SdwPNp1In69PTMazm74vJ6V0ADwYjsebbEYOjn/vzxV OQE7eKKIHhXoqZcllXM9eVAHQ1OwR53FGXHh5WscAOyenrXj8VShkk9STgPMXbI9 yJeLU0/zgVqWdjx6Su2pSeDRx6NPKACXDZTuXEpmiKHd6nkePiPuoa8LVXfiny2g RhX7auiTCgWrVvM0sAbAQH0T3x6T4o1mhlx7viSs1Z/nvXgOUVvzziaYvPk/xlt8 Nxi13e3kMwDa31b1ylbsYk7kwA9fTbgS/0fHRF7WeHISovMeffEgE6kIFpcpGNUc 9PX5oeFmo171Gz/K77Eom+X7zj9VGjUDW5QyDMUvZiSQhIkqNT+H+AvT0jI6XM8L Z4HDjy7DC+aJKm3hSAagmdi/EL/S6TKCGQXBiFloh9XeFCC761lH4TGsdnKH/a8n C7399hXtmmQVquijoGB6q6TqQwMhKWO
  • From Mo Zhou@21:1/5 to All on Thu May 1 04:00:01 2025
    On 2025-04-30 14:21, Étienne Mollier wrote:
    There has been a stem of discussion on debian-devel[1], but not
    that many contribution to the thread compared to the present one
    on debian-vote. The topic also shown up several times over the
    past few years on debian-ai, although this mailing list may be a
    little more confidential than the ones you mention.

    To clarify, [email protected] is a public archived mailing
    list which has nothing todo with the "confidential" word from the
    very first minute it was created since 2020 Sept. https://lists.debian.org/debian-ai/2020/09/threads.html

    [1]: https://lists.debian.org/debian-devel/2025/02/msg00015.html

    We indeed had some large scale discussions since at least seven years
    ago (July 2018):
    https://lwn.net/Articles/760142/

    Those previous efforts are mentioned in Appendix C if people did
    not notice its existence: https://salsa.debian.org/lumin/gr-ai-dfsg/-/blob/main/AppendixC.txt

    We have some relevant discussions in the past, basically scattered
    across debian-devel, debian-project, debian-ai, debian-science
    the four lists, throughout the past 7 years.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?utf-8?Q?=C3=89tienne?= Mollier@21:1/5 to All on Thu May 1 09:10:01 2025
    Hi Mo Zhou,

    On 2025-04-30 14:21, Étienne Mollier wrote:
    There has been a stem of discussion on debian-devel[1], but not
    that many contribution to the thread compared to the present one
    on debian-vote. The topic also shown up several times over the
    past few years on debian-ai, although this mailing list may be a
    little more confidential than the ones you mention.

    To clarify, [email protected] is a public archived mailing
    list which has nothing todo with the "confidential" word from the
    very first minute it was created since 2020 Sept. https://lists.debian.org/debian-ai/2020/09/threads.html

    Thank you for the clarification! I'm sorry, by "confidential",
    I meant that it might not be followed as closely as debian-devel
    or debian-vote. I did not want to imply that it is a private
    mailing list. Short answer is yes, the matter is publically
    discussed for several years already.

    Have a nice day, :)
    --
    .''`. Étienne Mollier <[email protected]>
    : :' : pgp: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
    `. `' sent from /dev/pts/1, please excuse my verbosity
    `-

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEj5GyJ8fW8rGUjII2eTz2fo8NEdoFAmgTHCQACgkQeTz2fo8N Edqf3hAAnrbxBZSvj7vWKpG1cCPuAW7yfIhdKacqPLj2PU9llNJnphaK+DUH5VW9 lguLmlZN01t8arOQ3HErnOrR5+erT4wiDvm61YD/tC6+uurW7NRmONReTr254+L7 +cvYLn70v6Ih3yH9cinmcll2LO7uMhN6VT5oP2AZ6cFDHskA4CCD9cZVp8cy9GNG JCwGHozjpqTKN2Faa3Y7hL6yxsaJdvdnbBDiuoS6J6Cq9FqzF0Y3pBL4kH5VwSAU PXqEfcpUrTJc4JNQMpj/cUfiaPUbGkkkVTq1sEXrCxPw817/JuX+eqavCHbv6wl7 Auitk8ATcLz+mwYvBGUG0rtpMOU1zwmHqAZGchbmf2gvUwzAM7KCMkhLn5yFzhVs gWYDXQgn1VRRsxjV0s8qbTKSgCTVSafqOpcEoAT6gBjuRVn+T5CNxZTqpLvLSX3Z GXqMoimhBnAiZKqwfh7NMFgtuq9O80R3TPMqSgmztEi61ArCnHsiqIpDMMfp9Sr5 cmu1Agl+Hxb7SMfq7L2ria0jzd+SX5QeRC1LIZ7PU8nK2ATaXW3jWLLt1ZEGAlad LeEPF7foFNhmMRu6qoi/QIp2Rq+wlqPUPXhY55hpJ9vHQ3EMcMsCxZXCXdmaEBch zcDiSv04mDc8pwfz1XHVVCEYkTdSHNRg7dh+LElTpilqUfWfzgs=
    =6xxd
    -----END PGP SIG
  • From Simon Josefsson@21:1/5 to Thorsten Glaser on Wed May 7 14:40:01 2025
    Thorsten Glaser <[email protected]> writes:

    Counter-Proposal -- Interpretation of DFSG on (AI) Models

    I'll second this option. I think it describes an internally consistent
    policy and give clear practical recommendations.

    2 Models are not suitable for the non-free-firmware archive.

    I think this will be difficult to enforce. We have no idea what's in
    the non-free-firmware blobs, nor is there any reliable way for us to
    ever gain that, and it seems likely that LLM models will end up inside
    non-free firmware blobs if they haven't already. How about removing
    that paragraph and replace 'non-free' with 'non-free(-firmware)' in the
    rest of your proposal?

    /Simon

    -----BEGIN PGP SIGNATURE-----

    iQNoBAEWCAMQFiEEo8ychwudMQq61M8vUXIrCP5HRaIFAmgbUuUUHHNpbW9uQGpv c2Vmc3Nvbi5vcmfCHCYAmDMEXJLOtBYJKwYBBAHaRw8BAQdACIcrZIvhrxDBkK9f V+QlTmXxo2naObDuGtw58YaxlOu0JVNpbW9uIEpvc2Vmc3NvbiA8c2ltb25Aam9z ZWZzc29uLm9yZz6IlgQTFggAPgIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgBYh BLHSvRN1vst4TPT4xNc89jjFPAa+BQJn0XQkBQkNZGbwAAoJENc89jjFPAa+BtIA /iR73CfBurG9y8pASh3cbGOMHpDZfMAtosu6jbpO69GHAP4p7l57d+iVty2VQMsx +3TCSAvZkpr4P/FuTzZ8JZe8BrgzBFySz4EWCSsGAQQB2kcPAQEHQOxTCIOaeXAx I2hIX4HK9bQTpNVei708oNr1Klm8qCGKiPUEGBYIACYCGwIWIQSx0r0Tdb7LeEz0 +MTXPPY4xTwGvgUCZ9F0SgUJDWRmSQCBdiAEGRYIAB0WIQSjzJyHC50xCrrUzy9R cisI/kdFogUCXJLPgQAKCRBRcisI/kdFoqdMAQCgH45aseZgIrwKOvUOA9QfsmeE 8GZHYNuFHmM9FEQS6AD6A4x5aYvoY6lo98pgtw2HPDhmcCXFItjXCrV4A0GmJA4J ENc89jjFPAa+wUUBAO64fbZek6FPlRK0DrlWsrjCXuLi6PUxyzCAY6lG2nhUAQC6 qobB9mkZlZ0qihy1x4JRtflqFcqqT9n7iUZkCDIiDbg4BFySz2oSCisGAQQBl1UB BQEBB0AxlRumDW6nZY7A+VCfek9VpEx6PJmdJyYPt3lNHMd6HAMBCAeIfgQYFggA JgIbDBYhBLHSvRN1vst4TPT4xNc89jjFPAa+BQJn0XTSBQkNZGboAAoJENc89jjF PAa+0M0BAPPRq73kLnHYNDMniVBOzUdi2XeF32idjEWWfjvyIJUOAP4wZ+ALxIeh is3Uw2BzGZE6ttXQ2Q+DeCJO3TPpIqaXDAAKCRBRcisI/kdFosV7AP4qJ+7XaGfd CYRxf5th1tQ2Dm+ni68J++g/RzUspwUUggEAtQne7Pl7uonWi+QgmZc1xawY5Uvl eknXg3/5kJq0owU=
    =KeG2
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thorsten Glaser@21:1/5 to Simon Josefsson on Thu May 8 01:00:02 2025
    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA384

    On Wed, 7 May 2025, Simon Josefsson wrote:

    I'll second this option. I think it describes an internally consistent >policy and give clear practical recommendations.

    Thank you.

    I’ve been made aware-ish of a few questions regarding this via LWN.

    The most important one is that this should of course only apply after
    the release of course, we don’t want to change things while preparing
    a release.

    We could even give already-existing packages that predate the current
    craze a long grace period (like backgammon).

    And geofft wrote:

    | (if I'm reading right) Thorsten's proposal allows a DFSG-free AI with
    | redistributable but DFSG-incompatible training data to go into contrib
    | with its training data packaged up in non-free

    That was not my intention, and please help me clean up the wording if
    others read it as that as well. In my eyes, there is not such a thing
    as “a DFSG-free AI with DFSG-incompatible training data”, as the model includes (lossily compressed) the training data. Not sure how geofft
    came up with this interpretation (especially as I specifically wrote
    that I think models in contrib should only be allowed non-free runtime dependencies, such as special GPU drivers, but not compile dependencies — though contrib is probably the least interesting area for this?).

    2 Models are not suitable for the non-free-firmware archive.

    I think this will be difficult to enforce. We have no idea what's in
    the non-free-firmware blobs, nor is there any reliable way for us to
    ever gain that, and it seems likely that LLM models will end up inside >non-free firmware blobs if they haven't already.

    … oh! That’s a way to look at this I haven’t considered yet.

    How about removing that paragraph and replace 'non-free' with >'non-free(-firmware)' in the rest of your proposal?

    Not sure this is the right fix. What I wanted to say is: models are,
    while data, run on the main system (CPU) or as main workload (this part
    is important) on adjacent chips (GPU, specific ASIC/FPGA); models are
    not a firmware in the sense of enabling the use of such hardware and
    therefore not for non-free-firmware. So, in that direction: models we’d
    have packaged as models, used by ordinary packages on the system, driven
    by the user (or an automatism on their behalf).

    The other direction, if there’s non-free-firmware that *also* includes models: yes, we cannot control that, and I’d be fine with that as long
    as these models are ⓐ a part of the firmware in question (and not pak‐ kaged separately) as well as ⓑ not available/used by packages in main
    (or contrib) for purposes beyond how the firmware itself, running on its separate chip, uses them.

    Is this clear? Is this a good idea? Can someone suggest wording?

    I’m unfamiliar with the formal procedures needed in the GR process,
    do we need to do anything other than to post and have seconded the
    updated wording, and in which timeframe?

    Thanks,
    //mirabilos
    - --
    15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-) -----BEGIN PGP SIGNATURE-----

    iQIcBAEBCQAGBQJoG+NfAAoJEHa1NLLpkAfgLMAQAIpc/1mPHov4VNpCHJfja/yo 8Nk4mjCjiphihLUp6cgx/+RiNrGfQpbUEzZZbGkt2A/aH7hv6mWNO7J+vT6G4nZh Kp+BPBGln2tWXLJ/CAifGBpvDxmOQiz9DRjRUvh653FL/20Oc9nQuS/WpGLJuEBR iwjlnYH/1AWBAio4hhT99RzaEU7LbF6+xx11L0I6mMnSYcy+G9m4cHA4Jjwbc0yO q+ZZh3/CUPj7uPKybibwitUS1sElBNBuKF0S5UXECwjV3k08FMam93ZRiUJI9yfj bHtUex2QUpnfNPu1z3iwvxeTQPNzO72cIIstBKWfaVtAuCBckibyQ43mDbV0wUKX hD7S1BohQ+1TuQrsHarwxq0PsnctjNYCiAY8U9dJZegsTnilkzAJ50CwIDUfhSny GII4sMt+DFViw+7j8nFVJLSX06in5DukngShyY+ExxcfAmny0oOqiTPaC+1UV41f 03BvX7c5yH40Zg7CERA52FDRL8w4oxU46orlciYx26nj4/tKvCzTr2q8cULamkFM DPaPanMwxHzJbNKeR77w96ntmSBQdXm/4wRQsT+qqeWK75vfwZwX849zLuGSr+jg KVb/bbmJrCZw1h68iX/OHd3sDZs/sZIFdilB3CiTT1mHAfmKbuwtAAbTy2N1oNdk Xdtzmwz90gH3T2WfO8/m
    =97mT
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sam Hartman@21:1/5 to All on Thu May 8 03:50:01 2025
    "Simon" == Simon Josefsson <[email protected]> writes:

    firmware blob
    Simon> for a future SoC CPU that includes camera functionality, it
    Simon> seems possible that would make use of some LLM model to have
    Simon> better face recognition for example.

    Do you perhaps mean model or machine learning model rather than LLM?
    I think that LLMs are large enough (even the "small ones") that we'd be
    aware if they were in non-free-firmware today, and at least the tasks
    you are talking about sound like they would be better approached by
    machine learning rather than something specifically directed at natural language.

    --Sam

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Josefsson@21:1/5 to Thorsten Glaser on Thu May 8 03:30:01 2025
    Thorsten Glaser <[email protected]> writes:

    How about removing that paragraph and replace 'non-free' with >>'non-free(-firmware)' in the rest of your proposal?

    Not sure this is the right fix. What I wanted to say is: models are,
    while data, run on the main system (CPU) or as main workload (this part
    is important) on adjacent chips (GPU, specific ASIC/FPGA); models are
    not a firmware in the sense of enabling the use of such hardware and therefore not for non-free-firmware. So, in that direction: models we’d have packaged as models, used by ordinary packages on the system, driven
    by the user (or an automatism on their behalf).

    I don't understand your point here, can you explain it differently?

    I don't think it is possible to separate firmware into things that are
    just for enabling of hardware compared to what is running on the main
    CPU. Consider a non-free firmware blob for a future SoC CPU that
    includes camera functionality, it seems possible that would make use of
    some LLM model to have better face recognition for example. Without disassembly (which may be illegal) we can't really know if this is part
    of the blob or not.

    The other direction, if there’s non-free-firmware that *also* includes models: yes, we cannot control that, and I’d be fine with that as long
    as these models are ⓐ a part of the firmware in question (and not pak‐ kaged separately) as well as ⓑ not available/used by packages in main
    (or contrib) for purposes beyond how the firmware itself, running on its separate chip, uses them.

    I'm not sure I understand fully here, but I wonder if what you are
    trying to express really works. Non-free firmware is loaded into the
    main CPU on many Intel and AMD systems, and I think it won't be long
    until CPUs won't even start (and thus eventually used by packages in
    main) if it doesn't have the non-free blob provided to it, and that it
    can include a small LLM model. Another use-case for small LLM models
    could be fingerprint readers.

    /Simon

    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iQNoBAEWCAMQFiEEo8ychwudMQq61M8vUXIrCP5HRaIFAmgcB1IUHHNpbW9uQGpv c2Vmc3Nvbi5vcmfCHCYAmDMEXJLOtBYJKwYBBAHaRw8BAQdACIcrZIvhrxDBkK9f V+QlTmXxo2naObDuGtw58YaxlOu0JVNpbW9uIEpvc2Vmc3NvbiA8c2ltb25Aam9z ZWZzc29uLm9yZz6IlgQTFggAPgIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgBYh BLHSvRN1vst4TPT4xNc89jjFPAa+BQJn0XQkBQkNZGbwAAoJENc89jjFPAa+BtIA /iR73CfBurG9y8pASh3cbGOMHpDZfMAtosu6jbpO69GHAP4p7l57d+iVty2VQMsx +3TCSAvZkpr4P/FuTzZ8JZe8BrgzBFySz4EWCSsGAQQB2kcPAQEHQOxTCIOaeXAx I2hIX4HK9bQTpNVei708oNr1Klm8qCGKiPUEGBYIACYCGwIWIQSx0r0Tdb7LeEz0 +MTXPPY4xTwGvgUCZ9F0SgUJDWRmSQCBdiAEGRYIAB0WIQSjzJyHC50xCrrUzy9R cisI/kdFogUCXJLPgQAKCRBRcisI/kdFoqdMAQCgH45aseZgIrwKOvUOA9QfsmeE 8GZHYNuFHmM9FEQS6AD6A4x5aYvoY6lo98pgtw2HPDhmcCXFItjXCrV4A0GmJA4J ENc89jjFPAa+wUUBAO64fbZek6FPlRK0DrlWsrjCXuLi6PUxyzCAY6lG2nhUAQC6 qobB9mkZlZ0qihy1x4JRtflqFcqqT9n7iUZkCDIiDbg4BFySz2oSCisGAQQBl1UB BQEBB0AxlRumDW6nZY7A+VCfek9VpEx6PJmdJyYPt3lNHMd6HAMBCAeIfgQYFggA JgIbDBYhBLHSvRN1vst4TPT4xNc89jjFPAa+BQJn0XTSBQkNZGboAAoJENc89jjF PAa+0M0BAPPRq73kLnHYNDMniVBOzUdi2XeF32idjEWWfjvyIJUOAP4wZ+ALxIeh is3Uw2BzGZE6ttXQ2Q+DeCJO3TPpIqaXDAAKCRBRcisI/kdFopR9AP463f905Cdg qabLk60sYwS/DU21+7Dy8Wh6SCPQsP6ztQEA6sEGHRqISES0ep8Pr9YHMxqZBjRP ZoHhkAw8O0ZvTAs=eQTq
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thorsten Glaser@21:1/5 to Simon Josefsson on Thu May 8 04:50:01 2025
    On Thu, 8 May 2025, Simon Josefsson wrote:

    I don't think it is possible to separate firmware into things that are
    just for enabling of hardware compared to what is running on the main
    CPU. Consider a non-free firmware blob for a future SoC CPU that
    includes camera functionality, it seems possible that would make use of
    some LLM model to have better face recognition for example. Without >disassembly (which may be illegal) we can't really know if this is part
    of the blob or not.

    I wanted to express that: if the LLM is part of the firmware uploaded
    to the SoC for camera functionality, then it is packaged as firmware
    as a whole and only accessed through normal camera functions when the
    user accesses the camers. It’s not packaged separately (just the model)
    or available to other software on the system (removed from the camera functionality).

    But I agree that to make this distinction probably isn’t worth extra
    headache (also my headache’s growing again…) so if nobody has got an
    idea of how to express this well, I’d say let’s just remove all
    mentions of non-free-firmware from the proposal, it can go with its
    usual rules.

    (In fact, my proposal comes from our normal rules, with a specific understanding of these things in mind, anyway.)

    bye,
    //mirabilos
    --
    [16:04:33] bkix: "veni vidi violini"
    [16:04:45] bkix: "ich kam, sah und vergeigte"...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Josefsson@21:1/5 to Thorsten Glaser on Thu May 8 10:40:01 2025
    Thorsten Glaser <[email protected]> writes:

    On Thu, 8 May 2025, Simon Josefsson wrote:

    I don't think it is possible to separate firmware into things that are
    just for enabling of hardware compared to what is running on the main
    CPU. Consider a non-free firmware blob for a future SoC CPU that
    includes camera functionality, it seems possible that would make use of >>some LLM model to have better face recognition for example. Without >>disassembly (which may be illegal) we can't really know if this is part
    of the blob or not.

    I wanted to express that: if the LLM is part of the firmware uploaded
    to the SoC for camera functionality, then it is packaged as firmware
    as a whole and only accessed through normal camera functions when the
    user accesses the camers. It’s not packaged separately (just the model)
    or available to other software on the system (removed from the camera functionality).

    Okay, I see what you are trying to get at, but I'm not certain things
    can be separated that easily. Can the camera software in the above
    scenario be in main, or is it tainted indirectly by the non-free
    firmware? Does it make a difference if the camera software would only
    support one proprietary camera that happens to require the
    non-free-firmware blob? I think this situation is comparable to what we already have today: non-free-firmware blobs enable certain CPU behaviour
    that packages in 'main' depends on for functionality.

    But I agree that to make this distinction probably isn’t worth extra headache (also my headache’s growing again…) so if nobody has got an
    idea of how to express this well, I’d say let’s just remove all
    mentions of non-free-firmware from the proposal, it can go with its
    usual rules.

    Yes, I can't seem to find any policy matters that affect only the
    intersection of non-free-firmware and AI models. It may indeed be
    simpler to treat non-free-firmware as part of the !main context.

    /Simon

    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iQNoBAEWCAMQFiEEo8ychwudMQq61M8vUXIrCP5HRaIFAmgcbLUUHHNpbW9uQGpv c2Vmc3Nvbi5vcmfCHCYAmDMEXJLOtBYJKwYBBAHaRw8BAQdACIcrZIvhrxDBkK9f V+QlTmXxo2naObDuGtw58YaxlOu0JVNpbW9uIEpvc2Vmc3NvbiA8c2ltb25Aam9z ZWZzc29uLm9yZz6IlgQTFggAPgIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgBYh BLHSvRN1vst4TPT4xNc89jjFPAa+BQJn0XQkBQkNZGbwAAoJENc89jjFPAa+BtIA /iR73CfBurG9y8pASh3cbGOMHpDZfMAtosu6jbpO69GHAP4p7l57d+iVty2VQMsx +3TCSAvZkpr4P/FuTzZ8JZe8BrgzBFySz4EWCSsGAQQB2kcPAQEHQOxTCIOaeXAx I2hIX4HK9bQTpNVei708oNr1Klm8qCGKiPUEGBYIACYCGwIWIQSx0r0Tdb7LeEz0 +MTXPPY4xTwGvgUCZ9F0SgUJDWRmSQCBdiAEGRYIAB0WIQSjzJyHC50xCrrUzy9R cisI/kdFogUCXJLPgQAKCRBRcisI/kdFoqdMAQCgH45aseZgIrwKOvUOA9QfsmeE 8GZHYNuFHmM9FEQS6AD6A4x5aYvoY6lo98pgtw2HPDhmcCXFItjXCrV4A0GmJA4J ENc89jjFPAa+wUUBAO64fbZek6FPlRK0DrlWsrjCXuLi6PUxyzCAY6lG2nhUAQC6 qobB9mkZlZ0qihy1x4JRtflqFcqqT9n7iUZkCDIiDbg4BFySz2oSCisGAQQBl1UB BQEBB0AxlRumDW6nZY7A+VCfek9VpEx6PJmdJyYPt3lNHMd6HAMBCAeIfgQYFggA JgIbDBYhBLHSvRN1vst4TPT4xNc89jjFPAa+BQJn0XTSBQkNZGboAAoJENc89jjF PAa+0M0BAPPRq73kLnHYNDMniVBOzUdi2XeF32idjEWWfjvyIJUOAP4wZ+ALxIeh is3Uw2BzGZE6ttXQ2Q+DeCJO3TPpIqaXDAAKCRBRcisI/kdFokgxAQCvS6yCcb3N dd/ddjkMut/77j4tJ7dOSVzgRBUUBr8zigEAu8nJcG9KXDPtp0IMhWGj052u071E YTpjDUUM+jteGAc=fz1g
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Josefsson@21:1/5 to Sam Hartman on Thu May 8 10:30:01 2025
    Sam Hartman <[email protected]> writes:

    "Simon" == Simon Josefsson <[email protected]> writes:

    firmware blob
    Simon> for a future SoC CPU that includes camera functionality, it
    Simon> seems possible that would make use of some LLM model to have
    Simon> better face recognition for example.

    Do you perhaps mean model or machine learning model rather than LLM?
    I think that LLMs are large enough (even the "small ones") that we'd be
    aware if they were in non-free-firmware today, and at least the tasks
    you are talking about sound like they would be better approached by
    machine learning rather than something specifically directed at natural language.

    Yes, sorry, my terminology is sloppy and I tend to conceptually merge
    all these. From my point I view I don't see them as necessarily any
    different from a include-in-Debian-or-not point of view. Is there a significant difference between any of these terms for this discussion?

    Size of the model is the only one I can guess, but I also believe that
    there are LLM's smaller than some of the bigger machine learning models,
    so I don't think it is that relevant. Having a small LLM model in a
    non-free firmware blob to bootstrap a text-to-speach or speach-to-text
    input method in a laptop doesn't seem far fetched to me.

    /Simon

    -----BEGIN PGP SIGNATURE-----

    iQNoBAEWCAMQFiEEo8ychwudMQq61M8vUXIrCP5HRaIFAmgcat8UHHNpbW9uQGpv c2Vmc3Nvbi5vcmfCHCYAmDMEXJLOtBYJKwYBBAHaRw8BAQdACIcrZIvhrxDBkK9f V+QlTmXxo2naObDuGtw58YaxlOu0JVNpbW9uIEpvc2Vmc3NvbiA8c2ltb25Aam9z ZWZzc29uLm9yZz6IlgQTFggAPgIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgBYh BLHSvRN1vst4TPT4xNc89jjFPAa+BQJn0XQkBQkNZGbwAAoJENc89jjFPAa+BtIA /iR73CfBurG9y8pASh3cbGOMHpDZfMAtosu6jbpO69GHAP4p7l57d+iVty2VQMsx +3TCSAvZkpr4P/FuTzZ8JZe8BrgzBFySz4EWCSsGAQQB2kcPAQEHQOxTCIOaeXAx I2hIX4HK9bQTpNVei708oNr1Klm8qCGKiPUEGBYIACYCGwIWIQSx0r0Tdb7LeEz0 +MTXPPY4xTwGvgUCZ9F0SgUJDWRmSQCBdiAEGRYIAB0WIQSjzJyHC50xCrrUzy9R cisI/kdFogUCXJLPgQAKCRBRcisI/kdFoqdMAQCgH45aseZgIrwKOvUOA9QfsmeE 8GZHYNuFHmM9FEQS6AD6A4x5aYvoY6lo98pgtw2HPDhmcCXFItjXCrV4A0GmJA4J ENc89jjFPAa+wUUBAO64fbZek6FPlRK0DrlWsrjCXuLi6PUxyzCAY6lG2nhUAQC6 qobB9mkZlZ0qihy1x4JRtflqFcqqT9n7iUZkCDIiDbg4BFySz2oSCisGAQQBl1UB BQEBB0AxlRumDW6nZY7A+VCfek9VpEx6PJmdJyYPt3lNHMd6HAMBCAeIfgQYFggA JgIbDBYhBLHSvRN1vst4TPT4xNc89jjFPAa+BQJn0XTSBQkNZGboAAoJENc89jjF PAa+0M0BAPPRq73kLnHYNDMniVBOzUdi2XeF32idjEWWfjvyIJUOAP4wZ+ALxIeh is3Uw2BzGZE6ttXQ2Q+DeCJO3TPpIqaXDAAKCRBRcisI/kdFoiAxAQD11ftv07OF sqc94JvgFT4eDLOE2MZvqklD+eqfdNlyFwEA5AiXX6l+eo6hHlWFjUFwQ4P9m1DG zdFSwcKkc4sapQ0=
    =zTTY
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thorsten Glaser@21:1/5 to All on Thu May 8 23:20:01 2025
    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA384

    Hi Simon,

    Okay, I see what you are trying to get at, but I'm not certain things
    can be separated that easily.

    hm, true. We probably can also not know whether firmware blobs have
    this inside or not. Perhaps it’s best to just leave non-free-firmware
    out of this, for now, at least in explicit mentions.

    So, with all the updates, maybe something like this?

    Counter-Proposal -- Interpretation of DFSG on (AI) Models (v2) =========================================================

    Please see the original proposal for background on this.

    The counter-proposal is as follows:

    The Debian project requires the same level of freedom for AI models
    than it does for other works entering the archive.

    Notably:

    1. A model must be trained only from legally obtained and used works,
    honour all licences of the works used in training, and be licenced
    under a suitable licence itself that allows distribution, or it is
    not even acceptable for non-free. This includes an understanding
    that “generative AI” output are derivative works of their inputs
    (including training data and the prompt), insofar as these pass
    threshold of originality, that is, generative AI acts similar to
    a lossy compression followed by decompression, or to a compiler.

    Any work resulting from generative use of a model can at most be
    as free as the model itself; e.g. programming with a model from
    contrib/non-free assisting prevents the result from entering main.

    The "/usr/share/doc/PACKAGE/copyright" file must include copyright
    notices from all training inputs as required by Policy for “any
    files which are compiled into the object code shipped in the binary
    package”, except for inputs already separately packaged (such as
    the training software, libraries, or inputs already available from
    packages such as word lists also used for spellchecking).

    Regarding availability of sources used for training, the normal
    rules of the non-free archive apply.

    2. For a model to enter the contrib archive, it may at runtime require
    components from outside of Debian main (such as drivers for specific
    hardware it is designed to run on), but the model itself (including
    any training input that ends up in the model) must still comply with
    the DFSG, i.e. follow below requirements for models entering main.
    If a model requires a component outside of main at build or training
    time that changes the model itself (e.g. training data, or training
    software part of which ends up in the trained model), it is only
    admissible to non-free.

    3. For a model to enter the main archive, all works used in training
    must additionally be available, auditable, and under DFSG-compliant
    licencing. All software used to do the training must be available
    in Debian main.

    If the training happens during package build, the sources must be
    present in Debian packages or in the model’s source packages; if
    not, they must still be available in the same way.

    This is the same rule as is used for other precompiled works in
    Debian packages that are not regenerated during build: they must
    be able to be regenerated using only Debian tools, waiving the
    requirement to actually do the regenerating during package build
    is a nod to realistic build time and resource usage.

    4. For a model to enter the main archive, the model training itself
    must *either* happen during package build (which, for models of
    a certain size, may need special infrastructure; the handling of
    this is outside of the scope of this resolution), *or* the model
    resulting from training must build in a sufficiently reproducible
    way that a separate rebuilding effort from the same source will
    result in the same trained model. (This includes using reproducible
    seeds for PRNGs used, etc.)

    For realistic achievability of this goal, the reproducibility
    requirement is relaxed to not require bitwise equality, as long
    as the resulting model is effectively identical. (As a comparison,
    for C programs this would be equivalent to allowing different
    linking order of the object files in the binary or embedded
    timestamps to differ, or a different encoding of the same opcodes
    (like 31 C0 vs. 33 C0 for i386 “xor eax,eax”), but no functional
    changes as determined by experts in the field.)

    5. For handling of any large packages resulting in this, the normal
    processes are followed (such as discussing in advance with the
    relevant teams, ensuring mirrors are not over-burdened, etc).

    The Debian project asks that training sources are not obtained
    unethically, and that the ecological impact of training and using
    AI models be considered.

    Transitional provisions:

    ⅰ. Any bugs resulting from this GR shall not be release-critical
    before Debian trixie has been released as stable.

    ⅱ. Any existing package with a “model” inside that already had the
    very same model before 2020-01-01 has an extra four years time
    before bugs regarding these models may become release-critical.

    [End of proposal.]

    Thanks,
    //mirabilos
    - --
    21:41⎜«Tonnerre:#nosec» Do at least one thing every day which makes
    ⎜ inspirational quotes lovers sad
    -----BEGIN PGP SIGNATURE-----

    iQIcBAEBCQAGBQJoHR8cAAoJEHa1NLLpkAfg12UP/0+8R57hvjUPkQVJ6hRxAkH3 EgMAYKoxsQQ11d8cz/MaCFxeuM8sitGb68oP3JTzYKkcoqtXi0pQMMNQ/xX1YeMX YTBKkq2jWIYD3z1hLTqyQcW/2G8a9yygEXsqt7Jm53b3vdkCo6UoyiaizvJ71pok 9cB7U27UhaLq2Ay32EB5bfFRQ8qOHapnRMWpHf0gDYSB2rG4PjjOkH8xijkvnbOB mdUAMtQvAGa0goYmdFiamYNxS/7J6NQJCz0myg9eYuh8egajNIrO8DoSOeoDiC4y Oc0i+vQ3du7ynExQ6EHOwVKl8cTHJcxtfwbC100ktZGUuMXTvjmDQg4exWxbfYpD 3IQS3ueAgrbc6K1+UA7xlJD3zNIAg/hsjpVeW2cbz4VJ8cioPOb+F6OrXk+GP8Fk KLM8Nhc2IvCAOmOwl6ZbkrrrSZrpk6STHlpcyz3NsdowrfwUxm6ZXqI7ELUiwIaJ cm+5Z0wrG+tJZr+Ia3BAKjPlxIbp3wNDts555NdeY+BxVKCXPyYZmnBGSpAJug6Q Thu3upZjGL+hCx8U4UyQC8ypM8pEUKSwNDxI8SE5sopf7fgrbEa3N3h7OZwL4Qhz OmT51eJ+rLifSBztFcHVNe0ztQsoagVZzb/bE7WV5nJwvR4GFSbO17LzCi5ouBYx 1rRGgalT5MTCFDJnewPJ
    =xeEf
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Josefsson@21:1/5 to Thorsten Glaser on Fri May 9 10:00:01 2025
    Thorsten Glaser <[email protected]> writes:

    So, with all the updates, maybe something like this?

    I read this now, and think it is an improvement so I'll second this
    version too.

    I realized that I have one additional generic concern: You claim that
    models are a derivate work of their training input.

    I don't think this is universally agreed on, or tested in court, and
    there are people who heavily push another agenda. It is somewhat of a provocative statement.

    However, I don't think you actually need to make an argument that his is
    true for your proposal. You don't need to take a stanze on this
    provocative question. People who disagree with this aspect could still
    find themselves in agreement with your proposal if it was tweaked a bit.

    It is sufficient to claim that

    A) models MAY be considered derivate works of their training inputs.
    We can realize that Debian is not the best organization to decide
    if this is true or not, and likely this will take many years until
    there is any general concsensus in the society about this aspect.
    However what we can claim is that it seems realistic that this MAY
    be the general opinion.

    and

    B) a conservative approach is thus to respect the licensing of all
    training inputs, until society have any clear take on A). This
    allows Debian to continue to work and take what appears to be less
    legal risk, and to more be aligned with the history of supporting
    libre content.

    Below is a small diff to achieve this:

    OLD:
    1. A model must be trained only from legally obtained and used works,
    honour all licences of the works used in training, and be licenced
    under a suitable licence itself that allows distribution, or it is
    not even acceptable for non-free. This includes an understanding
    that “generative AI” output are derivative works of their inputs
    (including training data and the prompt), insofar as these pass
    threshold of originality, that is, generative AI acts similar to
    a lossy compression followed by decompression, or to a compiler.

    NEW:
    1. A model must be trained only from legally obtained and used works,
    honour all licences of the works used in training, and be licenced
    under a suitable licence itself that allows distribution, or it is
    not even acceptable for non-free.

    This assumes an understanding that “generative AI” output may be
    considered derivative works of their inputs (including training
    data and the prompt), insofar as these pass threshold of
    originality. That is, generative AI acts similar to a lossy
    compression followed by decompression, or to a compiler.

    OLD:
    Any work resulting from generative use of a model can at most be
    as free as the model itself; e.g. programming with a model from
    contrib/non-free assisting prevents the result from entering main.

    NEW:
    Assuming a model output is a derivate work of their training input,
    and works derived from that model is also a derivate work, any work
    resulting from model can at most be as free as the model itself;
    e.g. programming with a model from contrib/non-free assisting
    prevents the result from entering main.

    ADD:
    We resolve that Debian wants to make conservative licensing choices
    and not put ourselves into unnecessary legal risk, therefor we
    propose to behave and act as if that were the case and works
    derived from training inputs have to consider the license on their
    inputs. This aligns with our preference for free software and
    DFSG-compatible licensing.

    I'm short on time so this maybe wasn't the best choice of words, so feel
    free to rewrite it if you agree with my principle.

    A small comment:

    ⅱ. Any existing package with a “model” inside that already had the
    very same model before 2020-01-01 has an extra four years time
    before bugs regarding these models may become release-critical.

    Why 2020-01-01? Couldn't we be generous here and say that if someone
    was in the initial Bookworm release then it is eligible for this
    exception?

    /Simon

    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iQNoBAEWCAMQFiEEo8ychwudMQq61M8vUXIrCP5HRaIFAmgdtJwUHHNpbW9uQGpv c2Vmc3Nvbi5vcmfCHCYAmDMEXJLOtBYJKwYBBAHaRw8BAQdACIcrZIvhrxDBkK9f V+QlTmXxo2naObDuGtw58YaxlOu0JVNpbW9uIEpvc2Vmc3NvbiA8c2ltb25Aam9z ZWZzc29uLm9yZz6IlgQTFggAPgIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgBYh BLHSvRN1vst4TPT4xNc89jjFPAa+BQJn0XQkBQkNZGbwAAoJENc89jjFPAa+BtIA /iR73CfBurG9y8pASh3cbGOMHpDZfMAtosu6jbpO69GHAP4p7l57d+iVty2VQMsx +3TCSAvZkpr4P/FuTzZ8JZe8BrgzBFySz4EWCSsGAQQB2kcPAQEHQOxTCIOaeXAx I2hIX4HK9bQTpNVei708oNr1Klm8qCGKiPUEGBYIACYCGwIWIQSx0r0Tdb7LeEz0 +MTXPPY4xTwGvgUCZ9F0SgUJDWRmSQCBdiAEGRYIAB0WIQSjzJyHC50xCrrUzy9R cisI/kdFogUCXJLPgQAKCRBRcisI/kdFoqdMAQCgH45aseZgIrwKOvUOA9QfsmeE 8GZHYNuFHmM9FEQS6AD6A4x5aYvoY6lo98pgtw2HPDhmcCXFItjXCrV4A0GmJA4J ENc89jjFPAa+wUUBAO64fbZek6FPlRK0DrlWsrjCXuLi6PUxyzCAY6lG2nhUAQC6 qobB9mkZlZ0qihy1x4JRtflqFcqqT9n7iUZkCDIiDbg4BFySz2oSCisGAQQBl1UB BQEBB0AxlRumDW6nZY7A+VCfek9VpEx6PJmdJyYPt3lNHMd6HAMBCAeIfgQYFggA JgIbDBYhBLHSvRN1vst4TPT4xNc89jjFPAa+BQJn0XTSBQkNZGboAAoJENc89jjF PAa+0M0BAPPRq73kLnHYNDMniVBOzUdi2XeF32idjEWWfjvyIJUOAP4wZ+ALxIeh is3Uw2BzGZE6ttXQ2Q+DeCJO3TPpIqaXDAAKCRBRcisI/kdFojNOAP9r4m6vf140 h6Kf2JQzH4rt9B9YGPAbXeqd/gE+XqjY7gEA7afmZUlOOsKoYwPvsgy67osGuotP E/fWpAVEucX2AQo=OXU/
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thorsten Glaser@21:1/5 to Simon Josefsson on Sat May 10 01:20:01 2025
    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA384

    On Fri, 9 May 2025, Simon Josefsson wrote:

    So, with all the updates, maybe something like this?

    I read this now, and think it is an improvement so I'll second this
    version too.

    OK, thanks.

    I realized that I have one additional generic concern: You claim that
    models are a derivate work of their training input.

    Yes. This is easily shown, for example by looking at how they work, https://explainextended.com/2023/12/31/happy-new-year-15/ explained
    this well, and in papers like “Extracting Training Data from ChatGPT”.
    It is a sort of lossy compression that has shown to be sufficiently
    un-lossy enough (urgs, forgive my lack of English) that recognisable “training data” can be recalled, and the operators’ “fix” was to add filters to the prompts, not to make it impossible, because they cannot.

    A small comment:

    ⅱ. Any existing package with a “model” inside that already had the
    very same model before 2020-01-01 has an extra four years time
    before bugs regarding these models may become release-critical.

    Why 2020-01-01? Couldn't we be generous here and say that if someone
    was in the initial Bookworm release then it is eligible for this
    exception?

    At least DALL-E predates bookworm, from a quick search. I’m not
    entirely sure when this craze began. If you’d rather have it
    expressed in terms of releases I’d choose bullseye here instead,
    to be sure.

    But, okay, let’s be generous here to cut discussion a bit.


    Counter-Proposal -- Interpretation of DFSG on (AI) Models (v3) =========================================================

    Please see the original proposal for background on this.

    The counter-proposal is as follows:

    The Debian project requires the same level of freedom for AI models
    than it does for other works entering the archive.

    Notably:

    1. A model must be trained only from legally obtained and used works,
    honour all licences of the works used in training, and be licenced
    under a suitable licence itself that allows distribution, or it is
    not even acceptable for non-free. This includes an understanding
    that “generative AI” output are derivative works of their inputs
    (including training data and the prompt), insofar as these pass
    threshold of originality, that is, generative AI acts similar to
    a lossy compression followed by decompression, or to a compiler.

    Any work resulting from generative use of a model can at most be
    as free as the model itself; e.g. programming with a model from
    contrib/non-free assisting prevents the result from entering main.

    The "/usr/share/doc/PACKAGE/copyright" file must include copyright
    notices from all training inputs as required by Policy for “any
    files which are compiled into the object code shipped in the binary
    package”, except for inputs already separately packaged (such as
    the training software, libraries, or inputs already available from
    packages such as word lists also used for spellchecking).

    Regarding availability of sources used for training, the normal
    rules of the non-free archive apply.

    2. For a model to enter the contrib archive, it may at runtime require
    components from outside of Debian main (such as drivers for specific
    hardware it is designed to run on), but the model itself (including
    any training input that ends up in the model) must still comply with
    the DFSG, i.e. follow below requirements for models entering main.
    If a model requires a component outside of main at build or training
    time that changes the model itself (e.g. training data, or training
    software part of which ends up in the trained model), it is only
    admissible to non-free.

    3. For a model to enter the main archive, all works used in training
    must additionally be available, auditable, and under DFSG-compliant
    licencing. All software used to do the training must be available
    in Debian main.

    If the training happens during package build, the sources must be
    present in Debian packages or in the model’s source packages; if
    not, they must still be available in the same way.

    This is the same rule as is used for other precompiled works in
    Debian packages that are not regenerated during build: they must
    be able to be regenerated using only Debian tools, waiving the
    requirement to actually do the regenerating during package build
    is a nod to realistic build time and resource usage.

    4. For a model to enter the main archive, the model training itself
    must *either* happen during package build (which, for models of
    a certain size, may need special infrastructure; the handling of
    this is outside of the scope of this resolution), *or* the model
    resulting from training must build in a sufficiently reproducible
    way that a separate rebuilding effort from the same source will
    result in the same trained model. (This includes using reproducible
    seeds for PRNGs used, etc.)

    For realistic achievability of this goal, the reproducibility
    requirement is relaxed to not require bitwise equality, as long
    as the resulting model is effectively identical. (As a comparison,
    for C programs this would be equivalent to allowing different
    linking order of the object files in the binary or embedded
    timestamps to differ, or a different encoding of the same opcodes
    (like 31 C0 vs. 33 C0 for i386 “xor eax,eax”), but no functional
    changes as determined by experts in the field.)

    5. For handling of any large packages resulting in this, the normal
    processes are followed (such as discussing in advance with the
    relevant teams, ensuring mirrors are not over-burdened, etc).

    The Debian project asks that training sources are not obtained
    unethically, and that the ecological impact of training and using
    AI models be considered.

    Transitional provisions:

    ⅰ. Any bugs resulting from this GR shall not be release-critical
    before Debian trixie has been released as stable.

    ⅱ. Any existing package with a “model” inside that already had the very
    same model in the initial bookworm release has an extra four years
    time before bugs regarding these models may become release-critical.

    [End of proposal.]


    Thanks for the discussion,
    //mirabilos
    - --
    <cnuke> den AGP stecker anfeilen, damit er in den slot aufm 440BX board passt…
    oder netzteile, an die man auch den monitor angeschlossen hat und die dann für ein elektrisch aufgeladenes gehäuse gesorgt haben […] für lacher gut auf jeder
    LAN party │ <nvb> damals, als der pizzateig noch auf dem monior "gegangen" ist
    -----BEGIN PGP SIGNATURE-----

    iQIcBAEBCQAGBQJoHozPAAoJEHa1NLLpkAfgr+kQAMHhGac5ieY+8h0yYGUW5dpR 0B6e5d0JsQyaE9wmqBVej+dnGkts7Jtz5T42e2t0AEiXpgNYfLvWUFX6nAjwpDJW reuvZRzynd2IYVxnadP0J/gX35R8ldqD8VXZFIs0McNsl5pmqxJRioYkB3lRXDjh McDZwc4LqR3ey6cW6ay7a7NG+ak8N5QAGmSF3y4fYDLVDKZxW73gJqrq81HOBJpp I76CL+JEpipEQ/AZHB/gdD/ldnc2EdtiHIOn7IpuFLKcgN6LJW9mpIDJ/IcWX2jE ZZ4lLcGdzhdZb4MovEisSmomkO6VMb+Qs22R34KkMlaOTA9ne5cmUPG3eJDHR7oP bTTb06AV7C0Mqtn7X0Am/x8R2suFqOu437RXmI+VA4NqMVQmWhOMvY4cQoTMr3h1 VWY/JtxwzcIhZE0WyC9Y6htX4AyGX23aNgCVVuo93w/Kq80e/57fjtuVAlUIesCN EwyQjou4RYUS/R6mVVeU8FopRR/BlYnu8kb6nBzm6o8oQBOQVZ0eLNIe7Yq2IEv7 Mm1fFpRH8oQfYvbVjx6DCllIshMegCorYd6dBClYIeN+ItbxTWSWln7STHuqfSHi mhVL05whqvBHdNiAiHtz5Mlc72gp65R6Fi0aH9jxErQ8iNIO9e33mCABnxPvONC7 kDHW9ujNAmPqLlJJIrv1
    =RYP4
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)