• Re: [RFCv3] Counter-Proposal -- Interpretation of DFSG on Artificial In

    From Thorsten Glaser@21:1/5 to Aigars Mahinovs on Sun May 11 01:00:01 2025
    On Sat, 10 May 2025, Aigars Mahinovs wrote:

    An algorithm that only stores and produces an *average* value across a
    wide set of inputs can not be any kind of compression.

    It’s not “just” an average: as has been shown, substantial amounts of substantially unmodified “training data” can be extracted.

    It is data mining.

    The copyright exception for text and data mining is only valid for
    uses that extract trends and things like that, not for generative
    use (and not for content with explicit opt-out, which those scrapers
    ignored).

    then go up. If I run "wc" on a copyrighted work, the number of words
    in the document is *not* a derived work from the original document.

    If you JPEG-compress a photo of the original document then uncompress
    it, it *is*.

    And, again, this has been shown to be substantially significantly for
    these models to be possible, therefore we need to act as if the output
    of such generation is derived from its inputs in the general case.
    There will always be outputs which aren’t, and inputs which don’t
    influence a subset of particular outputs, but the sum of its outputs
    is mechanicall derived from (most of) the sum of its inputs.

    bye,
    //mirabilos
    --
    /⁀\ The UTF-8 Ribbon
    ╲ ╱ Campaign against
     ╳  HTML eMail! Also,
    ╱ ╲ header encryption!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thorsten Glaser@21:1/5 to Aigars Mahinovs on Sun May 11 22:20:01 2025
    On Sun, 11 May 2025, Aigars Mahinovs wrote:

    If you JPEG-compress a photo of the original document then uncompress
    it, it *is*.

    Please, restore a document from the output of "wc". Or from the output

    Can you even read?

    EOT,
    //mirabilos
    --
    22:20⎜<asarch> The crazy that persists in his craziness becomes a master 22:21⎜<asarch> And the distance between the craziness and geniality is
    only measured by the success 18:35⎜<asarch> "Psychotics are consistently inconsistent. The essence of sanity is to be inconsistently inconsistent

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thorsten Glaser@21:1/5 to Aigars Mahinovs on Mon May 12 00:10:01 2025
    On Sun, 11 May 2025, Aigars Mahinovs wrote:

    Just because *one* software process produces a loosely compression,
    does not mean that ALL software processes are just lossy compression.

    Just because *one* doesn’t doesn’t mean none does either *sigh…*

    Please, just, go away. I don’t have the spoons to “discuss” with
    a Google and AI fanboy.

    Thanks,
    //mirabilos
    --
    15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Matthias Urlichs@21:1/5 to All on Mon May 12 08:40:01 2025
    This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --------------10YdHMXSTg1ci6Ljqw0J96qv
    Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: base64

    T24gMTIuMDUuMjUgMDA6MDgsIFRob3JzdGVuIEdsYXNlciB3cm90ZToNCj4gUGxlYXNlLCBq dXN0LCBnbyBhd2F5LiBJIGRvbuKAmXQgaGF2ZSB0aGUgc3Bvb25zIHRvIOKAnGRpc2N1c3Pi gJ0gd2l0aA0KPiBhIEdvb2dsZSBhbmQgQUkgZmFuYm95Lg0KDQpTb3JyeSBUaG9yc3Rlbiwg YnV0IHJpZ2h0IG5vdyBpdCdzIHlvdSB3aG8gaXMgbm90IGFibGUgdG8gZGlzY3VzcyB0aGlu Z3MgDQp3aXRoIHBlb3BsZSB3aG8gaGFwcGVuIHRvIGhvbGQgYSBkaWZmZXJlbnQgb3Bpbmlv bi4NCg0KSSB2ZXJ5IG11Y2ggZG91YnQgdGhhdCB0aGUgYWJvdmUgc3RhdGVtZW50IGlzIGlu IGFjY29yZGFuY2Ugd2l0aCBvdXIgQ29DLg0KDQotLSANCi0tIHJlZ2FyZHMNCi0tIA0KLS0g TWF0dGhpYXMgVXJsaWNocw0KDQo=

    --------------10YdHMXSTg1ci6Ljqw0J96qv--

    -----BEGIN PGP SIGNATURE-----

    wsF5BAABCAAjFiEEr9eXgvO67AILKKGfcs+OXiW0wpMFAmghlGQFAwAAAAAACgkQcs+OXiW0wpPD IhAA3DwsJ+5FaTTWxNwkL8yKSwOqgDRqOfZ255gkofiY0ChdLJ6DjtQWHgL5j2hmznfoJjCBRdeR aLPFcwk+0RnDUymdXFIjUWapkYFsGmOMhZRn246DG3lT88YzVdy3KiUZjIaff7Do1pniAwc9RHpx cyEDE3/7XQypC/xUAwdAnd0/r9+x69Y5siihRfvEkE7AvayL5UcJZQAqtvgGjgrgjOKwlnUF2fm0 6JwHX4avL/zxF25eNl+76DjGMy/DuwqS+Px3l+C8eqvlAG+YWr6dKfAFOY14u7jCXR+Y0NgkJyjf qPAypoe4oTx1MZAaUzIa3p/JFlcBTVZk17nl4npUQjwm+Fuabbfs3h+uwq6c1f9or/yW4t9Ghbk4 k3sKXbhEUON6ZagmKXmvQdHDM6vPTOxSfw7yYhTY9DF96fZWLIU6QL2LqD4T1YOJFO06DXJv5jda Ch+RU1/FbsG5KjR/UdwkFTQsbPm5kUPXpnHLgLoB6woxn/BOchRDAhumeO9Q1gjKezU0uQfk4W/D l5BLKP2qBO+7wtNTUsiTh6EKd+hhp3KHuZABmCA8MdBn9Kh1BTIrwfwCSO935pGtXjm63heHzD5F meIS700K8qB7dRUk0OJo9RHAAkILERSvqZpKrvghRIVOHxVa63AEznrGwnXFckf/NfcDFwS/XF2l xeg=
    =3oNl
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Josefsson@21:1/5 to Thorsten Glaser on Mon May 12 09:00:01 2025
    Thorsten Glaser <[email protected]> writes:

    Counter-Proposal -- Interpretation of DFSG on (AI) Models (v3) =========================================================

    I don't know if further seconds are needed for each new version of a
    proposal, but for clarity I'll second this version too.

    I realized that I have one additional generic concern: You claim that >>models are a derivate work of their training input.

    Yes. This is easily shown, for example by looking at how they work, https://explainextended.com/2023/12/31/happy-new-year-15/ explained
    this well, and in papers like “Extracting Training Data from ChatGPT”.
    It is a sort of lossy compression that has shown to be sufficiently
    un-lossy enough (urgs, forgive my lack of English) that recognisable “training data” can be recalled, and the operators’ “fix” was to add
    filters to the prompts, not to make it impossible, because they cannot.

    I don't think this question is legally established or socially agreed
    on, and I think it will be an area of conflict for many years. I also
    don't mind this text in your proposal because I happen to agree with it.
    But I think it would be possible to disagree on that (and many people
    do) and still agree with the rest of your proposal about what Debian
    should do in this situation.

    /Simon

    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iQNoBAEWCAMQFiEEo8ychwudMQq61M8vUXIrCP5HRaIFAmghm6AUHHNpbW9uQGpv c2Vmc3Nvbi5vcmfCHCYAmDMEXJLOtBYJKwYBBAHaRw8BAQdACIcrZIvhrxDBkK9f V+QlTmXxo2naObDuGtw58YaxlOu0JVNpbW9uIEpvc2Vmc3NvbiA8c2ltb25Aam9z ZWZzc29uLm9yZz6IlgQTFggAPgIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgBYh BLHSvRN1vst4TPT4xNc89jjFPAa+BQJn0XQkBQkNZGbwAAoJENc89jjFPAa+BtIA /iR73CfBurG9y8pASh3cbGOMHpDZfMAtosu6jbpO69GHAP4p7l57d+iVty2VQMsx +3TCSAvZkpr4P/FuTzZ8JZe8BrgzBFySz4EWCSsGAQQB2kcPAQEHQOxTCIOaeXAx I2hIX4HK9bQTpNVei708oNr1Klm8qCGKiPUEGBYIACYCGwIWIQSx0r0Tdb7LeEz0 +MTXPPY4xTwGvgUCZ9F0SgUJDWRmSQCBdiAEGRYIAB0WIQSjzJyHC50xCrrUzy9R cisI/kdFogUCXJLPgQAKCRBRcisI/kdFoqdMAQCgH45aseZgIrwKOvUOA9QfsmeE 8GZHYNuFHmM9FEQS6AD6A4x5aYvoY6lo98pgtw2HPDhmcCXFItjXCrV4A0GmJA4J ENc89jjFPAa+wUUBAO64fbZek6FPlRK0DrlWsrjCXuLi6PUxyzCAY6lG2nhUAQC6 qobB9mkZlZ0qihy1x4JRtflqFcqqT9n7iUZkCDIiDbg4BFySz2oSCisGAQQBl1UB BQEBB0AxlRumDW6nZY7A+VCfek9VpEx6PJmdJyYPt3lNHMd6HAMBCAeIfgQYFggA JgIbDBYhBLHSvRN1vst4TPT4xNc89jjFPAa+BQJn0XTSBQkNZGboAAoJENc89jjF PAa+0M0BAPPRq73kLnHYNDMniVBOzUdi2XeF32idjEWWfjvyIJUOAP4wZ+ALxIeh is3Uw2BzGZE6ttXQ2Q+DeCJO3TPpIqaXDAAKCRBRcisI/kdFogDyAP9H4bj18yWV mvifuOQ75EgFYo9HTNay3GQCQ2GWw+k1GAD/b6ZyfeOxIe76/3s/ccGnY3dQ9xrh HkOTbANDnKShWQA=LF/b
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Soren Stoutner@21:1/5 to All on Mon May 12 10:04:29 2025
    On Saturday, May 10, 2025 11:03:31 AM Mountain Standard Time Aigars Mahinovs wrote:
    If your entire proposal is based on this assumption about how
    copyright and copyright law works, I would expect something more
    substantial, like court decisions supporting this radical new
    interpretation. And overturning things like Article 4 of EU Directive 2019/790 granting near complete copyright exception to text and data
    mining. This was explicitly referred to in EU AI Directive in context
    of the use of training data. And overturning of a *ton* of already
    decided "fair use" cases in USA.

    I know this isn’t a ton of already decided fair use cases, but I saw the following link this morning.

    https://www.yahoo.com/news/us-copyright-office-thoughts-ai-235936541.html

    Two quotes from the US Copyright Office.

    “Although it is not possible to prejudge the result in any particular case, precedent supports the following general observations,” the office said. “Various uses of copyrighted works in AI training are likely to be transformative. The extent to which they are fair, however, will depend on what works were used, from what source, for what purpose, and with what controls on the outputs — all of which can affect the market.”

    “When a model is deployed for purposes such as analysis or research — the types of uses that are critical to international competitiveness — the outputs
    are unlikely to substitute for expressive works used in training,” the office
    said. “But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries.”


    As has already been said a number of times in these threads, it isn’t a settled issue whether copyrighted words can be used to train an AI model without the permission of the copyright holder. The answer to that question probably depends on if such training ends up being considered fair use. The above quotes indicate that, according to the US Copyright Office, whether it is fair use could depend on how the AI is used *after it is trained*.

    --
    Soren Stoutner
    [email protected]
    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEJKVN2yNUZnlcqOI+wufLJ66wtgMFAmgiKh0ACgkQwufLJ66w tgM/ihAAvAqkU3tJ99mj8QdNfDtiggVTKeUnzomnwerbM7tIrWpQzf0BSQoh6On7 YtmNqA0qiM5UZ+YJOoLGewm7tFzMqk8wytPhWAZSLW8nmRIpDynBJgiWDPP5nxJh 19E3SOPypWSMKxKQHsCGZYIVjWEdQvFZB6lYFRGUL84JTejTjrXLdlzJxmWCjjp9 uoUQzVUZvthDj7x316eSBADik0GHpp/EbA7fsN2R3x50Xi5gSSGwhDG+zi93VNzb KD14aJNGxWOodORUskpX+hYI1/yn5jmjEphdn6fs+/WI0aGkCMWDCtxOL9iITqgk 0uOtLwgGTf3oZhS2f0QjXBPSttVpgNG+CiokMuJjElc/Q7x8M3P8Gf2EKdO25uDM UpqiYR0gLVyi1W7irptuYluO49KWJBq9m1HvI1J3hwEqdgkvRCF54//sXk6eafKu 69UqgE4JHIYdLxc7cfuI7sgkahtz0SYrLO+BfxcDYciyVrqQHVhTB1pt+ncM6sWF xmVoN5I2PZ8wjcR25Rq2jTRIOjF2ke9wE21GWSb9G6H/XS7AGiQDQXHNHS/50/ET a93Tpzil05aXcm6ShL4sN4B/yDYtV361G1LCyUjybDilReZOvKgA2XiiXuUN+uRv IuNi01TQ8lSOG46YSCyVlQg07WmfCW7nM/W+QyuGf5RNpTpBplM=
    =DBsp
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)