On 2022-10-25, Mikko <
[email protected]> wrote:
On 2022-10-23 14:01:00 +0000, Doc O'Leary , said:
For your reference, records indicate that
Mikko <[email protected]> wrote:
Unmoderated goups are spammed so much that many have become unusable
and unused.
If you’re talking about Usenet itself, I would dispute that premise. There
are plenty of online forums that are still used despite being full of spam; >> I could even argue that the sum total of social media exists *to* be a
channel for spam, and that’s where the bulk of Usenet traffic has gone.
Network effects are a better explanation for why nobody goes where nobody
goes.
Is it already (or in near future) possible to construct an AI that
could moderate a discussion group so that the amount of off-topic
messages stays acceptable but acceptable messages are not rejected
too often?
It has been possible to stop spam for decades, and no AI is required to do >> it. It doesn’t even require natural language processing of message content!
Spam (and other forms of abuse) have a source, and using that metadata to
block bad actors is all that is required to stop the abuse. The problem is >> that, if you do said analysis, you’ll quickly discover that the source of >> abuse turns out to be the same “too big to fail” companies that exploit >> network effects for their own benefits. For Usenet, that means Google
Groups; if you have the courage to acknowledge Google is a hostile actor,
cut them off and you’ll eliminate 90% of the spam on Usenet.
That approach depends on identification of spam and spam sources. But my question about the possibility to identify on-topic messages is still unanswered.
Mikko
Is it possible to do better than random?
Absolutely.
Is it impossible to do perfectly?
Absolutely.
So even to begin with, you have to say what quality is acceptable.
Then after that, to even measure quality you need a definition of
spam. Then you'll find that humans will disagree far more often than
you would expect. In the well established field of experimental test collection information retrieval, where the goal is to find documents
relevant to a user's query, the relevant sets (A,B) of two human
professionals will typically only agree 60% of the time
(A intersection B / A union B is about .6).
Then after that, the biggest problem is that you are in an adversarial relationship with the spammers. Once you start interfering with the
spammers, they will change their approach. Retrospectively, given a decent learning set, current machine learning approaches will do a decent job
at identifying spam in these past sets. But as the spammers learn what is acceptable and what is not, the reliance on past spam will become less and
less useful. In the 2000's, some 30% of Google's search effort was spent
in this cat and mouse game with the spammers.
All that's in theory. In practice, any barrier at all to spam on
Usenet will reduce spam since the return from the spam is so small -
there are better places for the spammers. What would doom an effort
such as you suggest would be the complaints from the
borderline-legitimate posters about posts improperly identified as
spam. Usenet is dying fast enough as it is; it can't afford to
send these posters packing!
Chris
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)