Forum: >>> Magnum BBS <<<

Unique Characters related: Isogram Coding Puzzle

From yeti@21:1/5 to All on Sun Oct 1 15:14:12 2023

WEEKEND PROGRAMMING CHALLENGE ISSUE #4 https://olimex.wordpress.com/2013/04/12/weekend-programming-challenge-issue-4/

That was a nice and fun one. \o/

Try it.

--
R || 0 ... Resistance is futile.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to yeti on Sun Oct 1 22:36:58 2023

On 01.10.2023 17:14, yeti wrote:

WEEKEND PROGRAMMING CHALLENGE ISSUE #4 https://olimex.wordpress.com/2013/04/12/weekend-programming-challenge-issue-4/

Under the link you find:

"Isogram words are these with all letters different (no letters
duplicated). For instance �Hydropneumatics� is Isogram word.
Your challenge this weekend is to make program which scans text
and displays the longest Isogram word found in the scanned text."

And an (obviously broken) data link to alice_in_wonderland.html

That was a nice and fun one. \o/

Try it.

The point is that such types of tasks can be simply solved by
Unix commands. E.g. the following code

grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |
awk '{print length($0),$0}' | sort -n | tail

produces - sensible folks DO NOT READ FURTHER (strong language) !!!

13 clergywoman's
13 demographic's
13 documentary's
13 expurgation's
13 motherfucking
13 thunderclap's
13 tragicomedy's
13 valedictory's
14 ambidextrously
14 lexicography's

and so I'd throw in "ambidextrously" as a possible good word.

As homework do that in GNU Awk - I think it is not difficult. :-)

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Janis Papanagnou on Sun Oct 1 22:51:06 2023

On 01.10.2023 22:36, Janis Papanagnou wrote:

On 01.10.2023 17:14, yeti wrote:

WEEKEND PROGRAMMING CHALLENGE ISSUE #4

grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |
awk '{print length($0),$0}' | sort -n | tail

grep -Ev '(.).*\1'

is of course a sufficient grep pattern.

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Ben Bacarisse on Sun Oct 1 23:54:39 2023

On 01.10.2023 23:40, Ben Bacarisse wrote:

Janis Papanagnou <[email protected]> writes:

The point is that such types of tasks can be simply solved by
Unix commands. E.g. the following code

grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |

That's a neat trick! The initial and final .* are, however, redundant
and removing them makes the search noticeably faster (though it hardy matters).

Yes, I posted a follow-up where I already noted that.

awk '{print length($0),$0}' | sort -n | tail

I generally use 'sort -rn | head' for this sort of thing, but that's
just a preference for the output order.

Yes.

Comments on the exercise suggest that case should be ignored so maybe a
'tr A-Z a-z' in the pipe is needed.

Partly solved simply by a 'grep -Evi', but only for the first part.
So, yes, you're right

Personally, I'd also exclude apostrophes:

Indeed. (I've just taken a Linux standard dictionary as test data,
since the proposed text was unavailable. For a more complex text
there's certainly a lot more cleanup to be done beforehand.)

[snip]

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Janis Papanagnou on Sun Oct 1 22:40:43 2023

Janis Papanagnou <[email protected]> writes:

On 01.10.2023 17:14, yeti wrote:

WEEKEND PROGRAMMING CHALLENGE ISSUE #4
https://olimex.wordpress.com/2013/04/12/weekend-programming-challenge-issue-4/

Under the link you find:

"Isogram words are these with all letters different (no letters
duplicated). For instance “Hydropneumatics” is Isogram word.
Your challenge this weekend is to make program which scans text
and displays the longest Isogram word found in the scanned text."

And an (obviously broken) data link to alice_in_wonderland.html

That was a nice and fun one. \o/

Try it.

The point is that such types of tasks can be simply solved by
Unix commands. E.g. the following code

grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |

That's a neat trick! The initial and final .* are, however, redundant
and removing them makes the search noticeably faster (though it hardy
matters).

awk '{print length($0),$0}' | sort -n | tail

I generally use 'sort -rn | head' for this sort of thing, but that's
just a preference for the output order.

Comments on the exercise suggest that case should be ignored so maybe a
'tr A-Z a-z' in the pipe is needed. Personally, I'd also exclude
apostrophes:

</usr/share/dict/american-english tr A-Z a-z | \
grep -Ev "(.).*\1|'" | awk '{print length($0),$0}' | sort -rn | head

As homework do that in GNU Awk - I think it is not difficult. :-)

GNU AWK does not permit numbered back references in REs so it's going to
be more fiddly, though probably faster. Something like:

function is_isogram(s, letters, unique, i) {
split(tolower(s), letters, //)
for (i in letters) unique[letters[i]] = 1
return length(letters) == length(unique)
}

!/'/ && length($0) > max && is_isogram($0) {
max = length($0)
max_isogram = $0
}

END { print max_isogram }

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Ben Bacarisse on Mon Oct 2 00:03:08 2023

On 01.10.2023 23:40, Ben Bacarisse wrote:

Janis Papanagnou <[email protected]> writes:

As homework do that in GNU Awk - I think it is not difficult. :-)

GNU AWK does not permit numbered back references in REs so it's going to
be more fiddly, though probably faster.

Here I was not so much focused on the back-reference but on the
code that had already been posted in that other thread and that
could simply be used, e.g. like

# already existing function

function uniqueChars (t, s, n, i, c, o, seen)
{
delete seen
n = split (t, s, "")
for (i=1; i<=n; i++)
if (!seen[c = s[i]]++)
o = o c

return o
}

# new code below

$0 == uniqueChars($0) && length($0) > maxlen {
maxlen = length($0)
word = $0
}

END { print maxlen, word }

Of course there are also other ways to implement the function,
like yours...

Something like:

function is_isogram(s, letters, unique, i) {
split(tolower(s), letters, //)
for (i in letters) unique[letters[i]] = 1
return length(letters) == length(unique)
}

!/'/ && length($0) > max && is_isogram($0) {
max = length($0)
max_isogram = $0
}

END { print max_isogram }

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From yeti@21:1/5 to Janis Papanagnou on Mon Oct 2 00:25:00 2023

Janis Papanagnou <[email protected]> writes:

Indeed. (I've just taken a Linux standard dictionary as test data,
since the proposed text was unavailable. For a more complex text
there's certainly a lot more cleanup to be done beforehand.)

<https://www.gutenberg.org/cache/epub/11/pg11.txt>

--
This stealth signature intentionally left blank.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mike Sanders@21:1/5 to Janis Papanagnou on Mon Oct 2 06:01:59 2023

Janis Papanagnou <[email protected]> wrote:

# new code below

$0 == uniqueChars($0) && length($0) > maxlen {
maxlen = length($0)
word = $0
}

END { print maxlen, word }

Now that's really nice. I like the thinking here.

--
:wq
Mike Sanders

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to yeti on Mon Oct 2 10:23:37 2023

On 02.10.2023 02:25, yeti wrote:

Janis Papanagnou <[email protected]> writes:

Indeed. (I've just taken a Linux standard dictionary as test data,
since the proposed text was unavailable. For a more complex text
there's certainly a lot more cleanup to be done beforehand.)

<https://www.gutenberg.org/cache/epub/11/pg11.txt>

In this text I could only find seven isogram words of max.
length 10 (complained, croqueting, curtseying, educations,
flamingoes, flamingoes, scrambling). - Is that expected?

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to yeti on Mon Oct 2 15:33:53 2023

On 02.10.2023 15:20, yeti wrote:

Weekend Programming Challenge ISSUE #4 – Solutions <https://olimex.wordpress.com/2013/04/15/weekend-programming-challenge-issue-4-solutions/>

...says "This Weekend Programming Challenge have record submissions,
either the problem was very easy [...]" - I suppose it was.

...and: "I count total 30 solutions, some of them very elegant, some of
them very short [...]" - But where can we find the code to all these
solutions contributed? (I can't see anything on that page.)

...specifically: "I still bang my head to understand what this one line
AWK shell script solution does" - Certainly interesting for c.l.awk

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From yeti@21:1/5 to All on Mon Oct 2 13:20:00 2023

Weekend Programming Challenge ISSUE #4 – Solutions <https://olimex.wordpress.com/2013/04/15/weekend-programming-challenge-issue-4-solutions/>

... confirms ‘curtseying’ as solution.

--
Fake signature.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From yeti@21:1/5 to All on Mon Oct 2 13:59:17 2023

<https://github.com/OLIMEX/WPC/tree/master/ISSUE-4/SOLUTION-24>

--
Recursive signature
|--
|Recursive signature

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Janis Papanagnou on Mon Oct 2 18:33:24 2023

On 02.10.2023 15:33, Janis Papanagnou wrote:

...specifically: "I still bang my head to understand what this one line
AWK shell script solution does" - Certainly interesting for c.l.awk

https://github.com/OLIMEX/WPC/tree/master/ISSUE-4/SOLUTION-24/readme.txt

awk 'BEGIN { RS="[^A-Za-z]" } $0 { word=tolower($0) ; if(word in
WordSeen) next ; WordSeen[word]=1 ; split(word,Letters,"") ; delete
CharSeen ; for(char in Letters) if(++CharSeen[Letters[char]]>1) next ; len=length(word) ; if(len>maxlen) { maxword=word ; maxlen=len } } END {
print maxword}'

Not something I'd call a one-liner. (It's just a complete program in one
line, just omitting newlines.)

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Rixter
  Thu Jul 30 02:32:09 2026
  from Madison, Nc via Telnet
- Bob Worm
  Wed Jul 29 22:26:45 2026
  from Wales, Uk via Telnet
- Zenobyte
  Wed Jul 29 21:08:05 2026
  from San Juan, Pr via Telnet
- Guest
  Wed Jul 29 14:26:54 2026
  from Balkans via Telnet
- Rixter
  Wed Jul 29 14:18:17 2026
  from Madison, Nc via Telnet
- Rixter
  Wed Jul 29 02:00:40 2026
  from Madison, Nc via Telnet
- Centurion
  Tue Jul 28 22:54:59 2026
  from Berea, Ohio via Telnet
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	81:19:36
Calls:	12,451
Calls today:	1
Files:	15,194
Messages:	6,537,735

Unique Characters related: Isogram Coding Puzzle

Who's Online

Recent Visitors

System Info