Janis Papanagnou <
[email protected]> writes:
On 01.10.2023 17:14, yeti wrote:
WEEKEND PROGRAMMING CHALLENGE ISSUE #4
https://olimex.wordpress.com/2013/04/12/weekend-programming-challenge-issue-4/
Under the link you find:
"Isogram words are these with all letters different (no letters
duplicated). For instance “Hydropneumatics” is Isogram word.
Your challenge this weekend is to make program which scans text
and displays the longest Isogram word found in the scanned text."
And an (obviously broken) data link to alice_in_wonderland.html
That was a nice and fun one. \o/
Try it.
The point is that such types of tasks can be simply solved by
Unix commands. E.g. the following code
grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |
That's a neat trick! The initial and final .* are, however, redundant
and removing them makes the search noticeably faster (though it hardy
matters).
awk '{print length($0),$0}' | sort -n | tail
I generally use 'sort -rn | head' for this sort of thing, but that's
just a preference for the output order.
Comments on the exercise suggest that case should be ignored so maybe a
'tr A-Z a-z' in the pipe is needed. Personally, I'd also exclude
apostrophes:
</usr/share/dict/american-english tr A-Z a-z | \
grep -Ev "(.).*\1|'" | awk '{print length($0),$0}' | sort -rn | head
As homework do that in GNU Awk - I think it is not difficult. :-)
GNU AWK does not permit numbered back references in REs so it's going to
be more fiddly, though probably faster. Something like:
function is_isogram(s, letters, unique, i) {
split(tolower(s), letters, //)
for (i in letters) unique[letters[i]] = 1
return length(letters) == length(unique)
}
!/'/ && length($0) > max && is_isogram($0) {
max = length($0)
max_isogram = $0
}
END { print max_isogram }
--
Ben.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)