• Distinguishing list elements: strings and sublists

    From Arjen Markus@21:1/5 to All on Mon Apr 17 02:00:37 2023
    I have worked on an algorithm to plot a dendrogram where the data are stored as a nested list. The final elements are strings (labels, if you want). But I encounter the problem that a string with two or more words cannot be distinguished from a list with
    several elements. Is there a way to achieve this?

    Here is a small example:

    set a {"A B" C}
    lindex $a 0 ==> A B
    llength [lindex $a 0] ==> 2

    set b {{"A B"} C}
    lindex $b 0 ==> "A B"
    llength [lindex $b 0] ==> 1

    I can of course require labels of more than one word to be protected like in the second example. But is there a better solution?

    Regards,

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Schelte@21:1/5 to Arjen Markus on Mon Apr 17 11:53:58 2023
    Arjen,

    I quite liked the way ticklecharts solved that problem: It provides
    tcl::oo classes that can be used to create what amounts to typed
    variables in Tcl. Then you make a list of objects. By checking the class
    of the object, you can distinguish lists, dicts, and strings. I don't
    know how much overhead this creates and if that is an objection for your
    use case.


    Schelte.


    On 17/04/2023 11:00, Arjen Markus wrote:
    I have worked on an algorithm to plot a dendrogram where the data are stored as a nested list. The final elements are strings (labels, if you want). But I encounter the problem that a string with two or more words cannot be distinguished from a list
    with several elements. Is there a way to achieve this?

    Here is a small example:

    set a {"A B" C}
    lindex $a 0 ==> A B
    llength [lindex $a 0] ==> 2

    set b {{"A B"} C}
    lindex $b 0 ==> "A B"
    llength [lindex $b 0] ==> 1

    I can of course require labels of more than one word to be protected like in the second example. But is there a better solution?

    Regards,


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christian Gollwitzer@21:1/5 to All on Mon Apr 17 13:03:55 2023
    Am 17.04.23 um 11:00 schrieb Arjen Markus:
    I have worked on an algorithm to plot a dendrogram where the data are stored as a nested list. The final elements are strings (labels, if you want). But I encounter the problem that a string with two or more words cannot be distinguished from a list
    with several elements. Is there a way to achieve this?


    This is a drawback of the EIAS model and can be inconvenient from time
    to time. One standard solution is to prefix the data with a type
    specifier, i.e. every entry is always a list with either

    BRANCH $a $b
    or
    LEAF $a


    Christian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arjen Markus@21:1/5 to All on Mon Apr 17 05:04:31 2023
    Thanks Christian, Schelte,

    either idea could prove useful. For the moment I can simply ignore the problem, as I have to put it all into a more general form than I have now, but this was a problem that I anticipated.

    Regards,

    Arjen

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Mon Apr 17 14:25:15 2023
    Am 17.04.2023 um 11:00 schrieb Arjen Markus:
    I have worked on an algorithm to plot a dendrogram where the data are stored as a nested list. The final elements are strings (labels, if you want). But I encounter the problem that a string with two or more words cannot be distinguished from a list
    with several elements. Is there a way to achieve this?

    Here is a small example:

    set a {"A B" C}
    lindex $a 0 ==> A B
    llength [lindex $a 0] ==> 2

    set b {{"A B"} C}
    lindex $b 0 ==> "A B"
    llength [lindex $b 0] ==> 1

    I can of course require labels of more than one word to be protected like in the second example. But is there a better solution?

    Regards,


    It might be overkill, but tdom is a tool to handle tree data. You win
    the easy import/export to XML/JSON. And performance for mass data is
    great, I suppose.

    Harald

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From saitology9@21:1/5 to Arjen Markus on Mon Apr 17 10:21:19 2023
    On 4/17/2023 5:00 AM, Arjen Markus wrote:
    I have worked on an algorithm to plot a dendrogram where the data are stored as a nested list. The final elements are strings (labels, if you want). But I encounter the problem that a string with two or more words cannot be distinguished from a list
    with several elements. Is there a way to achieve this?

    Here is a small example:

    set a {"A B" C}
    ...
    set b {{"A B"} C}


    What does the full structure look like for a complete example?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arjen Markus@21:1/5 to All on Mon Apr 17 11:12:54 2023
    On Monday, April 17, 2023 at 4:21:25 PM UTC+2, saitology9 wrote:
    On 4/17/2023 5:00 AM, Arjen Markus wrote:
    I have worked on an algorithm to plot a dendrogram where the data are stored as a nested list. The final elements are strings (labels, if you want). But I encounter the problem that a string with two or more words cannot be distinguished from a list
    with several elements. Is there a way to achieve this?

    Here is a small example:

    set a {"A B" C}
    ...
    set b {{"A B"} C}
    What does the full structure look like for a complete example?

    My test cases so far are:

    set data {
    "Label1" "Label2" {"Level2a" "Level2b"} {{A B} Cxxxx}
    }
    set data2 {
    "Label1" "Label2" {"Level2a" "Level2b"}
    }
    set data3 {
    "Label1" "Label2" {{A B} Cxxxx}
    }
    set data4 {
    "Label1" "Label2" {{A B} {C D}}
    }
    set data5 {
    "Label1" "Label2" {{A B} {C {D E}}}
    }
    set data6 {
    "Label1" "Label2" {F {A B} {C {D E}}}
    }
    set data7 {
    {"Label1" "Label2"} {F {A B} {C {D E}}}
    }

    Nothing spectacular, but the sort of presentation I have in mind is often seen to represent the result of clustering. Something like https://en.wikipedia.org/wiki/Dendrogram#/media/File:Global-Diversity-of-Sponges-(Porifera)-pone.0035105.s008.tif


    Regards,

    Arjen

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From saitology9@21:1/5 to Arjen Markus on Mon Apr 17 15:34:37 2023
    On 4/17/2023 2:12 PM, Arjen Markus wrote:

    Nothing spectacular, but the sort of presentation I have in mind is often seen to represent the result of clustering. Something like https://en.wikipedia.org/wiki/Dendrogram#/media/File:Global-Diversity-of-Sponges-(Porifera)-pone.0035105.s008.tif



    Thanks for that, though I am still not clear :-) As already suggested,
    you may need to tag your data. I would offer the following more generic structure:


    % set my_dglist [list "root-label" $item1 $item2 $item3 ...]

    where an item is as one or the other of:

    [list "leaf" "Label1" more-details...] ## fixed-length of details
    [list "combo" $item4 $item5 $item6 ...] ## any number of items


    In general, dendrograms are created at the end of some sort of frequency analysis and branches aren't necessarily binary. It sounds like you are
    after the visual end result of it. So if you are creating the data programmatically, it should be easy :-) to come up with a few proc's to
    combine elements one by one into groups, and to display it at the end
    with canvas drawing commands.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From saitology9@21:1/5 to Rich on Mon Apr 17 17:12:40 2023
    On 4/17/2023 4:42 PM, Rich wrote:

    You are slowly reinventing Tcllib's huddle: https://wiki.tcl-lang.org/page/huddle


    It is funny you should say that :-) Just last week, I worked on parsing
    some yaml files - and by extension, the huddle system. I didn't want to distract from the original post's subject by mentioning it.

    But you are right: the structure is quite common and is easy to use.
    The node deciders could be generic like in my post or they could be data
    types like in huddle.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to [email protected] on Mon Apr 17 20:42:04 2023
    saitology9 <[email protected]> wrote:
    On 4/17/2023 2:12 PM, Arjen Markus wrote:

    Nothing spectacular, but the sort of presentation I have in mind is
    often seen to represent the result of clustering. Something like
    https://en.wikipedia.org/wiki/Dendrogram#/media/File:Global-Diversity-of-Sponges-(Porifera)-pone.0035105.s008.tif



    Thanks for that, though I am still not clear :-) As already suggested,
    you may need to tag your data. I would offer the following more generic structure:


    % set my_dglist [list "root-label" $item1 $item2 $item3 ...]

    where an item is as one or the other of:

    [list "leaf" "Label1" more-details...] ## fixed-length of details
    [list "combo" $item4 $item5 $item6 ...] ## any number of items



    You are slowly reinventing Tcllib's huddle: https://wiki.tcl-lang.org/page/huddle

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to [email protected] on Mon Apr 17 22:16:01 2023
    saitology9 <[email protected]> wrote:
    On 4/17/2023 4:42 PM, Rich wrote:

    You are slowly reinventing Tcllib's huddle:
    https://wiki.tcl-lang.org/page/huddle


    It is funny you should say that :-) Just last week, I worked on parsing
    some yaml files - and by extension, the huddle system. I didn't want to distract from the original post's subject by mentioning it.

    But you are right: the structure is quite common and is easy to use.
    The node deciders could be generic like in my post or they could be data types like in huddle.

    Yes, and for generic strings is is not possible to tell, from the
    characters themselves, whether the string represents a string, a list,
    or a dict. Extra "out of band" data has to exist somewhere (in code or
    in the data) to indicate whether:

    the quick brown fox jumped over

    Is a simple string, a list of six elements, or a dict with three keys.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arjen Markus@21:1/5 to Rich on Mon Apr 17 23:26:08 2023
    On Tuesday, April 18, 2023 at 12:16:05 AM UTC+2, Rich wrote:
    saitology9 wrote:
    On 4/17/2023 4:42 PM, Rich wrote:

    You are slowly reinventing Tcllib's huddle:
    https://wiki.tcl-lang.org/page/huddle


    It is funny you should say that :-) Just last week, I worked on parsing some yaml files - and by extension, the huddle system. I didn't want to distract from the original post's subject by mentioning it.

    But you are right: the structure is quite common and is easy to use.
    The node deciders could be generic like in my post or they could be data types like in huddle.
    Yes, and for generic strings is is not possible to tell, from the
    characters themselves, whether the string represents a string, a list,
    or a dict. Extra "out of band" data has to exist somewhere (in code or
    in the data) to indicate whether:

    the quick brown fox jumped over

    Is a simple string, a list of six elements, or a dict with three keys.

    It could also be interpreted as an incomplete sentence ;).

    Regards,

    Arjen

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)