• slow fileutil::foreachLine

    From Mark Summerfield@21:1/5 to All on Mon Jun 17 07:02:59 2024
    I have this function:

    proc ws::get_words {wordfile} {
    set in [open $wordfile r]
    try {
    while {[gets $in line] >= 0} {
    if {[regexp {^[a-z]+$} $line matched]} {
    lappend ::ws::Words [string tolower $matched]
    }
    }
    } finally {
    close $in
    }
    }

    It reads about 100_000 lines and ends up keeping about 65_000 of them
    (from /usr/share/dict/words)

    I tried replacing it with:

    proc ws::get_words {wordfile} {
    ::fileutil::foreachLine line $wordfile {
    if {[regexp {^[a-z]+$} $line matched]} {
    lappend ::ws::Words [string tolower $matched]
    }
    }
    }

    The first version loads "instantly"; but the second version (with
    foreachLine) takes seconds.

    I'm using Tcl/Tk 9.0b2

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to Mark Summerfield on Mon Jun 17 15:40:29 2024
    Mark Summerfield <[email protected]> wrote:
    I have this function:

    proc ws::get_words {wordfile} {
    set in [open $wordfile r]
    try {
    while {[gets $in line] >= 0} {
    if {[regexp {^[a-z]+$} $line matched]} {
    lappend ::ws::Words [string tolower $matched]
    }
    }
    } finally {
    close $in
    }
    }

    It reads about 100_000 lines and ends up keeping about 65_000 of them
    (from /usr/share/dict/words)

    I tried replacing it with:

    proc ws::get_words {wordfile} {
    ::fileutil::foreachLine line $wordfile {
    if {[regexp {^[a-z]+$} $line matched]} {
    lappend ::ws::Words [string tolower $matched]
    }
    }
    }

    The first version loads "instantly"; but the second version (with foreachLine) takes seconds.

    If you check the implementation of fileutil::foreachLine, you find:

    set code [catch {uplevel 1 $cmd} result options]

    Where "$cmd" is a variable holding a string of the "command" passed to foreachLine.

    Your original copy is all in a single procedure, so it will be bytecode compiled, and for all but the first execution will run that compiled
    bytecode.

    The foreachLine version, since the "cmd" is a string, will receive
    little to no byte code compiling, and the difference in time is the
    overhead of not being able to bytecode compile the "command" string
    passed to foreachLine.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)