• Replace all non-printable/control characters with plain space except ke

    From [email protected]@21:1/5 to All on Mon Mar 20 08:00:23 2023
    I want to replace all the non-printable/control characters with plain space except keeping the `\n' as they are in the following string:

    ```
    str:="""#g2 % point group to the space group of group
    3 % generator
    0 -1 0
    0 0 -1
    -1 0 0
    3 /8 % generator
    -30 58 -30
    -33 55 -25
    -25 55 -33
    % order of the group unknown""";
    ```

    Is there a convenient way to do this?

    Regards,
    Zhao

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to [email protected] on Mon Mar 20 17:33:00 2023
    On 20.03.2023 16:00, [email protected] wrote:
    I want to replace all the non-printable/control characters with plain
    space except keeping the `\n' as they are in the following string:

    I don't see any control characters in your data below.


    ```
    str:="""#g2 % point group to the space group of group
    3 % generator
    0 -1 0
    0 0 -1
    -1 0 0
    3 /8 % generator
    -30 58 -30
    -33 55 -25
    -25 55 -33
    % order of the group unknown""";
    ```

    Is there a convenient way to do this?

    Of course. Use variable substitution with patterns containing
    the respective character classes; for example

    str="..." # any string
    printf "%s" "${str//[^[:print:]$'\n']/ }"

    to replace all occurrences of non-printable and also not '\n'.

    Janis


    Regards,
    Zhao


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to Janis Papanagnou on Mon Mar 20 16:59:09 2023
    On Tuesday, March 21, 2023 at 12:33:06 AM UTC+8, Janis Papanagnou wrote:
    On 20.03.2023 16:00, [email protected] wrote:
    I want to replace all the non-printable/control characters with plain space except keeping the `\n' as they are in the following string:
    I don't see any control characters in your data below.

    If I first store the string here into a file named as `strfile' and then check it as follows, you will see them:

    werner@X10DAi:~$ cat -A strfile
    #g2 % point group to the space group of group$
    3^I% generator$
    0 -1 0$
    0 0 -1$
    -1 0 0$
    3^I/8^I^I% generator$
    -30 58 -30$
    -33 55 -25$
    -25 55 -33$
    % order of the group unknown$



    ```
    str:="""#g2 % point group to the space group of group
    3 % generator
    0 -1 0
    0 0 -1
    -1 0 0
    3 /8 % generator
    -30 58 -30
    -33 55 -25
    -25 55 -33
    % order of the group unknown""";
    ```

    Is there a convenient way to do this?
    Of course. Use variable substitution with patterns containing
    the respective character classes; for example

    str="..." # any string
    printf "%s" "${str//[^[:print:]$'\n']/ }"

    to replace all occurrences of non-printable and also not '\n'.

    But your description above is inconsistent with the answer given by ChatGPT:

    ```
    User:
    str="..." # any string
    printf "%s" "${str//[^[:print:]$'\n']/ }"

    ChatGPT:
    This code replaces any non-printable characters in the string "str" with a space. The "printf" command is used to print the modified string.

    The "${str//[^[:print:]$'\n']/ }" part of the code is a parameter expansion that uses pattern substitution. The pattern "[^[:print:]$'\n']" matches any character that is not a printable character or a newline character. The substitution "/ " replaces
    each matched character with a space.

    Overall, this code is useful for cleaning up strings and removing any non-printable characters that may cause issues in further processing or display.

    User:
    explain it in more detail

    ChatGPT:
    Sure, I'd be happy to explain this code in more detail!

    The code you provided is a Bash script that uses the `printf` command to print a modified version of a string. The string is stored in the variable `str`, which is defined as a placeholder in the code you provided.

    The `${str//pattern/replacement}` syntax is a parameter expansion that performs pattern substitution on the `str` variable. In this case, the pattern is `[^[:print:]$'\n']`, which matches any character that is not a printable character or a newline
    character. The `[:print:]` character class matches any printable character, which includes letters, numbers, punctuation, and whitespace. The `$'\n'` syntax is a Bash escape sequence that represents a newline character.

    The replacement in this pattern substitution is a space character, denoted by the single space between the forward slashes. This means that any non-printable characters or newline characters in the `str` variable will be replaced with a space character.

    The `printf` command then prints the modified string using the `%s` format specifier. This format specifier tells `printf` to print the argument as a string. The modified string is passed as an argument to `printf`, using the `${str//pattern/replacement}`
    syntax to substitute the modified string for the original `str` variable.

    Overall, this code is a useful way to clean up strings and remove any non-printable characters or newline characters that may cause issues in further processing or display.
    ```

    Finally, I only want to replace all occurrences of non-printable with one space and also keep '\n' as they are. The following is the desired result when applied on the file whose content is the string discussed here:

    werner@X10DAi:~$ sed -e 's/[^[:print:]]/ /g' strfile | cat -A
    #g2 % point group to the space group of group$
    3 % generator$
    0 -1 0$
    0 0 -1$
    -1 0 0$
    3 /8 % generator$
    -30 58 -30$
    -33 55 -25$
    -25 55 -33$
    % order of the group unknown$

    But my concern here is that is the whole file is represented in a string, I should do the above string operations on this string instead of a file.

    Janis


    Regards,
    Zhao


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to [email protected] on Tue Mar 21 02:28:32 2023
    On 21.03.2023 00:59, [email protected] wrote:
    On Tuesday, March 21, 2023 at 12:33:06 AM UTC+8, Janis Papanagnou
    wrote:
    On 20.03.2023 16:00, [email protected] wrote:
    I want to replace all the non-printable/control characters with
    plain space except keeping the `\n' as they are in the following
    string:
    I don't see any control characters in your data below.

    If I first store the string here into a file named as `strfile' and
    then check it as follows, you will see them:

    No. If _you_ do that *you* will see them. _I_ just see spaces tabs
    and newlines as the only control characters.


    werner@X10DAi:~$ cat -A strfile #g2 % point group to the space group
    of group$ 3^I% generator$ 0 -1 0$ 0 0 -1$ -1 0 0$ 3^I/8^I^I%
    generator$ -30 58 -30$ -33 55 -25$ -25 55 -33$ % order of the group
    unknown$



    ``` str:="""#g2 % point group to the space group of group 3 %
    generator 0 -1 0 0 0 -1 -1 0 0 3 /8 % generator -30 58 -30 -33 55
    -25 -25 55 -33 % order of the group unknown"""; ```

    Is there a convenient way to do this?
    Of course. Use variable substitution with patterns containing the
    respective character classes; for example

    str="..." # any string printf "%s" "${str//[^[:print:]$'\n']/ }"

    to replace all occurrences of non-printable and also not '\n'.

    But your description above is inconsistent with the answer given by
    ChatGPT:

    I suggest to discuss that with ChatGPT then, if you think there's
    more expertise, and if you prefer chatting with that tool instead
    of just trying the suggestion on your data.

    [ big snip of chat protocol spam ]


    Overall, this code is a useful way to clean up strings and remove any non-printable characters or newline characters that may cause issues
    in further processing or display. ```

    Are you saying that or your chat tool?

    The character set [^[:print:]$'\n'] specifies a pattern defined
    by the negated ('^') sets comprising printables and newlines.

    That is what you said you need. No?


    Finally, I only want to replace all occurrences of non-printable with
    one space and also keep '\n' as they are.

    Newlines are not touched with the code I presented.

    You didn't say in your original post that you want multiple occurrences "compressed" to a single character replacement.

    To transform _multiple_ consecutive control characters by a _single_
    character adjust your regexp. Depending on what tool (what shell type,
    sed, whatever) you want to use it's either
    [^[:print:]$'\n']+
    [^[:print:]$'\n'][^[:print:]$'\n']*
    +([^[:print:]$'\n'])

    The following is the
    desired result when applied on the file whose content is the string
    discussed here:

    werner@X10DAi:~$ sed -e 's/[^[:print:]]/ /g' strfile | cat -A #g2 %
    point group to the space group of group$ 3 % generator$ 0 -1 0$ 0 0
    -1$ -1 0 0$ 3 /8 % generator$ -30 58 -30$ -33 55 -25$ -25 55 -33$
    % order of the group unknown$

    But my concern here is that is the whole file is represented in a
    string, I should do the above string operations on this string
    instead of a file.

    That's what the shell's string substitution ${str//.../...} is for.

    In other words, just apply the solution, or go chatting with ChatGPT.


    Janis


    Regards, Zhao


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to Janis Papanagnou on Mon Mar 20 23:35:40 2023
    On Tuesday, March 21, 2023 at 9:28:39 AM UTC+8, Janis Papanagnou wrote:
    On 21.03.2023 00:59, [email protected] wrote:
    On Tuesday, March 21, 2023 at 12:33:06 AM UTC+8, Janis Papanagnou
    wrote:
    On 20.03.2023 16:00, [email protected] wrote:
    I want to replace all the non-printable/control characters with
    plain space except keeping the `\n' as they are in the following
    string:
    I don't see any control characters in your data below.

    If I first store the string here into a file named as `strfile' and
    then check it as follows, you will see them:
    No. If _you_ do that *you* will see them. _I_ just see spaces tabs
    and newlines as the only control characters.

    werner@X10DAi:~$ cat -A strfile #g2 % point group to the space group
    of group$ 3^I% generator$ 0 -1 0$ 0 0 -1$ -1 0 0$ 3^I/8^I^I%
    generator$ -30 58 -30$ -33 55 -25$ -25 55 -33$ % order of the group unknown$



    ``` str:="""#g2 % point group to the space group of group 3 %
    generator 0 -1 0 0 0 -1 -1 0 0 3 /8 % generator -30 58 -30 -33 55
    -25 -25 55 -33 % order of the group unknown"""; ```

    Is there a convenient way to do this?
    Of course. Use variable substitution with patterns containing the
    respective character classes; for example

    str="..." # any string printf "%s" "${str//[^[:print:]$'\n']/ }"

    to replace all occurrences of non-printable and also not '\n'.

    But your description above is inconsistent with the answer given by ChatGPT:
    I suggest to discuss that with ChatGPT then, if you think there's
    more expertise, and if you prefer chatting with that tool instead
    of just trying the suggestion on your data.

    In fact, in the analysis of your regex, ChatGPT is indeed correct. However, its final summary is wrong:

    The replacement in this pattern substitution is a space character, denoted by the single space between the forward slashes. This means that any non-printable characters or newline characters in the `str` variable will be replaced with a space character.

    "any non-printable characters or newline characters" should be "any non-printable characters other than newline characters".

    [ big snip of chat protocol spam ]

    Overall, this code is a useful way to clean up strings and remove any non-printable characters or newline characters that may cause issues
    in further processing or display. ```
    Are you saying that or your chat tool?

    The character set [^[:print:]$'\n'] specifies a pattern defined
    by the negated ('^') sets comprising printables and newlines.

    That is what you said you need. No?

    Yes.


    Finally, I only want to replace all occurrences of non-printable with
    one space and also keep '\n' as they are.
    Newlines are not touched with the code I presented.

    You didn't say in your original post that you want multiple occurrences "compressed" to a single character replacement.

    To transform _multiple_ consecutive control characters by a _single_ character adjust your regexp. Depending on what tool (what shell type,
    sed, whatever) you want to use it's either
    [^[:print:]$'\n']+
    [^[:print:]$'\n'][^[:print:]$'\n']*
    +([^[:print:]$'\n'])

    They all work as follows, with `grep -E':

    werner@X10DAi:~$ grep -E '[^[:print:]$'\n']+' strfile | cat -A
    3^I% generator$
    3^I/8^I^I% generator$
    werner@X10DAi:~$ grep -E '[^[:print:]$'\n'][^[:print:]$'\n']*' strfile | cat -A
    3^I% generator$
    3^I/8^I^I% generator$
    werner@X10DAi:~$ grep -E '[^[:print:]$'\n']+' strfile | cat -A
    3^I% generator$
    3^I/8^I^I% generator$

    The following is the
    desired result when applied on the file whose content is the string discussed here:

    werner@X10DAi:~$ sed -e 's/[^[:print:]]/ /g' strfile | cat -A #g2 %
    point group to the space group of group$ 3 % generator$ 0 -1 0$ 0 0
    -1$ -1 0 0$ 3 /8 % generator$ -30 58 -30$ -33 55 -25$ -25 55 -33$
    % order of the group unknown$

    But my concern here is that is the whole file is represented in a
    string, I should do the above string operations on this string
    instead of a file.
    That's what the shell's string substitution ${str//.../...} is for.

    In my example, the more portable usage should be as follows:

    werner@X10DAi:~$ sed -Ee 's/[^[:print:]]/ /g' strfile | cat -A
    #g2 % point group to the space group of group$
    3 % generator$
    0 -1 0$
    0 0 -1$
    -1 0 0$
    3 /8 % generator$
    -30 58 -30$
    -33 55 -25$
    -25 55 -33$
    % order of the group unknown$

    In other words, just apply the solution, or go chatting with ChatGPT.

    Agreed. But isn't it better to combine the advantages of both to a certain extent?

    Zhao

    Janis


    Regards, Zhao


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to [email protected] on Tue Mar 21 06:46:39 2023
    On Tuesday, March 21, 2023 at 2:35:43 PM UTC+8, [email protected] wrote:
    On Tuesday, March 21, 2023 at 9:28:39 AM UTC+8, Janis Papanagnou wrote:
    On 21.03.2023 00:59, [email protected] wrote:
    On Tuesday, March 21, 2023 at 12:33:06 AM UTC+8, Janis Papanagnou wrote:
    On 20.03.2023 16:00, [email protected] wrote:
    I want to replace all the non-printable/control characters with
    plain space except keeping the `\n' as they are in the following
    string:
    I don't see any control characters in your data below.

    If I first store the string here into a file named as `strfile' and
    then check it as follows, you will see them:
    No. If _you_ do that *you* will see them. _I_ just see spaces tabs
    and newlines as the only control characters.

    werner@X10DAi:~$ cat -A strfile #g2 % point group to the space group
    of group$ 3^I% generator$ 0 -1 0$ 0 0 -1$ -1 0 0$ 3^I/8^I^I%
    generator$ -30 58 -30$ -33 55 -25$ -25 55 -33$ % order of the group unknown$



    ``` str:="""#g2 % point group to the space group of group 3 %
    generator 0 -1 0 0 0 -1 -1 0 0 3 /8 % generator -30 58 -30 -33 55
    -25 -25 55 -33 % order of the group unknown"""; ```

    Is there a convenient way to do this?
    Of course. Use variable substitution with patterns containing the
    respective character classes; for example

    str="..." # any string printf "%s" "${str//[^[:print:]$'\n']/ }"

    to replace all occurrences of non-printable and also not '\n'.

    But your description above is inconsistent with the answer given by ChatGPT:
    I suggest to discuss that with ChatGPT then, if you think there's
    more expertise, and if you prefer chatting with that tool instead
    of just trying the suggestion on your data.
    In fact, in the analysis of your regex, ChatGPT is indeed correct. However, its final summary is wrong:
    The replacement in this pattern substitution is a space character, denoted by the single space between the forward slashes. This means that any non-printable characters or newline characters in the `str` variable will be replaced with a space character.
    "any non-printable characters or newline characters" should be "any non-printable characters other than newline characters".
    [ big snip of chat protocol spam ]

    Overall, this code is a useful way to clean up strings and remove any non-printable characters or newline characters that may cause issues
    in further processing or display. ```
    Are you saying that or your chat tool?

    The character set [^[:print:]$'\n'] specifies a pattern defined
    by the negated ('^') sets comprising printables and newlines.

    That is what you said you need. No?
    Yes.

    Finally, I only want to replace all occurrences of non-printable with one space and also keep '\n' as they are.
    Newlines are not touched with the code I presented.

    You didn't say in your original post that you want multiple occurrences "compressed" to a single character replacement.

    To transform _multiple_ consecutive control characters by a _single_ character adjust your regexp. Depending on what tool (what shell type, sed, whatever) you want to use it's either
    [^[:print:]$'\n']+
    [^[:print:]$'\n'][^[:print:]$'\n']*
    +([^[:print:]$'\n'])
    They all work as follows, with `grep -E':

    werner@X10DAi:~$ grep -E '[^[:print:]$'\n']+' strfile | cat -A
    3^I% generator$
    3^I/8^I^I% generator$
    werner@X10DAi:~$ grep -E '[^[:print:]$'\n'][^[:print:]$'\n']*' strfile | cat -A
    3^I% generator$
    3^I/8^I^I% generator$
    werner@X10DAi:~$ grep -E '[^[:print:]$'\n']+' strfile | cat -A
    3^I% generator$
    3^I/8^I^I% generator$

    Another question:

    [[:^print:]] and [^[:print:]], can they both be used here?

    The following is the
    desired result when applied on the file whose content is the string discussed here:

    werner@X10DAi:~$ sed -e 's/[^[:print:]]/ /g' strfile | cat -A #g2 % point group to the space group of group$ 3 % generator$ 0 -1 0$ 0 0
    -1$ -1 0 0$ 3 /8 % generator$ -30 58 -30$ -33 55 -25$ -25 55 -33$
    % order of the group unknown$

    But my concern here is that is the whole file is represented in a string, I should do the above string operations on this string
    instead of a file.
    That's what the shell's string substitution ${str//.../...} is for.
    In my example, the more portable usage should be as follows:

    werner@X10DAi:~$ sed -Ee 's/[^[:print:]]/ /g' strfile | cat -A
    #g2 % point group to the space group of group$
    3 % generator$
    0 -1 0$
    0 0 -1$
    -1 0 0$
    3 /8 % generator$
    -30 58 -30$
    -33 55 -25$
    -25 55 -33$
    % order of the group unknown$
    In other words, just apply the solution, or go chatting with ChatGPT.
    Agreed. But isn't it better to combine the advantages of both to a certain extent?

    Zhao

    Janis


    Regards, Zhao


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to [email protected] on Wed Mar 22 02:20:46 2023
    On 21.03.2023 07:35, [email protected] wrote:
    On Tuesday, March 21, 2023 at 9:28:39 AM UTC+8, Janis Papanagnou wrote:
    On 21.03.2023 00:59, [email protected] wrote:

    In other words, just apply the solution, or go chatting with ChatGPT.

    Agreed. But isn't it better to combine the advantages of both to a certain extent?

    You should be aware what ChatGPT is, how it basically works. It is not
    an all-knowing deity or somesuch technical counterpart. And this group
    is not about discussions whether some "conclusion" of some chat tool
    is ("by accident") correct or not; for that speak with the developers
    and promoters of that tool.

    Generally, on your posting habits here there's certainly a lot to say.
    In case you appreciate the support you get here I suggest to at least
    invest in the quality of your questions and your responses. If you get
    hints, e.g., to consult the man page don't post that man page content
    to show us that you've read it; that's noise - folks here either know
    what there's in the man pages or they know how they can call them. The
    chat protocol logs are in that respect also noise (see above). Usually
    there's rarely one who cares what their output is if they know how that information is actually generated. And any bugs in the database of any
    tool (whether chatGPT or else) or any buggy generation process is just
    OT here.

    If you understood that I can try to address your question above...
    Make use of their respective advantages. But don't mix responsibilities
    (if you now understand what I mean). Use information from these sources
    and make your mind. Compare sources. Experiment. Judge. For yourself, primarily. If you think you have something precious or worthwhile to
    contribute to this group then do it. But don't expect that the insights
    you've personally got are of any interest, even if they don't fail the
    on-topic test.

    Just a few suggestions and things to ponder about. Feel free to ignore
    them.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to [email protected] on Wed Mar 22 02:35:34 2023
    [ please snip the 140 lines of previous context if all you have is a
    simple question ]

    On 21.03.2023 14:46, [email protected] wrote:

    Another question:

    [[:^print:]] and [^[:print:]], can they both be used here?

    If you mean whether they are interchangeable, then No.
    (Where did you get _that idea_ from, from charGPT ?)
    But this you could also have easily testes yourself.

    Here's some basics for your convenience...

    [...] defines a character set ...
    [^...] defines the complement of the character set ...
    [:...:] defines a (predefined) character class ...

    The latter can take the position of any character in a character set,
    say, [:lower:], [:upper:], or [:digit:] may be in a character set as
    in (for example)

    [[:lower:][:digit:]]

    (to match any lowercase character or character representing a digit)
    and as

    [^a-z0-9]

    matches the complement also

    [^[:lower:][:digit:]]

    matches that complement.

    This is very basic and you should inspect some contemporary source
    describing the Unix'y form of regular expressions and their syntax.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to Janis Papanagnou on Tue Mar 21 19:49:04 2023
    On Wednesday, March 22, 2023 at 9:35:40 AM UTC+8, Janis Papanagnou wrote:
    [ please snip the 140 lines of previous context if all you have is a
    simple question ]
    On 21.03.2023 14:46, [email protected] wrote:

    Another question:

    [[:^print:]] and [^[:print:]], can they both be used here?
    If you mean whether they are interchangeable, then No.
    (Where did you get _that idea_ from, from charGPT ?)

    It's really given by ChatGPT, but subsequently, it also showed that this is a non-existent pattern that was arbitrarily fabricated and rejected its correctness.

    But this you could also have easily testes yourself.

    I've also checked it.

    This is very basic and you should inspect some contemporary source describing the Unix'y form of regular expressions and their syntax.

    Thank you for telling me the correct route and method in tackling problems.

    Janis

    Zhao

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to Janis Papanagnou on Tue Mar 21 19:42:43 2023
    On Wednesday, March 22, 2023 at 9:20:52 AM UTC+8, Janis Papanagnou wrote:
    [...]
    Just a few suggestions and things to ponder about. Feel free to ignore
    them.

    Thank you very much for your helpful comments and suggestions!

    Janis

    Zhao

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jalen Q@21:1/5 to [email protected] on Sat Apr 1 19:41:52 2023
    On Monday, March 20, 2023 at 10:00:32 AM UTC-5, [email protected] wrote:
    I want to replace all the non-printable/control characters with plain space except keeping the `\n' as they are in the following string:

    ```
    str:="""#g2 % point group to the space group of group
    3 % generator
    0 -1 0
    0 0 -1
    -1 0 0
    3 /8 % generator
    -30 58 -30
    -33 55 -25
    -25 55 -33
    % order of the group unknown""";
    ```

    Is there a convenient way to do this?

    Regards,
    Zhao
    bdddddccd

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jalen Q@21:1/5 to Jalen Q on Wed Apr 5 00:17:11 2023
    On Saturday, April 1, 2023 at 9:41:55 PM UTC-5, Jalen Q wrote:
    On Monday, March 20, 2023 at 10:00:32 AM UTC-5, [email protected] wrote:
    I want to replace all the non-printable/control characters with plain space except keeping the `\n' as they are in the following string:

    ```
    str:="""#g2 % point group to the space group of group
    3 % generator
    0 -1 0
    0 0 -1
    -1 0 0
    3 /8 % generator
    -30 58 -30
    -33 55 -25
    -25 55 -33
    % order of the group unknown""";
    ```

    Is there a convenient way to do this?

    Regards,
    Zhao
    bdddddccd

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)