Forum: >>> Magnum BBS <<<

Vanilla regex

From Tuxedo@21:1/5 to All on Sun Jul 2 15:24:40 2023

Can anyone assist with a regex using fairly standard and cross compatible methods?

It's for files containing wiki markup segments as follows:

[[File:Some File Name 0123.jpg|800px]]

Or maybe:

[[File:Some other file.jpg|250px]]

Or maybe:

[[File:Another file.jpg |600px|thumb]]

etc.

The only certainty to identify the relevant parts are the start of "[[File:" followed by characters and/or numbers making up a file names (No UTF-8) and ending in some suffix, such as .jpg JPEG, .Jpeg etc. .PNG, .gif, followed by
a "|" pipe or closing "]]" brackets

The regex needs to grab the filename portion, eg. "Another file.jpg", keep
it in a variable and replace any spaces with underscore(s) so the new
variable becomes "Another_file.jpg"

Thereafter, within the existing markup, for example:

[[File:Another file.jpg |600px|thumb]]

Add the following markup after the first pipe:

link=https://example.com/display.pl?Another_file.jpg|

So the final markup becomes:
[[File:Another file.jpg | link=https://example.com/display.pl?Another_file.jpg|600px|thumb]]

The spaces in the original "File: ..." name parts can remain as it's valid
but the underscores need to exist in link=... strings.

There may be instances where "|link=" occurrences already exits within the opening of a "[[File:" and before its closing "]]" brackets. The regex
should avoid operating on any such instances so the procedure can be run without conflict of past replacements.

Many thanks for any example code snippets and ideas.

Tuxedo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Tuxedo on Sun Jul 2 21:42:51 2023

Tuxedo <[email protected]> writes:

Can anyone assist with a regex using fairly standard and cross compatible methods?

What you want can't be done with a regex. You need a tool that uses
regexes to drive substitutions like sed, AWK, Perl, Python, PHP, ruby...

It's for files containing wiki markup segments as follows:

[[File:Some File Name 0123.jpg|800px]]

Or maybe:

[[File:Some other file.jpg|250px]]

Or maybe:

[[File:Another file.jpg |600px|thumb]]

etc.

The only certainty to identify the relevant parts are the start of "[[File:" followed by characters and/or numbers making up a file names (No UTF-8) and ending in some suffix, such as .jpg JPEG, .Jpeg etc. .PNG, .gif, followed by a "|" pipe or closing "]]" brackets

Is that really the only certainty? If so, it's a hard problem. Can the
file name contain | or ]] or newlines? I suspect not as "characters
and/or numbers" is an odd thing to say. I think you mean [a-zA-Z0-9 ].

The regex needs to grab the filename portion, eg. "Another file.jpg", keep
it in a variable and replace any spaces with underscore(s) so the new variable becomes "Another_file.jpg"

Regexes can't do that, but lots of tools that use them can. Do you care
what tool is used?

Thereafter, within the existing markup, for example:

[[File:Another file.jpg |600px|thumb]]

Add the following markup after the first pipe:

link=https://example.com/display.pl?Another_file.jpg|

So the final markup becomes:
[[File:Another file.jpg | link=https://example.com/display.pl?Another_file.jpg|600px|thumb]]

The spaces in the original "File: ..." name parts can remain as it's valid but the underscores need to exist in link=... strings.

There may be instances where "|link=" occurrences already exits within the opening of a "[[File:" and before its closing "]]" brackets. The regex
should avoid operating on any such instances so the procedure can be run without conflict of past replacements.

FYI: you want the program to be "idempotent".

Many thanks for any example code snippets and ideas.

It's not hard, but then it's not very much fun either, so you may have
to pay someone or learn how to do it yourself.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tuxedo@21:1/5 to Ben Bacarisse on Mon Jul 3 11:40:06 2023

Ben Bacarisse wrote:

Tuxedo <[email protected]> writes:

Can anyone assist with a regex using fairly standard and cross compatible
methods?

What you want can't be done with a regex. You need a tool that uses
regexes to drive substitutions like sed, AWK, Perl, Python, PHP, ruby...

It's for files containing wiki markup segments as follows:

[[File:Some File Name 0123.jpg|800px]]

Or maybe:

[[File:Some other file.jpg|250px]]

Or maybe:

[[File:Another file.jpg |600px|thumb]]

etc.

The only certainty to identify the relevant parts are the start of
"[[File:" followed by characters and/or numbers making up a file names
(No UTF-8) and ending in some suffix, such as .jpg JPEG, .Jpeg etc. .PNG,
.gif, followed by a "|" pipe or closing "]]" brackets

Is that really the only certainty? If so, it's a hard problem. Can the
file name contain | or ]] or newlines? I suspect not as "characters
and/or numbers" is an odd thing to say. I think you mean [a-zA-Z0-9 ].

The filename itself never contains | or ]] in this case. The odd new line
could be part of the complete string although it's unlikely and never in the filename part.

The regex needs to grab the filename portion, eg. "Another file.jpg",
keep it in a variable and replace any spaces with underscore(s) so the
new variable becomes "Another_file.jpg"

Regexes can't do that, but lots of tools that use them can. Do you care
what tool is used?

Yes, I care which tool is used in the sense that it works.

Thereafter, within the existing markup, for example:

[[File:Another file.jpg |600px|thumb]]

Add the following markup after the first pipe:

link=https://example.com/display.pl?Another_file.jpg|

So the final markup becomes:
[[File:Another file.jpg |
link=https://example.com/display.pl?Another_file.jpg|600px|thumb]]

The spaces in the original "File: ..." name parts can remain as it's
valid but the underscores need to exist in link=... strings.

There may be instances where "|link=" occurrences already exits within
the opening of a "[[File:" and before its closing "]]" brackets. The
regex should avoid operating on any such instances so the procedure can
be run without conflict of past replacements.

FYI: you want the program to be "idempotent".

Thank you for that word :-)

Many thanks for any example code snippets and ideas.

It's not hard, but then it's not very much fun either, so you may have
to pay someone or learn how to do it yourself.

And for the advice.

Tuxedo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Tuxedo on Mon Jul 3 13:56:04 2023

Tuxedo <[email protected]> writes:

Ben Bacarisse wrote:

Tuxedo <[email protected]> writes:

Can anyone assist with a regex using fairly standard and cross compatible >>> methods?

What you want can't be done with a regex. You need a tool that uses
regexes to drive substitutions like sed, AWK, Perl, Python, PHP, ruby...

It's for files containing wiki markup segments as follows:

[[File:Some File Name 0123.jpg|800px]]

Or maybe:

[[File:Some other file.jpg|250px]]

Or maybe:

[[File:Another file.jpg |600px|thumb]]

etc.

The only certainty to identify the relevant parts are the start of
"[[File:" followed by characters and/or numbers making up a file names
(No UTF-8) and ending in some suffix, such as .jpg JPEG, .Jpeg etc. .PNG, >>> .gif, followed by a "|" pipe or closing "]]" brackets

Is that really the only certainty? If so, it's a hard problem. Can the
file name contain | or ]] or newlines? I suspect not as "characters
and/or numbers" is an odd thing to say. I think you mean [a-zA-Z0-9 ].

The filename itself never contains | or ]] in this case. The odd new line could be part of the complete string although it's unlikely and never in the filename part.

That's significant as some tools (AWK and sed for example) are oriented
towards processing lines, though AWK really processes records and it has
ways to re-define what a record is so as to help in situations like
this. Even so, using AWK for multi-line data like this can get fiddly.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Fri Jul 31 15:23:30 2026
  from Wales, Uk via Telnet
- Rixter
  Fri Jul 31 12:17:09 2026
  from Madison, Nc via Telnet
- Krenn
  Fri Jul 31 10:41:58 2026
  from Sydney, Nsw via Telnet
- Krenn
  Fri Jul 31 10:34:35 2026
  from Sydney, Nsw via Telnet
- Shift
  Fri Jul 31 06:46:34 2026
  from Leeds, England via SSH
- Centurion
  Fri Jul 31 00:59:56 2026
  from Berea, Ohio via Telnet
- Rixter
  Fri Jul 31 00:00:46 2026
  from Madison, Nc via Telnet
- Bob Worm
  Thu Jul 30 20:01:55 2026
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	117:53:24
Calls:	12,465
Calls today:	7
Files:	15,200
Messages:	6,538,258

Vanilla regex

Who's Online

Recent Visitors

System Info