Can anyone assist with a regex using fairly standard and cross compatible methods?
It's for files containing wiki markup segments as follows:
[[File:Some File Name 0123.jpg|800px]]
Or maybe:
[[File:Some other file.jpg|250px]]
Or maybe:
[[File:Another file.jpg |600px|thumb]]
etc.
The only certainty to identify the relevant parts are the start of "[[File:" followed by characters and/or numbers making up a file names (No UTF-8) and ending in some suffix, such as .jpg JPEG, .Jpeg etc. .PNG, .gif, followed by a "|" pipe or closing "]]" brackets
The regex needs to grab the filename portion, eg. "Another file.jpg", keep
it in a variable and replace any spaces with underscore(s) so the new variable becomes "Another_file.jpg"
Thereafter, within the existing markup, for example:
[[File:Another file.jpg |600px|thumb]]
Add the following markup after the first pipe:
link=https://example.com/display.pl?Another_file.jpg|
So the final markup becomes:
[[File:Another file.jpg | link=https://example.com/display.pl?Another_file.jpg|600px|thumb]]
The spaces in the original "File: ..." name parts can remain as it's valid but the underscores need to exist in link=... strings.
There may be instances where "|link=" occurrences already exits within the opening of a "[[File:" and before its closing "]]" brackets. The regex
should avoid operating on any such instances so the procedure can be run without conflict of past replacements.
Many thanks for any example code snippets and ideas.
Tuxedo <[email protected]> writes:
Can anyone assist with a regex using fairly standard and cross compatible
methods?
What you want can't be done with a regex. You need a tool that uses
regexes to drive substitutions like sed, AWK, Perl, Python, PHP, ruby...
It's for files containing wiki markup segments as follows:
[[File:Some File Name 0123.jpg|800px]]
Or maybe:
[[File:Some other file.jpg|250px]]
Or maybe:
[[File:Another file.jpg |600px|thumb]]
etc.
The only certainty to identify the relevant parts are the start of
"[[File:" followed by characters and/or numbers making up a file names
(No UTF-8) and ending in some suffix, such as .jpg JPEG, .Jpeg etc. .PNG,
.gif, followed by a "|" pipe or closing "]]" brackets
Is that really the only certainty? If so, it's a hard problem. Can the
file name contain | or ]] or newlines? I suspect not as "characters
and/or numbers" is an odd thing to say. I think you mean [a-zA-Z0-9 ].
The regex needs to grab the filename portion, eg. "Another file.jpg",
keep it in a variable and replace any spaces with underscore(s) so the
new variable becomes "Another_file.jpg"
Regexes can't do that, but lots of tools that use them can. Do you care
what tool is used?
Thereafter, within the existing markup, for example:
[[File:Another file.jpg |600px|thumb]]
Add the following markup after the first pipe:
link=https://example.com/display.pl?Another_file.jpg|
So the final markup becomes:
[[File:Another file.jpg |
link=https://example.com/display.pl?Another_file.jpg|600px|thumb]]
The spaces in the original "File: ..." name parts can remain as it's
valid but the underscores need to exist in link=... strings.
There may be instances where "|link=" occurrences already exits within
the opening of a "[[File:" and before its closing "]]" brackets. The
regex should avoid operating on any such instances so the procedure can
be run without conflict of past replacements.
FYI: you want the program to be "idempotent".
Many thanks for any example code snippets and ideas.
It's not hard, but then it's not very much fun either, so you may have
to pay someone or learn how to do it yourself.
Ben Bacarisse wrote:
Tuxedo <[email protected]> writes:
Can anyone assist with a regex using fairly standard and cross compatible >>> methods?
What you want can't be done with a regex. You need a tool that uses
regexes to drive substitutions like sed, AWK, Perl, Python, PHP, ruby...
It's for files containing wiki markup segments as follows:
[[File:Some File Name 0123.jpg|800px]]
Or maybe:
[[File:Some other file.jpg|250px]]
Or maybe:
[[File:Another file.jpg |600px|thumb]]
etc.
The only certainty to identify the relevant parts are the start of
"[[File:" followed by characters and/or numbers making up a file names
(No UTF-8) and ending in some suffix, such as .jpg JPEG, .Jpeg etc. .PNG, >>> .gif, followed by a "|" pipe or closing "]]" brackets
Is that really the only certainty? If so, it's a hard problem. Can the
file name contain | or ]] or newlines? I suspect not as "characters
and/or numbers" is an odd thing to say. I think you mean [a-zA-Z0-9 ].
The filename itself never contains | or ]] in this case. The odd new line could be part of the complete string although it's unlikely and never in the filename part.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 156:08:21 |
| Calls: | 12,092 |
| Files: | 15,000 |
| Messages: | 6,517,729 |