• Re: (Linux) Convert text to Excel (and back) - what are people's prefer

    From Kenny McCormack@21:1/5 to [email protected] on Thu May 19 22:02:15 2022
    In article <t66cbd$f1o$[email protected]>,
    Lew Pitcher <[email protected]> wrote:
    ...
    I can't speak to your step 3; I don't know of any way to preserve the >metadata (column width, highlighting, etc) when converting an XLS file to >CSV.

    Actually, step 3 is about converting from tab-delimited to XLS, but I get
    the your meanining. To do this in full generality, you'd need to preserve
    it when you went down to text so that you could put it back when you went
    back up to XLS. But what I had in mind was some sort of option collection (either on the command line or in a config file) that you'd feed to the tab-delimited-to-XLS program to cause it to insert that meta-data into the output (XLS) file.

    However, I can suggest a simplification of your steps 1 and 2: use
    (Actually, only step 1. You still need to do the actual processing of the
    data in step 2)

    LibreOffice/OpenOffice Calc to convert the XLS file to CSV. It's a
    simple, one-line commandline invocation:
    scalc --invisible --convert-to csv --outdir target/directory input.XLS

    Take a look at
    scalc --help
    for these and other options

    This is not bad. I've actually done this before - and you're right, using
    LO on both the input and output sides might not be such a bad idea.

    But it does seem like overkill. I had hoped for something more
    lightweight, such as the packages that I alluded to in the OP - which seem
    to be present for languages such as Perl and Tcl.

    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/ItsTough

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lew Pitcher@21:1/5 to Kenny McCormack on Thu May 19 21:20:45 2022
    On Thu, 19 May 2022 20:15:36 +0000, Kenny McCormack wrote:

    First of all, I know the standard way - the way I've been doing it for decades - which is to:
    1) From Excel, write it out as a text file (tab delimited).
    2) Work on it in Linux, using, e.g., AWK, creating a new
    tab-delimited file.
    3) Load the tab-delimited file back into Excel

    This, of course, works fine, but is tedious and not really
    automate-able; you still have lots of manual steps.

    What I'm looking for is something that can actually read/write Excel's
    native format - i.e., an XLS file. Note that, for the purposes of this discussion, we are only talking about XLS (the old, classic Excel
    format), not XLSX.

    I believe there are packages available for the more popular scripting languages, such as Perl, Tcl, maybe Python, but not for AWK, that do
    this. But I've never used any of them. Even though I'm doing my main processing in AWK, I would not have a problem with using something in
    one of these other languages, if someone can/would "spoon feed" me on
    how to do it.

    Finally, note that I've done this sort of thing in the past using Excel macros and/or COM automation, to do it (i.e., automate it) in Excel
    itself. This is a possibility, but is kind of messy. I'd prefer a
    straight command line way.

    Also, one of the issues that I'm trying to get around is that when you
    do step 3 in the above list, you then have to mess around a fair amount
    in Excel to re-size your columns, and restore other bits of
    meta-information that got lost in step 1. This again, would be nice to
    have be automated by the command line tool that I am seeking. As noted,
    I had previously done this using Excel macros.

    I can't speak to your step 3; I don't know of any way to preserve the
    metadata (column width, highlighting, etc) when converting an XLS file to
    CSV.

    However, I can suggest a simplification of your steps 1 and 2: use LibreOffice/OpenOffice Calc to convert the XLS file to CSV. It's a
    simple, one-line commandline invocation:
    scalc --invisible --convert-to csv --outdir target/directory input.XLS

    Take a look at
    scalc --help
    for these and other options

    HTH
    --
    Lew Pitcher
    "In Skills, We Trust"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli the Bearded@21:1/5 to Kenny McCormack on Fri May 20 00:11:59 2022
    In comp.unix.shell, Kenny McCormack <[email protected]> wrote:
    First of all, I know the standard way - the way I've been doing it for decades - which is to:
    1) From Excel, write it out as a text file (tab delimited).
    2) Work on it in Linux, using, e.g., AWK, creating a new tab-delimited file.
    3) Load the tab-delimited file back into Excel
    ...
    I believe there are packages available for the more popular scripting languages, such as Perl, Tcl, maybe Python, but not for AWK, that do
    this. But I've never used any of them. Even though I'm doing my main processing in AWK, I would not have a problem with using something in
    one of these other languages, if someone can/would "spoon feed" me on
    how to do it.

    The case of read/write one sheet in one xls file is covered in the
    sample code in the documentation for the Perl module. Here's my version
    of that sample code:

    #!/usr/bin/perl -w
    # Parse and dump xls files, from the Spreadsheet::ParseExcel documentation.
    # This version puts all the sheets in a single file.
    # 1 June 2001
    use strict;
    use Spreadsheet::ParseExcel;

    my $oExcel = new Spreadsheet::ParseExcel;
    my $xlsfile = shift;

    die "$0: usage 'parsexls foo.xls > foo.out'\n"
    unless (defined($xlsfile) and (-f $xlsfile));

    #1.1 Normal Excel97
    my $oBook = $oExcel->Parse($xlsfile);
    my($iR, $iC, $oWkS, $oWkC);

    print "FILE :", $oBook->{File} , "\n";
    print "COUNT :", $oBook->{SheetCount} , "\n";
    print "AUTHOR:", $oBook->{Author} , "\n";

    my $nl;

    for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++) {
    $oWkS = $oBook->{Worksheet}[$iSheet];
    print "--------- SHEET:", $oWkS->{Name}, "\n";

    for(my $iR = $oWkS->{MinRow} ;
    defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) {

    $nl = '';
    for(my $iC = $oWkS->{MinCol} ;
    defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ; $iC++) {

    $oWkC = $oWkS->{Cells}[$iR][$iC];
    if ($oWkC) {
    print "\t" if $nl;
    print $oWkC->Value;
    $nl = "\n";
    } else {
    print "\t" if $nl;
    }

    } # for column in row
    print $nl;
    } # for row in sheet
    } # for sheet in book

    exit
    __END__


    And here's my tsvtoxls reverse tool:

    #!/usr/bin/perl -w
    # Turn one or more TSV files into an XLS file.
    # Each input file will be a sheet.
    #
    # Usage:
    # tsvtoxls TSVfile [ TSVfile ] output.xls
    # March 2011

    use strict;
    use Spreadsheet::WriteExcel;

    # WriteExcel objects
    use vars qw(
    $WB $WS
    );

    # Other vars
    use vars qw(
    @sheetfiles $outfile $thisfile $line @values $value
    );

    for $thisfile (@ARGV) {
    if( $thisfile =~ /[.]xls$/i) {
    if(!defined($outfile)) {
    $outfile = $thisfile;
    } else {
    die "$0: cannot have two output files: $outfile and $thisfile\n";
    }

    next;
    }

    if(! -f $thisfile) {
    die "$0: this file $thisfile is not a file\n";
    }

    push(@sheetfiles, $thisfile);
    }

    if($#sheetfiles < 0) {
    die "$0: no input files\n";
    }
    if(!defined($outfile)) {
    die "$0: no output file\n";
    }

    $WB = Spreadsheet::WriteExcel->new($outfile);

    for $thisfile (@sheetfiles) {
    if(!open(SHEET, "< $thisfile")) {
    warn "$0: cannot open $thisfile: $!, skipping\n";
    next;
    }
    $thisfile =~ s:.*/::; # drop directory
    $thisfile =~ s/[.]tsv$//; # drop suffix
    $thisfile =~ tr/_/ /; # underscore to space

    $WS = $WB->addworksheet($thisfile);

    while(defined($line = <SHEET>)) {
    chomp $line;
    @values = split(/\t/, $line);

    # row = line from file, start in column 0
    $WS->write_row($., 0, \@values);
    }
    close SHEET;
    }
    $WB->close;
    __END__

    Also, one of the issues that I'm trying to get around is that when you
    do step 3 in the above list, you then have to mess around a fair
    amount in Excel to re-size your columns, and restore other bits of meta-information that got lost in step 1. This again, would be nice
    to have be automated by the command line tool that I am seeking. As
    noted, I had previously done this using Excel macros.

    Those spreadsheet modules could probably read all the column, row, and
    cell formatting from one shhet and apply it to another. I haven't tried.
    I think I set column widths once, but I can't find my code for that now.

    Also note, there's an a whole separate module for xlxs, but interface
    is pretty similar as I recall.

    Elijah
    ------
    comp.lang.perl.misc will be good for further Perl questions

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From vgersh99@21:1/5 to Kenny McCormack on Mon Jun 27 07:54:26 2022
    On Thursday, May 19, 2022 at 4:15:42 PM UTC-4, Kenny McCormack wrote:
    First of all, I know the standard way - the way I've been doing it for decades - which is to:
    1) From Excel, write it out as a text file (tab delimited).
    2) Work on it in Linux, using, e.g., AWK, creating a new tab-delimited file. 3) Load the tab-delimited file back into Excel

    ...
    Maybe Ed Morton's SO post could help somewhat: https://stackoverflow.com/questions/38805123/how-do-i-use-awk-under-cygwin-to-print-fields-from-an-excel-spreadsheet

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)