develooper Front page | perl.module-authors | Postings from March 2023

Spreadsheet::Table::Extract

From:
Bob Rogers
Date:
March 7, 2023 03:58
Subject:
Spreadsheet::Table::Extract
Message ID:
25606.46705.40508.900834@orion.rgrjr.com
   I am currently finishing up a hack to extract objects from a
spreadsheet, tentatively called Spreadsheet::Table::Extract.  Here's the
summary:

    =head1 SYNOPSIS

	# say we have a checkbook with five columns that we want to
	# turn into a set of Checkbook::Entry objects:
	my $slot_from_heading = {
		# column      => object slot name
		'number'      => 'check_no',
		'date'        => 'date_days',
		'description' => 'description',
		'descr.'	  => 'description',	# synonym
		'payment'     => 'payment_value',
		'deposit'     => 'deposit_value'
	};

	use Spreadsheet::Table::Extract;
	use Spreadsheet::Read;
	use Checkbook::Entry;	# for example

	my $book = ReadData('some-spreadsheet.xlsx');
	my $sheet1 = $book->[1];
	my @tables = Spreadsheet::Table::Extract->find_tables
	    ($sheet, $slot_from_heading,
	     validate_object => Checkbook::Entry->can('validate'));
	# assume we have only one table; if there are multiple heading
	# rows, there could be several.
	my @entries = $checkbook_table->extract($tables[0]);
	say "have ", scalar(@entries), " checkbook transactions.";

    =head1 DESCRIPTION

    C<Spreadsheet::Table::Extract> identifies subsets of spreadsheets
    and/or CSV files read with C<Spreadsheet::Read> that are conceptually
    equivalent to database tables.  Each such table is found by looking
    for its heading row(s) in the spreadsheet, and results in a
    C<Spreadsheet::Table::Extract> instance.  The instance can then
    extract the table into an arrayref of objects of a designated object
    class.  Multiple tables of different types can be combined side by
    side or above one another on the same sheet.

    The full process goes like this:

    =over

    =item 1.

    Call L</find_tables> to extract table definitions from a sheet,
    passing it a hashref that tells it how to identify the heading row.
    Do this once for each combination of sheet and table type.

    =item 2.

    Call L</make_disjoint> on all tables to adjust their boundaries for
    potential overlap.

    =item 3.

    Call L</extract> on each table to produce a list of objects of the
    desired application class.

    =back

(If you want to see it all, it's at
https://git.sr.ht/~rgrjr/Spreadsheet-Table-Extract .)

   The thing is, I'm not completely happy with the name.  It extracts
tables from spreadsheets, sure enough, but "extract" is more of a verb
than a noun, and the key method makes instances of itself that represent
table-like subsets of the spreadsheet.  Which are in fact extracts, but
it might be more accurate to make those Spreadsheet::Table instances.
But that seems overly grandiose for such a small module (it's 550 lines
all told).  Spreadsheet::ExtractedTable?  Suggestions?

					-- Bob Rogers
					   http://www.rgrjr.com/



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About