Front page | perl.module-authors |
Postings from March 2023
Spreadsheet::Table::Extract
From:
Bob Rogers
Date:
March 7, 2023 03:58
Subject:
Spreadsheet::Table::Extract
Message ID:
25606.46705.40508.900834@orion.rgrjr.com
I am currently finishing up a hack to extract objects from a
spreadsheet, tentatively called Spreadsheet::Table::Extract. Here's the
summary:
=head1 SYNOPSIS
# say we have a checkbook with five columns that we want to
# turn into a set of Checkbook::Entry objects:
my $slot_from_heading = {
# column => object slot name
'number' => 'check_no',
'date' => 'date_days',
'description' => 'description',
'descr.' => 'description', # synonym
'payment' => 'payment_value',
'deposit' => 'deposit_value'
};
use Spreadsheet::Table::Extract;
use Spreadsheet::Read;
use Checkbook::Entry; # for example
my $book = ReadData('some-spreadsheet.xlsx');
my $sheet1 = $book->[1];
my @tables = Spreadsheet::Table::Extract->find_tables
($sheet, $slot_from_heading,
validate_object => Checkbook::Entry->can('validate'));
# assume we have only one table; if there are multiple heading
# rows, there could be several.
my @entries = $checkbook_table->extract($tables[0]);
say "have ", scalar(@entries), " checkbook transactions.";
=head1 DESCRIPTION
C<Spreadsheet::Table::Extract> identifies subsets of spreadsheets
and/or CSV files read with C<Spreadsheet::Read> that are conceptually
equivalent to database tables. Each such table is found by looking
for its heading row(s) in the spreadsheet, and results in a
C<Spreadsheet::Table::Extract> instance. The instance can then
extract the table into an arrayref of objects of a designated object
class. Multiple tables of different types can be combined side by
side or above one another on the same sheet.
The full process goes like this:
=over
=item 1.
Call L</find_tables> to extract table definitions from a sheet,
passing it a hashref that tells it how to identify the heading row.
Do this once for each combination of sheet and table type.
=item 2.
Call L</make_disjoint> on all tables to adjust their boundaries for
potential overlap.
=item 3.
Call L</extract> on each table to produce a list of objects of the
desired application class.
=back
(If you want to see it all, it's at
https://git.sr.ht/~rgrjr/Spreadsheet-Table-Extract .)
The thing is, I'm not completely happy with the name. It extracts
tables from spreadsheets, sure enough, but "extract" is more of a verb
than a noun, and the key method makes instances of itself that represent
table-like subsets of the spreadsheet. Which are in fact extracts, but
it might be more accurate to make those Spreadsheet::Table instances.
But that seems overly grandiose for such a small module (it's 550 lines
all told). Spreadsheet::ExtractedTable? Suggestions?
-- Bob Rogers
http://www.rgrjr.com/
-
Spreadsheet::Table::Extract
by Bob Rogers