Re: RFC 231 (v1) Data: Multi-dimensional arrays/hashes and slices

Front page | perl.perl6.language.data | Postings from September 2000

Re: RFC 231 (v1) Data: Multi-dimensional arrays/hashes and slices

Thread Previous | Thread Next

From:

Jeremy Howard

Date:

September 16, 2000 17:07

Subject:

Re: RFC 231 (v1) Data: Multi-dimensional arrays/hashes and slices

Message ID:

000a01c0203b$3afc81c0$0100a8c0@jeremy

Ilya Zakharevich wrote:
> On Sat, Sep 16, 2000 at 07:15:34PM +1100, Jeremy Howard wrote:
> > Why is it important for overloaded objects to be used as array indices?
>
> Overloaded objects should behave the same way as non-objects.
>
> > Why
> > does RFC 204 rule that out? RFC 204 simply specifies that a list
reference
> > as an index provides multidimensional access:
> >
> >   $a[ [1,1] ] == $a[1][1];
>
> I repeat: what does
>
>     $a[ $ind ]
>
> does if $ind is a (blessed) reference to array (1,1), but behaves as
> if it were 11 (due to overloading)?
>
How $ind is implemented (ie the actual structure that is blessed) does not
matter. What matters is what interface its class provides. If it overloads
operators such that dereferencing it does not provide an array, then it
shouldn't be expected to work as a multidimensional array index. If it
provides operators that give it the same interface as a list ref, then it
should work everywhere a list ref does.

> > RFC 81 expands on the existing operator '..' in a list context to allow
more
> > generic list generation. It is particularly useful to generate lists to
act
> > as array slices:
> >
> >   @a[ 1..5 : 3] == @a[1,3,5];
> >
> > This would seem to conflict with the meaning of '..' outlined in RFC
231.
>
> Sorry, I see no conflict.  (Assuming that ternary '..' is allowed, the
> token tie::multi::range() would be followed by 3 numbers, not 2.)
>
> These calls will result in
>
>   tied(@a)->FETCH_RANGE(tie::multi::range(), 1, 5, 3)
>   tied(@a)->FETCH_RANGE(1, 3, 3)
>
> If FETCH_RANGE uses tie::multi::inline() to preprocess the keys, this
> *by definition* will result in the same array of keys.  If not, it
> is the responsibility of FETCH_RANGE to insure the equivalence.
>
> And $a[ 1..5e6 ] would not need to create 5e6 Perl objects the only
> purpose of which is to inform the range extractor that it needs to
> create an object representing the slice.
>
From RFC 81:

<quote>
When a lazy list is passed to a function it is not evaluated. The function
can then access only the elements it needs, which are calculated as
required. Furthermore, the arguments that generated the list are available
as attributes of the list, and can therefore be used directly without
actually accessing the list
</quote>

It is not necessary to create 5e6 objects.

Furthermore, RFC 81 proposes syntax beyond just ($start..$stop: $step).
Implementing it using tie::multi::range() followed by 3 numbers would not be
enough. Anyway, we're defining a language interface here, not an
implementation, so we don't really need to nail this down immediately.

> > When we first discussed ';' on the list, we looked at making it special
in
> > an index only. But the more generic approach of making it a cartesian
> > product operator seems cleaner--it avoids 'special' meanings in favour
of
> > providing a generic operator.
>
> No, it is not a generic operator.  Its behavior depends on whether it
> is used *inside parens*, or not.  Additionally, the behaviour of
> cartesian product makes very little sense: if you did not want it 3
> times, you should not insert it into the language.
>
I'm not wedded to allowing ';' outside of a list index. However, it does
lead to both consistency and convenience with how list slicing is done in
Perl 5:

  # Perl 5 behaviour
  @indices = (1,3);
  @list = (3,4,5,6);
  @list[@indices] = (1,2);   # (3,1,5,2)

  # Multidim extension
  @2d_indices = ([0,0],[1,1]);
  @2d_arr = ([3,4,5],[6,7,8]);
  @2d_arr[@2d_indices] = (1,2);   # ([1,4,5],[6,2,8])

  # Slice syntax extension
  @2d_slice = (0..1 ; 0..1);       # ([0,0],[0,1],[1,0],[1,1])
  @2d_arr = ([3,4,5],[6,7,8]);
  @2d_arr[@2d_slice] = ([0,1],[0,1]);   # ([0,1,5],[0,1,8])

The implementation of ';' when used as a list index and then thrown away
clearly should not create an actual list of lists, for efficiency reasons. I
don't see why this case can't be dealt with appropriately.

> Maybe.  But it is not defined in the corresponding RFC nevertheless.
> At least: all I could deduce was that the following constructs are
> made synonymous:
>
>   @a = ($a .. $b);
>   tie @a, Array::Range, $a, $b;
>
> No other usage of .. is covered.
>
RFC 81 defines 4 uses of C<..>. It does not propose a specific
implementation in terms of C<tie>, or anything else--it simply defines a
language interface.

> *You do not want to create new values uncessesarily*.  This is too
> slow.  Quick operations should reuse already available values
> instead.  See how scratchpads work...
>
Agreed. RFC 81 proposes that generated lists be memoized, and that new
values are only create when required.

> Even if it is creation of a "streamlined" array, creation still will
> takes much more time than operation dispatch - which is in turn
> painfully slow.
>
We should optimise special cases when we know which are causing problems.
Perl 5 may or may not provide useful experience here--the operation dispatch
approach in Perl 6 may be quite different, given how the -internals
discussions are progressing.

> > RFC 204: Isn't it fairly intuitive that:
> >
> >   $a[ [1,1] ] == $a[1][1];
>
> It may be - for people who do not understand overloading.
>
I find it intuitive. And I understand overloading. For people who assume the
limitations of Perl 5 overloading, the syntax may seem fundamentally
ambiguous. Overloading allows the interface to a class to be defined in a
flexible way. This interface is independent of the implementation of the
class (for instance, what structure is blessed). An object that provides the
same interface as a list ref should be usable anywhere a list ref is. An int
and a list ref have different interfaces, and are therefore not ambiguous in
a list index.

> > The index in $a[1..100;1..100] should be generated lazily.
>
> This is *exactly* what my proposal is doing.  The difference is that
> it defines what "lazily" means.
>
Except that your proposal changes the language interface. In particular, it
doesn't allow the creation of contiguous slices, AFAICS. @a[1..100;1..100]
should refer to the whole box bounded by (1,1) and (100,100).

> But keep in mind that my proposal *does not contradict* your
> definition of ';'.  It just provides the *same* semantic inside
> hash/array indices, and shows how to implement it without any
> extravaganza.
>
If we can really get the same interface, with a simpler proposal, I'd be
thrilled. I don't think we're there yet.

> > I think the level of consensus achieved with the syntax proposed in RFCs
81,
> > 204, and 205 speaks volumes.
>
> I do not understand what "level of consensus" should do with language
design...
>
It's very important. It shows that a particular syntax is intuitive enough
that it is understand by people with a wide range of backgrounds. Intuitive
syntax is an important language design goal.

> As I said: Using array references to get multidimensional access *at
> least needs some work*.  Having "lazy lists" undefined does not help
> either.  What my RCS does: it is a simple, completely-defined
> alternative which covers practically the same range of problems,
> without requiring anything fancy.
>
It certainly needs some work. We're getting as close as we can in the
limited time available. The definition of lazy lists is still sloppy,
although RFC 81 defines them a little more rigorously than you suggest. I
think we can firm this up post Oct 15--their behaviour is well enough
understood from other languages that defining their interface is (hopefully)
an adequate first step.

RFC 231 does not (yet) effectively cover the same range of problems that the
array RFCs do. We need multidimensional slicing (not just multiple
indexing), flexible list generation, multiple levels of indirection, and
fast and compact reshaping. Multidimensional syntax should also be a direct
extension of everything that can be achieved with 1-d syntax.

Thread Previous | Thread Next