develooper Front page | perl.perl5.porters | Postings from August 2000

Re: Proposal for \v and \V, the small- and large- cut regex operators.

From:
simon
Date:
August 8, 2000 19:05
Subject:
Re: Proposal for \v and \V, the small- and large- cut regex operators.
Message ID:
slrn8p1f32.beq.simon@justanother.perlhacker.org
Jeffrey Friedl (lists.p5p):
>I've been reading a lot of the p5p archives lately (March is particularly
>interesting :-), but have not yet seen any pro-/anti- "backtracking"
>discussion. Is there one that I can read and learn from? In particular, if
>the UPV ("users point of view") can be changed to not include backtracking,
>I would have expected PP3 to not talk about backtracking so explicitly...

I *think* I understand Ilya's point, and I'll *try* to get it across in
other terms. Ilya, my apologies if I misrepresent you - please treat
this as purely my opinion.

Perl's "regular" expressions are growing increasingly less regular. Yes,
it still makes sense to consider them in terms of NFAs and backtracking,
but you have to wonder whether this is the most appropriate model.

From a user's point of view, I find the "Combining Pieces Model" a much,
much, much more straightforward explanation. Let's take one of the
examples in perlre:

           $_ =  "The food is under the bar in the barn.";
           if ( /foo(.*)bar/ ) {

Now, the backtracking explanation says we find foo, then the rest of the
string, then back up steadily until we get to a "bar". The combining
pieces model says: we have three parts. We match "foo", then as much as
possible of everything provided it is followed by "bar". 

You'll note that while the backtracking explanation gives us a clear
picture of how the regular expression is executed by Perl, the combining
pieces model is far more useful when it comes to actually *writing*
regular expressions. When you're thinking about how *you*, as a human
being, would look for the match in the string, you don't think about
going all the way to the end of the string and then backing up. You look
for three distinct chunks - "foo", lots of stuff, "bar" - and so when
you're programming, you state three distinct chunks. Easy. 

So a backtracking explanation doesn't really help you construct a
regular expression yourself, in the same way that the combining pieces
model doesn't tell you how the regular expression is executed.

Now comes the kicker: the user doesn't necessarily care how the
expression is executed. The user isn't writing regular expression
engines, but *should* be writing regular expressions. So it's better for
us to explain everything in terms of the model which makes it easier to
construct regular expressions. The backtracking explanation is simply
not from a user's point of view at all - the user doesn't have to care
how it works, so long as it does work. With the combining pieces model,
however, the user just states the chunks that she wants Perl to match,
one after the other.

Hence, if we can't explain \v and \V in terms of a "chunk" in the match,
a piece in the "combining pieces" model - that's a great stumbling block
for the user. And if the user has to understand *how* Perl's regular
expression implementation goes about its business, then we're in deep
trouble. That would be like having to understand the argument stack
before using built-in functions.

So see if you can explain \v and \V in terms of a chunk or an assertion
as part of the combining pieces model. If it only makes sense in the
context of backtracking and NFAs, it's too low level. 

I'm a little surprised and confused that PP3 talks about backtracking so
much as you say - I haven't seen it yet, but is that in the context of
how to construct REs, or how Perl evaluates them?

-- 
"Even had to open up the case and gaze upon the hallowed peace that 
graced the helpdesk that day." -- Megahal (trained on asr), 1998-11-06



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About