develooper Front page | perl.beginners | Postings from September 2021

Regex to detect natural language fragment

Thread Next
From:
Julius Hamilton
Date:
September 13, 2021 15:32
Subject:
Regex to detect natural language fragment
Message ID:
CAEsMKX176PuH9ZLbr3b1h747x4vgt8KCc-KfiZzpf+TWtPKcEA@mail.gmail.com
Hey,

I'm not sure if this is possible, and if it's not, I'll explore a better
way to do this.

I would like to write a script which analyzes if a line of text is (likely)
a broken natural language sentence, i.e., it is probably part of a
sentence, even if the start or end is not present, rather than it being a
fully "complete" linguistic entity, for example, a header of a section,
which does not have a period at the end and is not really a sentence, yet
is in a complete and unbroken form.

I'm pretty sure in principle this will require some kind of syntax parsing.
I think I read somewhere regular expressions for some mathematical reason
cannot parse tree / nested structures, for example HTML.

Does anyone know what some next most ubiquitous, standard tool is for
analyzing nested linguistic structures? Is that an XML parser?

Thanks very much,
Julius

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About