Jan 7, 2014

Regex like operators for DCG

Today I was trying to create a simple parser to count syllables in latin words with Prolog. I usually use DCGs in Prolog for parsing. Their semantic is very similar to BNF . I love DCGs, but sometimes the verbosity in some cases annoys me. Take the following example:

consonant -->
    "b"; "c"; "d"; "f"; "g"; "h"; "l"; "j"; "k"; "m";
    "n"; "p"; "q"; "r"; "s"; "t"; "v"; "x"; "z".
consonants -->
    [].
consonants -->
    consonant, consonants.

vowel -->
    "a"; "e"; "i"; "o"; "u".
vowels -->
    vowel.
vowels -->
    vowel, vowels.

syllable -->
    vowels.
syllable -->
    consonants, vowels.

syllables(0) -->
    [].
syllables(N) -->
    syllable, syllables(N_1),
    { N is N_1 + 1 }.

The vowels and consonant rules were created merely as helpers for the syllable predicate. That could be reduced if I had regex operators like +, * or ?. Although there are modules for using regex in Prolog ( swi-regex ), it is not suitable when using within in DCGs. So I wrote these regex like operators, with meta DCG predicates, for DCG (like EBNF operators):

% op statements let me use them without parenthesis
:- op(100, xf, *).
:- op(100, xf, +).
:- op(100, xf, ?).

*(_) -->
    [].
*(EXPR) -->
    EXPR, *(EXPR).

+(EXPR) -->
    EXPR.
+(EXPR) -->
    EXPR, +(EXPR).

?(EXPR) -->
    [].
?(EXPR) -->
    EXPR.

They allow me to modify the times a given rule will be matched. So, I can replace this:

consonants -->
    [].
consonants -->
    consonant, consonants.

vowels -->
    vowel.
vowels -->
    vowel, vowels.

syllable -->
    vowels.
syllable -->
    consonants, vowels.

with a simpler version without intermediate rules (using the operators definition through a library):

syllable -->
    *consonant, +vowel.

No comments:

Post a Comment