Module srackham.pcre2

srackham.pcre2
Version:
0.2.1
License:
MIT
Dependencies from vmod:
0
Imports:
1
Imported by:
1
Repository:
OS-specific
Show selected OS-specific symbols.
Backend-specific
Show selected Backend-specific symbols.

Dependencies defined in v.mod

This section is empty.

Imports

Imported by

Overview

NOTE: This release is graded alpha and is likely to experience API changes up until the 1.0 release.

Overview

A V library module for processing Perl Compatible Regular Expressions (PCRE) using the PCRE2 library.

  • The pcre2 module is a wrapper for the PCRE2 8-bit runtime library.
  • Regex find_* methods search a subject string for regular expression matches.
  • Regex replace_* methods return a string in which matches in the subject string are replaced by a replacement string or the result of a replacement function.
  • Regex *_all_* methods process all matches; *_one_* methods process the first match.
  • The Regex replace_*_extended methods support the PCRE2 extended replacements string syntax (see PCRE2_SUBSTITUTE_EXTENDED in the pcre2api man page).
  • Currently there are no extraction methods for named subpatterns.
  • The pcre module (which uses the older PCRE library) was the inspiration and starting point for this project; the Go regex package also influenced the project.

Documentation

Examples

import srackham.pcre2

fn main() {
    // Match words starting with `d` or `n`.
    r := pcre2.must_compile(r'\b([dn].*?)\b')

    subject := 'Lorem nisi dis diam a cras placerat natoque'

    // Extract array of all matched strings.
    a := r.find_all(subject)
    println(a) // ['nisi', 'dis', 'diam', 'natoque']

    // Quote matched words.
    s1 := r.replace_all(subject, '"$1"')
    println(s1) // 'Lorem "nisi" "dis" "diam" a cras placerat "natoque"'

    // Replace all matched strings with upper case.
    s2 := r.replace_all_fn(subject, fn (m string) string {
        return m.to_upper()
    })
    println(s2) // 'Lorem NISI DIS DIAM a cras placerat NATOQUE'

    // Replace all matched strings with upper case (PCRE2 extended replacement syntax).
    s3 := r.replace_all_extended(subject, r'\U$1')
    println(s3) // 'Lorem NISI DIS DIAM a cras placerat NATOQUE'
}

For more examples see inside the examples directory and take a look at the module tests.

Dependencies

Install the PCRE2 library:

Arch Linux and Manjaro: pacman -S pcre2

Debian and Ubuntu: apt install libpcre2-dev

Fedora: yum install pcre2-devel

macOS: brew install pcre2

Windows †: pacman.exe -S mingw-w64-x86_64-pcre2

† Uses the MSYS2 package management tools.

Installation

v install srackham.pcre2

Test the installation by running:

v test $HOME/.vmodules/srackham/pcre2

Example installation and test workflows for Ubuntu, macOS and Windows can be found in the Github Actions workflow file.

Performance

Complex patterns can cause PCRE2 resource exhaustion. find_* library functions respond to such errors by raising a panic. The solution is to simplify the offending pattern. Unlike, for example, the Go regexp package, PCRE2 does not have linear-time performance and while they may not trigger a panic, pathalogical patterns can exhibit slow performance. See the PCRE2 pcre2perform man page.

Aliases

This section is empty.

Constants

This section is empty.

Sum types

This section is empty.

Functions

#fn compile

fn compile(pattern string) !Regex

compile parses a regular expression pattern and returns the corresponding Regexp struct.

If the pattern fails to parse an error is returned.

#fn escape_meta

fn escape_meta(s string) string

escape_meta returns a string that escapes all regular expression metacharacters inside the argument text. The returned string is a regular expression matching the literal text.

Example:

assert escape_meta(r'\.+*?()|[]{}^$') == r'\\\.\+\*\?\(\)\|\[\]\{\}\^\$'

#fn must_compile

fn must_compile(pattern string) Regex

must_compile is like compile but panics if the regex pattern cannot be parsed.

Structs

#struct Regex

pub struct Regex {
pub:
	pattern          string
	subpattern_count int
mut:
	re &C.pcre2_code = unsafe { nil }
}

Regex contains the compiled regular expression.

  • pattern is the regular expression pattern.

  • subpattern_count is the number of capturing subpatterns.

  • re is a pointer to the compiled PCRE2 regular expression.

#fn (Regex) operator==

fn (r1 Regex) == (r2 Regex) bool

#fn (Regex) str

fn (r Regex) str() string

str returns a human-readable representation of a Regex.

#fn (Regex) is_nil

fn (r Regex) is_nil() bool

is_nil returns true if the r has not been initialized with a compiled PCRE2 regular expression.

#fn (&Regex) free

fn (r &Regex) free()

free disposes memory allocated to the PCRE2 compiled regex.

If V's -autofree option is enabled V's autofree engine calls free automatically when it disposes the Regex struct.

#fn (&Regex) is_match

fn (r &Regex) is_match(subject string) bool

is_match return true if the subject string contains a match for the regular expression; if no then false is returned.

#fn (&Regex) find_match

fn (r &Regex) find_match(subject string, pos int) ?MatchData

find_match searches the subject string starting at index pos and returns a MatchData struct.

  • If no match is found none is returned, * If an unexpected PCRE2 error occurs a panic is raised.

#fn (&Regex) find_n_matchdata

fn (r &Regex) find_n_matchdata(subject string, n int) []MatchData

find_n_matchdata returns an array of MatchData values from the first n matches in the subject string.

  • If n >= 0, then at most n matched indexes are returned; otherwise, all matched indexes are returned.

NOTE: This function does not propagate find_match errors because occasionally an unmatched subject triggers a match limit exceeded error, in these situations we just assume an unmatched subject.

#fn (&Regex) find_n

fn (r &Regex) find_n(subject string, n int) []string

find_n returns an array containing matched strings from the subject string.

  • If n >= 0, then at most n matches are returned; otherwise, all matches are returned.

Example:

assert must_compile(r'\d').find_n('1 abc 9 de 5 g', -1) == ['1', '9', '5']

#fn (&Regex) find_all

fn (r &Regex) find_all(subject string) []string

find_all returns an array containing all matched strings from the subject string.

Example:

assert must_compile(r'\d').find_all('1 abc 9 de 5 g') == ['1', '9', '5']

#fn (&Regex) find_one

fn (r &Regex) find_one(subject string) ?string

find_one returns the first matched string from the subject string.

If a match is not found none returned.

Example:

assert must_compile(r'\d').find_one('1 abc 9 de 5 g') == '1'

#fn (&Regex) find_n_index

fn (r &Regex) find_n_index(subject string, n int) [][]int

find_n_index returns an array of MatchData.ovector values from the first n matches in the subject string.

  • If n >= 0, then at most n matched indexes are returned; otherwise, all matched indexes are returned.

#fn (&Regex) find_all_index

fn (r &Regex) find_all_index(subject string) [][]int

find_all_index searches subject for all matches and returns an array; each element of the array is an array of byte indexes identifying the match and submatches within the subject (see find_one_index for details).

#fn (&Regex) find_one_index

fn (r &Regex) find_one_index(subject string) ?[]int

find_one_index searches subject for the first match and returns an array of subject byte indexes identifying the match and submatches.

  • result[0]..result[1] is the entire match.

  • result[2*N..2*N+2] is the Nth submatch (N = 1...).

  • If a subpattern did not participate in the match its indexes will be -1.

  • If no match is found none is returned.

#fn (&Regex) find_n_submatch

fn (r &Regex) find_n_submatch(subject string, n int) [][]string

find_n_submatch searchs the subject string for regular expression matches and returns an array containing match and submatches text.

  • Each match contributes an element to the result array.

  • Each result array element is an array containing the matched text (at index 0) plus any submatches (at indexes 1..).

  • If a subpattern did not participate in the match the corresponding element is set to ''.

  • If n >= 0, then at most n matches are returned; otherwise, all matches are returned.

#fn (&Regex) find_all_submatch

fn (r &Regex) find_all_submatch(subject string) [][]string

find_all_submatch searchs the subject string for all regular expression matches and returns an array containing match and submatches text.

  • Each match contributes an element to the result array.

  • Each result array element is an array containing the matched text (at index 0) plus any submatches (at indexes 1..).

  • If a subpattern did not participate in the match the corresponding element is set to ''.

#fn (&Regex) find_one_submatch

fn (r &Regex) find_one_submatch(subject string) ?[]string

find_one_submatch searchs the subject string for the first regular expression match and returns an array containing match and submatches text.

  • The first element (at index 0) contains the the entire matched text.

  • Subsequent elements (indexes 1..) contain corresponding matched subpatterns * If a subpattern did not participate in the match the corresponding array element is set to ''.

  • If a match is not found none is returned.

#fn (&Regex) replace_n

fn (r &Regex) replace_n(subject string, repl string, n int) string

replace_n returns a copy of the subject string in which matches of the regular expression are replaced by the repl string.

  • $0...$99 in the repl string are replaced by matching text; the number zero refers to the entire matched substring; higher numbers refer to substrings captured by parenthesized subpatterns e.g. $1 refers to the first submatch.

  • References to undefined subpatterns are not replaced.

  • Subpatterns that did not participate in the match replaced with ''.

  • To insert a literal $ in the output, use $$.

  • If n >= 0, then at most n matches are replaced; otherwise, all matches are replaced.

#fn (&Regex) replace_all

fn (r &Regex) replace_all(subject string, repl string) string

replace_all returns a copy of the subject string with all matches of the regular expression replaced by the repl string.

  • $0...$99 in the repl string are replaced by matching text; the number zero refers to the entire matched substring; higher numbers refer to substrings captured by parenthesized subpatterns e.g. $1 refers to the first submatch.

  • References to undefined subpatterns are not replaced.

  • Subpatterns that did not participate in the match replaced with ''.

  • To insert a literal $ in the output, use $$.

#fn (&Regex) replace_one

fn (r &Regex) replace_one(subject string, repl string) string

replace_one returns a copy of the subject string in with the first match of the regular expression replaced by the repl string.

In all other respects behaves like the replace_all method.

#fn (&Regex) replace_n_fn

fn (r &Regex) replace_n_fn(subject string, repl fn (string) string, n int) string

replace_n_fn returns a copy of the subject string with regular expression matches replaced by the return value of the repl callback function.

  • The repl function is passed a string containing the matched text.

  • If n >= 0, then at most n matches are replaced; otherwise, all matches are replaced.

#fn (&Regex) replace_all_fn

fn (r &Regex) replace_all_fn(subject string, repl fn (string) string) string

replace_all_fn returns a copy of the subject string with all regular expression matches replaced by the return value of the repl callback function.

  • The repl function is passed a string containing the matched text.

#fn (&Regex) replace_one_fn

fn (r &Regex) replace_one_fn(subject string, repl fn (string) string) string

replace_one_fn returns a copy of the subject string with the first regular expression match replaced by the return value of the repl callback function.

  • The repl function is passed a string containing the matched text.

#fn (&Regex) replace_n_matchdata_fn

fn (r &Regex) replace_n_matchdata_fn(subject string, repl fn (MatchData) string, n int) string

replace_n_matchdata_fn returns a copy of the subject string with regular expression matches replaced by the return value of the repl callback function.

  • The repl function is passed the MatchData struct resulting from the match.

  • If n >= 0, then at most n matches are replaced; otherwise, all matches are replaced.

#fn (&Regex) replace_n_submatch_fn

fn (r &Regex) replace_n_submatch_fn(subject string, repl fn ([]string) string, n int) string

replace_n_submatch_fn returns a copy of the subject string with regular expression matches replaced by the return value of the repl callback function.

  • The repl function is passed a matches array containing the matched text (matches[0]) and any submatches (matches[1..]).

  • If a subpattern did not participate in the match the corresponding matches element is set to ''.

  • If n >= 0, then at most n matches are replaced; otherwise, all matches are replaced.

#fn (&Regex) replace_all_submatch_fn

fn (r &Regex) replace_all_submatch_fn(subject string, repl fn ([]string) string) string

replace_all_submatch_fn returns a copy of the subject string with all regular expression matches replaced by the return value of the repl callback function.

  • The repl function is passed a matches array containing the matched text (matches[0]) and any submatches (matches[1..]).

  • If a subpattern did not participate in the match the corresponding matches element is set to ''.

#fn (&Regex) replace_one_submatch_fn

fn (r &Regex) replace_one_submatch_fn(subject string, repl fn ([]string) string) string

replace_one_submatch_fn returns a copy of the subject string with the first regular expression match replaced by the return value of the repl callback function.

  • The repl function is passed a matches array containing the matched text (matches[0]) and any submatches (matches[1..]).

  • If a subpattern did not participate in the match the corresponding matches element is set to ''.

#fn (&Regex) substitute

fn (r &Regex) substitute(subject string, pos int, repl string, options int) !string

substitute is a wrapper for the PCRE2 pcre2_substitute API.

It returns a copy of the subject string in which matches of the regular expression after index pos are replaced by the repl string. options is passed to the PCRE2 pcre2_substitute API.

  • By default only the first match is replaced, use the C.PCRE2_SUBSTITUTE_GLOBAL option to replace all matches.

  • If no matches are found the unmodified subject is returned.

  • An error is returned if a PCRE2 error occurs.

#fn (&Regex) replace_all_extended

fn (r &Regex) replace_all_extended(subject string, repl string) string

replace_all_extended returns a copy of the subject string with all matches of the regular expression replaced by the repl string.

The repl string supports the PCRE2 extended replacements string syntax (see PCRE2_SUBSTITUTE_EXTENDED in the pcre2api man page).

#fn (&Regex) replace_one_extended

fn (r &Regex) replace_one_extended(subject string, repl string) string

replace_one_extended returns a copy of the subject string with the first match of the regular expression replaced by the repl string.

The repl string supports the PCRE2 extended replacements string syntax (see PCRE2_SUBSTITUTE_EXTENDED in the pcre2api man page).

#fn (&Regex) split_n

fn (r &Regex) split_n(subject string, n int) []string

split_n splits the subject string at regular expression match boundaries and returns an array of the split strings.

  • If n >= 0, then at most n matches are processed; otherwise, all matches are processed.

#fn (&Regex) split_all

fn (r &Regex) split_all(subject string) []string

split_all splits the subject string at regular expression match boundaries and returns an array of the split strings.

If no matches are found a single-element array containing subject string is returned.

#fn (&Regex) split_one

fn (r &Regex) split_one(subject string) ?[]string

split_one splits the subject string at the first regular expression match boundary and returns an array of the two split strings.

If no match is found none is returned.

Interfaces

This section is empty.

Enums

This section is empty.