EasyPattern Regular Expressions

EasyPatterns 2.8

Conventions: Actual EasyPatterns are highlighted with [...].

 

EasyPattern matching is an essential search/replace ingredient available when searching and replaceing text/HTML/XML files, Search and replacing Microsoft Word document hyperlinks, Searching and replacing Microsoft Excel spreadsheet links, Searching and replacing Microsoft PowerPoint presentations and slides, and Searching and replacing inside databases.

Specifying literal text/static text

The only character that is 'special' is the left square bracket or [. The simplest pattern is just literal text, with no left square brackets. Whenever we need EasyPattern keywords we just put them inside [...].

EasyPattern Description Matches this text...
hello there No [ ... ] expression has been used, this is just literal text hello there
hello there [longest 1 or more letters] The special part is [longest 1 or more letters] hello there Fred, hello there Cornelia, etc
I am [1 or more digits] years old] The special part is [1 or more digits] I am 2 years old, I am 302 years old, etc
This is a left square bracket [ '[' ] This shows how to insert a left square bracket in literal text. This is a left square bracket [

To use multiple keywords, you can either

Put them next to each other [...][...] e.g. [letter][digit] matches "a1", "b1", etc.
Put commas between them [..., ...] e.g. [letter, digit] instead of [letter][digit]
Put spaces between them [... ...] e.g. [letter digit] instead of [letter][digit]

You can also put literal text anywhere inside [...] using single quotes or double quotes.

['literal'] e.g. "['abc']" instead of "abc"
[... 'literal'] e.g. "[digit, 'abc']" instead of "[digit]abc"
['literal' ...] e.g. "['abc', digit]" instead of "abc[digit]"
[... 'literal' ...] e.g. "[digit, 'abc', digit]" instead of "[digit]abc[digit]"
  • There is usually no difference in meaning between including literal text within a bracketed expression (in single quotes) and leaving literal text outside the brackets. The choice is a matter of individual preference. One exception: when using [not], only the single-quoted literal will work, e.g. [not '-'].

Common character classes such as letters and digits

The most important keywords represent character classes or sets, that is, a set of related characters.

Any character, letters, digits, etc.
[character], [char], [chars], [characters] All 256 chars (every character including NULL). EasyPattern's [character] or [char] will match any character including return. If you want any character except a return (or formfeed), use [paragraphChar]; that is, any character that could appear in a paragraph. Details below.
[letter], [letters] Includes ?and ? common in certain European languages.
[digit], [digits] Decimal digits 0-9
[number], [numbers], [numeric] A number with an optional leading sign, digits, optional decimal point and trailing digits
[Integer] A number with an optional leading sign, followed by digits
[Float] A number with an optional leading sign, digits, optional decimal point and trailing digits, optionally followed by 'e', a sign, and 1 or more digits
[EBCDICletter] An EBCDIC letter
[EBCDICupper] An EBCDIC uppercase letter
[EBCDIClower] An EBCDIC lowercase letter
[EBCDICdigit] An EBCDIC digit, ASCII F0-F9
[punctuation] Printing characters, excluding letters and digits, includes !?.,:; " ' ' / - () {} -
Note that ? and ? are considered punctuation.
[symbol], [symbols] ~@#$%^&*
EasyPattern distinguishes punctuation from symbols; the sets do not overlap. For broader combinations, see [printableChar] and [typewriterChar]. For narrower focus, see [sentencePunctuation], [anyQuote], [anyBracket] and [anyDash].
Special letters
[upper], [uppercase], [uppercaseLetter] Uppercase letters. Note: In TextPipe you will also need to enable the Match Case option for this to make any difference.
[lower], [lowercase], [lowercaseLetter] Lowercase letters. Note: In TextPipe you will also need to enable the Match Case option for this to make any difference.
Reserved punctuation
[leftBracket] [
[rightBracket] ]
[leftParen], [leftParenthesis] (
[rightParen], [rightParenthesis] )
[leftAngle], [lessThan] <
[rightAngle], [greaterThan] >
[comma] ,
[singleQuote] '
[doubleQuote], [quote] "   (i.e. standard ASCII "straight" quotation mark)
[backwardSingleQuote] `
ASCII function()
asc(code, ...), ascii(code, ...) The ASCII() or ASC() function embeds arbitrary control characters by entering the control code in decimal or hex (precede the hex digit with '$" eg $ff). You can add one or more control characters by separating each with a space or comma. e.g. ASC( 65, 66 ) outputs 'AB' into the pattern.
EBCDIC function()
ebcdic( literal ) The EBCDIC() function embeds Mainframe EBCDIC characters translated from a string literal you provide e.g. EBCDIC( '0' ) outputs \xF0 (an EBCDIC '0')
  • Square brackets are "reserved" by EasyPattern. The other punctuation marks listed here are literal when they appear outside of brackets.
  • EasyPattern gives special meaning to certain punctuation marks; these keywords can be used to represent the literal character.
  • A literal comma and left & right brackets & parentheses may appear inside single quotes. The keywords are provided to make patterns easier to read.

Filename patterns

[Drive]

A drive letter followed by a colon (:) e.g d:\special\folder\filename.doc, feeding the letter into @Drive@

[Folder]

A path fragment between \ ... \, e.g. d:\special\folder\filename.doc, feeding into @Folder@

[Path]

A path with optional drive e.g. e.g. d:\special\folder\filename.doc, feeding into @Drive@ and @Path@

[UNCpath]

A UNC path consisting of server, share and path - these feed into @Server@, @Share@ and @Path@

[Filename]

A filename, starting from \ and not ending with \ e.g d:\special\folder\filename.doc

Combining character classes and creating your own character classes

There are many ways to create your own character sets to match exactly the characters you require.

You can combine existing character sets using "or":

[... or ...] e.g. [letter or digit], ['a' or 'b']
  • Most EasyPattern keywords refer to a set of characters from which one will match. The first use of "or" is to make a larger set -- though again, only one of the larger set will match. (Any quantity can be specified using repetition keywords, but it is still applying a quantity to a single character not to multiple characters)
  • When sets are combined with "or", parentheses are optional. (In technical terms, the character set use of "or" has very high precedence; see below)
     [letter, letter or digit, letter] -- matches "aaa", "xyz", "h4q", "b7f" etc.

It doesn't hurt to add parentheses even though they are not required.
 [letter (letter or digit) letter] -- same as above

Negation - Match anything except a given set

Instead of specifying all the characters that could occur in a match, it is often convenient to specify characters that could not occur.

[not ...], [non ...], [anyExcept ...] e.g. [oneOrMore non letter]
  • EasyPattern has keywords for [quotedString] and [HTMLTag], but if it didn't, they would be easy to define:
     ['<', oneOrMore not '>', '>'] ? same as [HTMLTag]
     [quote, oneOrMore not quote, quote] ? a simple definition for [quotedString]
  • Negation can only be applied to a single character, or a character set from which one will match. For example:
     [not letter] ? fine
     [not letter or digit] ? fine: [letter or digit] is a set from which one will match
     [not word] ? ERROR: "word" matches multiple characters
     [not 'a'] ? fine (a single character)
     [not 'whatever'] ? ERROR
     
    [not lineChar or letter] ? CAREFUL! [lineChar] is defined as [not linefeed OR verticalTab OR formfeed OR return]. You cannot combine negated and non-negated characters sets, so this pattern is equivalent to [ (not lineChar) or letter ], instead of [ not (lineChar or letter) ]

Custom Sets

Keywords such as [letter] and [digit] are character sets defined internally to EasyPattern; the angle bracket notation lets you define your own characters sets. In both case, EasyPattern matches any single character in that set.

[<...>] e.g. [<aeiou>], [<135>], [<!@#$%^&*>]
  • For single characters, [or] and a set are interchangeable, e.g. [<aei>] and ['a' or 'e' or 'i'] have the same meaning.
  • User defined sets, single character literals and EasyPattern keywords can be combined with [or]:
     [<aeiou> or <123> or '7' or symbol]

Handling Alternatives - A or B

alternative patterns with "or"
[... or ...] e.g. ['Player' or 'EasyPattern']
  • When [or] is used to specify alternatives as part of a larger pattern, grouping parentheses are required, e.g.
     [space, 'Player' or 'EasyPattern', space] -- may not mean what you think!
     [space]Player[or]EasyPattern[space] -- may not mean what you think!
     [(space, 'Player') or ('EasyPattern', space)] -- that's what they mean
     [space ('Player' or 'EasyPattern') space] -- this might be what you wanted

Remember: as noted in the section on expressions, commas are allowed between items to make patterns easier to read; they do not affect what the pattern means.

  • If you leave out the parentheses, EasyPattern will treat everything to the left of the [or] as one implicit group and everything to the right of the [or] as a separate group. Note that visual grouping with brackets or commas is not enough; you must use parentheses. For example, all of the following will be interpreted as [(digit, 'this') or ('that')]:
     [digit, 'this' or 'that'] ? careful; the commas may mislead
     [digit]['this' or 'that'] ? careful; the brackets may mislead
     [digit 'this' or 'that'] ? the grouping isn't clear; parentheses would help
     [digit]this[or]that ? the grouping isn't clear; parentheses would help

As noted in the previous section, parentheses are not required when [or] is used to combine character sets.

"or" as set vs. "or" as alternative

In many cases, you don't have to worry that there are two different uses for "or"; both generally make sense in context. However, there are 2 reasons for learning the differences:

  • or as set doesn't require parentheses; the grouping is implied
  • or as set can be part of a "not" expression since it still represents one character

How many repeats? Quantity, repetition and optional pieces

Notation: "..." is any appropriate keyword or expression, # is a number (one or more digits; the maximum varies with context).

repetition examples will match...
[optional ...], [zeroOrOne ...] [digit, optional letter], [digit, zeroOrOne letter] 2, 2a
[0+ ...], [zeroOrMore ...] [digit, zeroOrMore letters] 2, 2a, 2aa, 2aaa, 2aaaa...
[1+ ...], [oneOrMore ...] [digit, oneOrMore letters] 2a, 2aa, 2aaa, 2aaaa...
[2+ ...], [many ...], [twoOrMore ...] [digit, many letters], [digit, twoOrMore letters] 2aa, 2aaa, 2aaaa...
[#+ ...] [digit, 5+ letters] 2aaaaa, 2aaaaaa...
  • A space is not allowed, e.g. [one or more] will not be recognized
  • The words are all special cases, e.g. [threeOrMore ...] will not work (use [3+ ...])
specific quantity, quantity range (where # is a number) will match...
[# ...] [5 letters] aaaaa, bbbbb
[# to # ...]
[# - # ...]
[# .. # ...]
[3 to 5 letters]
[3-5 letters]
[3..5 letters]
aaa, aaaa, aaaaa

Quantities can now be entered in Hex form by preceding them with '$' e.g $ff.

Greediness, or shortest match vs longest match, atomic matching

When the repetition or count includes a range of values to match, EasyPattern has the choice of matching the "shortest" sequence of characters that fits the pattern, or the "longest" that fits the pattern. For example

[shortest zeroOrOne ...] 0 or 1 will try to match zero occurrences
[shortest zeroOrMore ...] 0+ will try to match zero occurrences
[shortest oneOrMore ...] 1+ will try to match one occurrence
[shortest twoOrMore ...] 2+ will try to match two occurrences

EasyPattern defaults to the SHORTEST match so the "shortest" keyword is optional.

[shortest ... ...] match the lowest possible number of repetitions (default)
[longest ... ...] match the highest possible number of repetitions
  • In these cases, EasyPattern will only match more than the minimum if required to complete additional parts of the pattern, e.g. given "abc123" and the pattern [shortest oneOrMore letter, digit], EasyPattern will match "abc1", i.e. all 3 letters. However, given the same string and the pattern [shortest oneOrMore letter], EasyPattern will just match "a" the first letter. Given the same string and [longest oneOrMore letter], EasyPattern will match "abc". Note that EasyPattern always starts with the first character that matches that pattern, e.g. despite "c1" being shorter than "abc1", EasyPattern matches the latter.
  • Shortest/longest can be confusing.
  • Shortest can be quite slow, use "not" if possible

Pattern matching can become very time consuming if the number of repeats is not known. Take for example

    Pattern: a[ 1+ digits ]b             Matching text: a22222222z

The pattern matcher first matches a, and 8 '2's, then it finds that 'z' does not match 'b'. So it backtracks, trying with 7 '2's, failing again, then with 6 '2's, all the way back to 1 '2', before finally giving up, and starting to test for 'a' again. If we know that backtracking into a repeated match will still result in failure, we can tell EasyPatterns to not bother, by using the atomic keyword.

    Pattern: a[ atomic(1+ digits) ]b     Matching text: a22222222z

This time, the pattern matcher first matches a, and 8 '2's, then it finds that 'z' does not match 'b'. So it backtracks all the way back to starting to test for 'a' again.

Literals, groups

All of the repetition & quantity keywords can be applied to literals and groups as well as to individual keywords, e.g.
 [oneOrMore 'ab'] ? matches "ab", "abab", "ababab" etc.
 [oneOrMore letter or digit] ? matches "aaa", "456", "a45bbb" etc.
 [oneOrMore not letter or digit] ? matches punctuation, symbols, whitespace etc.
 [oneOrMore ('alpha' or 'omega')] ? matches "alphaalapha", "alphaomega" etc.
 [oneOrMore (letter, digit)] ? matches "r2", "r2d2", "r2d2f7b2c4" etc.

Grouping text and capturing text for use in a replace string

[(...)] A non-capturing group
[capture(...)],

[capture(...) as 'varname' ]

Assigns the contents of the group to a variable which can be referred to later in both the search pattern ([group#] e.g. [group6] ,# can range from 1 to 26) and in the replacement string ($# e.g. $6, # can range from 1-9, a-z. $0 represents the entire matched string). If specified, the text can also be stored in the global variable @varname in addition to the positional variables $1, $2 etc.
[group#] Matches the same text that a previously captured group found.

[capture(letter), group1] ? matches "ee", "bb", "cc" etc

[mustBeginWith(...) ...], [mustNotBeginWith(...) ...] When a match is found, it must be/must not be preceded by what is in the brackets. The bracket contents are NOT included in the actual match. The bracket contents are limited to fixed length strings - so no '3+' etc are allowed. This must be the first part of your pattern.

[mustBeginWith( 'hello' or 'goodbye' ) 'fred']

[... mustEndWith(...)], [... mustNotEndWith(...)] When a match is found, it must be/must not be followed by what is in the brackets. The bracket contents are NOT included in the actual match. The bracket contents are limited to fixed length strings - so no '3+' etc are allowed. This must be the last part of your pattern.

['fred' mustEndWith( 'erick' or 'dy' ) ]

  • Parentheses must match, i.e. ")" always ends the most recent "(", independent of number.

Commenting your patterns for readability

EasyPattern allows comments to be included in multi-line patterns using the character ';' or '#' to make the start of a comment, extending until the end of the line e.g.

[ 3 space ;look for 3 spaces
  'hello'    #then the keyword we want
]

Patterns for Whitespace

[space], [spaces] ASCII 32
[nonbreakingSpace] ASCII 202
[whitespace] [space OR tab OR cr OR lf OR verticalTab OR nonbreakingSpace]
[tab] ASCII 9, \t
[return], [cr] ASCII 13, \r
[linefeed], [lf] ASCII 10, \n
[verticalTab] ASCII 11
[formfeed] ASCII 12, \f
[null] ASCII 0
[CRLF] [return, linefeed]
[newline] [(return, linefeed) or return or linefeed]
[DOSNewline] [return, linefeed]
[UNIXNewline] [linefeed]
[MacNewline] [return]
  • [not] cannot be applied to [CRLF], [newline] or [DOSNewline] since they either are or may be a character sequence rather than just a single character.
  • A space character can usually be typed directly into a pattern ([ ' ' ]) but using the keyword may make the pattern easier to understand (and modify later)

Whitespace combinations

[horizontalWhitespace], [hSpace] [space or nonbreakingSpace or tab]
[verticalWhitespace], [vSpace] [return or linefeed or formfeed or vertical tab]
words, columns, lines & paragraphs
[wordDelimiter] [space OR tab OR linefeed OR verticalTab OR formfeed OR return]
[wordChar] [not wordDelimiter]
[word] [1+ wordChar]
   
[columnDelimiter] [tab OR linefeed OR formfeed OR return]
[columnChar] [not columnDelimiter]
[column] [1+ columnChar]    Note: Use [0+ columnChar] instead if the column could be blank
   
[lineDelimiter] [linefeed OR verticalTab OR formfeed OR return]
[lineChar] [not lineDelimiter]
[line] [1+ lineChar]    Note: Use [0+ lineChar] instead if the line could be blank
   
[paragraphDelimiter] [formfeed OR return]
[paragraphChar] [not paragraphDelimiter]
[paragraph] [1+ paragraphChar]
  • The above delimiters are characters not positions; they will "consume" the character that they match. In contrast, [TextStart] and [TextEnd] (below) are positions.
  • The above objects (word, column, line, paragraph) do not include delimiters. So, to match multiple objects, you need to include the delimiters, e.g.
     [2+ word] -- won't match anything
     [2+ (word, optional wordDelimiter)] -- correct
  • The definition for word is based strictly on whitespace so it will include punctuation, matching text such as "$27.52" and "fancy+name". Although in many cases it would be nice to exclude trailing punctuation, that pattern would fail for text like "S.M.U.". When EP's definition of a word isn't appropriate for your text, simply use the custom pattern that fits. For example, [1+ wordChar, letter or digit or symbol] would ensure that the last char is not punctuation.
  • Word, column, line & paragraph require one or more character. If a line might be empty, use: [0+ lineChar] instead of [line].
  • Because the definitions for word, column, line & paragraph look for anything except the appropriate delimiter (rather than the leading delimiter, a series of anything else, and the trailing delimiter), they can be used to get the rest of a word, column, line & paragraph when the starting point is already in the middle.
  • These definitions allow control characters (except the specific whitespace used as delimiters) to appear in words, columns, lines & paragraphs.
  • A column may contain the verticalTab character (it's used by FileMaker to indicate line breaks within a field)
  • Word, column, line & paragraph consist of multiple characters so patterns like [not word] don't make sense.

Positions

[textStart] matches at start of entire text
[textEnd] matches at end of the entire text or before newline at end
[lineStart] matches the start of a line (*)
[lineEnd] matches the end of a line (*)
[wordBoundary] or [wordBreak] matches at a word boundary
[notWordBoundary] matches when not at a word boundary

(*) [lineStart] and [lineEnd] will work fine if the file you're editing has Unix end of line characters, because the core EasyPattern engine assumes this. For DOS or Windows files,  you should use
  [ cr lf or textEnd ]
or
  [ mustEndWith(cr lf or textEnd) ]

More Keywords

Combinations

[controlChar] characters 0-31, 127 (careful: includes most whitespace)
[gremlin] characters 0-31. The definition for [gremlin] is more cautious than in some products.
[printableChar] [letter or digit or punctuation or symbol] (anything that prints ink on paper)
[typewriterChar] [printableChar or space or tab or return] (excludes linefeed, vertical tab & formfeed)

Punctuation subsets (these items are included in [punctuation])

[sentencePunctuation] .,;:!???
[anyBracket], [anyBrackets] left/right paren/bracket/brace (i.e. "bracket" in the broad sense of the term)
[anyQuote] [doubleQuote OR singleQuote OR backwardSingleQuote]
[dash], [hyphen] -    used interchangeably. we have adopted the common notion that these terms refer to the same character
[period] .
[caret] ^
[pound], [hash] #
[slash] /
[backslash] \
[colon] :
[percent] %
[star], [asterisk] *
[ampersand] &
[pipe] |

Real-world patterns

[HTMLTag] <[1+ not '>']>
[HTMLStartTag] <[not '/', 0+ not '>']> (i.e. any tag except an end tag)
[HTMLEndTag] </[1+ not '>']>
[QuotedString] [quote, 1+ ((backslash, quote) or not quote), quote]
[SocialSecurityNumber] [3 digits, dash, 2 digits, dash, 4 digits]
[PhoneNumber] Matches a US-style (xxx) xxx-xxxx number with a variety of punctuation marks. The matching text is captured into 3 successive $variables
[EmailAddress] Matches email addresses. The name and domain parts are captured into 2 successive $variables
[IPAddress] Matches numeric IP addresses. The matching text  is captured into 4 successive $variables
[CreditCard] Matches credit card numbers with a variety of punctuation marks. The matching text is captured into 4 successive $variables
[Hyperlink] Matches a ftp, http, https, telnet, gopher or nntp internet url. The matching text is captured into 3 successive $variables
[DuplicateWord] Matches a repeated word. The matching text is captured into 2 successive $variables
[PageNumber] Matches a page number of the following forms:
Page dd
Page No dd
Page No. dd
Page Num. dd
Pg Num dd
Page Number dd.

The matching text is captured into 3 successive $variables (Page, Number, #)

Data processing patterns (in TextPipe 6.8.2 and later)

[CSVfield] A Comma-Separated-Value field. If fields are delimited by single or double quotes, embedded newlines are allowed, as are doubled-up quotes. The quotes are returned as part of the match.
[TABfield] A Tab-delimited field. To process multiple tab fields e.g.
  [ 3 or more ( TABfield Tab) TABfield ]
[PipeField] A Pipe-delimited field. To process multiple pipe fields e.g.
  [ 3 or more ( PipeField '|' ) PipeField ]

Date and time patterns

[Date] Matches a date format DD-MM-YY or DD-MMM-YY e.g. 01-Jan-02, 29-03-98
[AMPM] The AM/PM part of a time
[Month] A MonthName or a MonthNumber
[MonthNumber] 1-12, with an optional leading zero e.g. 03, 12, 4, 7
[MonthName], [MonthNameShort], [MonthNameLong] January-December and Jan-Dec
[MonthNameLocal] Full month names and 3 letter abbreviations for the current locale
[Day] 1-31, with an optional leading zero e.g. 1, 13, 08, 28
[DayNumber] 01-31 (the leading zero is required) e.g. 01, 08, 13
[DayName], [DayNameShort], [DayNameLong] Sunday-Saturday and Sun-Sat
[DayNameLocal] Weekday names and 3 letter abbreviations for the current locale
[DayOfYear] 1..366
[Year], [YearShort], [YearLong] A 2 or 4 digit year (between 1800 and 2199)
[Hour] A 12 or 24-hour hour, with optional leading zero
[Minute] A 2 digit minute with leading zero
[Second] A 2 digit second with leading zero

Using the real world patterns above, you can easily construct the following EasyPatterns:

HMS [ Hour <:.-> Minute <:.-> Second ]
DMY [ Day <-/ > Month <-/ > Year ]
MDY [ Month <-/ > Day <-/ > Year ]
YMD [ Year <-/ > Month <-/ > Day ]
Julian [ Year DayOfYear ]
MY [ Month <-/ > Year ]
MD [ Month <-/ > Day ]
DM [ Day <-/ > Month ]
HM [ Hour <:. > Minute ]

Advanced - Operator precedence, Order of operations

A complete pattern may include many individual keywords and many expressions. How do you know which keywords go together and where one expression stops and another begins? If in doubt, just enclose every expression in parentheses. But, EasyPattern has rules for combining keywords into expressions, so parentheses aren't always required. The traditional way of expressing these rules is to list the "precedence" of various operators or terms.:

  • (...), including numbered groups
  • [or] for characters sets and single-character literal
  • [not]
  • quantity specifiers ( oneormore, 2+, 3..7, etc)
  • character set keywords (e.g. letter, digit) and single-character literals
  • multi-character literals
  • [or] as alternative, for groups and multi-character literals

Items with high precedence don't need parentheses; they group together automatically. For example, let's build a pattern step-by-step using the "high precedence" operators:
 [letter or digit] ? "or" for characters set keywords
 [letter or digit or '.'] ? and single-character literal
 [letter or digit or '.' or <!?>] ? and arbitrary set
 [not letter or digit or '.' or <!?>] ? reverse the meaning with not
 [1+ not letter or digit or '.' or <!?>] ? add a quantity specifier
 [1+ (not (letter or digit or '.' or <!?>))] ? if you like parentheses, though the meaning is the same

Adding lower precedence terms before, after or both doesn't change the grouping, though the expression is long enough that you may find a pair of commas, brackets, or parentheses helpful. As long as you understand how EasyPattern is doing the grouping, it doesn't matter whether you choose commas, brackets or parentheses. If the parentheses are added around something that is already a group, they don't change the meaning.
 [punctuation 1+ not letter or digit or '.' or <!?> symbol]
 [punctuation, 1+ not letter or digit or '.' or <!?>, symbol] ? same meaning but easier to read
 [punctuation][1+ not letter or digit or '.' or <!?>][symbol] ? same meaning
 [punctuation (1+ not letter or digit or '.' or <!?>) symbol] ? same meaning

Remember, commas and brackets don't change the meaning, only the look. If you put them in the middle of high precedence terms, you might confuse yourself:
 [punctuation 1+ not letter][or][digit or '.' or <!?> symbol] ? same meaning but HARDER to read
 [punctuation 1+ not letter, or, digit or '.' or <!?> symbol] ? same meaning but HARDER to read

Only parentheses change the meaning:
 [(punctuation 1+ not letter) or (digit or '.' or <!?> symbol)] ? different meaning

Note that [or] for character sets and [or] as alternative have opposite precedence. See Character Sets and Alternatives (above) for details & examples.

EasyPattern vs. perl regex or grep

At its core, EasyPattern uses "regular expression" technology that is similar to the "regex" or "grep" tools that originated on UNIX. EasyPattern's primary benefit is that the patterns are much easier to read and write.

For those who have some experience with regex, here are a few specific differences:

  • Quantity is specified as a prefix rather than a suffix. We believe prefix notation is much more natural.
     e.g. [1+ digit] rather than "[0-9]+"
  • Parentheses groups are not automatically numbered. Drawback (to some): you have to include a number if you want to refer to that matched portion. Benefits: the parentheses that are there just for logical grouping don't get numbered.
  • No backslashes are required to "escape" special characters (instead, EasyPattern provides keywords such as [rightBracket]). Benefit: Other pattern languages already use backslash as an escape character so extra backslashes make patterns even more difficult to read.
  • EasyPattern includes keywords for many character sets that require a custom bracketed set in regex, e.g. punctuation, whitespace, paragraph, column, etc.
  • EasyPattern keywords generally include Macintosh-specific characters, e.g. [letter] includes letters with umlauts and other diacritical marks
  • EasyPattern can combine character sets with [or] (as well as use [or] for alternatives).
  • EP's [character] or [char] will match any character; the "equivalent" in some products will match anything except carriage return. If you want any character except a return (or formfeed), use [paragraphChar]; that is, any character that could appear in a paragraph. Of course, [not return] works too.