Beginning Regular Expressions
Format: PDF / Kindle (mobi) / ePub
This book introduces the various parts of the construction of a regular expression pattern, explains what they mean, and walks you through working examples showing how they work and why they do what they do. By working through the examples, you will build your understanding of how to make regular expressions do what you want them to do and avoid creating regular expressions that don’t meet your intentions.
Beginning chapters introduce regular expressions and show you a method you can use to break down a text manipulation problem into component parts so that you can make an intelligent choice about constructing a regular expression pattern that matches what you want it to match and avoids matching unwanted text.
To solve more complex problems, you should set out a problem definition and progressively refine it to express it in English in a way that corresponds to a regular expression pattern that does what you want it to do.
The second part of the book devotes a chapter to each of several technologies available on the Windows platform. You are shown how to use each tool or language with regular expressions (for example, how to do a lookahead in Perl or create a named variable in C#).
Regular expressions can be useful in applications such as Microsoft Word, OpenOffice.org Writer, Microsoft Excel, and Microsoft Access. A chapter is devoted to each.
In addition, tools such as the little-known Windows findstr utility and the commercial PowerGrep tool each have a chapter showing how they can be used to solve text manipulation tasks that span multiple files.
The use of regular expressions in the MySQL and Microsoft SQL Server databases are also demonstrated.
XML is used increasingly to store textual data. The W3C XML Schema definition language can use regular expressions to automatically validate data in an XML document. W3C XML Schema has a chapter demonstrating how regular expressions can be used with the xs:pattern element.
Chapters 1 through 10 describe the component parts of regular expression patterns and show you what they do and how they can be used with a variety of text manipulation tools and languages. You should work through these chapters in order and build up your understanding of regular expressions.
The book then devotes a chapter to each of several text manipulation tools and programming languages. These chapters assume knowledge from Chapters 1 through 10, but you can dip into the tool-specific and language-specific chapters in any order you want.
Characters Change Meaning in Different Contexts Another reason why people find regular expressions confusing is that individual characters, or metacharacters, can have significantly different meanings depending on where you use them. For example, the ^ metacharacter can signify the beginning of a line in regular expressions in some languages. However, in those same languages the same ^ metacharacter can, when used inside a character class, signify negation. So the regular expression pattern ^and
www.canadapost.com. T3Z 3N7 D8R 8C4 RR4 88D P9C 3Q4 V2X 3RU V5R8S4 M8N 7LK J1M6U4 S1B 2R9 88B U2L D7R 7L2 F9Z6G4 A careful look at the sample data indicates that some lines have sequences of three alphanumeric characters, followed by a space character, followed by three more alphanumeric characters. Other lines have no space character. So if you are to detect all valid character sequences, you must allow for the optional nature of the space character. First, let’s design a pattern that will
possible to match the same digits using other, less succinct regular expression patterns. The techniques to do this involve alternation, which is described in Chapter 7, or character classes, which are described in Chapter 5. You saw an example of using a character class a little earlier in this chapter. However, a couple of simple examples using alternation are shown here so you can see how to handle the matching of digits or nondigits in implementations that do not support the \d and \D
as you can see in Figure 6-4. Figure 6-4 147 Chapter 6 When multiline mode is used, the position after a Unicode newline character is treated in the same way as the position that comes at the beginning of the test file. A Unicode newline character matches any of the characters or character combinations that can be used to express the notion of a newline. Not all programming languages support multiline mode. How individual programming languages treat this issue is discussed and, where
processing. For example, in a Web form you will want to check that a credit card number is correctly structured or that a postal code is correctly formed. In a lengthy document, you might want to find a hazily recalled URL for an important source of information. You might want to convert HTML code so that it conforms to the rules of Extensible Markup Language (XML) syntax and complies with company policy to use XHTML code. You might want to check that user input into a Windows application