Wrox Programmer Forums
| Search | Today's Posts | Mark Forums Read
BOOK: Beginning Regular Expressions
This is the forum to discuss the Wrox book Beginning Regular Expressions by Andrew Watt; ISBN: 9780764574894
Welcome to the p2p.wrox.com Forums.

You are currently viewing the BOOK: Beginning Regular Expressions section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
Old May 28th, 2006, 05:48 AM
ufo ufo is offline
Authorized User
Join Date: Apr 2006
Location: , , .
Posts: 24
Thanks: 0
Thanked 0 Times in 0 Posts


Ok, if I’m making mistakes, bash me. I will change the post if I do, but I think this book is a disaster. It’s not just a disaster, it’s disgracefully bad. There are several major areas in which I feel betrayed after buying this book (in no specific order of importance):
  • Structure
  • Stretching 20 pages of content over 740 pages
  • Errors
  • The fact that this book is out there for a year and no errata have been submitted

Probably structure is a matter of taste, and when you are reading books for relaxation, it probably doesn’t really matter, but for learning regular expressions ( or anything else for that matter ) and after that mastering them, I think it is really necessary that you get an oversight of the structure of the language/syntax of something. I think it is imperative to get a structured oversight of the syntax. In this book for example you will not find a table showing al the metacharacters in regex and their meaning, not even in an appendix. That’s a disaster I think. The theory is spread out over about 200 pages with tons of screenshots and ‘try it out’ sections, leaving me completely confused in the end. There is no way of summarizing all that in your head and having a structure to remember it all, because there is almost no structure in the text.

Stretched content
To be honest, for me most of this book was pagefilling. To start with, the 448 screenshots are completely redundant, and as well make the text less organised. I think they averagely take about 1/3 to ½ of a page, so you get about 150 pages of screenshots to learn patterns that do text recognition…. What is the is the surplus of screenshots for text based technology? What is the surplus of screenshots of console applications?
Further more I think the ‘how does it work’ sections should have been combined in a decent chapter about the internals of how a regex engine works, because their value quickly decreases.
I think the book is also far from complete. The for the most part regular expressions are not difficult at all, they just look like chinese if you are not acquainted to them…
There are however some things about regex that are less obvious, and those are the things one would buy a book for. Obviously none of these things have been covered in this book. Examples are: greedy and ungreedy quantifiers and being able to predict their behaviour when matching a pattern multiple times, performance issues and good practices such as classes are faster than alternatives between parentheses, what about performance optimisation of nested patterns with quantifiers and so on, recursive patterns, lookahead and lookbehind are not clearly explained, conditional groups,… there are undoubtedly more issues that one needs to read about before understanding, all absent in this book… Overall, I would say that I learned more from the brief intro to regex in the php manual than from this 740 page book. Now that’s dissappointing, because it wastes my money and time, and I still am no wiser in the end. Needless to say, after reading the errors below, you will understand that the author doesn’t know anymore about this than any beginner.

Now, all the above can be taken to be subjective, dependent on taste, but the errors jump of the page in this book. That’s not subjective, there is no way around it. The author does not master regular expressions any more than a beginner. I will give a list of errors, mostly from between page 180 and 250, simply because in the beginning of the book, I was mainly flipping pages because it wasn’t really interesting, and the second part of the book ( after 250 ) I’m just not gonna bother reading anymore, because it is concerning specific software, and no doubt the respective documentation of the software is way better than this book.

Page 58:
To look for lines that contain matches for colour and color using the findstr utility, enter the following
at the command line:
findstr /N colo*r Colors.txt

The expession does not match colour, but it actually matches ‘col’ followed by zero or more occurences of the character ‘o’, followed by an ‘r’.

Page 75:
Here we get the definition of the ‘.’ metacharacter. This is really basics, it is a definition. But the beginner at regex here finds themselves learning wrong definitions:

The period is one of the most broadly scoped metacharacters. It can match any alphabetic character,
whether lowercase or uppercase, as well as any numeric digit.

This is just completely wrong. The period matches any character but newline. In some languages, you can set modifiers however to make it match even newline. Don’t try to tell me the author didn’t know what alphabetic means, because only a few pages further (p82) he explains the ‘\w’ metacharacter as:

The pattern \w indicates that an ASCII alphabetic character (upper- or lowercase A through Z or a
through z), a numeric digit, or an underscore is to be matched.

Page 180:
On page 182, the author explains us that the expression ‘(a|ab)’ will never match ‘ab’, because as soon as the ‘a’ gets matched, ‘ab’ won’t be checked anymore. This is correct, but unfortunately the author only discovered this when he was at p182, because some pages before, he already fell into this pit:

Match the sequence of characters D, o, c, t, o, and r OR match the sequence of characters D and r
OR match the sequence of characters D, r, and . (a period).
The following pattern will satisfy the requirements specified in the problem definition:

This will never match ‘Dr.’ for this reason.

Page 204:
Try It Out Lookahead in the Same Document
1. Open RegexBuddy, and select the Match tab in the upper pane.
2. Enter the regular expression pattern SQL Server(?:.*MySQL) in the Match tab, and select the
Test tab in the middle pane.
3. Click the Open File option, navigate to C:\BRegExp\Ch08, and open the Databases.txt file.
Adjust the navigation if you installed the code downloads elsewhere.
4. Click the Find First button, and inspect the highlighted text in the lower pane.
Figure 8-5 shows the results. Notice that RegexBuddy highlights the whole text from the match
of SQL Server to the first appearance of the sequence of characters MySQL. Strictly speaking,
only the string SQL Server is matched by the regular expression. The highlighted area, in a
convention that is also followed by PowerGrep, also highlights the text specified by the lookahead

This is actually a really disgraceful one. Here the author claims that two different software packets are wrong, where it is blatant he is not even doing a lookahead. The pattern matches ‘SQL Server’ followed by zero or more characters on the same line ( unless you put a modifier to make ‘.’ match newline also, followed by ‘MySQL’. So basically the pattern doesn’t look ahead, the ‘(?:’ actually means that this is a non capturing group, also this doesn’t try to match the rest of the document, only the rest of the line. This one Mr. Watt, earns you a proper spanking by the hands of the authors of regex buddy and powergrep…

Page 227:
Here the author is giving a pattern to match a valid email adress:


This one actually does work, surprise, on the examples he provides in his txt file. It is actually one following the why do it easy if you can also do it difficult. The first part
‘\w*(?<=\w)’ actually means zero or more word characters, of which the last one has to be a word character. That doesn’t even make sense in plain english. Of course one would write this as ‘\w+’ or one or more word characters.

Page 232:
Similarly, lookahead can reduce sensitivity. For example, suppose that you want to match all occurrences
of the character sequence John. The following pattern would match a word boundary, then the desired
character sequence John, and then check if the following character is a space character:

\bJohn(?= )

However, if the test text is as follows, the lookahead is too specific and causes what is likely to be a
desired match to fail:

I went with John, and Mary on a trip.

Modifying the lookahead to (?=\b) or (?=\W) would prevent the problem caused by the occurrence of
an unanticipated comma.

Note that here ‘(?=\b)’ doesn’t make sense, since it is the same as ‘\b’, as it is not a character match, but a position match.

Page 233:
For example, suppose you have a collection of HTML documents that include IP addresses, and your
task is to amend the style that the IP addresses are displayed in. Suppose that initially, IP addresses are
nested inside the start and end tags for HTML b elements, as in the following:

This should be: ‘<b>(\d{1,3}(?:\.\d{1,3}){3})</b>’

Page 234:
Similar issues arise when handling data that includes information about qualifications. For example, if
a Doctor of Philosophy degree is of interest, it will often be written as PhD (no space character or
period character), Ph.D. (two period characters), or Ph. D. (one space character, two period characters).
To match the options just mentioned, a pattern such as the following would be satisfactory:
Ph\. ?D\.?

It includes the \. metacharacter twice with a ? quantifier, which matches each of the optional period
character(s) that can occur in some of the options. Depending on where the degree was obtained, the
form D.Phil. (two period characters) with option DPhil (no period characters) can also occur. To allow
for these additional forms, a pattern such as the following would be needed:

The first one doesn’t match ‘PhD’ and the second one does not match ‘Ph. D.’

Page 236:
Some European surnames have variant spellings too. For example, the surnames Van Nistelrooy
(with an intermediate space character) can also be spelled Van Nistelrooij or VanNistelrooy (with
no intermediate space character). So a pattern such as the following would be needed to match these
three spelling variants:
Van *Nistelroo(ij|y)

Of course, because some such surnames may sometimes be spelled with a lowercase v in van, the following
pattern might be more sensitive in some situations:
[vV]an *Nistelroo(ij|y)

First thing to note is that the author allows more than one space character between ‘van’ and the name, which is not legal in Europe, and secondly, he says he is only gonna match without a space when the name ends on ‘y’. The correct pattern for this would be:
[vV]an( )?Nistelroo(?(1)(y|ij)|y)
Basically this makes sure it can only end on ‘ij’ when there was a space.

Page 344:
When I had a quick look in the chapter on powergrep, to see if the author mentions the strange convention again, I stumbled upon another error:

Match a < character followed by the character sequence hr (either case), followed by optional whitespace
characters, followed by zero or more characters, followed by optional whitespace characters,
followed by an optional forward slash, followed by a > character.
A pattern corresponding to the preceding problem definition is shown here:

The above expression is actually the same as ‘’. Fortunately for Mr.Watt, case insensitivity is default in powergrep. On the next page however there is an other error:

Page 345:
If you simply want to find all correctly structured hr elements, this pattern should be close to 100 percent
sensitive. If the element is spread over several lines:

the pattern could be usefully modified to the following:

The \s metacharacter ensure that tab characters or newline characters are also matched.

I don’t own a copy of powergrep at the moment, but fortunately there is a screenshot on page 345 from which the reader can see that there is actually a checkbox saying ‘Dot matches newline’, which is without doubt highly preferable to the above expression. Sorry for ranting about the screenshots before, as it now occurs to me, that they are there to compensate for the errors and incompletenesses in the text.

Ok, undoubtedly there are many more errors in this book, but I’m tired. I’m going to close it until I need to light my stove.

Don’t buy it.

This worries me most:
About the Author
Andrew Watt is an independent consultant and experienced author with an interest and expertise in
XML and Web technologies. He has written and coauthored more than 10 books on Web development
and XML, including XPath Essentials and XML Schema Essentials. He has been programming since 1984,
moving to Web development technologies in 1994. He’s a well-known voice in several influential online
technical communities and is a frequent contributor to many Web development specifications.

Mister Watt, I don’t want to hurt your feelings, but it occurs to me that only two things could have happened. Either you wrote this book between your soup and patatoes, in which case you deserve a slapping, else you are just not capable to do programming. The last thing is nothing to be ashamed of. Everybody as different talents, and someone at wrox should have told you that maybe it just wasn’t a good idea. In any case, please, please don’t contribute to web specifications. They are very important, and if they are not up to standards, they cost us hours and weeks of lost time to get things working. It is imperative that only the brightest people write these things. Neither you nor me qualify for this… And please understand, that I don’t write this to insult you…

As for the people at wrox. Please withdraw this book from sales, because everyone who buys it is being ripped of… and also the following people should reconsider their positions:
The development editor, Marcia Ellett, was great to work with and did a lot to tidy up my prose to make
a better read for all readers of this book. In addition, her eagle eyes spotted some minor slips that had
slipped through the authorial net. Thanks, Marcia.

Doug Steele, a fellow Microsoft MVP, was technical editor and carried out a tactful and painstaking job
and picked up many little things that the smoke from the author’s midnight oil seemed somehow to
obscure. Thanks, Doug.

Darren Niemke, another MVP, helped with technical editing of a number of chapters. Thanks, Darren.

All people who’s eagle eyes must have been glazed from smoking to much pod, cause they didn’t notice the major slips when they hit them in the forehead.

Now, I don’t want my money back for this one, but please use the money to pay a reviewer that is actually capable of doing their job, and harsh enough to tell the thruth.
Old June 16th, 2006, 10:29 AM
ufo ufo is offline
Authorized User
Join Date: Apr 2006
Location: , , .
Posts: 24
Thanks: 0
Thanked 0 Times in 0 Posts


Does anyone at wrox care enough to at least correct the errata page, or answer to the bug reports?????
Old June 16th, 2006, 08:01 PM
jminatel's Avatar
Wrox Staff
Points: 17,906, Level: 58
Points: 17,906, Level: 58 Points: 17,906, Level: 58 Points: 17,906, Level: 58
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
Join Date: May 2003
Location: Indianapolis, IN, USA.
Posts: 1,906
Thanks: 62
Thanked 139 Times in 101 Posts

The author is aware of your post and errata list as is the editor (me). He is working on getting to the list and potentially getting to a response to some of your broader issues about the book. But, he's finishing up another contract for someone else first at the moment and his obligations on that unfortunately have to be his first priority now.

As for the observations about some of the other folks involved with the list: 2 things: 1. author's usually disclaim the editors from any responsibility for errors remaining in the book, unless of course the author thinks we introduced the errors. But when a TE or DE reports a problem, we usually give the author the final say in how to verify and fix it. 2. Personal attacks aren't likely to result in many positive results, just flame wars, which we highly discourage and douse here. We appreciate passionate feedback but please stay focused on the issues not unsubstantiated accusations even if they are partially in jest. Thanks.

Jim Minatel
Senior Acquisitions Editor
Wiley Technology Publishing
WROX Press
Blog: http://wroxblog.typepad.com/
Jim's Book of the week: No book this week - Donate to the Red Cross!
Old June 19th, 2006, 05:54 AM
ufo ufo is offline
Authorized User
Join Date: Apr 2006
Location: , , .
Posts: 24
Thanks: 0
Thanked 0 Times in 0 Posts

Dear Mr Minatel;

I understand your concern, but maybe you can understand my point of view, where i have spend 40 Euro's on a book that looks pretty good at first sight, where it turns out to be absolutely useless. If i then have to read that the author is so arrogant as to blame software of other people to be buggy, which isn't the case, and also write in a style like the "eagle eyes that might have missed some minor slip", then my tea water starts to boil over. Where it concerns mister Watt, i was genuinely concerned if this kind of people write our web specifications, although I have done a search and can't find his name on any web specifications except for some feedback email like my post here, so it's probably a storm in a glass of water...
Thanks for replying at least, though i think that something is this buggy as this, it would be a token of respect to process the bug report, which in the end only takes like an hour to verify. In the most busy of times, no doubt anyone could make an hour free on a month if they really cared, apart from that, i don't want to make your agenda...

Old June 19th, 2006, 07:45 AM
jminatel's Avatar
Wrox Staff
Points: 17,906, Level: 58
Points: 17,906, Level: 58 Points: 17,906, Level: 58 Points: 17,906, Level: 58
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
Join Date: May 2003
Location: Indianapolis, IN, USA.
Posts: 1,906
Thanks: 62
Thanked 139 Times in 101 Posts

I understand completely and I assure you that I at least press our authors to make resolving errors a high priority. I apologize that this hasn't happened as quickly as it should have.

Jim Minatel
Senior Acquisitions Editor
Wiley Technology Publishing
WROX Press
Blog: http://wroxblog.typepad.com/
Jim's Book of the week: No book this week - Donate to the Red Cross!
Old February 12th, 2007, 06:04 AM
ufo ufo is offline
Authorized User
Join Date: Apr 2006
Location: , , .
Posts: 24
Thanks: 0
Thanked 0 Times in 0 Posts


I see about 9 month's have passed now. I was wondering how Mr.Watt's other contracts are going. Undoubtedly a man like him must be very busy. Personally I stopped buying wrox books. I have been reading Knuth's the art of programming, and actually i was amazed by the quality of this work...
On top of that, there are some lessons in that book for mr Watt. Well, there's without doubt a lot of lessons there, but take the foreword for example. Mr Knuth offers 2.56$ to everyone who finds an error in his book. I would have almost earned my money back on this book. Anyways, I hope your sales are good here at wrox, and for the readers I hope they aren't.


Similar Threads
Thread Thread Starter Forum Replies Last Post
Dont buy this book gmorris59 BOOK: Professional C# 2005 with .NET 3.0 ISBN: 978-0-470-12472-7 0 October 5th, 2007 03:35 PM
Which book to buy? grstad BOOK: Beginning ASP.NET 2.0 and Databases 1 August 7th, 2007 10:41 PM
Where to buy this book ? varun_java BOOK: Beginning JavaServer Pages 4 April 12th, 2007 07:15 AM
Before I buy the book Futuregame BOOK: ASP.NET Website Programming Problem-Design-Solution 1 January 29th, 2004 08:26 PM
I want to buy "Pro JSP" book! fujinova J2EE 0 September 10th, 2003 09:29 PM

Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.