Wednesday, February 24, 2010

I’m using my degree … finally: A book review.

Note: This review may contain spoilers. Where possible I will point them out in advance.

A few weeks ago, this guy I work with, we’ll call him “Sam” came to me and said, “Here’s a book I want you to look at. It’s got some stuff in it about flexible pattern matching in strings.”

I looked down at a bright yellow book, called “Flexible Pattern Matching in Strings”.

“Ok, Sam.” I argued.

Sam continued, “Once upon a time I implemented the “Set Horspool” algor …”

You know what? Let’s call him “Ted”. “Sam” is just not working for me.

“… Algorithm, but lost the source code. I want you to read this book and find the best way known to man or beast to search for any of a list of strings within a target string,“ Ted went on to explain.

At first, the idea of reading what looked like a textbook didn’t appeal to me. But Ted sweetened the deal by telling me there were algorithms described inside the book. I love algorithms. Ted knows that. “What the heck, give me that book Sam. I mean Ted!”

Several white board drawings and unrelated personal anectodes later, Ted left me alone with the little yellow book.

A voice inside my head said, “This is your chance Freddie. The opportunity you’ve worked for. Don’t blow it.”

I swiveled abruptly in my office (cubicle) chair. I hadn’t immediately realized the voice was internal. “What do you mean, “opportunity”?

Voice: You know as well as I do what I mean.

Me: Then why don’t you fill us both in?

Voice: Seriously?

Me: Please?

Voice: No.

I may never know what the voice meant. But I knew that this was a chance to use my formal training in computer science. Taking a closer look at the book, I notice it’s not bright yellow, but more pale. Hmm. Must be the lighting. I carefully open the book. Ted is pretty anal about his stuff so I don’t want to get spaghetti sauce on it or anything. As I begin to read, I realize what a profoundly wonderful book this is. Well, after about 1 and a half chapters. I had to kind of skim over chapter one, the elementary crap about bit-parallelism and bit operations and the labeled rooted tree and trie crap (yawn) and get right to the good stuff in the middle of chapter 2. This is where the author struts his stuff. Showcasing his talent, he masterfully paints the tale of flexible pattern matching history. From its humble beginnings in a sleepy midwestern village where the controversial Knuth-Morris-Pratt idea came to prominence all the way up to jaw dropping discoveries like Boyer-Moore, Horspool etc. From start (1 and a half chapters in) to finish (about chapter 5 or so) You learn the truth about algorithms you've heard about your whole life but never believed actually existed.

SPOILER ALERT!!!: Turns out, Horspool is an improvement over the original Boyer-Moore idea. I know, right?

Let me tell you, if you’ve ever had a need to match patterns flexibly, or even if you just consider yourself a weekend flexible pattern in strings matcher, here’s your book. I’ll warn you, though. If you do get this book, keep your eye on it. People will be “borrowing” it from your cube on a regular basis. Yeah, It’s that good.

Whatever you do, don’t skip the section on the Backward Nondeterministic Dawg Matching Algorithm. I won’t spoil it for you, but I will ask that you thank me later for the heads up.

BIG HUGE SPOILER ALERT, AND THE REASON FOR TED’s VISIT IN THE FIRST PLACE: Though the Horspool Algorithm is great for finding one particular substring, its multiple string version, “Set Horspool” sucks ass. Thankfully, there’s an answer. In the late 80’s, early 90’s a couple of guys by the unfortunate names of Udi Manber and Sun Wu describe what turns out to be one of the most efficient ways to find any of a set of substrings within a certain string. It is named after its inventors. By now, it should be obvious I’m talking about the “Wu-Manber Algorithm!” Ok, be honest. Who thought it was “Manber-Wu?” Silly reader!

So I read the book. Got the info I needed and wrote a program that reads in a list of words and looks for their occurrence in some text. And it does it really really fast. Thanks Little Yellow book!

By the way. The reason Ted wanted this thing? Well, here at the company, we have lots of information. We also have a list of potentially offensive words. We like to run the information through looking for these 400 or so words. I ran this blog post through it. Results below:

---
ix = 0, match found: CRAP
ixTemp = crap
---
---
ix = 0, match found: CRAP
ixTemp = crap
---
---
ix = 1, match found: HATE
ixTemp = hatever
---
---
ix = 0, match found: SUCK
ixTemp = sucks
---
---
ix = 0, match found: SUCKS
ixTemp = sucks
---
---
ix = 0, match found: ASS
ixTemp = ass
---

5 comments:

Lorraine Mutschler said...

You're kidding,. right? Tell me you're kidding . . . who talks like this?

Flintstone R Cube said...

I would like to simultaneously thank you and apologize to you for reading this. The story is loosely based on actual events. Most of it is my weak attempt at satire. I was conflicted in college about whether to be an English or computer science major. I followed the money. If I took more time with these posts, my intentions might be clearer. But that sounds like work.

I really do like a good algorithm. "Ted" really does know it. The algorithm names (which are true) are so unimaginative that I wanted to comment. I'll never be as smart as Wu or Manber, but I bet I could come up with a better name for their algorithm.

Note: If you follow the amazon link in the post and read the (2) reviews of the book, you'll see who talks like this. But yeah, I was just having a laugh at everyone else's expense. Again, thanks and sorry.

brady said...

OK, I've got this one. Speaking for the entire breadth of teh internets, that bit on the Backward Nondeterministic Dawg Matching Algorithm is totally awesome!

Thank you very much. Indeed, after a hearty cry with a productive expectoration of sputum, my spirit soared from this touching story. Bravo!

brady said...

I double dog dare you to run that comment through your Flexible Pattern Matching algorithm.

Flintstone R Cube said...

Never one to back down from a double dog dare, I ran it, but It may not be working right. Or maybe I typed in the slang/english dictionary accidentally.

---
ix = 0, match found: HOCK
ixTemp = expectoration
---
---
ix = 0, match found: LOOGIE
ixTemp = sputum
---