Regular Expression in C++

This article summarizes the regular expression syntax and usage in C++. Most of the content of this article is from Bo Qian’s modern C++ tutorial series: https://www.youtube.com/playlist?list=PL5jc9xFGsL8FWtnZBeTqZBbniyw0uHyaH

If you are new to C++, I personally highly recommend following his channel. His tutorial is well organized and easy to follow. If you just want to quickly go through the content, just scroll down. Let’s start.

Regular Expression:  is a sequence of characters that define a search pattern. Usually, such patterns are used by string searching algorithms for “find” or “find and replace” operations on strings, or for input validation.    — Wikipedia

Modern C++ supports 6 types of regular expression system:

– ECMAScript (default)
– basic
– extended
– awk
– grep
– egrep

The default regular expression system in C++ is ECMAScript. In this article, we will specifically focus on ECMAScript.

1. Regular Expression Syntax

In order to use the regular expression in C++, you should first include the regex header file:

#include <regex>

Then you can easily define your own regular expression using the following syntax:

regex e(“abc”);

If you want to change the regular expression system, you can simply add a flag during the definition of the regular expression:

regex e(“abc”, regex_constants::grep); // Now we change the regular expression system from ECMAScript to grep.

The following code provides an overview of how to use regular expression in C++. You may copy and paste the code in your ide, uncomment the code line by line and check the result by yourself.


2. Sub-match in Regular Expression

We can store the match result in smatch data structure, and get the detailed matching result of the specific groups defined in the regular expression. The general syntax and the meaning of smatch are provided below:

std::match_result<> – store the detailed matches!
smatch – detailed match in the string!

smatch m;
m[0].str() – The entire match (same with m.str(), m.str(0))
m[1].str() – The substring that matches the first group (same as m.str(1))
m[2].str() – The substring that matches the second group
m.prefix() – Everything before the first matched character
m.suffix() – Everything after the last matched character

The following code provides an overview of how to use the smatch library:

Here is a screenshot of the result if we run the above code:

RE_Results


3. Iterators in Regular Expression

If we have defined a string like the following:

string str = “zhangxm01@gmail.com; zhan@163.com; zhan_j@yahoo.com”;

and we want to extract the e-mail and domain names in this string, what should we do? Unfortunately, the previous code in section two won’t work because it only extracts the first e-mail and domain name.

In this case, we need to use iterators or token iterators in the regular expression. The following code shows how to use iterators in the regular expression:

Try it yourself! A sample screenshot of the result is provided below:

RE_Results_02

An alternative way to do this is to use token iterator:

The difference between regular expression iterator and the token iterator is that the first one will point to a detailed match, which is why that iterator can have multiple data members, and each data member corresponding to a sub-match. The token iterator can only point to a sub-match, which is why pos->str() cannot have any parameter in str().


4. Regular Expression Replace

we can use regex_replace to replace the regular expression groups into strings.

Try the following code yourself:


Thank you for reading. Have a nice day!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s