The Basics of Regular Expressions (Regex)

Thiyagu Arunachalam
3 min readOct 11, 2023

--

Photo by Florian Olivo on Unsplash

Regular Expressions, commonly known as Regex, are a powerful tool for working with text.

They allow you to search for and manipulate text based on patterns. Whether you're a budding programmer, a data enthusiast, or simply curious about the magic of text processing, learning the basics of Regex can be a valuable skill.

In this article, we'll break down the fundamentals of Regex in a beginner-friendly manner to help students understand and use this tool effectively.

What Is Regex?

At its core, Regex is a sequence of characters that defines a search pattern. This pattern is used to match strings in text, providing a versatile way to find, validate, and extract information.

Think of it as a secret code for finding specific pieces of text within a larger body of text.

The Basic Building Blocks

  1. Literals: In Regex, regular characters like letters and numbers are treated as literals. For instance, the regex "hello" would match the word "hello" in a text.

2. Metacharacters: These are special characters with unique meanings in Regex. For example:
`.` (period): Matches any character except a line terminator.
`*` (asterisk): Matches zero or more of the preceding element.
`+` (plus): Matches one or more of the preceding element.
`?` (question mark): Matches zero or one of the preceding element.
`|` (pipe): Acts like a logical OR, allowing you to match one of two or more patterns.

3. Character Classes: A character class is used to match any one of a set of characters. For example:
`\d`: Matches any digit (0-9).
`\w`: Matches any word character (letters, digits, or underscore).
`\s`: Matches any whitespace character (spaces, tabs, line breaks).

4. Anchors: Anchors define the start and end of a line. For instance, `^` matches the start of a line, and `$` matches the end of a line.

5. Groups: Parentheses `()` are used to group characters or expressions together. This is useful for capturing parts of a matched string.

Practical Examples

Let's explore a few practical examples to demonstrate how Regex works:

1. Matching Email Addresses:
- Regex Pattern: `^\w+@\w+\.\w+$`
- Explanation: This pattern matches email addresses by requiring one or more word characters, followed by an at symbol, another set of word characters, a period, and one or more word characters.

2. Finding Dates:
- Regex Pattern: `\d{1,2}/\d{1,2}/\d{4}`
- Explanation: This pattern finds dates in the format "mm/dd/yyyy." It looks for one or two digits, a forward slash, one or two digits, another slash, and four digits for the year.

3. Extracting Phone Numbers:
- Regex Pattern: `\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}`
- Explanation: This pattern extracts phone numbers that can have optional parentheses around the area code, as well as various separators like dashes, periods, or spaces.

In summary

Regex is a versatile tool that can help students work with text data more effectively. While we’ve covered the basics in this article, it’s just the tip of the iceberg.

As you delve deeper into the world of Regex, you’ll discover its limitless potential for text processing and manipulation.

So, roll up your sleeves, practice, and unlock the power of regular expressions in your projects, from data analysis to web development.

--

--

Thiyagu Arunachalam

Hi there! I'm a science and technology enthusiast with a passion for writing about the latest developments in the fields of science and coding.