Match patterns like a pro

Regex or regular expressions are a way to match patterns in text, it is kind of like a language. The reason it is an important part of linux is that many programs in linux use regular experessions and by learning them you can become an advanced user.

Metacharacters

These are some special characters used in regular expressions to match complex patterns.
Some of these metacharacters are: ^, $, ., [ ], { }, -, ?, +, *, ( ), |, \ etc.
Moving on we'll learn about these metacharacters one by one.

The caret (`^`) sign

It is used to match the start of a line.
For example if we want to match the line that starts with "Because", we can do this:
- ^Because

The dollar (`$`) sign

It is used to match the end of the line.
For example if we want to match the line that ends with "bye", we can do this:
- bye$

Match anything using (`.`)

So the . character in regular expression can match any other character.
For example if you are matching 3 letter words starting with "ma", like "man", "map", "mam", "may", "mac" etc. We can do this:
- ma.
To be honest a better way to search for 3 letter words starting with "ma", would be this:
- \bma.\b
\b matches the beginning or end of a word.

Create character classes (`[ ]`) and specify ranges (`-`)

Character classes are a way to match a character from the given list of characters.
For example:
- [bch]atman can match "batman", "catman" or "hatman"
The above regular expression does not match "bcfatman" as beginners sometimes misunderstand.
You can also give ranges for the character classes using hyphen.
Pre-built character classes:
- \d - Matches any digit [0-9].
- \D - Matches everything except digit [^0-9].
- \w - Matches any character, digit or underscore [a-zA-Z0-9_].
- \W - Matches everything except character, digit or underscore [^a-zA-Z0-9_].
- \s - Matches space, tab, carriage return, line feed or form feed. [ \t\r\n\f].
- \S - Matches everything except space, tab, carriage return or form feed [^ \t\r\n\f].
Examples:
- [a-zA-Z0-9]
- [a-z0-9]
- [A-H]atman
- [a-z\-] - Matches lowercase characters and hyphen (-)

Match a number of characters at a time (`{ }`)

You can match some pattern a minimum, maximum or fixed number of times using braces.
Let's create a regex for an IP address.
An IP address is of the format:
- someNumberBetween0-255.someNumberBetween0-255.someNumberBetween0-255.someNumberBetween0-255
So we can say in regex it is:
- [0-9] (minimum 1 to maximum 3), yes it will include numbers out the range 0-255 but let's get to that later.
- So we can write the above expression as \d{1,3}
- After the number we have a full stop but we need to escape it since fullstop is a metacharacter in regex.
  - So it becomes \d{1,3}\.
- Now we can to repeat this whole pattern 3 times, so it becomes:
  - (\d{1,3}\.){3} Here we grouped the whole previous pattern using (), we'll learn about it later.
- Now at last their is one more number which can be minimum 1 digit to maximum 3 digits.
  - (\d{1,3}\.){3}\d{1,3}
Now if we want to be strict with the 0-255 range we can do something like this:
- ((1?\d{1,2}|2[0-5]{2})\.){3}(1?\d{1,2}|2[0-5]{2})

Match zero or one times (`?`)

You can match a character zero or one time using ?, it is helpful when a character may or may not exist in string.
For example if we want to match a phone number it may or may not have + symbol in the beginning:
- \+?91-\d{10}

Match one or more times (`+`)

You can match a character one or more times using +, it is helpful when you want the character to be matched atleast once.
So if we want to match "Baby shark do do do do...", we can do this:
Baby shark (do\s?)+

Match zero or more times (`*`)

You can match a character zero or more times using *.
Example if we want to match any tag in html, we can do this:
- <.*>

Grouping (`( )`)

We can group parts of our regex to specify the number of times they should be matched or to output the groups seperately (possible in many programming languages).
Like in the IP address example we looked before we grouped parts of it to match them a specific number of times using braces { }
Examples:
- ((1?\d{1,2}|2[0-5]{2})\.){3}(1?\d{1,2}|2[0-5]{2})
- Baby shark (do\s?)+

Or operator (`|`)

This is used when one of the specified pattern/string may be present but we're not sure which one.
Example:
- bat(man|woman) - This will match "batman" or "batwoman"
The above example can also be done like this:
- bat(wo)?man

Escape character (`\`)

Many times we want to use characters like +, *, ?, $ or ^ as normal characters in our regex.
To do that we can use \ to escape those metacharacters.
Example:
- \+1 $\d{3}$ \d{3}\-\d{4} - Matches US phone numbers

Negation (`^`)

When the caret symbol is used with character classes, it negates that character class.
Example:
- [^A-Z] - Matches everything except uppercase alphabets.

grep - global regular expression print

This is a program that can use regular expression to find patterns in files.
Usage: grep [OPTIONS] <pattern> [FILE...]
Use the -i or --ignore-case flag to do a case-insensitive match.
Use the -v or --invert-match flag to print lines without the pattern.
Use the -l flag to only output file name of match.
Use the -c flag for count of matches.
Use the -n flag for number of line match was found on.
Use the -P flag for perl style regex use.
Use the -r flag for recursive search inside a directory.
Use the -o flag to output only what matched.
Examples:
- grep -Pn "Baby shark (do\s?)+" babyShark.lrc
- grep -ron "John Doe" ./dir/
- grep -Pic "Bat(wo)?man" batmanChronicles.script

Regular Expressions

Match patterns like a pro

Metacharacters

The caret (`^`) sign

The dollar (`$`) sign

Match anything using (`.`)

Create character classes (`[ ]`) and specify ranges (`-`)

Match a number of characters at a time (`{ }`)

Match zero or one times (`?`)

Match one or more times (`+`)

Match zero or more times (`*`)

Grouping (`( )`)

Or operator (`|`)

Escape character (`\`)

Negation (`^`)

grep - global regular expression print

find command

locate command

results matching ""

No results matching ""

Match patterns like a pro

Metacharacters

The caret (^) sign

The dollar ($) sign

Match anything using (.)

Create character classes ([ ]) and specify ranges (-)

Match a number of characters at a time ({ })

Match zero or one times (?)

Match one or more times (+)

Match zero or more times (*)

Grouping (( ))

Or operator (|)

Escape character (\)

Negation (^)

grep - global regular expression print

find command

locate command

results matching ""

No results matching ""

The caret (`^`) sign

The dollar (`$`) sign

Match anything using (`.`)

Create character classes (`[ ]`) and specify ranges (`-`)

Match a number of characters at a time (`{ }`)

Match zero or one times (`?`)

Match one or more times (`+`)

Match zero or more times (`*`)

Grouping (`( )`)

Or operator (`|`)

Escape character (`\`)

Negation (`^`)