geoff-young

📝📞📹💻
Using regex with JavaScript
Created
Sep 20, 2021 05:35 PM
Property
blog

Regex was easier to learn and more useful than I thought

notion image
I had a string that included some HTML. I wanted the URL of the first '<img>' component if it existed. I hadn't really worked with strings before and thought it'd be faster to just use some JavaScript functions instead of learning regex. The code below is an example string.
const htmlString = '<h1>Regex and Javascript</h1><h2>It feels good to progress</h2><img src="https://talktree.me/favicon512.png" alt="highliners"/>'
I used the JS function slice() to extract the string I wanted and indexOf() to know where to extract.
const firstImageBeginning = htmlString.indexOf('<img')
if (firstImageBeginning > -1) {
  const firstImageEnd = htmlString.indexOf('/>', firstImageBeginning)
  const firstImage = htmlString.slice(firstImageBeginning + 10, firstImageEnd - 2)
}
Getting the URL from a string as shown in the code above was very problematic. I had to incorporate an if/then statements in case indexOf() returns -1 when the string is not found. This also wouldn't work if the alt tag showed up in front of 'src' since I used a fixed starting position, ('<img src="') is 10 characters, hence the '+ 10'.
I looked into regex and did some lessons on RegexOne and used RegExr to play around. It took me less than an hour.
Now I only need to use the match() function as shown in the code below. It returns either null if no match is found, or an array with the matched string.
const wholeImgTag = htmlString.match(/<img.+?src=".+?"/)
Here is the regex deconstructed:
  • '/' starts and ends the expression
  • '<img' is interpreted literally, it looks for those characters
  • '.' is a special character, it represents any character
  • '+' is also a special character to match the previous character
  • '?' is also a special character for 'if followed by'
  • 'src=" ' and '"' are interpreted literally
So it's looking for '<img' followed by anything until 'src="' followed by anything until '"'
If we console.log(wholeImgTag), we get the following array:
[
  0: "<img alt="highliners" src="https://talktree.me/favicon512.png"",
  groups: undefined,
  index: 63,
  input: "<h1>Regex and Javascript</h1><h2>It feels good to …ners" src="https://talktree.me/favicon512.png" />",
]
The response we normally want is [0]. But in this case, I want the URL and not the whole <img> tag so I'm going to create a group.
const wholeImgTag = htmlString.match(/<img.+?src="(.+?)"/)
This is done by adding parenthesis. I added some around the second '.+?'. Now if we console.log(wholeImgTag) again:
[
  0: "<img alt="highliners" src="https://talktree.me/favicon512.png"",
  1: "https://talktree.me/favicon512.png",
  groups: undefined,
  index: 63,
  input: "<h1>Regex and Javascript</h1><h2>It feels good to …ners" src="https://talktree.me/favicon512.png" />",
]
I can access the exact string I want with 'wholeImgTag[1]'
Note: if you use the global tag, '/g', in the regex the returned array will contain the matched strings and not include the groups, index, or input.

Thoughts

Regex initially seemed complicated and intimidating. But once I learned the basic special characters, it became incredibly easy to use. Now I don't shy away from working with strings and know I can use regex to get exactly what I want.
notion image