Given we have the following HTML document (as a string), we want to slice out all of the opening tags.
<div>
<p class="content">
Consider the SUT safe.
<a href="https://www.google.com">
Google
</a>
is your friend. <br />
Another line here.
<hr class="divider" />
Final line
</p>
</div>
Our first attempt comes close:
html_string = <<-HTML
<div>
<p class="content">
Consider the SUT safe.
<a href="https://www.google.com">
Google
</a>
is your friend. <br />
Another line here.
<hr class="divider" />
Final line
</p>
</div>
HTML
pattern = /<([a-z]+) *[^\/]*?>/
html_string.scan(pattern)
#=> [["div"], ["p"]]
but we seem to be missing the a
tag in the list. What can we do to fix
this?