On Sun, 16 Jan 2005 18:36:09 +0500 "Sara" <sara_samsara@hotpop.com> wrote:
I am trying to extract links along with HTML tags <a href=blah> from a> list, but it's not working on my XP machine with Active State Perl> 5.0.6 Kindly help.>
Chris Devers 17 January 2005 15:45:43 [ permanent link ]
On Mon, 17 Jan 2005, Alexander Blьm wrote:
this is also possible _without_ any modules, except maybe "strict".>
# this will replace the contents of each match in @get> foreach(@array){> my @get = $_ =~ /<a href="(.*?)">/g;> }
What happens if the url has a doublequote followed by an angle bracket?
It's not likely, but it can happen, and it can work.
And if such a URL is discovered, this regex would break.
What happens if the url isn't wrapped in quotes at all?
This is much more likely, and again will work fine in browsers.
But again, this regex won't find it at all.
This kind of problem is why HTML (and XML) is really best processed using pre-written parser modules, such as HTML::SimpleLinkExtor. A parser has a much better shot at getting a proper view of the document than a simple regex pattern match.
Yes, you can approach such problems using simple regular expressions, such as what we have here, and in many cases they'll work, and maybe even work faster than the parser version would. On the other hand, this approach is much less generally robust: minor changes that don't break the HTML may break the regex, so you end up having to constantly adjust it to handle all the special cases that come up over time.
If you just parse it at the outset, such as with HTML::SimpleLinkExtor, then the code should be simple, robust, and useful for a long time.
If you would like to report an abuse of our service, such as a spam message, please . Если Вы хотите пожаловаться на содержимое этой страницы, пожалуйста .