How can I see my friends' birthdays?
Re: Extracting links. - without modules
Hello Guest
  
  • Login
• Register…
• Start blog
  • Who, Where, When
• What is interesting here?
• Duels
  • Polls
• Avatars
• Interests
  • Cities and Countries
• Random blog
• Users search
  • Search
• Games
• Tests
• QAIX
  • Сообщества
• Talxy Chat
• Horoscope
• Online
 
Register!

QAIX > Perl web-programming > Re: Extracting links. - without modules 17 January 2005 15:45:43

  Top users: 
  Recent blog posts: 
  They have birthday today: 
  Forums:   
  Discuss: 
  Recent forum topics: 
  Recent forum comments:
  Модератор:

Re: Extracting links. - without modules

Alexander Blm 17 January 2005 12:26:11
 On Sun, 16 Jan 2005 18:36:09 +0500
"Sara" <sara_samsara@hotpo­p.com> wrote:
I am trying to extract links along with HTML tags <a href=blah> from a> list, but it's not working on my XP machine with Active State Perl> 5.0.6 Kindly help.>
################# CODE START ###################­#>
my @array = qq|> <body><a href="http://www.my­domain.com"><img alt="Free Hosting,> Freebies" border=0> src="http://www.myd­omain.com/images/log­o2.gif"></a>|;#extra­ct LINKS> (no image links) only <a href="http://www.my­domain.com">>
my @get = grep {/<a .*?>/} @array;> print "@get\n">
###################­ CODE END ###################­>
Thanks,>
Sara.>

this is also possible _without_ any modules, except maybe "strict".

# this will replace the contents of each match in @get
foreach(@array){
my @get = $_ =~ /<a href="(.*?)">/g;
}

or:

# this will add each match to @get
my @get = ();
foreach(@array){
push @get, $_ =~ /<a href="(.*?)">/g;
}


--
Cheers,
Alex
Add comment
Chris Devers 17 January 2005 15:45:43 permanent link ]
 On Mon, 17 Jan 2005, Alexander Blьm wrote:
this is also possible _without_ any modules, except maybe "strict".>
# this will replace the contents of each match in @get> foreach(@array){> my @get = $_ =~ /<a href="(.*?)">/g;> }

What happens if the url has a doublequote followed by an angle bracket?

It's not likely, but it can happen, and it can work.

And if such a URL is discovered, this regex would break.

What happens if the url isn't wrapped in quotes at all?

This is much more likely, and again will work fine in browsers.

But again, this regex won't find it at all.

This kind of problem is why HTML (and XML) is really best processed
using pre-written parser modules, such as HTML::SimpleLinkExt­or. A
parser has a much better shot at getting a proper view of the document
than a simple regex pattern match.

Yes, you can approach such problems using simple regular expressions,
such as what we have here, and in many cases they'll work, and maybe
even work faster than the parser version would. On the other hand, this
approach is much less generally robust: minor changes that don't break
the HTML may break the regex, so you end up having to constantly adjust
it to handle all the special cases that come up over time.

If you just parse it at the outset, such as with HTML::SimpleLinkExt­or,
then the code should be simple, robust, and useful for a long time.




--
Chris Devers
Add comment
 

Add new comment

As:
Login:  Password:  
 
 
  
 
Пожалуйста, относитесь к собеседникам уважительно, не используйте нецензурные слова, не злоупотребляйте заглавными буквами, не публикуйте рекламу и объявления о купле/продаже, а также материалы нарушающие сетевой этикет или законы РФ. Ваш ip-адрес записывается.


QAIX > Perl web-programming > Re: Extracting links. - without modules 17 January 2005 15:45:43

see also:
4.1-Schema won't assign, snaps back
Mysql_fetch_array help from a newbie…
pass tests:
see also:

  Copyright © 2001—2010 QAIX
Идея: Монашёв Михаил.
Авторами текстов, изображений и видео, размещённых на этой странице, являются пользователи сайта.
See Help and FAQ in the community support.qaix.com.
Write in the community about the bugs you have noticedbugs.qaix.com.
Write your offers and comments in the communities suggest.qaix.com.
Information for parents.
Пишите нам на .
If you would like to report an abuse of our service, such as a spam message, please .
Если Вы хотите пожаловаться на содержимое этой страницы, пожалуйста .