Home Forum User CP Browse Members Calendar Register Today!  
Get New posts Faq / Help?
   


Not A Member Yet? Register today and become part of the community.

Go Back   Webmaster Forum - 9MB.com > Webmaster Forum > Web Design

Web Design Discussions on HTML, XML, PHP, Perl, etc. as well as CSS and content management.

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 07-13-2008, 04:01 AM
sujith.it
Status: Offline
Junior Member
 
Join Date: Jul 2008
Posts: 13
iTrader: (0)
Rep Power: 0
sujith.it is an unknown quantity at this point
Parse html with preg_match_all..howto

For the most of the PHP scripters which are using preg_match or preg_replace frequently is the function preg_match_all a smaller advantage, but for all others it's maybe hard to understand. The biggest difference between preg_match_all and the regular preg_match is that all matched values are stored inside a multi-dimensional array to store an unlimited number of matches. With the following example I will try to make clear how it’s possible to store the image paths inside a webpage:

<?php
$data = file_get_contents("http://www.finalwebsites.com");
$pattern = "/src=["']?([^"']?.*(png|jpg|gif))["']?/i";
preg_match_all($pattern, $data, $images);
?>
We take a closer look to the pattern:
"/src=["']?([^"']?.*(png|jpg|gif))["']?/i"

The first part and the last part are searching for everything that starts with src and ends with a optional quote or double quote. This could be a long string because the outer rule is very global. Next Virginia website design check the rule starts within the first bracket:
"/src=["']?([^"']?.*(png|jpg|gif))["']?/i"

Now we are looking inside this long string from the outer rule for strings starting with an optional quote or double quote followed by any characters. The last part inside the inner brackets is the magic:
"/src=["']?([^"']?.*(png|jpg|gif))["']?/i"

We are looking next for a string that is followed by a file extension and match we get all the paths from the html file. We need all the rules to isolate the string parts (image paths) from the rest of the html. The result looks like this (access the array $images with these indexes, or just use print_r($images)):
$images[0][0] -> scr="/images/english.gif"
$images[1][0] -> /images/english.gif
$images[2][0] -> gif
Reply With Quote

____________
I Search Click Info Web Search Engine
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT -4. The time now is 09:28 PM.

Skin Design By vBSkinworks



Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC8
Copyright (c) 2007-2008 - All Rights Reserved - 9MB.com