Qindex Programming Tips
55 [Tips] Regular Expression
written by Qindex at 2006-10-28 01:12 /

<script type='text/javascript'>
re = new RegExp("\\?");
if(!re.test(strng)) strng += '?';
re = new RegExp("<object (.+)</object>",'gmi');
if(arry=strng.match(re)) src  = arry[0];
document.getElementById('i_uri').value = strng.replace(/(&|&amp;)?autoplay=1/i,'');
</script>
* When using the constructor function, the normal string escape rules (preceding special characters with \ when included in a string) are necessary.:
re = /\w+/ <=> re = new RegExp("\\w+")
When using the literal format, the forward slash should be escaped.:
re = new RegExp("a/b" ) <=> re = /a\/b/
<?php
if(preg_match("/doc\[uri\]=(.+)&doc\[title\]=/i",$arry['value'],$mtchs)) {
    $keyword = "<a href='".urldecode($mtchs[1])."' target='_blank'>".substr(urldecode($mtchs[1]),0,35)."</a>";
}
?>
 
Extract file names with relative paths from the results of EditPlus 'Find in Files'
^"ftp\([0-9]+\):([^"]+)"\([0-9]+,[0-9]+\)
^"[^:]+:([^"]+)"\([0-9]+,[0-9]+\)
 
Extract URL, title and description from a site
  <h2><a href="/p/a/search-video-web-20-applications-sites-yuscano/1603">yuscano</a></h2>
  <h3>Scan youtube and find the videos you want to see</h3>
  <div class="appimage"><a href="
http://www.yuscano.com/"><img src="/images/thumbnail/www.yuscano.com.jpg"
">([^<]+)</a></h2>\n\s+<h3>([^<]+)</h3>\n\s+<div class="appimage"><a href="([^"]+)"><img
 
Extract the host name from a referrer
http://forums.devshed.com/website-critiques-89/hierachical-quick-index-512897.html
http://210.117.121.40:8080/xs_btn/uxcn.asp ?n=2&u=56741&i=0&y=0&m=0&k=0&t=Y&b=702&s=3600&x=www.qindex.info
^(?:http|https)://([0-9a-zA-z.:-]+)
 
Parsing DTD string
<!DOCTYPE ([a-z:]+)(?:\s+((?:PUBLIC)|(?:SYSTEM))(?:\s+"(-//[^"]+)")?(?:\s+"(http://www.w3.org/[^"]+)")?)?>
          1              2                             3                   4
if($hndl = fopen($arry['uri'],"rb")) {
    $cntnt = '';
    while (!feof($hndl)) $cntnt .= fread($hndl, 8192);
    fclose($hndl);
    $pttrn = "/<!DOCTYPE ([a-z:]+)(?:\s+((?:PUBLIC)|(?:SYSTEM))(?:\s+\"(-\/\/[^\"]+)\")?(?:\s+\"(http:\/\/www.w3.org\/[^\"]+)\")?)?>/";
    preg_match($pttrn,$cntnt,$mtchs);
}
 
 
File name match
<%
Dim RE
Set RE = New RegExp
RE.Pattern = "test\.asp"
RE.IgnoreCase = True
RE.Global = True
Response.Write RE.Test(Request.ServerVariables("SCRIPT_NAME"))
%>
 
 
Parsing a URI Reference with a Regular Expression(RFC3986)

   As the "first-match-wins" algorithm is identical to the "greedy"
   disambiguation method used by POSIX regular expressions, it is
   natural and commonplace to use a regular expression for parsing the
   potential five components of a URI reference.
   The following line is the regular expression for breaking-down a
   well-formed URI reference into its components.
 
      ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
       12            3  4          5       6  7        8 9
 
   The numbers in the second line above are only to assist readability;
   they indicate the reference points for each subexpression (i.e., each
   paired parenthesis).  We refer to the value matched for subexpression
   <n> as $<n>.  For example, matching the above expression to
      http://www.ics.uci.edu/pub/ietf/uri/#Related
   results in the following subexpression matches:
 
      $1 = http:
      $2 = http
      $3 = //www.ics.uci.edu
      $4 = www.ics.uci.edu
      $5 = /pub/ietf/uri/
      $6 = <undefined>
      $7 = <undefined>
      $8 = #Related
      $9 = Related
 
   where <undefined> indicates that the component is not present, as is
   the case for the query component in the above example.  Therefore, we
   can determine the value of the five components as
 
      scheme    = $2
      authority = $4
      path      = $5
      query     = $7
      fragment  = $9
 
   Going in the opposite direction, we can recreate a URI reference from
   its components by using the algorithm of Section 5.3.
 
var re_URL = /^(([^:\/?#]+):)?(\/\/([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?/i;
var rfrrr_s = Rfrrr.match(re_URL);
Host = rfrrr_s[4];
 
 
Extracting the source, width and height of an embeded object code
 
<object style='width:425px; height:355px;'><param name="movie" value="http://www.youtube.com/v/hoytrHE821o&hl=en"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/hoytrHE821o&hl=en" type="application/x-shockwave-flash" wmode="transparent" width="425" height="355"></embed></object>
 
<object (.+)</object>
<param\s+name\s*=\s*(?:'|")?movie(?:'|")?\s+value\s*=\s*(?:'|")?([^'"]+)(?:'|")?\s*/?>
\s+width\s*=\s*(?:'|")?([0-9]+)(?:'|")?
\s+height\s*=\s*(?:'|")?([0-9]+)(?:'|")?
 
width:([0-9]+)px
height:([0-9]+)px
 
<embed (.+)</embed>
\s+src\s*=\s*(?:'|")?([^'"]+)(?:'|")?
\s+width\s*=\s*(?:'|")?([0-9]+)(?:'|")?
\s+height\s*=\s*(?:'|")?([0-9]+)(?:'|")?
 
width:([0-9]+)px
height:([0-9]+)px

http://blog\.naver\.com/([0-9a-zA-Z_]+)
 
 
 
 
 
 
 
 
 
 
 
 
function Q_URI($strng,$mode='match') {
    $rglr_exprssn =
          ""
        . "((?:http)|(?:https)|(?:ftp)|(?:mms)):\/\/" //PROTOCOL
        . "((?:[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})|(?:[-_.0-9a-zA-Z]+))" //HOST
        . "(?::([0-9]+))?" //PORT
        . "((?:\/[-_.!~*'()0-9a-zA-Z%]*)+)?" //PATH
        . "((?:(?:\\?|&)[-_.!~*'()0-9a-zA-Z%]+(?:=[+-_.!~*'()0-9a-zA-Z%]*)?)+)?" //QUERY_STRING
        . "(#[-_.!~*'()0-9a-zA-Z%]+)?" //HASH
        . "";
    if($mode=='match') {
        if(preg_match("/^".$rglr_exprssn."\$/",$strng,$tmp)) {
            $arry = array();
            $arry['PROTOCOL']     = $tmp[1];
            $arry['HOST']         = $tmp[2];
            $arry['PORT']         = $tmp[3];
            $arry['PATH']         = $tmp[4];
            $arry['QUERY_STRING'] = $tmp[5];
            $arry['HASH']         = $tmp[6];
            return $arry;
        } else return false;
    } elseif($mode=='replace') {
        return preg_replace("/(".$rglr_exprssn.")/","<a href='\${1}' target='_blank'>\${1}</a>",$strng);
    }
}

----------------------------------------------------------------------------------
<a href='
http://qindex.info/Q_drctry/sample/-_.!~%20()' '>
address(
http://qindex.info/Q_drctry/sample/-_.!~'%20() )



 [post]
[permission] read:Anonymous, comment:Anonymous, write:Webmaster, upload:Webmaster, manage:Webmaster
Qindex.info