Jenda.Rex - Regular expressions via OLE
Version 0.14.05
Dim re As Object Dim arr As Variant Dim info As Collection Set re = CreateObject("Jenda.Rex")
if re.Test( strMail, "\w+@\w+\.\w+") then MsgBox "Look's like a mail address", vbInformation end if
arr = re.Match( strINILine, "^([\w\d]+)=(.*)$") if isEmpty(arr) then MsgBox "Malformed line! : " & strINILine, vbCritical else info.Add arr(1), arr(0) end if
re.TieCollection info, "info"
strTemplate = "%id%: %programtitle% by %author% (version %version%)" strToShow = re.Replace strTemplate, "%(.*?)%", "$info{$1}", g
The Microsoft VBScript Regular Expressions (5.5) object is funny at best, it is hard to use, leads to lengthy code and really looks like something a VB programmer made.
I was so frustrated by the object that I sat down and wrote my own.
It is written in Perl so it provides full Perl regular expressions. It even allows you to access VB arrays, collections, recordsets and other similar objects from the replacement string.
The parts of the string captured by groups in the regexp may be accessed using the TestMatched()
or Matched()
method.
If RE.Test( stringToTest, regularExpression, options) Then result = RE.TestMatched(0) & RE.TestMatched(1) is equivalent to if ($stringToTest =~ /regularExpression/options) { $result = $1 . $2;
Please note that the matches are indexed from 0, not 1 in TestMatched()
!!!
You may use the common regular expressions defined in Regesp::Common http://search.cpan.org/~abigail/Regexp-Common-2.120/lib/Regexp/Common.pm:
If RE.Test( stringToTest, "^$RE{num}{real}") Then
TestMatched()
and each Jenda.Rex.Prepared object has its own TestMatched()
buffer.
arr = Match( stringToMatch, regularExpression, options) is equivalent to @arr = ($stringToMatch =~ /regularExpression/options)
1) If the options DO NOT include ``g'' and there are no () groups you'll get a one element array containing (1) if the regexp matches, Empty otherwise.
2) If the options DO NOT include ``g'' and there are () groups you'll get an array containing the strings matched by the subregexps in the () groups if the regexp matches, Empty otherwise.
3) If the options DO include ``g'' and there are no () groups you'll get an array containing all the strings matched by the whole regexp if it matched at least once, Empty otherwise.
4) If the options DO include ``g'' and there are some () groups you'll get an array containing all the strings matched by all the () groups from all matches of the whole regexp in case of success, Empty otherwise.
This means that if there are three () groups then arr(0)
contains the string
matched by first () group the first time, arr(1)
the second group, arr(2)
the third group,
arr(3)
the second match for the first () group and so on. If a () group matched an empty string
or was ``skipped'' due to ``|'' then the arr(x)
will contain an empty string.
Note that if you do not specify any () groups and the options do NOT include ``g''
you only get Array(1)
if the regexp matches! If you want to get the first match you
have to enclose the regexp in parens!
substring(s)
by the replacementString.
strResult = Replace( stringToProcess, regularExpression, replacementString, options) is equivalent to ($strResult = $stringToProcess) =~ s/regularExpression/replacementString/options) # make a copy and do a replace on it
The replacement string may contain $1, $2, ... variables denoting the strings matched by () groups in the regular expressions. It may even contain subscriptions to Tied arrays ``$array[$1]'' or collections ``$col{$1}''.
You can create as many Jenda.Rex.Prepared objects as you like.
If you use the same regular expession many times it's recomended to ``Prepare'' it.
Example: arr = Array( "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" ) re.TieArray arr, "day" strNamed = re.Replace ( strByNum, "Weekday:(\d+)", "Weekday:$day[$1]")
Keep in mind that the TieArray copies the array so any changes you do between TieArray and Replace are not visible!
Default subscriptionFunctionName is ``Item''. If you do not set propertyName we use the default property.
Example: Dim col As Collection Set col = New Collection col.Add "value", "key" ... re.TieCollection col, "col" strResult = re.Replace strWithVars, "%(.*?)%", "$col{$1}", "g"
Dim rst As ADODB.Recordset ... re.TieCollection rst, "data", "Fields", "Value" while not rst.EOF strResult = re.Replace strWithVars, "%(.*?)%", "$data{$1}", g MsgBox strResult, vbInformation rst.MoveNext wend
As you can see the TieCollection unlike the TieArray only references the object so any changes you do to the object are visible in the Jenda.Rex object.
Keep in mind that if you Tie a collection to a Jenda.Rex object it will not be destroyed until you either Untie it or you destroy the Jenda.Rex object!
Also keep in mind that the tiedNames are CASE SENSITIVE !!!
Example: Set NeedQuoting = RE.Prepare(RE.Quote(QuoteCharacter) & "|" & RE.Quote(FieldSeparator) & "|\x0D|\x0A") ' if it contains the separator or the quote or any end of line character
If you did not quote the variables and the FieldSeparator happened to be ``|'' you'd end up with regexp ``'|||\x0D|\x0A'' and it would of course match anything.
html = "<td><b>" & RE.HTMLescape(value) & "</b></td>"
html = "<input type=text name=Foo value=""" & RE.TAGescape(value) & """>"
html = "<a href=""JavaScript:doSomething( '" & RE.JSescape(value) & "');"">Click here</a>"
<b>Holds:</b> 0 < 1
then the result will be
<b>Holds:</b> 0 < 1
<b>Holds:</b> 0 < 1
then the result will be
<b>Holds:</b> 0 < 1
and adds some paragraph and <BR> tags if the text doesn't already contain them.
To be used to polish HTML coding you got from users :-)
Str = RE.DeWordify( Str )
Str = RE.DeWordifyHTML( Str )
Eg. replacing ...<SPAN some attributes>whatever</SPAN><SPAN the same attributes>... by ...<SPAN some attributes>whatever..., removing <SPAN> and <FONT> tags with no attributes, removes <SPAN>, <FONT>, <B>, ... tags with only whitespace content etc.
Str = RE.DeMoronizeHTML( Str )
Str = RE.ImproveHTML( Str )
Str = RE.DeUTF8( Str )
Str = RE.EnUTF8( Str )
RE.EnUTF8File( Filename ) RE.EnUTF8File( Filename, OtherFilename )
tag1 tag2 tag3 ... tag4 tag5 : foo bar baz ... # comment tag6 # comment tag7 ; comment tag8 ' comment ...
The created object supports these two methods:
strResult = objFilter.doSTRING( strSource ) objFilter.doFILE( strSourcePath, strResultPath )
After filtering the text will only contain the allowed tags and allowed parameters. All other HTML will be stripped.
It's recomended to polish the HTML with FUZZYescape()
or PolishHTML()
beforehand.
Example: strFilter = "B" & vbCRLF & "I" & vbCRLF & "A: HREF NAME" & vbCRLF & "BR" Set objFilter = re.HTMLfilter( strFilter ) str = re.FUZZYescape( str ) str = objFilter.doSTRING( str )
SeparatorChar - the separator, by default "," QuoteChar - the quote character, by default a doublequote EscapeChar - the character to use to escape the quote character, by default a doublequote EndOfLine - the character that denotes the end of line in the file, by default CRLF (vbCrLf, "\r\n", "\x0D\x0A") AlwaysQuote - boolean, controls whether even the items that do not contain the quote or separator characters are to be quoted, by default False Binary - boolean, specifies whether the included texts may contain characters outside the ASCII printable range. by default True
Dim RE, CSV Set RE = Server.CreateObject("Jenda.Rex") Set CSV = RE.CSVParser()
Dim Arr CSV.ParseFile( fileName) Arr = CSV.Parse( "" ) Do While True If IsEmpty(Arr) Then Exit Do name = Arr(0) email = Arr(1) pwd = Arr(2) ... Loop CSV.CloseFile
See Jenda.Rex.CSVParser methods
The methods of prepared regular expressions are almost the same as for the general object. Except there is no Prepare method and the additional functions and all Test, Match and Replace do not take regularExpression and options parameters.
Please keep in mind that arrays are ZERO based.
Push
ed items and returns the resulting string including the newline character.
Push
ed items.
Push
and returns the resulting string.
Clears the list kept by CSVParser.
Push
and returns the resulting string.
Doesn't change the list kept by CSVParser.
Push
and returns the resulting string.
Removes the last Count items from the list kept by CSVParser.
Push
and returns the resulting string.
Removes all except first Count items from the list kept by CSVParser.
line = CSV.Combine( name, email, pwd, salary) otherLine = CSV.Combine( name, email, arrayOfSomething)
CSV.Push( name, email) if includeThis Then CSV.Push( this) if includeThat Then CSV.Push( that) CSV.Push( theLastThing) yetOtherLine = CSV.Flush
Example: re.AllowXHTML = True MsgBox re.FUZZYescape("Hello <BR/>World") ' prints "Hello <BR/>World"
re.AllowXHTML = False MsgBox re.FUZZYescape("Hello <BR/>World") ' prints "Hello <BR/>World"
re.AllowXHTML = True filter = re.HTMLfilter( "B I BR" ) MsgBox filter.doSTRING("<b>Hello</b> <BR/><foo>World</foo>") ' prints "<b>Hello</b> <BR/>World"
re.AllowXHTML = False filter = re.HTMLfilter( "B I BR" ) MsgBox filter.doSTRING("<b>Hello</b> <BR/><foo>World</foo>") ' prints "<b>Hello</b> World"
The HTMLfilter object remembers the state of re.AllowXHTML and is NOT affected by later changes!
Default is True.
Regular expressions documentation
I will not describe Perl regular expressions here. You may find the related docs here:
http://aspn.activestate.com/ASPN/Reference/Products/ActivePerl/lib/Pod/perlre.html and http://aspn.activestate.com//ASPN/Reference/Products/ActivePerl/lib/Pod/perlop.html
The included JendaRexTest.frm contains examples of pretty much the whole functionality.
The COM object was written by Jenda@Krynicky.cz ( http://jenda.krynicky.cz )
To name all authors of Perl would be toooooo lengthy.
It was converted to an ActiveX DLL by PerlCtrl from Perl Dev Kit 4.1 by ActiveState.
The DLL itself depends only on a few system DLLs that should be available everywhere.
There's one possible problem though. Jenda.Rex needs to be able to extract some files to %TEMP%\pdk directory. (The filenames will look like this: a4da113933612fb90ce798dd83702bf2.dll)
If it can't create the pdk directory or the files it will not work! You may need to review the permissions to the %TEMP%\pdk directory.
(c) 2001-2007 Jenda Krynicky
You may distribute under the terms of either the GNU General Public License or the Artistic License (both are easy to find on the Net) or included in your application in compiled form.
In the last case :
- You do not have to mention Jenda.Rex in the docs or license conditions of your program. - You don't have to install it separately using Jenda.Rex's own installer, you may install it within your own instalation procedure. (all you need to do is to copy the JendaRex.dll to your program's directory and run regsvr32 /s JendaRex.dll). - The only thing I would like you to do is to include the JendaRex.html and install it into the same directory as JendaRex.dll. (No need to link to the file from your docs or whereever. Just place it in the same dir.)
Version 0.14.05 10 Nov 2006