|
|
Regex Musings
-
First off let me say I'm a bit over my
head here. Not regex part but host the language of the regex engine.
Many moons ago I posted a blog article
stating why you could not write a regex that validated an e-mail
address 100%. Well this is still true, however in that
posted I also stated that the pattern was so massive that it wasn't
worth using. This is also still true however I was made aware of a
flavor-specific syntax that reduces the regex from massive to very
large.
This regex is for the PCRE engine.
http://www.myregextester.com/?r=337
Though from what I've read this will work for PHP too. Now I don't know Perl or PHP or what
minimum version of PCRE supports this syntax. That being the case I
also don't how well it performs. I wrote the original version using
the .Net syntax and not only was the regexPublish massive, which is one reason I never posted it but the
performance was terrible. Given that most people want to use this
type regex to validate a data entry field, the pattern was overkill.
In fact I recommend that you don't use this, except to learn from.
The PCRE version may perform better but I don't have the means or
time to test, so use at your own risk. For simple field validation
even this is still overkill. For a large text file performance
may suffer horribly. Most likely you aren't going to want to use
this pattern as it is too large for simple test and performs poorly
for large test.
When I see people asking for Email
regex, I point out that perfect validation is not possible. And when
I see so-call email validating regex that are only about 50
characters long, it makes me chuckle. This pattern is probably to
most compact version of a RFC 2822 address regex you'll find and it
is still huge. Ports to other regex engines not supporting the
recursive syntax will easily be 4x as large as my .Net version was.
The above pattern does the RFC Spec up to
the address-spec, which pretty much what people are thinking about
when they are saying Email address.
It not to hard to take to it up a few
more level in the spec using this syntax
RFC 2822 mailbox :
http://www.myregextester.com/?r=338
but like I said it likely won't perform
well enough to be useful. The two patterns I've linked to I've
wrapped in anchors so they are just matching against the whole
string. Searching for a string within a larger body, without anchors will probably
degrade performance very fast. But if any of you PHP or Perl gurus want to stress test this beast, have fun. Maybe it's not as bad as I think it may be.
Save and Continue Writing
|
-
This is a C# 2.0 enhancement of a C# port of YUI Compressor's CSS minification code
I got a little carried away with ideas
for this, they were all regex based which really is what motivated me
to work on it. However after I thought I was done I learned not
everything worked. It did what I wanted it to do but what I wanted
wasn't the correct thing. I really should have just stopped with my
original ideas.
The last idea for my original changes was to take 2 or more
individual subset properties and write them in shorthand notation of
the main property they were a subset of. Well I got that to working.
But upon testing I learned something new about CSS that I didn't
know. Basically that what I was doing could alter the behavior of the
presentation. Which was disappointing because I put a lot of energy
into getting the results I was after.
So it looked as all of that code was
going to go to waste. But there was one scenario that what I was
trying to do was alright. So the code wasn't completely wasted. The
one scenario was if all the subset properties are declared then
combining them is fine. I didn't bother changing the regexes I wrote
for this but I cleaned up some of the code. Though it would have
worked as is some of the things being checked were now unnecessary.
using System;
using System.Collections;
using System.Collections.Generic;
using System.Globalization;
using System.Text;
using System.Text.RegularExpressions;
namespace CSSMinify
{
class CSSMinify
{
public static Hashtable shortColorNames = new Hashtable();
public static Hashtable shortHexColors = new Hashtable();
public static string Minify(string css)
{
return Minify(css, 0);
}
public static string Minify(string css, int columnWidth)
{
// BSD License http://developer.yahoo.net/yui/license.txt
// New css tests and regexes by Michael Ash
createHashTable();
MatchEvaluator rgbDelegate = new MatchEvaluator(RGBMatchHandler);
MatchEvaluator shortColorNameDelegate = new MatchEvaluator(ShortColorNameMatchHandler);
MatchEvaluator shortColorHexDelegate = new MatchEvaluator(ShortColorHexMatchHandler);
css = RemoveCommentBlocks(css);
css = Regex.Replace(css, @"\s+", " "); //Normalize whitespace
css = Regex.Replace(css, @"\x22\x5C\x22}\x5C\\x22\x22", "___PSEUDOCLASSBMH___"); //hide Box model hack
/* Remove the spaces before the things that should not have spaces before them.
But, be careful not to turn "p :link {...}" into "p:link{...}"
*/
css = Regex.Replace(css, @"(?#no preceding space needed)\s+((?:[!{};>+()\],])|(?<={[^{}]*):(?=[^}]*}))", "$1");
css = Regex.Replace(css, @"([!{}:;>+([,])\s+", "$1"); // Remove the spaces after the things that should not have spaces after them.
css = Regex.Replace(css, @"([^;}])}", "$1;}"); // Add the semicolon where it's missing.
css = Regex.Replace(css, @"(\d+)\.0+(p(?:[xct])|(?:[cem])m|%|in|ex)\b", "$1$2"); // Remove .0 from size units x.0em becomes xem
css = Regex.Replace(css, @"([\s:])(0)(px|em|%|in|cm|mm|pc|pt|ex)\b", "$1$2"); // Remove unit from zero
//New test
//Font weights
css = Regex.Replace(css, @"(?<=font-weight:)normal\b", "400");
css = Regex.Replace(css, @"(?<=font-weight:)bold\b", "700");
//Thought this was a good idea but properties of a set not defined get element defaults. This is reseting them. css = ShortHandProperty(css);
css = ShortHandAllProperties(css);
//css = Regex.Replace(css, @":(\s*0){2,4}\s*;", ":0;"); // if all parameters zero just use 1 parameter
// if all 4 parameters the same unit make 1 parameter
css = Regex.Replace(css, @"(?<!background-position\s*):\s*(inherit|auto|0|(?:(?:\d*\.?\d+(?:p(?:[xct])|(?:[cem])m|%|in|ex))))(\s+\1){1,3};", ":$1;", RegexOptions.IgnoreCase);
// if has 4 parameters and top unit = bottom unit and right unit = left unit make 2 parameters
css = Regex.Replace(css, @":\s*((inherit|auto|0|(?:(?:\d*\.?\d+(?:p(?:[xct])|(?:[cem])m|%|in|ex))))\s+(inherit|auto|0|(?:(?:\d?\.?\d(?:p(?:[xct])|(?:[cem])m|%|in|ex)))))\s+\2\s+\3;", ":$1;", RegexOptions.IgnoreCase);
// if has 4 parameters and top unit != bottom unit and right unit = left unit make 3 parameters
css = Regex.Replace(css, @":\s*((?:(?:inherit|auto|0|(?:(?:\d*\.?\d+(?:p(?:[xct])|(?:[cem])m|%|in|ex))))\s+)?(inherit|auto|0|(?:(?:\d?\.?\d(?:p(?:[xct])|(?:[cem])m|%|in|ex))))\s+(?:0|(?:(?:\d?\.?\d(?:p(?:[xct])|(?:[cem])m|%|in|ex)))))\s+\2;", ":$1;", RegexOptions.IgnoreCase);
//// if has 3 parameters and top unit = bottom unit make 2 parameters
//css = Regex.Replace(css, @":\s*((0|(?:(?:\d?\.?\d(?:p(?:[xct])|(?:[cem])m|%|in|ex))))\s+(?:0|(?:(?:\d?\.?\d(?:p(?:[xct])|(?:[cem])m|%|in|ex)))))\s+\2;", ":$1;", RegexOptions.IgnoreCase);
css = Regex.Replace(css, "background-position:0;", "background-position:0 0;");
css = Regex.Replace(css, @"(:|\s)0+\.(\d+)", "$1.$2");
// Outline-styles and Border-sytles parameter reduction
css = Regex.Replace(css, @"(outline|border)-style\s*:\s*(none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset)(?:\s+\2){1,3};", "$1-style:$2;", RegexOptions.IgnoreCase);
css = Regex.Replace(css, @"(outline|border)-style\s*:\s*((none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset)\s+(none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset ))(?:\s+\3)(?:\s+\4);", "$1-style:$2;", RegexOptions.IgnoreCase);
css = Regex.Replace(css, @"(outline|border)-style\s*:\s*((?:(?:none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset)\s+)?(none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset )\s+(?:none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset ))(?:\s+\3);", "$1-style:$2;", RegexOptions.IgnoreCase);
css = Regex.Replace(css, @"(outline|border)-style\s*:\s*((none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset)\s+(?:none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset ))(?:\s+\3);", "$1-style:$2;", RegexOptions.IgnoreCase);
// Outline-color and Border-color parameter reduction
css = Regex.Replace(css, @"(outline|border)-color\s*:\s*((?:\#(?:[0-9A-F]{3}){1,2})|\S+)(?:\s+\2){1,3};", "$1-color:$2;", RegexOptions.IgnoreCase);
css = Regex.Replace(css, @"(outline|border)-color\s*:\s*(((?:\#(?:[0-9A-F]{3}){1,2})|\S+)\s+((?:\#(?:[0-9A-F]{3}){1,2})|\S+))(?:\s+\3)(?:\s+\4);", "$1-color:$2;", RegexOptions.IgnoreCase);
css = Regex.Replace(css, @"(outline|border)-color\s*:\s*((?:(?:(?:\#(?:[0-9A-F]{3}){1,2})|\S+)\s+)?((?:\#(?:[0-9A-F]{3}){1,2})|\S+)\s+(?:(?:\#(?:[0-9A-F]{3}){1,2})|\S+))(?:\s+\3);", "$1-color:$2;", RegexOptions.IgnoreCase);
// Shorten colors from rgb(51,102,153) to #336699
// This makes it more likely that it'll get further compressed in the next step.
css = Regex.Replace(css, @"rgb\s*\x28((?:25[0-5])|(?:2[0-4]\d)|(?:[01]?\d?\d))\s*,\s*((?:25[0-5])|(?:2[0-4]\d)|(?:[01]?\d?\d))\s*,\s*((?:25[0-5])|(?:2[0-4]\d)|(?:[01]?\d?\d))\s*\x29", rgbDelegate);
css = Regex.Replace(css, @"(?<![\x22\x27=]\s*)\#(?:([0-9A-F])\1)(?:([0-9A-F])\2)(?:([0-9A-F])\3)", "#$1$2$3", RegexOptions.IgnoreCase);
// Replace hex color code with named value is shorter
css = Regex.Replace(css, @"(?<=color\s*:\s*.*)\#(?<hex>f00)\b", "red", RegexOptions.IgnoreCase);
css = Regex.Replace(css, @"(?<=color\s*:\s*.*)\#(?<hex>[0-9a-f]{6})", shortColorNameDelegate, RegexOptions.IgnoreCase);
css = Regex.Replace(css, @"(?<=color\s*:\s*)\b(Black|Fuchsia|LightSlateGr[ae]y|Magenta|White|Yellow)\b", shortColorHexDelegate, RegexOptions.IgnoreCase);
// Remove empty rules.
css = Regex.Replace(css, @"[^}]+{;}", "");
//Remove semicolon of last property
css = Regex.Replace(css, ";(})", "$1");
if (columnWidth > 0)
{
css = BreakLines(css, columnWidth);
}
return css;
}
private static string RemoveCommentBlocks(string input)
{
int startIndex = 0;
int endIndex = 0;
bool iemac = false;
startIndex = input.IndexOf(@"/*", startIndex);
while (startIndex >= 0)
{
endIndex = input.IndexOf(@"*/", startIndex + 2);
if (endIndex >= startIndex + 2)
{
if (input[endIndex - 1] == '\\')
{
startIndex = endIndex + 2;
iemac = true;
}
else if (iemac)
{
startIndex = endIndex + 2;
iemac = false;
}
else
{
input = input.Remove(startIndex, endIndex + 2 - startIndex);
}
}
startIndex = input.IndexOf(@"/*", startIndex);
}
return input;
}
private static String RGBMatchHandler(Match m)
{
int val = 0;
StringBuilder hexcolor = new StringBuilder("#");
for (int index = 1; index <= 3; index += 1)
{
val = Int32.Parse(m.Groups[index].Value);
hexcolor.Append(val.ToString("x2"));
}
return hexcolor.ToString();
}
private static string BreakLines(string css, int columnWidth)
{
int i = 0;
int start = 0;
StringBuilder sb = new StringBuilder(css);
while (i < sb.Length)
{
char c = sb[i++];
if (c == '}' && i - start > columnWidth)
{
sb.Insert(i, '\n');
start = i;
}
}
return sb.ToString();
}
private static string ReplaceNonEmpty(string inputText, string replacementText)
{
if (replacementText.Trim() != string.Empty)
{
inputText = string.Format(" {0}", replacementText);
}
return inputText;
}
private static string ShortColorNameMatchHandler(Match m)
{
// This function replace hex color values named colors if the name is shorter than the hex code
string returnValue = m.Value;
if (shortColorNames.ContainsKey(m.Groups["hex"].Value))
{
returnValue = shortColorNames[m.Groups["hex"].Value].ToString();
}
return returnValue;
}
private static string ShortColorHexMatchHandler(Match m)
{
//This function replaces named values with there shorter hex equivalent
return shortHexColors[m.Value.ToString().ToLower()].ToString();
}
private static void createHashTable()
{
//Color names shorter than hex notation. Except for red.
shortColorNames.Add("F0FFFF".ToLower(), "Azure".ToLower());
shortColorNames.Add("F5F5DC".ToLower(), "Beige".ToLower());
shortColorNames.Add("FFE4C4".ToLower(), "Bisque".ToLower());
shortColorNames.Add("A52A2A".ToLower(), "Brown".ToLower());
shortColorNames.Add("FF7F50".ToLower(), "Coral".ToLower());
shortColorNames.Add("FFD700".ToLower(), "Gold".ToLower());
shortColorNames.Add("808080".ToLower(), "Grey".ToLower());
shortColorNames.Add("008000".ToLower(), "Green".ToLower());
shortColorNames.Add("4B0082".ToLower(), "Indigo".ToLower());
shortColorNames.Add("FFFFF0".ToLower(), "Ivory".ToLower());
shortColorNames.Add("F0E68C".ToLower(), "Khaki".ToLower());
shortColorNames.Add("FAF0E6".ToLower(), "Linen".ToLower());
shortColorNames.Add("800000".ToLower(), "Maroon".ToLower());
shortColorNames.Add("000080".ToLower(), "Navy".ToLower());
shortColorNames.Add("808000".ToLower(), "Olive".ToLower());
shortColorNames.Add("FFA500".ToLower(), "Orange".ToLower());
shortColorNames.Add("DA70D6".ToLower(), "Orchid".ToLower());
shortColorNames.Add("CD853F".ToLower(), "Peru".ToLower());
shortColorNames.Add("FFC0CB".ToLower(), "Pink".ToLower());
shortColorNames.Add("DDA0DD".ToLower(), "Plum".ToLower());
shortColorNames.Add("800080".ToLower(), "Purple".ToLower());
shortColorNames.Add("FA8072".ToLower(), "Salmon".ToLower());
shortColorNames.Add("A0522D".ToLower(), "Sienna".ToLower());
shortColorNames.Add("C0C0C0".ToLower(), "Silver".ToLower());
shortColorNames.Add("FFFAFA".ToLower(), "Snow".ToLower());
shortColorNames.Add("D2B48C".ToLower(), "Tan".ToLower());
shortColorNames.Add("008080".ToLower(), "Teal".ToLower());
shortColorNames.Add("FF6347".ToLower(), "Tomato".ToLower());
shortColorNames.Add("EE82EE".ToLower(), "Violet".ToLower());
shortColorNames.Add("F5DEB3".ToLower(), "Wheat".ToLower());
// Hex notation shorter than named value
shortHexColors.Add("black", "#000");
shortHexColors.Add("fuchsia", "#f0f");
shortHexColors.Add("lightSlategray", "#789");
shortHexColors.Add("lightSlategrey", "#789");
shortHexColors.Add("magenta", "#f0f");
shortHexColors.Add("white", "#fff");
shortHexColors.Add("yellow", "#ff0");
}
private static string ShortHandAllProperties(string css)
{
/*
* This function searchs for properties specifying all the individual properties of a property type
* and reduces it to a single property use shorthand notation
*/
Regex reCSSBlock = new Regex("{[^{}]*}");
Regex reTRBL1 = new Regex(@"(?<fullProperty>(?:(?<property>padding)-(?<position>top|right|bottom|left)))\s*:\s*(?<unit>[\w.]+);?", RegexOptions.IgnoreCase);
Regex reTRBL2 = new Regex(@"(?<fullProperty>(?:(?<property>margin)-(?<position>top|right|bottom|left)))\s*:\s*(?<unit>[\w.]+);?", RegexOptions.IgnoreCase);
Regex reTRBL3 = new Regex(@"(?<fullProperty>(?<property>border)-(?<position>top|right|bottom|left)(?<property2>-(?:color)))\s*:\s*(?<unit>[#\w.]+);?", RegexOptions.IgnoreCase);
Regex reTRBL4 = new Regex(@"(?<fullProperty>(?<property>border)-(?<position>top|right|bottom|left)(?<property2>-(?:style)))\s*:\s*(?<unit>none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset);?", RegexOptions.IgnoreCase);
Regex reTRBL5 = new Regex(@"(?<fullProperty>(?<property>border)-(?<position>top|right|bottom|left)(?<property2>-(?:width)))\s*:\s*(?<unit>[\w.]+);?", RegexOptions.IgnoreCase);
Regex reListStyle = new Regex(@'list-style-(?<style>type|image|position)\s*:\s*(?<unit>[^};]+);?', RegexOptions.IgnoreCase);
Regex reFont = new Regex(@"font-(?:(?:(?<fontProperty>family\b)\s*:\s*(?<fontPropertyValue>(?:\b[a-zA-Z]+(-[a-zA-Z]+)?\b|\x22[^\x22]+\x22)(?:\s*,\s*(?:\b[a-zA-Z]+(-[a-zA-Z]+)?\b|\x22[^\x22]+\x22))*)\b)|
(?:(?<fontProperty>style\b)\s*:\s*(?<fontPropertyValue>normal|italic|oblique|inherit))|
(?:(?<fontProperty>variant\b)\s*:\s*(?<fontPropertyValue>normal|small-caps|inherit))|
(?:(?<fontProperty>weight\b)\s*:\s*(?<fontPropertyValue>normal|bold|(?:bold|light)er|[1-9]00|inherit))|
(?:(?<fontProperty>size\b)\s*:\s*(?<fontPropertyValue>(?:(?:xx?-)?(?:small|large))|medium|(?:\d*\.?\d+(?:%|(p(?:[xct])|(?:[cem])m|in|ex))\b)|inherit|\b0\b)))\s*;?", (RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace));
Regex reBackGround = new Regex(@"background-(?:
(?:(?<property>color)\s*:\s*(?<unit>transparent|inherit|(?:(?:\#(?:[0-9A-F]{3}){1,2})|\S+)))|
(?:(?<property>image)\s*:\s*(?<unit>none|inherit|(?:url\s*\([^()]+\))))|
(?:(?<property>repeat)\s*:\s*(?<unit>no-repeat|inherit|repeat(?:-[xy])))|
(?:(?<property>attachment)\s*:\s*(?<unit>scroll|inherit|fixed))|
(?:(?<property>position)\s*:\s*(?<unit>((?<horizontal>left | center | right|(?:0|(?:(?:\d*\.?\d+(?:p(?:[xct])|(?:[cem])m|%|in|ex)))))\s+(?<vertical>top | center | bottom |(?:0|(?:(?:\d*\.?\d+(?:p(?:[xct])|(?:[cem])m|%|in|ex))))))|
((?<vertical>top | center | bottom )\s+(?<horizontal>left | center | right ))|
((?<horizontal>left | center | right )|(?<vertical>top | center | bottom ))))
);?", (RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture));
MatchCollection mcBlocks = reCSSBlock.Matches(css);
foreach (Match mBlock in mcBlocks)
{
string strBlock = mBlock.Value;
HasAllPositions(reTRBL1, ref strBlock);
HasAllPositions(reTRBL2, ref strBlock);
HasAllPositions(reTRBL3, ref strBlock);
HasAllPositions(reTRBL4, ref strBlock);
HasAllPositions(reTRBL5, ref strBlock);
HasAllListStyle(reListStyle, ref strBlock);
HasAllFontProperties(reFont, ref strBlock);
HasAllBackGroundProperties(reBackGround, ref strBlock);
css = css.Replace(mBlock.Value, strBlock);
}
return css;
}
private static void HasAllBackGroundProperties(Regex re, ref string CSSText)
{
{
MatchCollection mcProperySet = re.Matches(CSSText);
int z = 5;
if (mcProperySet.Count == z)
{
int y = 0;
for (int x = 0; x < z; x = x + 1)
{
switch (mcProperySet[x].Groups["property"].Value)
{
case "color":
y = y + 1;
break;
case "image":
y = y + 2;
break;
case "repeat":
y = y + 4;
break;
case "attachment":
y = y + 8;
break;
case "position":
y = y + 16;
break;
}
}
if (y == 31)
{
CSSText = ShortHandBackGroundReplaceV2(mcProperySet, re, CSSText);
}
}
}
}
private static void HasAllFontProperties(Regex re, ref string CSSText)
{
{
MatchCollection mcProperySet = re.Matches(CSSText);
int z = 5;
if (mcProperySet.Count == z)
{
int y = 0;
for (int x = 0; x < z; x = x + 1)
{
switch (mcProperySet[x].Groups["fontProperty"].Value)
{
case "style":
y = y + 1;
break;
case "variant":
y = y + 2;
break;
case "weight":
y = y + 4;
break;
case "size":
y = y + 8;
break;
case "family":
y = y + 16;
break;
}
}
if (y == 31)
{
CSSText = ShortHandFontReplaceV2(mcProperySet, re, CSSText);
}
}
}
}
private static void HasAllListStyle(Regex re, ref string CSSText)
{
{
int z = 3;
MatchCollection mcProperySet = re.Matches(CSSText);
if (mcProperySet.Count == z)
{
int y = 0;
for (int x = 0; x < z; x = x + 1)
{
switch (mcProperySet[x].Groups["style"].Value)
{
case "type":
y = y + 1;
break;
case "image":
y = y + 2;
break;
case "position":
y = y + 4;
break;
}
}
if (y == 7)
{
CSSText = ShortHandListReplaceV2(mcProperySet, re, CSSText);
}
}
}
}
private static void HasAllPositions(Regex re, ref string CSSText)
{
{
MatchCollection mcProperySet = re.Matches(CSSText);
if (mcProperySet.Count == 4)
{
int y = 0;
for (int x = 0; x < 4; x = x + 1)
{
switch (mcProperySet[x].Groups["position"].Value)
{
case "top":
y = y + 1;
break;
case "right":
y = y + 2;
break;
case "bottom":
y = y + 4;
break;
case "left":
y = y + 8;
break;
}
}
if (y == 15)
{
CSSText = ShortHandReplaceV2(mcProperySet, re, CSSText);
}
}
}
}
private static string ShortHandFontReplaceV2(MatchCollection mcProperySet, Regex re, string InputText)
{
/*
* This Function replaces the individual font properties with a single entry
* */
string strFamily, strStyle, strVariant, strWeight, strSize;
Regex reLineHeight = new Regex(@"line-height\s*:\s*((?:\d*\.?\d+(?:%|(p(?:[xct])|(?:[cem])m|in|ex)\b)?)|normal|inherit);?", RegexOptions.IgnoreCase);
strFamily = string.Empty;
strStyle = string.Empty;
strVariant = string.Empty;
strWeight = string.Empty;
strSize = string.Empty;
string strStyle_Variant_Weight = string.Empty;
foreach (Match mProperty in mcProperySet)
{
switch (mProperty.Groups[""].Value)
{
case "family":
strFamily = string.Format(" {0}", mProperty.Groups["fontPropertyValue"].Value);
break;
case "size":
if (reLineHeight.IsMatch(InputText))
{
Match m = reLineHeight.Match(InputText);
if (m.Groups[1].Value != "normal")
{
strSize = String.Format("/{0}", m.Groups[1].Value);
}
InputText = reLineHeight.Replace(InputText, string.Empty);
}
strSize = string.Format(" {0}{1}", mProperty.Groups["fontPropertyValue"].Value, strSize);
if (strSize == "medium")
{
strSize = string.Empty;
}
break;
case "style":
case "variant":
case "weight":
if (mProperty.Groups["fontPropertyValue"].Value != "normal")
{
strStyle_Variant_Weight += string.Format(" {0}", mProperty.Groups["fontPropertyValue"].Value);
} break;
}
}
string strShortcut;
string strProperties = string.Format("{0}{1}{2};", strStyle_Variant_Weight, strVariant, strWeight, strSize, strFamily);
strShortcut = string.Format("font:{0}", strProperties.Trim());
string strNewBlock = re.Replace(InputText, "");
strNewBlock = strNewBlock.Insert(1, strShortcut);
return strNewBlock;
}
private static string ShortHandBackGroundReplaceV2(MatchCollection mcProperySet, Regex re, string InputText)
{
/*
* This Function replaces the individual background properties with a single entry
* */
string strColor, strImage, strRepeat, strAttachment, strPosition;
strColor = string.Empty;
strImage = string.Empty;
strRepeat = string.Empty;
strAttachment = string.Empty;
strPosition = string.Empty;
foreach (Match mProperty in mcProperySet)
{
switch (mProperty.Groups["property"].Value)
{
case "color":
if (mProperty.Groups["unit"].Value != "transparent")
{
strColor = string.Format(" {0}", mProperty.Groups["unit"].Value);
}
break;
case "image":
if (mProperty.Groups["unit"].Value != "none")
{
strImage = string.Format(" {0}", mProperty.Groups["unit"].Value);
}
break;
case "repeat":
if (mProperty.Groups["unit"].Value != "repeat")
{
strRepeat = string.Format(" {0}", mProperty.Groups["unit"].Value);
} break;
case "attachment":
if (mProperty.Groups["unit"].Value != "scroll")
{
strAttachment = string.Format(" {0}", mProperty.Groups["unit"].Value);
}
break;
case "position":
if (mProperty.Groups["unit"].Value != "0% 0%")
{
strPosition = string.Format(" {0}", mProperty.Groups["unit"].Value);
}
break;
}
}
string strShortcut;
string strProperties = string.Format("{0}{1}{2}{3}{4};", strColor, strImage, strRepeat, strAttachment, strPosition);
strShortcut = string.Format("background:{0}", strProperties.Trim());
string strNewBlock = re.Replace(InputText, "");
strNewBlock = strNewBlock.Insert(1, strShortcut);
return strNewBlock;
}
private static string ShortHandReplaceV2(MatchCollection mcProperySet, Regex reTRBL1, string InputText)
{
// Replace method for regexes used in ShortHand property method for properties with top, right, bottom and left sub properties.
string strTop, strRight, strBottom, strLeft;
strTop = string.Empty;
strRight = string.Empty;
strBottom = string.Empty;
strLeft = string.Empty;
string strProperty;
strProperty = string.Format("{0}{1}", mcProperySet[0].Groups["property"].Value, mcProperySet[0].Groups["property2"].Value);
foreach (Match mProperty in mcProperySet)
{
switch (mProperty.Groups["position"].Value)
{
case "top":
strTop = mProperty.Groups["unit"].Value;
break;
case "right":
strRight = mProperty.Groups["unit"].Value;
break;
case "bottom":
strBottom = mProperty.Groups["unit"].Value;
break;
case "left":
strLeft = mProperty.Groups["unit"].Value;
break;
}
}
string strShortcut = string.Format("{0}:{1} {2} {3} {4};", strProperty, strTop, strRight, strBottom, strLeft);
string strNewBlock = reTRBL1.Replace(InputText, "");
strNewBlock = strNewBlock.Insert(1, strShortcut);
return strNewBlock;
}
private static string ShortHandListReplaceV2(MatchCollection mcProperySet, Regex re, string InputText)
{
/*
* This Function replaces the individual list properties with a single entry
* */
string strType, strPosition, strImage;
strType = string.Empty;
strPosition = string.Empty;
strImage = string.Empty;
foreach (Match mProperty in mcProperySet)
{
switch (mProperty.Groups["style"].Value)
{
case "type":
if (mProperty.Groups["unit"].Value != "disc")
{
strType = mProperty.Groups["unit"].Value;
}
break;
case "position":
if (mProperty.Groups["unit"].Value != "outside")
{
strPosition = string.Format(" {0}", mProperty.Groups["unit"].Value);
}
break;
case "style":
if (mProperty.Groups["unit"].Value != "none")
{
strImage = string.Format(" {0}", mProperty.Groups["unit"].Value);
}
break;
}
}
string strShortcut = string.Format("list-style:{0}{1}{2};", strType, strPosition, strImage);
string strNewBlock = re.Replace(InputText, "");
strNewBlock = strNewBlock.Insert(1, strShortcut);
return strNewBlock;
}
}
}
|
-
OK, there regexes were discussed in the previous post this is mostly just their application.
This is a C# 2.0 enhancement of a C# port of YUI Compressor's CSS minification code
Since I was doing this is C# I took full advantage of it's regex engine, namely using lookbehinds and delegates for some replaces.
Almost all the regexes after the "New Test" comment are the new or modified regexes from the ported version. There is also one new and two modified expressions before that comment. One of those modification is just a change in writing style, the other modifications are replacing some code but (hopefully) not functionality with a regex replace. The new regex replacements of course are the new compression enhancements.
There are also a couple of new regexes not mentioned in the previous post that match and replace some of the color values with an equivalent but a more concisely written value. The replace the color "red" is a straight replace but the other colors require some code evaluation and are using delegates.
I've done some very limited testing but as I mentioned in the previous post most of the CSS I've written doesn't have some of the new things I was searching for. I could add them for a test (which I did) but that won't catch any problems they my cause to the actual CSS application since I wasn't really using the test values. So the source code is now available for beta testing. Test early and often before committing to use it. I'm willing to fix any minor bugs for things I may have overlook but if a particular replace is problematic it's easy enough to comment out the offender and use the rest.
And as was mentioned in the comments of the previous post any generated content that looks like CSS may get stepped on so be aware of that.
And also that all licenses for previous versions still apply. UPDATE 2008-04-27
After a little more testing I discovered one of the replaces I was doing can alter how the CSS is processed. So I have just crossed out the functions and function call I've come up with a safer, though less likely to occur replacement.
using System;
using System.Collections;
using System.Collections.Generic;
using System.Globalization;
using System.Text;
using System.Text.RegularExpressions;
namespace CSSMinify
{
class CSSMinify
{
public static Hashtable shortColorNames = new Hashtable();
public static Hashtable shortHexColors = new Hashtable();
public static string Minify(string css)
{
return Minify(css, 0);
}
public static string Minify(string css, int columnWidth)
{
// BSD License http://developer.yahoo.net/yui/license.txt
// New css tests and regexes by Michael Ash
createHashTable();
MatchEvaluator rgbDelegate = new MatchEvaluator(RGBMatchHandler);
MatchEvaluator shortColorNameDelegate = new MatchEvaluator(ShortColorNameMatchHandler); MatchEvaluator shortColorHexDelegate = new MatchEvaluator(ShortColorHexMatchHandler);
css = RemoveCommentBlocks(css);
css = Regex.Replace(css, @"\s+", " "); //Normalize whitespace
css = Regex.Replace(css, @"\x22\x5C\x22}\x5C\x22\x22", "___PSEUDOCLASSBMH___"); //hide Box model hack
/* Remove the spaces before the things that should not have spaces before them.
But, be careful not to turn "p :link {...}" into "p:link{...}"
*/
css = Regex.Replace(css, @"(?#no preceding space needed)\s+((?:[!{};>+()\],])|(?<={[^{}]*):(?=[^}]*}))", "$1");
css = Regex.Replace(css, @"([!{}:;>+([,])\s+", "$1"); // Remove the spaces after the things that should not have spaces after them.
css = Regex.Replace(css, @"([^;}])}", "$1;}"); // Add the semicolon where it's missing.
css = Regex.Replace(css, @"(\d+)\.0+(p(?:[xct])|(?:[cem])m|%|in|ex)\b", "$1$2"); // Remove .0 from size units x.0em becomes xem
css = Regex.Replace(css, @"([\s:])(0)(px|em|%|in|cm|mm|pc|pt|ex)\b", "$1$2"); // Remove unit from zero
//New test
css = ShortHandProperty(css);
//css = Regex.Replace(css, @":(\s*0){2,4}\s*;", ":0;"); // if all parameters zero just use 1 parameter
// if all 4 parameters the same unit make 1 parameter
css = Regex.Replace(css, @":\s*(0|(?:(?:\d*\.?\d+(?:p(?:[xct])|(?:[cem])m|%|in|ex))))(\s+\1){1,3};", ":$1;", RegexOptions.IgnoreCase);
// if has 4 parameters and top unit = bottom unit and right unit = left unit make 2 parameters
css = Regex.Replace(css, @":\s*((0|(?:(?:\d?\.?\d(?:p(?:[xct])|(?:[cem])m|%|in|ex))))\s+(0|(?:(?:\d?\.?\d(?:p(?:[xct])|(?:[cem])m|%|in|ex)))))\s+\2\s+\3;", ":$1;", RegexOptions.IgnoreCase);
// if has 4 parameters and top unit != bottom unit and right unit = left unit make 3 parameters
css = Regex.Replace(css, @":\s*((?:(?:0|(?:(?:\d?\.?\d(?:p(?:[xct])|(?:[cem])m|%|in|ex))))\s+)?(0|(?:(?:\d?\.?\d(?:p(?:[xct])|(?:[cem])m|%|in|ex))))\s+(?:0|(?:(?:\d?\.?\d(?:p(?:[xct])|(?:[cem])m|%|in|ex)))))\s+\2;", ":$1;", RegexOptions.IgnoreCase);
//// if has 3 parameters and top unit = bottom unit make 2 parameters
//css = Regex.Replace(css, @":\s*((0|(?:(?:\d?\.?\d(?:p(?:[xct])| (?:[cem])m|%|in|ex))))\s+(?:0|(?:(?:\d?\.?\d(?:p(?:[xct])|(?:[cem])m|%|in|ex)))))\s+\2;", ":$1;", RegexOptions.IgnoreCase);
css = Regex.Replace(css,"background-position:0;", "background-position:0 0;");
css = Regex.Replace(css,@"(:|\s)0+\.(\d+)", "$1.$2");
// Outline-styles and Border-sytles parameter reduction
css = Regex.Replace(css, @"(outline|border)-style\s*:\s*(none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset)(?:\s+\2){1,3};", "$1-style:$2;", RegexOptions.IgnoreCase);
css = Regex.Replace(css, @"(outline|border)-style\s*:\s*((none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset)\s+(none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset ))(?:\s+\3)(?:\s+\4);", "$1-style:$2;", RegexOptions.IgnoreCase);
css = Regex.Replace(css, @"(outline|border)-style\s*:\s*((?:(?:none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset)\s+)?(none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset )\s+(?:none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset ))(?:\s+\3);", "$1-style:$2;", RegexOptions.IgnoreCase);
css = Regex.Replace(css, @"(outline|border)-style\s*:\s*((none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset)\s+(?:none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset ))(?:\s+\3);", "$1-style:$2;", RegexOptions.IgnoreCase);
// Outline-color and Border-color parameter reduction
css = Regex.Replace(css, @"(outline|border)-color\s*:\s*((?:\#(?:[0-9A-F]{3}){1,2})|\S+)(?:\s+\2){1,3};", "$1-color:$2;", RegexOptions.IgnoreCase);
css = Regex.Replace(css, @"(outline|border)-color\s*:\s*(((?:\#(?:[0-9A-F]{3}){1,2})|\S+)\s+((?:\#(?:[0-9A-F]{3}){1,2})|\S+))(?:\s+\3)(?:\s+\4);", "$1-color:$2;", RegexOptions.IgnoreCase);
css = Regex.Replace(css, @"(outline|border)-color\s*:\s*((?:(?:(?:\#(?:[0-9A-F]{3}){1,2})|\S+)\s+)?((?:\#(?:[0-9A-F]{3}){1,2})|\S+)\s+(?:(?:\#(?:[0-9A-F]{3}){1,2})|\S+))(?:\s+\3);", "$1-color:$2;", RegexOptions.IgnoreCase);
// Shorten colors from rgb(51,102,153) to #336699
// This makes it more likely that it'll get further compressed in the next step.
css = Regex.Replace(css,@"rgb\s*\x28((?:25[0-5])|(?:2[0-4]\d)|(?:[01]?\d?\d))\s*,\s*((?:25[0-5])|(?:2[0-4]\d)|(?:[01]?\d?\d))\s*,\s*((?:25[0-5])|(?:2[0-4]\d)|(?:[01]?\d?\d))\s*\x29", rgbDelegate);
css = Regex.Replace(css, @"(?<![\x22\x27=]\s*)\#(?:([0-9A-F])\1)(?:([0-9A-F])\2)(?:([0-9A-F])\3)", "#$1$2$3", RegexOptions.IgnoreCase);
// Replace hex color code with named value is shorter
css = Regex.Replace(css, @"(?<=color\s*:\s*.*)\#(?<hex>f00)\b", "red",RegexOptions.IgnoreCase);
css = Regex.Replace(css, @"(?<=color\s*:\s*.*)\#(?<hex>[0-9a-f]{6})", shortColorNameDelegate, RegexOptions.IgnoreCase);
css = Regex.Replace(css, @"(?<=color\s*:\s*)\b(Black|Fuchsia|LightSlateGr[ae]y|Magenta|White|Yellow)\b", shortColorHexDelegate,RegexOptions.IgnoreCase);
// Remove empty rules.
css = Regex.Replace(css,@"[^}]+{;}", "");
//Remove semicolon of last property
css = Regex.Replace(css, ";(})", "$1");
if (columnWidth > 0)
{
css = BreakLines(css, columnWidth);
}
return css;
}
private static string RemoveCommentBlocks(string input)
{
int startIndex = 0;
int endIndex = 0;
bool iemac = false;
startIndex = input.IndexOf(@"/*", startIndex);
while (startIndex >= 0)
{
endIndex = input.IndexOf(@"*/", startIndex + 2);
if (endIndex >= startIndex + 2)
{
if (input[endIndex - 1] == '\\')
{
startIndex = endIndex + 2;
iemac = true;
}
else if (iemac)
{
startIndex = endIndex + 2;
iemac = false;
}
else
{
input = input.Remove(startIndex, endIndex + 2 - startIndex);
}
}
startIndex = input.IndexOf(@"/*", startIndex);
}
return input;
}
private static String RGBMatchHandler(Match m)
{
int val = 0;
StringBuilder hexcolor = new StringBuilder("#");
for(int index=1; index <= 3; index += 1)
{
val = Int32.Parse(m.Groups[index].Value);
hexcolor.Append(val.ToString("x2"));
}
return hexcolor.ToString();
}
private static string BreakLines(string css, int columnWidth)
{
int i = 0;
int start = 0;
StringBuilder sb = new StringBuilder(css);
while (i < sb.Length)
{
char c = sb[i++];
if (c == '}' && i - start > columnWidth)
{
sb.Insert(i, '\n');
start = i;
}
}
return sb.ToString();
}
private static string ShortHandProperty(string css)
{
/*
* This function searchs for properties specifying at least 2 of the top, right, bottom or left box model
* positions and reduces it to a single property use shorthand notation
*/
Regex reCSSBlock = new Regex("{[^{}]*}");
Regex reTRBL1 = new Regex(@"(?<fullProperty>(?:(?<property>padding)-(?<position>top|right|bottom|left)))\s*:\s*(?<unit>[\w.]+);?", RegexOptions.IgnoreCase);
Regex reTRBL2 = new Regex(@"(?<fullProperty>(?:(?<property>margin)-(?<position>top|right|bottom|left)))\s*:\s*(?<unit>[\w.]+);?", RegexOptions.IgnoreCase);
Regex reTRBL3 = new Regex(@"(?<fullProperty>(?<property>border)-(?<position>top|right|bottom|left)(?<property2>-(?:color)))\s*:\s*(?<unit>[#\w.]+);?", RegexOptions.IgnoreCase);
Regex reTRBL4 = new Regex(@"(?<fullProperty>(?<property>border)-(?<position>top|right|bottom|left)(?<property2>-(?:style)))\s*:\s*(?<unit>none|hidden|d(?:otted|ashed|ouble)|solid|groove|ridge|inset|outset);?", RegexOptions.IgnoreCase);
Regex reTRBL5 = new Regex(@"(?<fullProperty>(?<property>border)-(?<position>top|right|bottom|left)(?<property2>-(?:width)))\s*:\s*(?<unit>[\w.]+);?", RegexOptions.IgnoreCase);
MatchCollection mcBlocks = reCSSBlock.Matches(css);
foreach (Match mBlock in mcBlocks)
{
string strBlock= mBlock.Value;
MatchCollection mcProperySet = reTRBL1.Matches(strBlock);
if (mcProperySet.Count > 1)
{
strBlock = ShortHandReplace(mcProperySet, reTRBL1, strBlock);
}
mcProperySet = reTRBL2.Matches(strBlock);
if (mcProperySet.Count > 1)
{
strBlock = ShortHandReplace(mcProperySet, reTRBL2, strBlock);
}
mcProperySet = reTRBL3.Matches(strBlock);
if (mcProperySet.Count > 1)
{
strBlock = ShortHandReplace(mcProperySet, reTRBL3, strBlock);
}
mcProperySet = reTRBL4.Matches(strBlock);
if (mcProperySet.Count > 1)
{
strBlock = ShortHandReplace(mcProperySet, reTRBL4, strBlock);
}
mcProperySet = reTRBL5.Matches(strBlock);
if (mcProperySet.Count > 1)
{
strBlock = ShortHandReplace(mcProperySet, reTRBL5, strBlock);
}
css = css.Replace(mBlock.Value, strBlock);
}
return css;
}
private static string ShortHandReplace(MatchCollection mcProperySet, Regex reTRBL1, string InputText)
{
// Replace method for regexes used in ShortHand property method.
string strTop, strRight, strBottom, strLeft;
strTop = string.Empty;
strRight = string.Empty;
strBottom = string.Empty;
strLeft = string.Empty;
string strProperty;
string strDefaultValue;
strProperty = string.Format("{0}{1}", mcProperySet[0].Groups["property"].Value, mcProperySet[0].Groups["property2"].Value);
switch (strProperty){
case "border-color":
strDefaultValue = "inherit";
break;
case "border-style":
strDefaultValue = "none";
break;
default:
strDefaultValue = "0";
break;
}
foreach (Match mProperty in mcProperySet)
{
if (mProperty.Groups["position"].Value == "top")
{
if (strTop == string.Empty)
{
strTop = mProperty.Groups["unit"].Value;
}
else
{
break;
}
}
if (mProperty.Groups["position"].Value == "right")
{
if (strRight == string.Empty)
{
strRight = mProperty.Groups["unit"].Value;
}
else
{
break;
}
}
if (mProperty.Groups["position"].Value == "bottom")
{
if (strBottom == string.Empty)
{
strBottom = mProperty.Groups["unit"].Value;
}
else
{
break;
}
}
if (mProperty.Groups["position"].Value == "left")
{
if (strLeft == string.Empty)
{
strLeft = mProperty.Groups["unit"].Value;
}
else
{
break;
}
}
}
if (strTop == string.Empty)
{
strTop = strDefaultValue;
}
if (strRight == string.Empty)
{
strRight = strDefaultValue;
}
if (strBottom == string.Empty)
{
strBottom = strDefaultValue;
}
if (strLeft == string.Empty)
{
strLeft = strDefaultValue;
}
string strShortcut = string.Format("{0}:{1} {2} {3} {4};", strProperty, strTop, strRight, strBottom, strLeft);
string strNewBlock = reTRBL1.Replace(InputText, "");
strNewBlock = strNewBlock.Insert(1, strShortcut);
return strNewBlock;
}
private static string ShortColorNameMatchHandler(Match m)
{
// This function replace hex color values named colors if the name is shorter than the hex code
string returnValue = m.Value;
if (shortColorNames.ContainsKey(m.Groups["hex"].Value))
{
returnValue = shortColorNames[m.Groups["hex"].Value].ToString();
}
return returnValue;
}
private static string ShortColorHexMatchHandler(Match m)
{
return shortHexColors[m.Value.ToString().ToLower()].ToString();
}
private static void createHashTable()
{
//Color names shorter than hex notation. Except for red.
shortColorNames.Add("F0FFFF".ToLower(), "Azure".ToLower());
shortColorNames.Add("F5F5DC".ToLower(), "Beige".ToLower());
shortColorNames.Add("FFE4C4".ToLower(), "Bisque".ToLower());
shortColorNames.Add("A52A2A".ToLower(), "Brown".ToLower());
shortColorNames.Add("FF7F50".ToLower(), "Coral".ToLower());
shortColorNames.Add("FFD700".ToLower(), "Gold".ToLower());
shortColorNames.Add("808080".ToLower(), "Grey".ToLower());
shortColorNames.Add("008000".ToLower(), "Green".ToLower());
shortColorNames.Add("4B0082".ToLower(), "Indigo".ToLower());
shortColorNames.Add("FFFFF0".ToLower(), "Ivory".ToLower());
shortColorNames.Add("F0E68C".ToLower(), "Khaki".ToLower());
shortColorNames.Add("FAF0E6".ToLower(), "Linen".ToLower());
shortColorNames.Add("800000".ToLower(), "Maroon".ToLower());
shortColorNames.Add("000080".ToLower(), "Navy".ToLower());
shortColorNames.Add("808000".ToLower(), "Olive".ToLower());
shortColorNames.Add("FFA500".ToLower(), "Orange".ToLower());
shortColorNames.Add("DA70D6".ToLower(), "Orchid".ToLower());
shortColorNames.Add("CD853F".ToLower(), "Peru".ToLower());
shortColorNames.Add("FFC0CB".ToLower(), "Pink".ToLower());
shortColorNames.Add("DDA0DD".ToLower(), "Plum".ToLower());
shortColorNames.Add("800080".ToLower(), "Purple".ToLower());
shortColorNames.Add("FA8072".ToLower(), "Salmon".ToLower());
shortColorNames.Add("A0522D".ToLower(), "Sienna".ToLower());
shortColorNames.Add("C0C0C0".ToLower(), "Silver".ToLower());
shortColorNames.Add("FFFAFA".ToLower(), "Snow".ToLower());
shortColorNames.Add("D2B48C".ToLower(), "Tan".ToLower());
shortColorNames.Add("008080".ToLower(), "Teal".ToLower());
shortColorNames.Add("FF6347".ToLower(), "Tomato".ToLower());
shortColorNames.Add("EE82EE".ToLower(), "Violet".ToLower());
shortColorNames.Add("F5DEB3".ToLower(), "Wheat".ToLower());
// Hex notation shorter than named value
shortHexColors.Add("black", "#000");
shortHexColors.Add("fuchsia", "#f0f");
shortHexColors.Add("lightSlategray", "#789");
shortHexColors.Add("lightSlategrey", "#789");
shortHexColors.Add("magenta", "#f0f");
shortHexColors.Add("white", "#fff");
shortHexColors.Add("yellow", "#ff0");
}
}
}
| |
|