In my article about Getting Your Site to Play Nice with Search Engines and Social Networks I discussed the importance of having canonical URLs for the pages on your site to avoid issues with multiple URLs for the same page causing your page reputation to be divided across the versions. While you can provide a <link rel=”canonical” …> meta tag to achieve this, it’s also good practice to configure your site to redirect the various versions of URLs to the canonical version.
If you are using IIS as your web server, you can implement rules to redirect to canonical URLs with the IIS URL Rewrite Module 2.0. To use this module, download and install it on your web server. Once installed, you will see a URL Rewrite option in Internet Information Services (IIS) Manager when viewing the properties of your site.
The user interface for adding and modifying rules is very straightforward. When creating a new rule you will be presented with a variety of templates to start from for common rewrite cases. When a request is received, all matching rules are executed against your URL in the order in which they are defined. You can adjust the order of the rules. You can also set a property on some rule types to indicate that processing should stop and to not move on to the remaining rules.
Microsoft has documentation on Using URL Rewrite Module 2.0 and the invaluable URL Rewrite Module v2.0 Configuration Reference.
The user interface in IIS Manager saves the settings to the web.config file. I will now go through each of the rewrite rules implemented on the Highway North site with a screen shot of the settings in IIS Manager. At the end I will include the full code for the settings from the web.config file.
Matching patterns in rules can be specified as JavaScript Regular Expressions or wildcards.
Ignored Paths Rule
There are a few paths on our site that we don’t want to redirect from. For example, we have some URLs that our existing Android applications are expecting to be available and don’t have the ability to redirect from HTTP to HTTPS. The URL Rewrite Module makes available a {PATH_INFO} server variable which includes the path after the protocol prefix and domain name including the forward slash. It’s easy to match the ignored paths to the {PATH_INFO} server variable and set an action type of None (i.e. leave the URL alone) and then stop processing more rules.
Name |
Ignored Paths Rule |
Match URL |
Matches the regular expression: (.*) |
Conditions |
{PATH_INFO} matches any of these regular expressions:
/AppInfo/.*
/Mountie/Token
|
Action Type |
None |
Stop processing of subsequent rules |
Yes |
HTTP to HTTPS Redirect Rule
We want to force users of our site to connect with HTTP secure (HTTPS). The URL Rewrite Module makes available an HTTPS server variable set to either ON or OFF to tell us if the URL is using HTTPS.
Name |
HTTP to HTTPS Redirect Rule |
Match URL |
Matches the regular expression: (.*) |
Conditions |
{HTTPS} matches the regular expression: ^OFF$ |
Action Type |
Redirect |
Redirect URL |
https://{HTTP_HOST}{PATH_INFO} |
Append query string |
Yes |
Redirect Type |
Permanent (301) |
Stop processing of subsequent rules |
No |
Canonical Host Name Rule
Our site is accessible via both its domain name and via a ‘www’ name. i.e. https://highwaynorth.com and https://www.highwaynorth.com. We choose the ‘www’ version as our preferred version and redirect to it. We can look at the {HTTP_HOST} server variable to determine if our canonical host name was used or not.
Name |
Canonical Host Name Rule |
Match URL |
Matches the regular expression: (.*) |
Conditions |
{HTTP_HOST} does not match the regular expression: ^www\.highwaynorth\.com$ |
Action Type |
Redirect |
Redirect URL |
https://www.highwaynorth.com{PATH_INFO} |
Append query string |
Yes |
Redirect Type |
Permanent (301) |
Stop processing of subsequent rules |
No |
Remove Trailing Slash Rule
Whether or not your URLs end with a trailing slash is largely a matter of preference. Because your URL is treated as a separate URL by search engines when it has a trailing slash from when it does not, you should choose the one you prefer and make it canonical. You can add a rule to either add or remove a trailing slash. For our site, our preference is to remove the trailing slash. This can be achieved by matching URLs that end with a trailing slash.
Name |
Remove Trailing Slash Rule |
Match URL |
Matches the regular expression: (.*)/$ |
Conditions |
{REQUEST_FILENAME} is not a directory
{REQUEST_FILENAME} is not a file
|
Action Type |
Redirect |
Redirect URL |
{R:1} |
Append query string |
Yes |
Redirect Type |
Permanent (301) |
Stop processing of subsequent rules |
No |
Lower Case Rule
Many sites redirect to an all lower case URL which is important because search engines will treat casing differences as being different URLs. However, for our site, we had too many existing links with mixed case that would now cause lots of redirects to happen. Also, we prefer the readability and look of mixed case URLs when users navigate our site or share links. We decided to accept the risk of our links being shared with mixed case and feel it will be a rare enough scenario not to worry about so we didn’t implement a rule to redirect to all lower case.
Source Code of Rules
Here is the full source code of the above rules as they appear in the web.config file:
<rewrite xdt:Transform="Insert">
<rules>
<clear />
<rule name="Ignored Paths Rule" stopProcessing="true">
<match url="(.*)" />
<conditions logicalGrouping="MatchAny" trackAllCaptures="false">
<add input="{PATH_INFO}" pattern="/AppInfo/.*" />
<add input="{PATH_INFO}" pattern="/Mountie/Token" />
<add input="{PATH_INFO}" pattern="/Mountie/api/.*" />
<add input="{PATH_INFO}" pattern="/Sasquatch/api/.*" />
<add input="{PATH_INFO}" pattern="/Tundra/api/.*" />
</conditions>
<action type="None" />
</rule>
<rule name="HTTP to HTTPS Redirect Rule">
<match url="(.*)" negate="false" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{HTTPS}" pattern="^OFF$" />
</conditions>
<action type="Redirect" url="https://{HTTP_HOST}{PATH_INFO}" />
</rule>
<rule name="Canonical Host Name Rule">
<match url="(.*)" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{HTTP_HOST}" pattern="^www\.highwaynorth\.com$" negate="true" />
</conditions>
<action type="Redirect" url="https://www.highwaynorth.com{PATH_INFO}" />
</rule>
<rule name="Remove Trailing Slash Rule">
<match url="(.*)/$" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
<add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
</conditions>
<action type="Redirect" url="{R:1}" />
</rule>
</rules>
</rewrite>
Again, you don’t have to use the IIS Manager interface to configure your rules. Personally, I take the approach of using the UI to configure the rules against a local instance of IIS then copy and paste the code from the generated web.config into our site’s web.config.