BUG: Special Characters Are Getting Converted Inside String (212704)



The information in this article applies to:

  • Microsoft Internet Explorer (Programming) 3.0
  • Microsoft Internet Explorer (Programming) 3.01
  • Microsoft Internet Explorer (Programming) 3.02
  • Microsoft Internet Explorer (Programming) 4.0
  • Microsoft Internet Explorer (Programming) 4.01
  • Microsoft Internet Explorer (Programming) 4.01 SP1
  • Microsoft Internet Explorer (Programming) 4.01 SP2

This article was previously published under Q212704

SYMPTOMS

Internet Explorer converts all instances of named entities inside an HTML document, even when common convention dictates that it should not, such as inside tag attribute quoted strings.

For example, Internet Explorer would treat the following opening anchor tag as if the URL contained a less than (<) symbol in the middle.
<a href="test.asp?param1=value1&ltname=value2">
				

RESOLUTION

Change any instances of ampersands in the HTML document to the following:

"&amp;"

If the page should be converting the ampersand combination as a named entity, ensure that the named entity is correctly terminated by a semicolon. Change query string parameters for URLs that are not generated by form submittals so they don't use names similar to typical named entities.

STATUS

Microsoft has confirmed that this is a bug in the Microsoft products that are listed at the beginning of this article. This problem was corrected in Internet Explorer 5.

MORE INFORMATION

Steps to Reproduce Behavior

The following HTML page demonstrates this bug:
<HTML><HEAD><TITLE>Entity Parsing Demonstration</TITLE></HEAD>
<BODY>
Right-click on the links to see the URL in Properties<p>

<a href="test.asp?param1=value1&current=value2">curren problem</a><BR>
<a href="test.asp?param1=value1&ltname=value2">lt problem</a><BR>

<input type=button value="Hello&gt"><BR>
</BODY>
</HTML>
				
When viewing this page in Internet Explorer 4, the strings inside the HTML tags are parsed as if they contained entities, despite the common convention of parsing only entities that are terminated by semicolons. As a result, the ampersand-"curren" in the middle first URL is converted to the currency character, the ampersand-"lt" in the middle of the second URL is converted to a less-than symbol, and the ampersand-"gt" and the end of the button value is converted to a greater than symbol.

The incorrectly parsed URLs can be viewed in the Internet Explorer status bar when mousing over the hyperlinks, or by right-clicking on the hyperlink and choosing the Properties option.

NOTE: In Internet Explorer 5, numeric entities may still be converted in inconvenient situations, as in the following example:
<A HREF="javascript:"dostuff(http://somesite.asp?queryvalue1=%3fstuff%3f');">
				
NOTE: In Internet Explorer 5, this has been fixed to convert only named entities if they are an exact match and they are followed by a non-alphanumeric character or are at the end of the string.

For example, this fixes the parsing of the two hyperlinks in the above sample HTML. The button will still read "Hello>" because the ampersand-"gt" entity falls at the end of the string.

Modification Type:MinorLast Reviewed:1/23/2004
Keywords:kbBug kbhtml kbpending KB212704