How to Modify the Search Thesaurus in SharePoint Portal Server (289652)



The information in this article applies to:

  • Microsoft SharePoint Portal Server 2001

This article was previously published under Q289652

SUMMARY

SharePoint Portal Server contains thesaurus files that you can configure for each language that the Search feature supports. When you configure these thesaurus files, you can specify synonyms for words so that the Search feature automatically replaces words in a query with other words that you specify. This article describes how to modify the thesaurus files to customize your searches.

MORE INFORMATION

The Search feature thesaurus files that are installed with SharePoint Portal Server are Unicode Extensible Markup Language (XML) files that you can modify by using a text editor. By default, these files are located in the following folder, where drive is the drive on which you installed SharePoint Portal Server:

drive:\Program Files\SharePoint Portal Server\Data\FTData\SharePointPortalServer\Config

The thesaurus file names use the format TSxxx.xml, where the xxx is the three-letter language code. (For example, the English (United States) version of the thesaurus is TSenu.xml.)

Thesaurus File Entry Types

You can make two types of entries in the thesaurus files: replacement sets and expansion sets.

Replacement Sets

In a replacement set, you can specify patterns that you want the search to replace with substitution sets. For example, you can have the search replace "Win2K" with "Windows 2000" or "Windows NT." In this case, if you query for "Win2K," the documents in the return set contain the words "Windows 2000" or "Windows NT."

Enclose replacement sets in <replacement> tags. Enclose each pattern in <pat> tags, and enclose substitutions in <sub> tags.

For example:
<replacement> 
    <pat>Win2K</pat>
    <sub>Windows 2000</sub>
    <sub>Windows NT</sub> 
</replacement>
				

Expansion Sets

In expansion sets, you can specify substitutions that are synonyms of each other. Enclose expansion sets in <expansion> tags. Enclose each synonym in <sub> tags; do not use <pat> tags. When you query for one word that is in the set, the search expands the query to include all the synonyms that you specify.

For example:
<expansion>
    <sub>developer</sub>
    <sub>code writer</sub>
    <sub>programmer</sub> 
</expansion>
				
In this example, if you query for "developer," "code writer," or "programmer," the documents in the return set contain any of the matches.

Substitution Weights

You can also apply weights to each substitution entry. Apply a value between 0 and 1 to give higher weight to certain words in your substitution set relative to other words in that set.

For example:
<expansion>
    <sub weight="0.5">Internet Explorer</sub>
    <sub weight="0.1">IE</sub>
    <sub weight="0.8">IE5</sub> 
</expansion> 
				

Stemming Entries

You can specify stemming in pattern and substitution entries by adding two asterisks (**) to the end of the string. When you do this, the Search feature also matches the stemmed variants of the string and returns all the documents that contain any of the variants.

For example:
<expansion> 
    <sub weight="0.5">run**</sub>
    <sub weight="0.5">jog**</sub> 
</expansion>
				
In this example, if you query for documents that contain the word "run," the documents in the return set contain the word "run," "running," "jog," "jogging," "jogs," "runs," or "ran," and others.

Modification Type:MajorLast Reviewed:1/3/2003
Keywords:kbinfo KB289652