please dont rip this site

Word-Breaker DLLs and Noise Words

This topic describes how document text and properties returned by filters are broken up into words and how common words are excluded.

Word-Breaker DLLs

A word-breaker DLL parses the text and textual properties returned by the filter DLL into words. The word-breaker DLL is language dependent. For a list of languages supported by Index Server, see the Index Server Web page.

Noise Words

Words that are not significant for searching are called noise words or stop words. Noise words are stored in %systemroot%\system32 directory in various noise word files (Noise.dat, by default). The noise word files are language dependent. The noise word file for a particular language is specified in the registry under the key:

HKEY_LOCAL_MACHINE\SYSTEM
\SYSTEM
 \CurrentControlSet
  \Control
   \ContentIndex
    \Language
     \<language>
      \NoiseFile

For example, the noise word file for English_US is listed as the registry key:

HKEY_LOCAL_MACHINE\SYSTEM
\SYSTEM
 \CurrentControlSet
  \Control
   \ContentIndex
    \Language
     \English_US
      \NoiseFile
       \noise.dat

The noise word files can be edited with a text editor to either add new words or remove words that are not considered “noise” at a particular installation. Note that querying for noise words will not yield any hits.

Caution    Removing all noise words from the noise word files can significantly increase the size of indexes.


© 1997 by Microsoft Corporation. All rights reserved.


file: /Techref/language/asp/ix/ixfilwdn.htm, 3KB, , updated: 1997/9/29 02:23, local time: 2024/11/23 10:47,
TOP NEW HELP FIND: 
18.223.209.129:LOG IN

 ©2024 These pages are served without commercial sponsorship. (No popup ads, etc...).Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE. Questions?
Please DO link to this page! Digg it! / MAKE!

<A HREF="http://massmind.org/techref/language/asp/ix/ixfilwdn.htm"> Microsoft Index Server: Word-Breaker DLLs and Noise Words</A>

After you find an appropriate page, you are invited to your to this massmind site! (posts will be visible only to you before review) Just type a nice message (short messages are blocked as spam) in the box and press the Post button. (HTML welcomed, but not the <A tag: Instead, use the link box to link to another page. A tutorial is available Members can login to post directly, become page editors, and be credited for their posts.


Link? Put it here: 
if you want a response, please enter your email address: 
Attn spammers: All posts are reviewed before being made visible to anyone other than the poster.
Did you find what you needed?

 

Welcome to massmind.org!

 

Welcome to massmind.org!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  .