CIS 170F: Windows 7 Administration

Week 9

User Productivity Tools
Search
Search Engine Terminology

The following terminology describes search and indexing as it has been implemented in Windows 7 and Windows Vista:

  • Catalog The index with the property cache.

  • Crawl scopes (inclusions and exclusions) Included and excluded paths within a search root. For example, if a user wants to index the D drive but exclude D:\Temp, he would add a crawl scope (inclusion) for "D:\*" and a crawl scope (Exclusion) for "D:\Temp\*". The Crawl Scope Manager would also add a start address for "D:\".

  • Gathering The process of discovering and accessing items from a data store using protocol handlers and IFilters.

  • IFilter A feature of the Windows Search engine that is used to extract text from documents so that it can be added to the index. (IFilters can also be used to extract format-specific properties, such as Subject or Author; however, in Windows Vista and Windows 7, property handlers are the preferred mechanism for extracting these properties.) Microsoft provides IFilters for many common document formats by default, while third-party vendors such as Adobe provide their own IFilters for indexing other types of content.

  • Property handler A feature of Windows that is used to extract format-dependent properties. This feature is used both by the Windows Search engine to read and index property values and also by Windows Explorer to read and write property values directly in the file. Microsoft provides property handlers for many common formats by default.

  • Indexing The process of building the system index and property cache, which together form the catalog.

  • Master index A single index formed by combining shadow indexes together using a process called the master merge. This is a content index and conceptually maps words to documents or other items.

  • Master merge The process of combining index fragments (shadow indexes) together into a single content index called the master index.

  • Property cache The persistent cache of properties (metadata) for indexed items. Basic file properties (such as the file size or last date modified) are added to the property cache for each indexed item; additional properties are added for items with format-specific properties collected by a property handler or IFilter. Indexing item properties allows users to search quickly through this information and create rich pivoted views based on available metadata.

  • Property store Another name for the property cache.

  • Protocol handler A feature of the Windows Search engine that is used to communicate with and enumerate the contents of stores such as the file system, Messaging Application Program Interface (MAPI) e-mail database, and the CSC or offline files database. Like IFilters, protocol handlers are also extensible.

  • Start address A Uniform Resource Locator (URL) that points to the starting location for indexed content. When indexing is performed, each configured starting address is enumerated by a protocol handler to find the content to be indexed.

  • Search root The base namespace of a given protocol handler.

  • Search defaults The default crawl scope(s) for a given search root.

  • Shadow indexes Temporary indexes that are created during the indexing process and then combined into a single index called the master index.

  • Shadow merge The process of combining index fragments (shadow indexes) together into the next level of index. The resulting index file will still be a shadow index, but merging indexes into bigger entities improves query performance.

  • System index The entire index on the system, including the master index, shadow indexes, and various configuration files, log files, and temporary files.