Simple Search Help
Click to go backClick to go home

This page describes how Panoptic processes a query consisting of a list of words. In general, Panoptic tries to find documents which contain all of the query words (implicit AND). However, if there are not enough of these, it will present tiers of results corresponding to gradual relaxation of this constraint. No AND operator is provided (or needed).
This page discusses:
(dot)
(dot)
(dot)
 
What is a word?
 
  1. Panoptic has a very liberal definition of what constitutes a word. Any unbroken sequence of letters and/or numbers such as Clinton, a, 1999, or ENGN3410, will do.
  2. By default, Panoptic does not distinguish between upper and lower case. Except for rare examples such as IT, case insensitivity produces the best results. Your Panoptic administrator can configure the system to build case sensitive indexes if required. Even if an index includes case information, queries are processed case insensitively by default. Default behaviour can be over-ridden by the by the CASE CGI parameter.
  3. Out-of-the-box, Panoptic does not stem words either in the query or in the index. You can specify stemming by appending a cross-hatch ('#') to each query word you wish to stem. For example, the query "economic# policy#" will match economic policy, economics policy, economic policies etc. Note that a Panoptic administrator can cause Panoptic to activate stem matching by default.
  4. Compound words such as non-conformist, O'Toole, www.anu.edu.au, or reports/1999 also act like single words when they appear in a query although internally they are processed as phrases.
  5. Stop words such as the, a, and of occur very frequently in documents and usually slow down query processing while contributing nothing to the quality of results. However, in some cases, eg. to be or not to be they can be important. Accordingly, Panoptic indexes stopwords in documents but ignores them in queries, unless the query contains fewer than three non-stopwords or the stopwords occur within a phrase. Panoptic's stopword list is currently hard-coded but future versions may allow flexibility. As of Panoptic v5.0, the stopword list includes French as well as English words.
 
How are simple queries processed?
 
Short queries consist of six or fewer non-stopwords. If any documents contain all of the query words, they will be presented in the top tier of results. Documents which partially match will be presented in subsequent tiers. Within a tier, documents are ranked by their relevance score, which takes into account the relative rarity of the query terms, the frequency with which they occur in the document and the length of the document. In addition, the scores of documents containing the query words in their title or in the URL of the page are boosted.
Long queries consist of seven or more non-stopwords. Top-tier answers need not necessarily contain occurrences of all query words. Instead, documents are ranked by their relevance score.
If queries are very long, Panoptic will process the words in order of decreasing rarity and may ignore the most common words.
 
Examples of Simple Queries
 
Example Query Explanation Illustrative results for this search
ITS Even though Panoptic is by default case insensitive, meaning that the acronym in this query is considered as a stopword, the results generated are quite good. show results
Life Matters In this example, the related sites are generated by a customisable thesaurus. The top tier of results contains more than 3000 documents which include both query words. They are ranked by relevance score. show results
the ISR annual report 1999 This illustrates the use of numbers as words and the removal of a stop word. (See the "Query:" display under the search box.) show results
the union In this case the stopword is not eliminated because there are too few other words. show results
non-conformist film As can be seen from the "query:" display, the compound word is treated as a single entity by converting it into a phrase before processing. This example is a good illustration of results tiers because few documents within the collection contain both terms (words). show results
Tell me about any research which might be going on on genetic manipulation or genetic modification of plants. I'm really interested in possible risks and hazards. This is processed as a long query. The "query:" display shows that stopwords have been eliminated, and that the query words have been reordered and enclosed in square brackets. The square brackets avoid the need for top-tier documents to contain all of the query words. Ranking is purely on the basis of relevance score. show results
 
The example result pages linked to from the above table were saved on 26 July 2001 from a variety of Panoptic search services. The pages have been edited to remove extraneous material such as headers, footers and logos.


Panoptic Search Engine

© Copyright CSIRO Australia, 1997-2004.