|
This page describes how Panoptic processes a query consisting of a list of
words. In general, Panoptic tries to find documents which contain all of the
query words (implicit AND). However, if there are not enough of these, it
will present tiers of results corresponding to gradual relaxation of this
constraint. No AND operator is provided (or needed).
|
|
This page discusses:
|
|
| | |
|
| |
- Panoptic has a very liberal definition of what constitutes a word. Any unbroken sequence of letters and/or numbers such as Clinton, a, 1999, or ENGN3410, will do.
- By default, Panoptic does not distinguish between upper and lower case. Except for rare examples such as IT, case insensitivity produces the best results. Your Panoptic administrator can configure the system to
build case sensitive indexes if required. Even if an index includes case information, queries are
processed case insensitively by default. Default behaviour can be over-ridden
by the by the CASE CGI parameter.
- Out-of-the-box, Panoptic does not stem words either in the query or
in the index. You can specify stemming by appending a
cross-hatch ('#') to each query word you wish to stem. For example,
the query "economic# policy#" will match economic policy,
economics policy,
economic policies etc. Note that a Panoptic administrator can cause
Panoptic to activate stem matching by default.
- Compound words such as non-conformist, O'Toole, www.anu.edu.au, or reports/1999 also act like single words when they appear in a query although internally they are processed as phrases.
- Stop words such as the, a, and
of occur very frequently in documents and usually slow
down query processing while contributing nothing to the quality of
results. However, in some cases, eg. to be or not to be
they can be important. Accordingly, Panoptic indexes stopwords
in documents but ignores them in queries, unless the query
contains fewer than three non-stopwords or the stopwords occur
within a phrase. Panoptic's stopword list is currently hard-coded but
future versions may allow flexibility. As of Panoptic v5.0, the stopword
list includes French as well as English words.
| | |
|
| |
|
Short queries consist of six or fewer
non-stopwords. If any documents contain all of the query words, they will be
presented in the top tier of results. Documents which partially
match will be presented in subsequent tiers. Within a tier,
documents are ranked by their relevance score, which takes into
account the relative rarity of the query terms, the frequency with
which they occur in the document and the length of the document.
In addition, the scores of documents containing the query words in
their title or in the URL of the page are boosted.
|
|
Long queries consist of seven or more
non-stopwords. Top-tier answers need not necessarily contain
occurrences of all query words. Instead, documents are ranked by
their relevance score.
|
|
If queries are very long, Panoptic will process
the words in order of decreasing rarity and may ignore the most common
words.
|
| |
|
| |
| Example Query
| Explanation
| Illustrative results for this search
| | ITS
| Even though Panoptic is by default case insensitive, meaning that the acronym in this query is considered as a stopword, the results generated are quite good.
| show results
| | Life Matters
| In this example, the related sites
are generated by a customisable thesaurus. The top tier of results
contains more than 3000 documents which include both query words.
They are ranked by relevance score.
| show results
| | the ISR annual report 1999
| This illustrates the use of
numbers as words and the removal of a stop word. (See the "Query:" display under the search box.)
| show results
| | the union
| In this case the stopword is
not eliminated because there are too few other words.
| show results
| | non-conformist film
| As can be seen from the "query:"
display, the compound word is treated as a single
entity by converting it into a phrase before processing. This example
is a good illustration of results tiers because few documents within
the collection contain both terms (words).
| show results
| | Tell me about any research which might be going on
on genetic manipulation or genetic modification of plants. I'm really interested in possible risks and hazards.
| This is processed as a long
query. The "query:" display shows that stopwords have
been eliminated, and that the query words have been reordered and
enclosed in square brackets. The square brackets avoid the need for
top-tier documents to contain all of the query words. Ranking is
purely on the basis of relevance score.
| show results
| | | | | |
|
The example result pages linked to from the above table were
saved on 26 July 2001 from a variety of Panoptic search services.
The pages have been edited to remove
extraneous material such as headers, footers and logos.
|
|
|
|