debian-python-whoosh/docs/source/querylang.rst

192 lines
4.8 KiB
ReStructuredText

==========================
The default query language
==========================
.. highlight:: none
Overview
========
A query consists of *terms* and *operators*. There are two types of terms: single
terms and *phrases*. Multiple terms can be combined with operators such as
*AND* and *OR*.
Whoosh supports indexing text in different *fields*. You must specify the
*default field* when you create the :class:`whoosh.qparser.QueryParser` object.
This is the field in which any terms the user does not explicitly specify a field
for will be searched.
Whoosh's query parser is capable of parsing different and/or additional syntax
through the use of plug-ins. See :doc:`parsing`.
Individual terms and phrases
============================
Find documents containing the term ``render``::
render
Find documents containing the phrase ``all was well``::
"all was well"
Note that a field must store Position information for phrase searching to work in
that field.
Normally when you specify a phrase, the maximum difference in position between
each word in the phrase is 1 (that is, the words must be right next to each
other in the document). For example, the following matches if a document has
``library`` within 5 words after ``whoosh``::
"whoosh library"~5
Boolean operators
=================
Find documents containing ``render`` *and* ``shading``::
render AND shading
Note that AND is the default relation between terms, so this is the same as::
render shading
Find documents containing ``render``, *and* also either ``shading`` *or*
``modeling``::
render AND shading OR modeling
Find documents containing ``render`` but *not* modeling::
render NOT modeling
Find documents containing ``alpha`` but not either ``beta`` or ``gamma``::
alpha NOT (beta OR gamma)
Note that when no boolean operator is specified between terms, the parser will
insert one, by default AND. So this query::
render shading modeling
is equivalent (by default) to::
render AND shading AND modeling
See :doc:`customizing the default parser <parsing>` for information on how to
change the default operator to OR.
Group operators together with parentheses. For example to find documents that
contain both ``render`` and ``shading``, or contain ``modeling``::
(render AND shading) OR modeling
Fields
======
Find the term ``ivan`` in the ``name`` field::
name:ivan
The ``field:`` prefix only sets the field for the term it directly precedes, so
the query::
title:open sesame
Will search for ``open`` in the ``title`` field and ``sesame`` in the *default*
field.
To apply a field prefix to multiple terms, group them with parentheses::
title:(open sesame)
This is the same as::
title:open title:sesame
Of course you can specify a field for phrases too::
title:"open sesame"
Inexact terms
=============
Use "globs" (wildcard expressions using ``?`` to represent a single character
and ``*`` to represent any number of characters) to match terms::
te?t test* *b?g*
Note that a wildcard starting with ``?`` or ``*`` is very slow. Note also that
these wildcards only match *individual terms*. For example, the query::
my*life
will **not** match an indexed phrase like::
my so called life
because those are four separate terms.
Ranges
======
You can match a range of terms. For example, the following query will match
documents containing terms in the lexical range from ``apple`` to ``bear``
*inclusive*. For example, it will match documents containing ``azores`` and
``be`` but not ``blur``::
[apple TO bear]
This is very useful when you've stored, for example, dates in a lexically sorted
format (i.e. YYYYMMDD)::
date:[20050101 TO 20090715]
The range is normally *inclusive* (that is, the range will match all terms
between the start and end term, *as well as* the start and end terms
themselves). You can specify that one or both ends of the range are *exclusive*
by using the ``{`` and/or ``}`` characters::
[0000 TO 0025}
{prefix TO suffix}
You can also specify *open-ended* ranges by leaving out the start or end term::
[0025 TO]
{TO suffix}
Boosting query elements
=======================
You can specify that certain parts of a query are more important for calculating
the score of a matched document than others. For example, to specify that
``ninja`` is twice as important as other words, and ``bear`` is half as
important::
ninja^2 cowboy bear^0.5
You can apply a boost to several terms using grouping parentheses::
(open sesame)^2.5 roc
Making a term from literal text
===============================
If you need to include characters in a term that are normally treated specially
by the parser, such as spaces, colons, or brackets, you can enclose the term
in single quotes::
path:'MacHD:My Documents'
'term with spaces'
title:'function()'