329 lines
11 KiB
ReStructuredText
329 lines
11 KiB
ReStructuredText
.. _ref-faceting:
|
|
|
|
========
|
|
Faceting
|
|
========
|
|
|
|
What Is Faceting?
|
|
-----------------
|
|
|
|
Faceting is a way to provide users with feedback about the number of documents
|
|
which match terms they may be interested in. At its simplest, it gives
|
|
document counts based on words in the corpus, date ranges, numeric ranges or
|
|
even advanced queries.
|
|
|
|
Faceting is particularly useful when trying to provide users with drill-down
|
|
capabilities. The general workflow in this regard is:
|
|
|
|
#. You can choose what you want to facet on.
|
|
#. The search engine will return the counts it sees for that match.
|
|
#. You display those counts to the user and provide them with a link.
|
|
#. When the user chooses a link, you narrow the search query to only include
|
|
those conditions and display the results, potentially with further facets.
|
|
|
|
.. note::
|
|
|
|
Faceting can be difficult, especially in providing the user with the right
|
|
number of options and/or the right areas to be able to drill into. This
|
|
is unique to every situation and demands following what real users need.
|
|
|
|
You may want to consider logging queries and looking at popular terms to
|
|
help you narrow down how you can help your users.
|
|
|
|
Haystack provides functionality so that all of the above steps are possible.
|
|
From the ground up, let's build a faceted search setup. This assumes that you
|
|
have been to work through the :doc:`tutorial` and have a working Haystack
|
|
installation. The same setup from the :doc:`tutorial` applies here.
|
|
|
|
1. Determine Facets And ``SearchQuerySet``
|
|
------------------------------------------
|
|
|
|
Determining what you want to facet on isn't always easy. For our purposes,
|
|
we'll facet on the ``author`` field.
|
|
|
|
In order to facet effectively, the search engine should store both a standard
|
|
representation of your data as well as exact version to facet on. This is
|
|
generally accomplished by duplicating the field and storing it via two
|
|
different types. Duplication is suggested so that those fields are still
|
|
searchable in the standard ways.
|
|
|
|
To inform Haystack of this, you simply pass along a ``faceted=True`` parameter
|
|
on the field(s) you wish to facet on. So to modify our existing example::
|
|
|
|
class NoteIndex(SearchIndex, indexes.Indexable):
|
|
text = CharField(document=True, use_template=True)
|
|
author = CharField(model_attr='user', faceted=True)
|
|
pub_date = DateTimeField(model_attr='pub_date')
|
|
|
|
Haystack quietly handles all of the backend details for you, creating a similar
|
|
field to the type you specified with ``_exact`` appended. Our example would now
|
|
have both a ``author`` and ``author_exact`` field, though this is largely an
|
|
implementation detail.
|
|
|
|
To pull faceting information out of the index, we'll use the
|
|
``SearchQuerySet.facet`` method to setup the facet and the
|
|
``SearchQuerySet.facet_counts`` method to retrieve back the counts seen.
|
|
|
|
Experimenting in a shell (``./manage.py shell``) is a good way to get a feel
|
|
for what various facets might look like::
|
|
|
|
>>> from haystack.query import SearchQuerySet
|
|
>>> sqs = SearchQuerySet().facet('author')
|
|
>>> sqs.facet_counts()
|
|
{
|
|
'dates': {},
|
|
'fields': {
|
|
'author': [
|
|
('john', 4),
|
|
('daniel', 2),
|
|
('sally', 1),
|
|
('terry', 1),
|
|
],
|
|
},
|
|
'queries': {}
|
|
}
|
|
|
|
.. note::
|
|
|
|
Note that, despite the duplication of fields, you should provide the
|
|
regular name of the field when faceting. Haystack will intelligently
|
|
handle the underlying details and mapping.
|
|
|
|
As you can see, we get back a dictionary which provides access to the three
|
|
types of facets available: ``fields``, ``dates`` and ``queries``. Since we only
|
|
faceted on the ``author`` field (which actually facets on the ``author_exact``
|
|
field managed by Haystack), only the ``fields`` key has any data
|
|
associated with it. In this case, we have a corpus of eight documents with four
|
|
unique authors.
|
|
|
|
.. note::
|
|
Facets are chainable, like most ``SearchQuerySet`` methods. However, unlike
|
|
most ``SearchQuerySet`` methods, they are *NOT* affected by ``filter`` or
|
|
similar methods. The only method that has any effect on facets is the
|
|
``narrow`` method (which is how you provide drill-down).
|
|
|
|
Configuring facet behaviour
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
You can configure the behaviour of your facets by passing options
|
|
for each facet in your SearchQuerySet. These options can be backend specific.
|
|
|
|
**limit**
|
|
*tested on Solr*
|
|
|
|
The ``limit`` parameter limits the results for each query. On Solr, the default `facet.limit`_ is 100 and a
|
|
negative number removes the limit.
|
|
|
|
.. _facet.limit: https://wiki.apache.org/solr/SimpleFacetParameters#facet.limit
|
|
|
|
Example usage::
|
|
|
|
>>> from haystack.query import SearchQuerySet
|
|
>>> sqs = SearchQuerySet().facet('author', limit=-1)
|
|
>>> sqs.facet_counts()
|
|
{
|
|
'dates': {},
|
|
'fields': {
|
|
'author': [
|
|
('abraham', 1),
|
|
('benny', 2),
|
|
('cindy', 1),
|
|
('diana', 5),
|
|
],
|
|
},
|
|
'queries': {}
|
|
}
|
|
|
|
>>> sqs = SearchQuerySet().facet('author', limit=2)
|
|
>>> sqs.facet_counts()
|
|
{
|
|
'dates': {},
|
|
'fields': {
|
|
'author': [
|
|
('abraham', 1),
|
|
('benny', 2),
|
|
],
|
|
},
|
|
'queries': {}
|
|
}
|
|
|
|
**sort**
|
|
*tested on Solr*
|
|
|
|
The ``sort`` parameter will sort the results for each query. Solr's default
|
|
`facet.sort`_ is ``index``, which will sort the facets alphabetically. Changing
|
|
the parameter to ``count`` will sort the facets by the number of results for
|
|
each facet value.
|
|
|
|
.. _facet.sort: https://wiki.apache.org/solr/SimpleFacetParameters#facet.sort
|
|
|
|
|
|
Example usage::
|
|
|
|
>>> from haystack.query import SearchQuerySet
|
|
>>> sqs = SearchQuerySet().facet('author', sort='index', )
|
|
>>> sqs.facet_counts()
|
|
{
|
|
'dates': {},
|
|
'fields': {
|
|
'author': [
|
|
('abraham', 1),
|
|
('benny', 2),
|
|
('cindy', 1),
|
|
('diana', 5),
|
|
],
|
|
},
|
|
'queries': {}
|
|
}
|
|
|
|
>>> sqs = SearchQuerySet().facet('author', sort='count', )
|
|
>>> sqs.facet_counts()
|
|
{
|
|
'dates': {},
|
|
'fields': {
|
|
'author': [
|
|
('diana', 5),
|
|
('benny', 2),
|
|
('abraham', 1),
|
|
('cindy', 1),
|
|
],
|
|
},
|
|
'queries': {}
|
|
}
|
|
|
|
|
|
Now that we have the facet we want, it's time to implement it.
|
|
|
|
2. Switch to the ``FacetedSearchView`` and ``FacetedSearchForm``
|
|
----------------------------------------------------------------
|
|
|
|
There are three things that we'll need to do to expose facets to our frontend.
|
|
The first is construct the ``SearchQuerySet`` we want to use. We should have
|
|
that from the previous step. The second is to switch to the
|
|
``FacetedSearchView``. This view is useful because it prepares the facet counts
|
|
and provides them in the context as ``facets``.
|
|
|
|
Optionally, the third step is to switch to the ``FacetedSearchForm``. As it
|
|
currently stands, this is only useful if you want to provide drill-down, though
|
|
it may provide more functionality in the future. We'll do it for the sake of
|
|
having it in place but know that it's not required.
|
|
|
|
In your URLconf, you'll need to switch to the ``FacetedSearchView``. Your
|
|
URLconf should resemble::
|
|
|
|
from django.conf.urls.defaults import *
|
|
from haystack.forms import FacetedSearchForm
|
|
from haystack.query import SearchQuerySet
|
|
from haystack.views import FacetedSearchView
|
|
|
|
|
|
sqs = SearchQuerySet().facet('author')
|
|
|
|
|
|
urlpatterns = patterns('haystack.views',
|
|
url(r'^$', FacetedSearchView(form_class=FacetedSearchForm, searchqueryset=sqs), name='haystack_search'),
|
|
)
|
|
|
|
The ``FacetedSearchView`` will now instantiate the ``FacetedSearchForm`` and use
|
|
the ``SearchQuerySet`` we provided. Now, a ``facets`` variable will be present
|
|
in the context. This is added in an overridden ``extra_context`` method.
|
|
|
|
|
|
3. Display The Facets In The Template
|
|
-------------------------------------
|
|
|
|
Templating facets involves simply adding an extra bit of processing to display
|
|
the facets (and optionally to link to provide drill-down). An example template
|
|
might look like this::
|
|
|
|
<form method="get" action=".">
|
|
<table>
|
|
<tbody>
|
|
{{ form.as_table }}
|
|
<tr>
|
|
<td> </td>
|
|
<td><input type="submit" value="Search"></td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</form>
|
|
|
|
{% if query %}
|
|
<!-- Begin faceting. -->
|
|
<h2>By Author</h2>
|
|
|
|
<div>
|
|
<dl>
|
|
{% if facets.fields.author %}
|
|
<dt>Author</dt>
|
|
{# Provide only the top 5 authors #}
|
|
{% for author in facets.fields.author|slice:":5" %}
|
|
<dd><a href="{{ request.get_full_path }}&selected_facets=author_exact:{{ author.0|urlencode }}">{{ author.0 }}</a> ({{ author.1 }})</dd>
|
|
{% endfor %}
|
|
{% else %}
|
|
<p>No author facets.</p>
|
|
{% endif %}
|
|
</dl>
|
|
</div>
|
|
<!-- End faceting -->
|
|
|
|
<!-- Display results... -->
|
|
{% for result in page.object_list %}
|
|
<div class="search_result">
|
|
<h3><a href="{{ result.object.get_absolute_url }}">{{ result.object.title }}</a></h3>
|
|
|
|
<p>{{ result.object.body|truncatewords:80 }}</p>
|
|
</div>
|
|
{% empty %}
|
|
<p>Sorry, no results found.</p>
|
|
{% endfor %}
|
|
{% endif %}
|
|
|
|
Displaying the facets is a matter of looping through the facets you want and
|
|
providing the UI to suit. The ``author.0`` is the facet text from the backend
|
|
and the ``author.1`` is the facet count.
|
|
|
|
4. Narrowing The Search
|
|
-----------------------
|
|
|
|
We've also set ourselves up for the last bit, the drill-down aspect. By
|
|
appending on the ``selected_facets`` to the URLs, we're informing the
|
|
``FacetedSearchForm`` that we want to narrow our results to only those
|
|
containing the author we provided.
|
|
|
|
For a concrete example, if the facets on author come back as::
|
|
|
|
{
|
|
'dates': {},
|
|
'fields': {
|
|
'author': [
|
|
('john', 4),
|
|
('daniel', 2),
|
|
('sally', 1),
|
|
('terry', 1),
|
|
],
|
|
},
|
|
'queries': {}
|
|
}
|
|
|
|
You should present a list similar to::
|
|
|
|
<ul>
|
|
<li><a href="/search/?q=Haystack&selected_facets=author_exact:john">john</a> (4)</li>
|
|
<li><a href="/search/?q=Haystack&selected_facets=author_exact:daniel">daniel</a> (2)</li>
|
|
<li><a href="/search/?q=Haystack&selected_facets=author_exact:sally">sally</a> (1)</li>
|
|
<li><a href="/search/?q=Haystack&selected_facets=author_exact:terry">terry</a> (1)</li>
|
|
</ul>
|
|
|
|
.. warning::
|
|
|
|
Haystack can automatically handle most details around faceting. However,
|
|
since ``selected_facets`` is passed directly to narrow, it must use the
|
|
duplicated field name. Improvements to this are planned but incomplete.
|
|
|
|
This is simply the default behavior but it is possible to override or provide
|
|
your own form which does additional processing. You could also write your own
|
|
faceted ``SearchView``, which could provide additional/different facets based
|
|
on facets chosen. There is a wide range of possibilities available to help the
|
|
user navigate your content.
|