django-haystack (2.4.0-2) unstable; urgency=low
* Switch buildsystem to pybuild. * Add Python3 support through a separate package. * Add lintian override for missing upstream changelog. # imported from the archive
This commit is contained in:
commit
48a63b7514
|
@ -0,0 +1,106 @@
|
|||
Primary Authors:
|
||||
|
||||
* Daniel Lindsley
|
||||
* Matt Croydon (some documentation, sanity checks and the sweet name)
|
||||
* Travis Cline (the original SQ implementation, improvements to ModelSearchIndex)
|
||||
* David Sauve (notanumber) for the Xapian backend, the simple backend and various patches.
|
||||
* Jannis Leidel (jezdez)
|
||||
* Chris Adams (acdha)
|
||||
* Justin Caratzas (bigjust)
|
||||
* Andrew Schoen (andrewschoen)
|
||||
* Dan Watson (dcwatson)
|
||||
* Matt Woodward (mpwoodward)
|
||||
* Alex Vidal (avidal)
|
||||
* Zach Smith (zmsmith)
|
||||
* Stefan Wehrmeyer (stefanw)
|
||||
* George Hickman (ghickman)
|
||||
* Ben Spaulding (benspaulding)
|
||||
|
||||
|
||||
Thanks to
|
||||
* Jacob Kaplan-Moss & Joseph Kocherhans for the original implementation of
|
||||
djangosearch, of which portions were used, as well as basic API feedback.
|
||||
* Christian Metts for designing the logo and building a better site.
|
||||
* Nathan Borror for testing and advanced form usage.
|
||||
* Malcolm Tredinnick for API feedback.
|
||||
* Mediaphormedia for funding the development on More Like This and faceting.
|
||||
* Travis Cline for API feedback, Git help and improvements to the reindex command.
|
||||
* Brian Rosner for various patches.
|
||||
* Richard Boulton for feedback and suggestions.
|
||||
* Cyberdelia for feedback and patches.
|
||||
* Ask Solem for for patching the setup.py.
|
||||
* Ben Spaulding for feedback and documentation patches.
|
||||
* smulloni for various patches.
|
||||
* JoeGermuska for various patches.
|
||||
* SmileyChris for various patches.
|
||||
* sk1p for various patches.
|
||||
* Ryszard Szopa (ryszard) for various patches.
|
||||
* Patryk Zawadzki (patrys) for various patches and feedback.
|
||||
* Frank Wiles for documentation patches.
|
||||
* Chris Adams (acdha) for various patches.
|
||||
* Kyle MacFarlane for various patches.
|
||||
* Alex Gaynor (alex) for help with handling deferred models with More Like This.
|
||||
* RobertGawron for a patch to the Highlighter.
|
||||
* Simon Willison (simonw) for various proposals and patches.
|
||||
* Ben Firshman (bfirsh) for faceting improvements and suggestions.
|
||||
* Peter Bengtsson for a patch regarding passing a customized site.
|
||||
* Sam Bull (osirius) for a patch regarding initial data on SearchForms.
|
||||
* slai for a patch regarding Whoosh and fetching all documents of a certain model type.
|
||||
* alanwj for a patch regarding Whoosh and empty MultiValueFields.
|
||||
* alanzoppa for a patch regarding highlighting.
|
||||
* piquadrat for a patch regarding the more_like_this template tag.
|
||||
* dedsm for a patch regarding the pickling of SearchResult objects.
|
||||
* EmilStenstrom for a patch to the Highlighter.
|
||||
* symroe for a patch regarding the more_like_this template tag.
|
||||
* ghostrocket for a patch regarding the simple backend.
|
||||
* Rob Hudson (robhudson) for improvements to the admin search.
|
||||
* apollo13 for simplifying ``SearchForm.__init__``.
|
||||
* Carl Meyer (carljm) for a patch regarding character primary keys.
|
||||
* oyiptong for a patch regarding pickling.
|
||||
* alfredo for a patch to generate epub docs.
|
||||
* Luke Hatcher (lukeman) for documentation patches.
|
||||
* Trey Hunner (treyhunner) for a Whoosh field boosting patch.
|
||||
* Kent Gormat of Retail Catalyst for funding the development of multiple index support.
|
||||
* Gidsy for funding the initial geospatial implementation
|
||||
* CMGdigital for funding the development on:
|
||||
* a multiprocessing-enabled version of ``update_index``.
|
||||
* the addition of ``--start/--end`` options in ``update_index``.
|
||||
* the ability to specify both apps & models to ``update_index``.
|
||||
* A significant portion of the geospatial feature.
|
||||
* A significant portion of the input types feature.
|
||||
* Aram Dulyan (Aramgutang) for fixing the included admin class to be Django 1.4 compatible.
|
||||
* Honza Kral (HonzaKral) for various Elasticsearch tweaks & testing.
|
||||
* Alex Vidal (avidal) for a patch allowing developers to override the queryset used for update operations.
|
||||
* Igor Támara (ikks) for a patch related to Unicode ``verbose_name_plural``.
|
||||
* Dan Helfman (witten) for a patch related to highlighting.
|
||||
* Matt DeBoard for refactor of ``SolrSearchBackend.search`` method to allow simpler extension of the class.
|
||||
* Rodrigo Guzman (rz) for a fix to query handling in the ``simple`` backend.
|
||||
* Martin J. Laubach (mjl) for fixing the logic used when combining querysets
|
||||
* Eric Holscher (ericholscher) for a docs fix.
|
||||
* Erik Rose (erikrose) for a quick pyelasticsearch-compatibility patch
|
||||
* Stefan Wehrmeyer (stefanw) for a simple search filter fix
|
||||
* Dan Watson (dcwatson) for various patches.
|
||||
* Andrew Schoen (andrewschoen) for the addition of ``HAYSTACK_IDENTIFIER_METHOD``
|
||||
* Pablo SEMINARIO (pabluk) for a docs fix, and a fix in the ElasticSearch backend.
|
||||
* Eric Thurgood (ethurgood) for a import fix in the Elasticssearch backend.
|
||||
* Revolution Systems & The Python Software Foundation for funding a significant portion of the port to Python 3!
|
||||
* Artem Kostiuk (postatum) for patch allowing to search for slash character in ElasticSearch since Lucene 4.0.
|
||||
* Luis Barrueco (luisbarrueco) for a simple fix regarding updating indexes using multiple backends.
|
||||
* Szymon Teżewski (jasisz) for an update to the bounding-box calculation for spatial queries
|
||||
* Chris Wilson (qris) and Orlando Fiol (overflow) for an update allowing the use of multiple order_by()
|
||||
fields with Whoosh as long as they share a consistent sort direction
|
||||
* Steven Skoczen (@skoczen) for an ElasticSearch bug fix
|
||||
* @Xaroth for updating the app loader to be compatible with Django 1.7
|
||||
* Jaroslav Gorjatsev (jarig) for a bugfix with index_fieldname
|
||||
* Dirk Eschler (@deschler) for app loader Django 1.7 compatibility fixes
|
||||
* Wictor (wicol) for a patch improving the error message given when model_attr references a non-existent
|
||||
field
|
||||
* Pierre Dulac (dulaccc) for a patch updating distance filters for ElasticSearch 1.x
|
||||
* Andrei Fokau (andreif) for adding support for ``SQ`` in ``SearchQuerySet.narrow()``
|
||||
* Phill Tornroth (phill-tornroth) for several patches improving UnifiedIndex and ElasticSearch support
|
||||
* Philippe Luickx (philippeluickx) for documenting how to provide backend-specific facet options
|
||||
* Felipe Prenholato (@chronossc) for a patch making it easy to exclude documents from indexing using custom logic
|
||||
* Alfredo Armanini (@phingage) for a patch fixing compatibility with database API changes in Django 1.8
|
||||
* Ben Spaulding (@benspaulding) for many updates for Django 1.8 support
|
||||
* Troy Grosfield (@troygrosfield) for fixing the test runner for Django 1.8
|
||||
* Ilan Steemers (@Koed00) for fixing Django 1.9 deprecation warnings
|
|
@ -0,0 +1,31 @@
|
|||
Copyright (c) 2009-2013, Daniel Lindsley.
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without modification,
|
||||
are permitted provided that the following conditions are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright notice,
|
||||
this list of conditions and the following disclaimer.
|
||||
|
||||
2. Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
3. Neither the name of Haystack nor the names of its contributors may be used
|
||||
to endorse or promote products derived from this software without
|
||||
specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
||||
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
|
||||
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
||||
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
||||
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
|
||||
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
---
|
||||
|
||||
Prior to April 17, 2009, this software was released under the MIT license.
|
|
@ -0,0 +1,5 @@
|
|||
recursive-include docs *
|
||||
recursive-include haystack/templates *.xml *.html
|
||||
include AUTHORS
|
||||
include LICENSE
|
||||
include README.rst
|
|
@ -0,0 +1,79 @@
|
|||
Metadata-Version: 1.1
|
||||
Name: django-haystack
|
||||
Version: 2.4.0
|
||||
Summary: Pluggable search for Django.
|
||||
Home-page: http://haystacksearch.org/
|
||||
Author: Daniel Lindsley
|
||||
Author-email: daniel@toastdriven.com
|
||||
License: UNKNOWN
|
||||
Description: ========
|
||||
Haystack
|
||||
========
|
||||
|
||||
:author: Daniel Lindsley
|
||||
:date: 2013/07/28
|
||||
|
||||
Haystack provides modular search for Django. It features a unified, familiar
|
||||
API that allows you to plug in different search backends (such as Solr_,
|
||||
Elasticsearch_, Whoosh_, Xapian_, etc.) without having to modify your code.
|
||||
|
||||
.. _Solr: http://lucene.apache.org/solr/
|
||||
.. _Elasticsearch: http://elasticsearch.org/
|
||||
.. _Whoosh: https://bitbucket.org/mchaput/whoosh/
|
||||
.. _Xapian: http://xapian.org/
|
||||
|
||||
Haystack is BSD licensed, plays nicely with third-party app without needing to
|
||||
modify the source and supports advanced features like faceting, More Like This,
|
||||
highlighting, spatial search and spelling suggestions.
|
||||
|
||||
You can find more information at http://haystacksearch.org/.
|
||||
|
||||
|
||||
Getting Help
|
||||
============
|
||||
|
||||
There is a mailing list (http://groups.google.com/group/django-haystack/)
|
||||
available for general discussion and an IRC channel (#haystack on
|
||||
irc.freenode.net).
|
||||
|
||||
|
||||
Documentation
|
||||
=============
|
||||
|
||||
* Development version: http://docs.haystacksearch.org/
|
||||
* v2.3.X: http://django-haystack.readthedocs.org/en/v2.3.0/
|
||||
* v2.2.X: http://django-haystack.readthedocs.org/en/v2.2.0/
|
||||
* v2.1.X: http://django-haystack.readthedocs.org/en/v2.1.0/
|
||||
* v2.0.X: http://django-haystack.readthedocs.org/en/v2.0.0/
|
||||
* v1.2.X: http://django-haystack.readthedocs.org/en/v1.2.7/
|
||||
* v1.1.X: http://django-haystack.readthedocs.org/en/v1.1/
|
||||
|
||||
Build Status
|
||||
============
|
||||
|
||||
.. image:: https://travis-ci.org/django-haystack/django-haystack.svg?branch=master
|
||||
:target: https://travis-ci.org/django-haystack/django-haystack
|
||||
|
||||
Requirements
|
||||
============
|
||||
|
||||
Haystack has a relatively easily-met set of requirements.
|
||||
|
||||
* Python 2.7+ or Python 3.3+
|
||||
* Django 1.6+
|
||||
|
||||
Additionally, each backend has its own requirements. You should refer to
|
||||
http://django-haystack.readthedocs.org/en/latest/installing_search_engines.html for more
|
||||
details.
|
||||
|
||||
Platform: UNKNOWN
|
||||
Classifier: Development Status :: 5 - Production/Stable
|
||||
Classifier: Environment :: Web Environment
|
||||
Classifier: Framework :: Django
|
||||
Classifier: Intended Audience :: Developers
|
||||
Classifier: License :: OSI Approved :: BSD License
|
||||
Classifier: Operating System :: OS Independent
|
||||
Classifier: Programming Language :: Python
|
||||
Classifier: Programming Language :: Python :: 2
|
||||
Classifier: Programming Language :: Python :: 3
|
||||
Classifier: Topic :: Utilities
|
|
@ -0,0 +1,59 @@
|
|||
========
|
||||
Haystack
|
||||
========
|
||||
|
||||
:author: Daniel Lindsley
|
||||
:date: 2013/07/28
|
||||
|
||||
Haystack provides modular search for Django. It features a unified, familiar
|
||||
API that allows you to plug in different search backends (such as Solr_,
|
||||
Elasticsearch_, Whoosh_, Xapian_, etc.) without having to modify your code.
|
||||
|
||||
.. _Solr: http://lucene.apache.org/solr/
|
||||
.. _Elasticsearch: http://elasticsearch.org/
|
||||
.. _Whoosh: https://bitbucket.org/mchaput/whoosh/
|
||||
.. _Xapian: http://xapian.org/
|
||||
|
||||
Haystack is BSD licensed, plays nicely with third-party app without needing to
|
||||
modify the source and supports advanced features like faceting, More Like This,
|
||||
highlighting, spatial search and spelling suggestions.
|
||||
|
||||
You can find more information at http://haystacksearch.org/.
|
||||
|
||||
|
||||
Getting Help
|
||||
============
|
||||
|
||||
There is a mailing list (http://groups.google.com/group/django-haystack/)
|
||||
available for general discussion and an IRC channel (#haystack on
|
||||
irc.freenode.net).
|
||||
|
||||
|
||||
Documentation
|
||||
=============
|
||||
|
||||
* Development version: http://docs.haystacksearch.org/
|
||||
* v2.3.X: http://django-haystack.readthedocs.org/en/v2.3.0/
|
||||
* v2.2.X: http://django-haystack.readthedocs.org/en/v2.2.0/
|
||||
* v2.1.X: http://django-haystack.readthedocs.org/en/v2.1.0/
|
||||
* v2.0.X: http://django-haystack.readthedocs.org/en/v2.0.0/
|
||||
* v1.2.X: http://django-haystack.readthedocs.org/en/v1.2.7/
|
||||
* v1.1.X: http://django-haystack.readthedocs.org/en/v1.1/
|
||||
|
||||
Build Status
|
||||
============
|
||||
|
||||
.. image:: https://travis-ci.org/django-haystack/django-haystack.svg?branch=master
|
||||
:target: https://travis-ci.org/django-haystack/django-haystack
|
||||
|
||||
Requirements
|
||||
============
|
||||
|
||||
Haystack has a relatively easily-met set of requirements.
|
||||
|
||||
* Python 2.7+ or Python 3.3+
|
||||
* Django 1.6+
|
||||
|
||||
Additionally, each backend has its own requirements. You should refer to
|
||||
http://django-haystack.readthedocs.org/en/latest/installing_search_engines.html for more
|
||||
details.
|
|
@ -0,0 +1,44 @@
|
|||
django-haystack (2.4.0-2) unstable; urgency=low
|
||||
|
||||
* Switch buildsystem to pybuild.
|
||||
* Add Python3 support through a separate package.
|
||||
* Add lintian override for missing upstream changelog.
|
||||
|
||||
-- Michael Fladischer <fladi@debian.org> Tue, 07 Jul 2015 22:12:20 +0200
|
||||
|
||||
django-haystack (2.4.0-1) unstable; urgency=low
|
||||
|
||||
* New upstream release.
|
||||
* Remove files from d/copyright which are no longer shipped by
|
||||
upstream.
|
||||
* Use pypi.debian.net service for uscan.
|
||||
* Change my email address to fladi@debian.org.
|
||||
|
||||
-- Michael Fladischer <fladi@debian.org> Tue, 07 Jul 2015 16:18:03 +0200
|
||||
|
||||
django-haystack (2.3.1-1) unstable; urgency=medium
|
||||
|
||||
* New upstream release (Closes: #755599).
|
||||
* Bump Standards-Version to 3.9.6.
|
||||
* Disable tests as they require a live SOLR and elasticsearch server.
|
||||
* Change file names for solr configuration files in d/copyright.
|
||||
* Make pysolr require at least version 3.2.0.
|
||||
* Add python-elasticsearch to Suggests.
|
||||
* Drop packages required by tests from Build-Depends:
|
||||
+ python-django
|
||||
+ python-httplib2
|
||||
+ python-mock
|
||||
+ python-pysolr
|
||||
+ python-whoosh
|
||||
* Drop python-xapian from suggests as the xapian backend is not
|
||||
included.
|
||||
* Add django_haystack.egg-info/requires.txt to d/clean.
|
||||
* Remove empty lines at EOF for d/clean and d/rules.
|
||||
|
||||
-- Michael Fladischer <FladischerMichael@fladi.at> Mon, 20 Oct 2014 14:18:24 +0200
|
||||
|
||||
django-haystack (2.1.0-1) unstable; urgency=low
|
||||
|
||||
* Initial release (Closes: #563311).
|
||||
|
||||
-- Michael Fladischer <FladischerMichael@fladi.at> Thu, 13 Mar 2014 19:11:15 +0100
|
|
@ -0,0 +1,6 @@
|
|||
django_haystack.egg-info/PKG-INFO
|
||||
django_haystack.egg-info/SOURCES.txt
|
||||
django_haystack.egg-info/dependency_links.txt
|
||||
django_haystack.egg-info/not-zip-safe
|
||||
django_haystack.egg-info/top_level.txt
|
||||
django_haystack.egg-info/requires.txt
|
|
@ -0,0 +1 @@
|
|||
9
|
|
@ -0,0 +1,71 @@
|
|||
Source: django-haystack
|
||||
Section: python
|
||||
Priority: optional
|
||||
Maintainer: Debian Python Modules Team <python-modules-team@lists.alioth.debian.org>
|
||||
Uploaders: Michael Fladischer <fladi@debian.org>
|
||||
Build-Depends: debhelper (>= 9),
|
||||
dh-python,
|
||||
python-all,
|
||||
python-setuptools,
|
||||
python-sphinx (>= 1.0.7+dfsg),
|
||||
python3-all,
|
||||
python3-setuptools
|
||||
Standards-Version: 3.9.6
|
||||
X-Python-Version: >= 2.6
|
||||
X-Python3-Version: >= 3.3
|
||||
Homepage: https://github.com/toastdriven/django-haystack
|
||||
Vcs-Svn: svn://anonscm.debian.org/python-modules/packages/django-haystack/trunk/
|
||||
Vcs-Browser: http://anonscm.debian.org/viewvc/python-modules/packages/django-haystack/trunk/
|
||||
|
||||
Package: python-django-haystack
|
||||
Architecture: all
|
||||
Depends: python-django (>= 1.5),
|
||||
${misc:Depends},
|
||||
${python:Depends}
|
||||
Suggests: python-elasticsearch,
|
||||
python-httplib2,
|
||||
python-pysolr (>= 3.2.0),
|
||||
python-whoosh
|
||||
Description: modular search for Django
|
||||
Haystack provides modular search for Django. It features a unified, familiar
|
||||
API that allows you to plug in different search backends (such as Solr,
|
||||
Elasticsearch, Whoosh, Xapian, etc.) without having to modify your code.
|
||||
.
|
||||
It plays nicely with third-party app without needing to modify the source and
|
||||
supports advanced features like faceting, More Like This, highlighting, spatial
|
||||
search and spelling suggestions.
|
||||
|
||||
Package: python3-django-haystack
|
||||
Architecture: all
|
||||
Depends: python3-django,
|
||||
${misc:Depends},
|
||||
${python3:Depends}
|
||||
Suggests: python3-elasticsearch,
|
||||
python3-httplib2,
|
||||
python3-whoosh
|
||||
Description: modular search for Django (Python3 version)
|
||||
Haystack provides modular search for Django. It features a unified, familiar
|
||||
API that allows you to plug in different search backends (such as Solr,
|
||||
Elasticsearch, Whoosh, Xapian, etc.) without having to modify your code.
|
||||
.
|
||||
It plays nicely with third-party app without needing to modify the source and
|
||||
supports advanced features like faceting, More Like This, highlighting, spatial
|
||||
search and spelling suggestions.
|
||||
.
|
||||
This package contains the Python 3 version of the library.
|
||||
|
||||
Package: python-django-haystack-doc
|
||||
Section: doc
|
||||
Architecture: all
|
||||
Depends: ${misc:Depends},
|
||||
${sphinxdoc:Depends}
|
||||
Description: modular search for Django (Documentation)
|
||||
Haystack provides modular search for Django. It features a unified, familiar
|
||||
API that allows you to plug in different search backends (such as Solr,
|
||||
Elasticsearch, Whoosh, Xapian, etc.) without having to modify your code.
|
||||
.
|
||||
It plays nicely with third-party app without needing to modify the source and
|
||||
supports advanced features like faceting, More Like This, highlighting, spatial
|
||||
search and spelling suggestions.
|
||||
.
|
||||
This package contains the documentation.
|
|
@ -0,0 +1,55 @@
|
|||
Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
|
||||
Upstream-Name: django-haystack
|
||||
Upstream-Contact: Daniel Lindsley <daniel@toastdriven.com>
|
||||
Source: https://github.com/toastdriven/django-haystack
|
||||
|
||||
Files: *
|
||||
Copyright: 2009-2013, Daniel Lindsley <daniel@toastdriven.com>
|
||||
License: BSD-3-clause
|
||||
|
||||
Files: haystack/templates/search_configuration/solr.xml
|
||||
Copyright: Apache Software Foundation
|
||||
License: Apache
|
||||
|
||||
Files: debian/*
|
||||
Copyright: 2013, Fladischer Michael <fladi@debian.org>
|
||||
License: BSD-3-clause
|
||||
|
||||
License: BSD-3-clause
|
||||
Redistribution and use in source and binary forms, with or without modification,
|
||||
are permitted provided that the following conditions are met:
|
||||
.
|
||||
1. Redistributions of source code must retain the above copyright notice,
|
||||
this list of conditions and the following disclaimer.
|
||||
.
|
||||
2. Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
.
|
||||
3. Neither the name of Haystack nor the names of its contributors may be used
|
||||
to endorse or promote products derived from this software without
|
||||
specific prior written permission.
|
||||
.
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
||||
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
|
||||
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
||||
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
||||
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
|
||||
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
License: Apache
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
.
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
.
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
|
@ -0,0 +1,8 @@
|
|||
Document: python-django-haystack-doc
|
||||
Title: Dango Haystack Documentation
|
||||
Author: Daniel Lindsley <daniel@toastdriven.com>
|
||||
Section: Programming/Python
|
||||
|
||||
Format: HTML
|
||||
Index: /usr/share/doc/python-django-haystack-doc/html/index.html
|
||||
Files: /usr/share/doc/python-django-haystack-doc/html/*.html
|
|
@ -0,0 +1 @@
|
|||
docs/_build/html
|
|
@ -0,0 +1,2 @@
|
|||
# Upstream does not provide a changelog.
|
||||
python-django-haystack-doc: no-upstream-changelog
|
|
@ -0,0 +1 @@
|
|||
README.rst
|
|
@ -0,0 +1,2 @@
|
|||
# Upstream does not provide a changelog.
|
||||
python-django-haystack: no-upstream-changelog
|
|
@ -0,0 +1 @@
|
|||
README.rst
|
|
@ -0,0 +1,2 @@
|
|||
# Upstream does not provide a changelog.
|
||||
python3-django-haystack: no-upstream-changelog
|
|
@ -0,0 +1,15 @@
|
|||
#!/usr/bin/make -f
|
||||
|
||||
export PYBUILD_NAME=django-haystack
|
||||
export PYBUILD_DISABLE=test
|
||||
|
||||
%:
|
||||
dh $@ --with python2,python3,sphinxdoc --buildsystem=pybuild
|
||||
|
||||
override_dh_auto_build:
|
||||
PYTHONPATH=. sphinx-build -b html -d docs/_build/.doctrees -N docs docs/_build/html
|
||||
dh_auto_build
|
||||
|
||||
override_dh_clean:
|
||||
rm -rf docs/_build
|
||||
dh_clean
|
|
@ -0,0 +1 @@
|
|||
3.0 (quilt)
|
|
@ -0,0 +1,2 @@
|
|||
# Upstream does not provide PGP signatures for their release tarballs.
|
||||
django-haystack source: debian-watch-may-check-gpg-signature
|
|
@ -0,0 +1,3 @@
|
|||
version=3
|
||||
opts=uversionmangle=s/(rc|a|b|c)/~$1/ \
|
||||
http://pypi.debian.net/django-haystack/django-haystack-(.+)\.(?:zip|tgz|tbz|txz|(?:tar\.(?:gz|bz2|xz)))
|
|
@ -0,0 +1,79 @@
|
|||
Metadata-Version: 1.1
|
||||
Name: django-haystack
|
||||
Version: 2.4.0
|
||||
Summary: Pluggable search for Django.
|
||||
Home-page: http://haystacksearch.org/
|
||||
Author: Daniel Lindsley
|
||||
Author-email: daniel@toastdriven.com
|
||||
License: UNKNOWN
|
||||
Description: ========
|
||||
Haystack
|
||||
========
|
||||
|
||||
:author: Daniel Lindsley
|
||||
:date: 2013/07/28
|
||||
|
||||
Haystack provides modular search for Django. It features a unified, familiar
|
||||
API that allows you to plug in different search backends (such as Solr_,
|
||||
Elasticsearch_, Whoosh_, Xapian_, etc.) without having to modify your code.
|
||||
|
||||
.. _Solr: http://lucene.apache.org/solr/
|
||||
.. _Elasticsearch: http://elasticsearch.org/
|
||||
.. _Whoosh: https://bitbucket.org/mchaput/whoosh/
|
||||
.. _Xapian: http://xapian.org/
|
||||
|
||||
Haystack is BSD licensed, plays nicely with third-party app without needing to
|
||||
modify the source and supports advanced features like faceting, More Like This,
|
||||
highlighting, spatial search and spelling suggestions.
|
||||
|
||||
You can find more information at http://haystacksearch.org/.
|
||||
|
||||
|
||||
Getting Help
|
||||
============
|
||||
|
||||
There is a mailing list (http://groups.google.com/group/django-haystack/)
|
||||
available for general discussion and an IRC channel (#haystack on
|
||||
irc.freenode.net).
|
||||
|
||||
|
||||
Documentation
|
||||
=============
|
||||
|
||||
* Development version: http://docs.haystacksearch.org/
|
||||
* v2.3.X: http://django-haystack.readthedocs.org/en/v2.3.0/
|
||||
* v2.2.X: http://django-haystack.readthedocs.org/en/v2.2.0/
|
||||
* v2.1.X: http://django-haystack.readthedocs.org/en/v2.1.0/
|
||||
* v2.0.X: http://django-haystack.readthedocs.org/en/v2.0.0/
|
||||
* v1.2.X: http://django-haystack.readthedocs.org/en/v1.2.7/
|
||||
* v1.1.X: http://django-haystack.readthedocs.org/en/v1.1/
|
||||
|
||||
Build Status
|
||||
============
|
||||
|
||||
.. image:: https://travis-ci.org/django-haystack/django-haystack.svg?branch=master
|
||||
:target: https://travis-ci.org/django-haystack/django-haystack
|
||||
|
||||
Requirements
|
||||
============
|
||||
|
||||
Haystack has a relatively easily-met set of requirements.
|
||||
|
||||
* Python 2.7+ or Python 3.3+
|
||||
* Django 1.6+
|
||||
|
||||
Additionally, each backend has its own requirements. You should refer to
|
||||
http://django-haystack.readthedocs.org/en/latest/installing_search_engines.html for more
|
||||
details.
|
||||
|
||||
Platform: UNKNOWN
|
||||
Classifier: Development Status :: 5 - Production/Stable
|
||||
Classifier: Environment :: Web Environment
|
||||
Classifier: Framework :: Django
|
||||
Classifier: Intended Audience :: Developers
|
||||
Classifier: License :: OSI Approved :: BSD License
|
||||
Classifier: Operating System :: OS Independent
|
||||
Classifier: Programming Language :: Python
|
||||
Classifier: Programming Language :: Python :: 2
|
||||
Classifier: Programming Language :: Python :: 3
|
||||
Classifier: Topic :: Utilities
|
|
@ -0,0 +1,99 @@
|
|||
AUTHORS
|
||||
LICENSE
|
||||
MANIFEST.in
|
||||
README.rst
|
||||
setup.cfg
|
||||
setup.py
|
||||
django_haystack.egg-info/PKG-INFO
|
||||
django_haystack.egg-info/SOURCES.txt
|
||||
django_haystack.egg-info/dependency_links.txt
|
||||
django_haystack.egg-info/not-zip-safe
|
||||
django_haystack.egg-info/pbr.json
|
||||
django_haystack.egg-info/requires.txt
|
||||
django_haystack.egg-info/top_level.txt
|
||||
docs/Makefile
|
||||
docs/admin.rst
|
||||
docs/architecture_overview.rst
|
||||
docs/autocomplete.rst
|
||||
docs/backend_support.rst
|
||||
docs/best_practices.rst
|
||||
docs/boost.rst
|
||||
docs/conf.py
|
||||
docs/contributing.rst
|
||||
docs/creating_new_backends.rst
|
||||
docs/debugging.rst
|
||||
docs/faceting.rst
|
||||
docs/faq.rst
|
||||
docs/glossary.rst
|
||||
docs/highlighting.rst
|
||||
docs/index.rst
|
||||
docs/inputtypes.rst
|
||||
docs/installing_search_engines.rst
|
||||
docs/management_commands.rst
|
||||
docs/migration_from_1_to_2.rst
|
||||
docs/multiple_index.rst
|
||||
docs/other_apps.rst
|
||||
docs/python3.rst
|
||||
docs/rich_content_extraction.rst
|
||||
docs/running_tests.rst
|
||||
docs/searchbackend_api.rst
|
||||
docs/searchfield_api.rst
|
||||
docs/searchindex_api.rst
|
||||
docs/searchquery_api.rst
|
||||
docs/searchqueryset_api.rst
|
||||
docs/searchresult_api.rst
|
||||
docs/settings.rst
|
||||
docs/signal_processors.rst
|
||||
docs/spatial.rst
|
||||
docs/templatetags.rst
|
||||
docs/toc.rst
|
||||
docs/tutorial.rst
|
||||
docs/utils.rst
|
||||
docs/views_and_forms.rst
|
||||
docs/who_uses.rst
|
||||
docs/_build/.gitignore
|
||||
docs/_static/.gitignore
|
||||
docs/_templates/.gitignore
|
||||
docs/haystack_theme/layout.html
|
||||
docs/haystack_theme/theme.conf
|
||||
docs/haystack_theme/static/documentation.css
|
||||
haystack/__init__.py
|
||||
haystack/admin.py
|
||||
haystack/constants.py
|
||||
haystack/exceptions.py
|
||||
haystack/fields.py
|
||||
haystack/forms.py
|
||||
haystack/generic_views.py
|
||||
haystack/indexes.py
|
||||
haystack/inputs.py
|
||||
haystack/manager.py
|
||||
haystack/models.py
|
||||
haystack/panels.py
|
||||
haystack/query.py
|
||||
haystack/routers.py
|
||||
haystack/signals.py
|
||||
haystack/urls.py
|
||||
haystack/views.py
|
||||
haystack/backends/__init__.py
|
||||
haystack/backends/elasticsearch_backend.py
|
||||
haystack/backends/simple_backend.py
|
||||
haystack/backends/solr_backend.py
|
||||
haystack/backends/whoosh_backend.py
|
||||
haystack/management/__init__.py
|
||||
haystack/management/commands/__init__.py
|
||||
haystack/management/commands/build_solr_schema.py
|
||||
haystack/management/commands/clear_index.py
|
||||
haystack/management/commands/haystack_info.py
|
||||
haystack/management/commands/rebuild_index.py
|
||||
haystack/management/commands/update_index.py
|
||||
haystack/templates/panels/haystack.html
|
||||
haystack/templates/search_configuration/solr.xml
|
||||
haystack/templatetags/__init__.py
|
||||
haystack/templatetags/highlight.py
|
||||
haystack/templatetags/more_like_this.py
|
||||
haystack/utils/__init__.py
|
||||
haystack/utils/app_loading.py
|
||||
haystack/utils/geo.py
|
||||
haystack/utils/highlighting.py
|
||||
haystack/utils/loading.py
|
||||
haystack/utils/log.py
|
|
@ -0,0 +1 @@
|
|||
|
|
@ -0,0 +1 @@
|
|||
|
|
@ -0,0 +1 @@
|
|||
{"is_release": false, "git_version": "ebf1a5c"}
|
|
@ -0,0 +1 @@
|
|||
Django
|
|
@ -0,0 +1 @@
|
|||
haystack
|
|
@ -0,0 +1,80 @@
|
|||
# Makefile for Sphinx documentation
|
||||
#
|
||||
|
||||
# You can set these variables from the command line.
|
||||
SPHINXOPTS =
|
||||
SPHINXBUILD = sphinx-build
|
||||
PAPER =
|
||||
|
||||
# Internal variables.
|
||||
PAPEROPT_a4 = -D latex_paper_size=a4
|
||||
PAPEROPT_letter = -D latex_paper_size=letter
|
||||
ALLSPHINXOPTS = -d _build/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
|
||||
|
||||
.PHONY: help clean html web pickle htmlhelp latex changes linkcheck
|
||||
|
||||
help:
|
||||
@echo "Please use \`make <target>' where <target> is one of"
|
||||
@echo " html to make standalone HTML files"
|
||||
@echo " pickle to make pickle files"
|
||||
@echo " json to make JSON files"
|
||||
@echo " htmlhelp to make HTML files and a HTML help project"
|
||||
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
|
||||
@echo " changes to make an overview over all changed/added/deprecated items"
|
||||
@echo " linkcheck to check all external links for integrity"
|
||||
|
||||
clean:
|
||||
-rm -rf _build/*
|
||||
|
||||
html:
|
||||
mkdir -p _build/html _build/doctrees
|
||||
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) _build/html
|
||||
@echo
|
||||
@echo "Build finished. The HTML pages are in _build/html."
|
||||
|
||||
pickle:
|
||||
mkdir -p _build/pickle _build/doctrees
|
||||
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) _build/pickle
|
||||
@echo
|
||||
@echo "Build finished; now you can process the pickle files."
|
||||
|
||||
web: pickle
|
||||
|
||||
json:
|
||||
mkdir -p _build/json _build/doctrees
|
||||
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) _build/json
|
||||
@echo
|
||||
@echo "Build finished; now you can process the JSON files."
|
||||
|
||||
htmlhelp:
|
||||
mkdir -p _build/htmlhelp _build/doctrees
|
||||
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) _build/htmlhelp
|
||||
@echo
|
||||
@echo "Build finished; now you can run HTML Help Workshop with the" \
|
||||
".hhp project file in _build/htmlhelp."
|
||||
|
||||
latex:
|
||||
mkdir -p _build/latex _build/doctrees
|
||||
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) _build/latex
|
||||
@echo
|
||||
@echo "Build finished; the LaTeX files are in _build/latex."
|
||||
@echo "Run \`make all-pdf' or \`make all-ps' in that directory to" \
|
||||
"run these through (pdf)latex."
|
||||
|
||||
changes:
|
||||
mkdir -p _build/changes _build/doctrees
|
||||
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) _build/changes
|
||||
@echo
|
||||
@echo "The overview file is in _build/changes."
|
||||
|
||||
linkcheck:
|
||||
mkdir -p _build/linkcheck _build/doctrees
|
||||
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) _build/linkcheck
|
||||
@echo
|
||||
@echo "Link check complete; look for any errors in the above output " \
|
||||
"or in _build/linkcheck/output.txt."
|
||||
|
||||
epub:
|
||||
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) _build/epub
|
||||
@echo
|
||||
@echo "Build finished. The epub file is in _build/epub."
|
|
@ -0,0 +1,47 @@
|
|||
.. _ref-admin:
|
||||
|
||||
===================
|
||||
Django Admin Search
|
||||
===================
|
||||
|
||||
Haystack comes with a base class to support searching via Haystack in the
|
||||
Django admin. To use Haystack to search, inherit from ``haystack.admin.SearchModelAdmin``
|
||||
instead of ``django.contrib.admin.ModelAdmin``.
|
||||
|
||||
For example::
|
||||
|
||||
from haystack.admin import SearchModelAdmin
|
||||
from .models import MockModel
|
||||
|
||||
|
||||
class MockModelAdmin(SearchModelAdmin):
|
||||
haystack_connection = 'solr'
|
||||
date_hierarchy = 'pub_date'
|
||||
list_display = ('author', 'pub_date')
|
||||
|
||||
|
||||
admin.site.register(MockModel, MockModelAdmin)
|
||||
|
||||
You can also specify the Haystack connection used by the search with the
|
||||
``haystack_connection`` property on the model admin class. If not specified,
|
||||
the default connection will be used.
|
||||
|
||||
If you already have a base model admin class you use, there is also a mixin
|
||||
you can use instead::
|
||||
|
||||
from django.contrib import admin
|
||||
from haystack.admin import SearchModelAdminMixin
|
||||
from .models import MockModel
|
||||
|
||||
|
||||
class MyCustomModelAdmin(admin.ModelAdmin):
|
||||
pass
|
||||
|
||||
|
||||
class MockModelAdmin(SearchModelAdminMixin, MyCustomModelAdmin):
|
||||
haystack_connection = 'solr'
|
||||
date_hierarchy = 'pub_date'
|
||||
list_display = ('author', 'pub_date')
|
||||
|
||||
|
||||
admin.site.register(MockModel, MockModelAdmin)
|
|
@ -0,0 +1,66 @@
|
|||
.. _ref-architecture-overview:
|
||||
|
||||
=====================
|
||||
Architecture Overview
|
||||
=====================
|
||||
|
||||
``SearchQuerySet``
|
||||
------------------
|
||||
|
||||
One main implementation.
|
||||
|
||||
* Standard API that loosely follows ``QuerySet``
|
||||
* Handles most queries
|
||||
* Allows for custom "parsing"/building through API
|
||||
* Dispatches to ``SearchQuery`` for actual query
|
||||
* Handles automatically creating a query
|
||||
* Allows for raw queries to be passed straight to backend.
|
||||
|
||||
|
||||
``SearchQuery``
|
||||
---------------
|
||||
|
||||
Implemented per-backend.
|
||||
|
||||
* Method for building the query out of the structured data.
|
||||
* Method for cleaning a string of reserved characters used by the backend.
|
||||
|
||||
Main class provides:
|
||||
|
||||
* Methods to add filters/models/order-by/boost/limits to the search.
|
||||
* Method to perform a raw search.
|
||||
* Method to get the number of hits.
|
||||
* Method to return the results provided by the backend (likely not a full list).
|
||||
|
||||
|
||||
``SearchBackend``
|
||||
-----------------
|
||||
|
||||
Implemented per-backend.
|
||||
|
||||
* Connects to search engine
|
||||
* Method for saving new docs to index
|
||||
* Method for removing docs from index
|
||||
* Method for performing the actual query
|
||||
|
||||
|
||||
``SearchSite``
|
||||
--------------
|
||||
|
||||
One main implementation.
|
||||
|
||||
* Standard API that loosely follows ``django.contrib.admin.sites.AdminSite``
|
||||
* Handles registering/unregistering models to search on a per-site basis.
|
||||
* Provides a means of adding custom indexes to a model, like ``ModelAdmins``.
|
||||
|
||||
|
||||
``SearchIndex``
|
||||
---------------
|
||||
|
||||
Implemented per-model you wish to index.
|
||||
|
||||
* Handles generating the document to be indexed.
|
||||
* Populates additional fields to accompany the document.
|
||||
* Provides a way to limit what types of objects get indexed.
|
||||
* Provides a way to index the document(s).
|
||||
* Provides a way to remove the document(s).
|
|
@ -0,0 +1,220 @@
|
|||
.. _ref-autocomplete:
|
||||
|
||||
============
|
||||
Autocomplete
|
||||
============
|
||||
|
||||
Autocomplete is becoming increasingly common as an add-on to search. Haystack
|
||||
makes it relatively simple to implement. There are two steps in the process,
|
||||
one to prepare the data and one to implement the actual search.
|
||||
|
||||
Step 1. Setup The Data
|
||||
======================
|
||||
|
||||
To do autocomplete effectively, the search backend uses n-grams (essentially
|
||||
a small window passed over the string). Because this alters the way your
|
||||
data needs to be stored, the best approach is to add a new field to your
|
||||
``SearchIndex`` that contains the text you want to autocomplete on.
|
||||
|
||||
You have two choices: ``NgramField`` and ``EdgeNgramField``. Though very similar,
|
||||
the choice of field is somewhat important.
|
||||
|
||||
* If you're working with standard text, ``EdgeNgramField`` tokenizes on
|
||||
whitespace. This prevents incorrect matches when part of two different words
|
||||
are mashed together as one n-gram. **This is what most users should use.**
|
||||
* If you're working with Asian languages or want to be able to autocomplete
|
||||
across word boundaries, ``NgramField`` should be what you use.
|
||||
|
||||
Example (continuing from the tutorial)::
|
||||
|
||||
import datetime
|
||||
from haystack import indexes
|
||||
from myapp.models import Note
|
||||
|
||||
|
||||
class NoteIndex(indexes.SearchIndex, indexes.Indexable):
|
||||
text = indexes.CharField(document=True, use_template=True)
|
||||
author = indexes.CharField(model_attr='user')
|
||||
pub_date = indexes.DateTimeField(model_attr='pub_date')
|
||||
# We add this for autocomplete.
|
||||
content_auto = indexes.EdgeNgramField(model_attr='content')
|
||||
|
||||
def get_model(self):
|
||||
return Note
|
||||
|
||||
def index_queryset(self, using=None):
|
||||
"""Used when the entire index for model is updated."""
|
||||
return Note.objects.filter(pub_date__lte=datetime.datetime.now())
|
||||
|
||||
As with all schema changes, you'll need to rebuild/update your index after
|
||||
making this change.
|
||||
|
||||
|
||||
Step 2. Performing The Query
|
||||
============================
|
||||
|
||||
Haystack ships with a convenience method to perform most autocomplete searches.
|
||||
You simply provide a field and the query you wish to search on to the
|
||||
``SearchQuerySet.autocomplete`` method. Given the previous example, an example
|
||||
search would look like::
|
||||
|
||||
from haystack.query import SearchQuerySet
|
||||
|
||||
SearchQuerySet().autocomplete(content_auto='old')
|
||||
# Result match things like 'goldfish', 'cuckold' and 'older'.
|
||||
|
||||
The results from the ``SearchQuerySet.autocomplete`` method are full search
|
||||
results, just like any regular filter.
|
||||
|
||||
If you need more control over your results, you can use standard
|
||||
``SearchQuerySet.filter`` calls. For instance::
|
||||
|
||||
from haystack.query import SearchQuerySet
|
||||
|
||||
sqs = SearchQuerySet().filter(content_auto=request.GET.get('q', ''))
|
||||
|
||||
This can also be extended to use ``SQ`` for more complex queries (and is what's
|
||||
being done under the hood in the ``SearchQuerySet.autocomplete`` method).
|
||||
|
||||
|
||||
Example Implementation
|
||||
======================
|
||||
|
||||
The above is the low-level backend portion of how you implement autocomplete.
|
||||
To make it work in browser, you need both a view to run the autocomplete
|
||||
and some Javascript to fetch the results.
|
||||
|
||||
Since it comes up often, here is an example implementation of those things.
|
||||
|
||||
.. warning::
|
||||
|
||||
This code comes with no warranty. Don't ask for support on it. If you
|
||||
copy-paste it and it burns down your server room, I'm not liable for any
|
||||
of it.
|
||||
|
||||
It worked this one time on my machine in a simulated environment.
|
||||
|
||||
And yeah, semicolon-less + 2 space + comma-first. Deal with it.
|
||||
|
||||
A stripped-down view might look like::
|
||||
|
||||
# views.py
|
||||
import simplejson as json
|
||||
from django.http import HttpResponse
|
||||
from haystack.query import SearchQuerySet
|
||||
|
||||
|
||||
def autocomplete(request):
|
||||
sqs = SearchQuerySet().autocomplete(content_auto=request.GET.get('q', ''))[:5]
|
||||
suggestions = [result.title for result in sqs]
|
||||
# Make sure you return a JSON object, not a bare list.
|
||||
# Otherwise, you could be vulnerable to an XSS attack.
|
||||
the_data = json.dumps({
|
||||
'results': suggestions
|
||||
})
|
||||
return HttpResponse(the_data, content_type='application/json')
|
||||
|
||||
The template might look like::
|
||||
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Autocomplete Example</title>
|
||||
</head>
|
||||
<body>
|
||||
<h1>Autocomplete Example</h1>
|
||||
|
||||
<form method="post" action="/search/" class="autocomplete-me">
|
||||
<input type="text" id="id_q" name="q">
|
||||
<input type="submit" value="Search!">
|
||||
</form>
|
||||
|
||||
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>
|
||||
<script type="text/javascript">
|
||||
// In a perfect world, this would be its own library file that got included
|
||||
// on the page and only the ``$(document).ready(...)`` below would be present.
|
||||
// But this is an example.
|
||||
var Autocomplete = function(options) {
|
||||
this.form_selector = options.form_selector
|
||||
this.url = options.url || '/search/autocomplete/'
|
||||
this.delay = parseInt(options.delay || 300)
|
||||
this.minimum_length = parseInt(options.minimum_length || 3)
|
||||
this.form_elem = null
|
||||
this.query_box = null
|
||||
}
|
||||
|
||||
Autocomplete.prototype.setup = function() {
|
||||
var self = this
|
||||
|
||||
this.form_elem = $(this.form_selector)
|
||||
this.query_box = this.form_elem.find('input[name=q]')
|
||||
|
||||
// Watch the input box.
|
||||
this.query_box.on('keyup', function() {
|
||||
var query = self.query_box.val()
|
||||
|
||||
if(query.length < self.minimum_length) {
|
||||
return false
|
||||
}
|
||||
|
||||
self.fetch(query)
|
||||
})
|
||||
|
||||
// On selecting a result, populate the search field.
|
||||
this.form_elem.on('click', '.ac-result', function(ev) {
|
||||
self.query_box.val($(this).text())
|
||||
$('.ac-results').remove()
|
||||
return false
|
||||
})
|
||||
}
|
||||
|
||||
Autocomplete.prototype.fetch = function(query) {
|
||||
var self = this
|
||||
|
||||
$.ajax({
|
||||
url: this.url
|
||||
, data: {
|
||||
'q': query
|
||||
}
|
||||
, success: function(data) {
|
||||
self.show_results(data)
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
Autocomplete.prototype.show_results = function(data) {
|
||||
// Remove any existing results.
|
||||
$('.ac-results').remove()
|
||||
|
||||
var results = data.results || []
|
||||
var results_wrapper = $('<div class="ac-results"></div>')
|
||||
var base_elem = $('<div class="result-wrapper"><a href="#" class="ac-result"></a></div>')
|
||||
|
||||
if(results.length > 0) {
|
||||
for(var res_offset in results) {
|
||||
var elem = base_elem.clone()
|
||||
// Don't use .html(...) here, as you open yourself to XSS.
|
||||
// Really, you should use some form of templating.
|
||||
elem.find('.ac-result').text(results[res_offset])
|
||||
results_wrapper.append(elem)
|
||||
}
|
||||
}
|
||||
else {
|
||||
var elem = base_elem.clone()
|
||||
elem.text("No results found.")
|
||||
results_wrapper.append(elem)
|
||||
}
|
||||
|
||||
this.query_box.after(results_wrapper)
|
||||
}
|
||||
|
||||
$(document).ready(function() {
|
||||
window.autocomplete = new Autocomplete({
|
||||
form_selector: '.autocomplete-me'
|
||||
})
|
||||
window.autocomplete.setup()
|
||||
})
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
|
@ -0,0 +1,127 @@
|
|||
.. _ref-backend-support:
|
||||
|
||||
===============
|
||||
Backend Support
|
||||
===============
|
||||
|
||||
|
||||
Supported Backends
|
||||
==================
|
||||
|
||||
* Solr_
|
||||
* Elasticsearch_
|
||||
* Whoosh_
|
||||
* Xapian_
|
||||
|
||||
.. _Solr: http://lucene.apache.org/solr/
|
||||
.. _Elasticsearch: http://elasticsearch.org/
|
||||
.. _Whoosh: https://bitbucket.org/mchaput/whoosh/
|
||||
.. _Xapian: http://xapian.org/
|
||||
|
||||
|
||||
Backend Capabilities
|
||||
====================
|
||||
|
||||
Solr
|
||||
----
|
||||
|
||||
**Complete & included with Haystack.**
|
||||
|
||||
* Full SearchQuerySet support
|
||||
* Automatic query building
|
||||
* "More Like This" functionality
|
||||
* Term Boosting
|
||||
* Faceting
|
||||
* Stored (non-indexed) fields
|
||||
* Highlighting
|
||||
* Spatial search
|
||||
* Requires: pysolr (2.0.13+) & Solr 3.5+
|
||||
|
||||
Elasticsearch
|
||||
-------------
|
||||
|
||||
**Complete & included with Haystack.**
|
||||
|
||||
* Full SearchQuerySet support
|
||||
* Automatic query building
|
||||
* "More Like This" functionality
|
||||
* Term Boosting
|
||||
* Faceting (up to 100 facets)
|
||||
* Stored (non-indexed) fields
|
||||
* Highlighting
|
||||
* Spatial search
|
||||
* Requires: elasticsearch-py > 1.0 & Elasticsearch 1.0+
|
||||
|
||||
Whoosh
|
||||
------
|
||||
|
||||
**Complete & included with Haystack.**
|
||||
|
||||
* Full SearchQuerySet support
|
||||
* Automatic query building
|
||||
* "More Like This" functionality
|
||||
* Term Boosting
|
||||
* Stored (non-indexed) fields
|
||||
* Highlighting
|
||||
* Requires: whoosh (2.0.0+)
|
||||
|
||||
Xapian
|
||||
------
|
||||
|
||||
**Complete & available as a third-party download.**
|
||||
|
||||
* Full SearchQuerySet support
|
||||
* Automatic query building
|
||||
* "More Like This" functionality
|
||||
* Term Boosting
|
||||
* Faceting
|
||||
* Stored (non-indexed) fields
|
||||
* Highlighting
|
||||
* Requires: Xapian 1.0.5+ & python-xapian 1.0.5+
|
||||
* Backend can be downloaded here: `xapian-haystack <http://github.com/notanumber/xapian-haystack/>`_
|
||||
|
||||
Backend Support Matrix
|
||||
======================
|
||||
|
||||
+----------------+------------------------+---------------------+----------------+------------+----------+---------------+--------------+---------+
|
||||
| Backend | SearchQuerySet Support | Auto Query Building | More Like This | Term Boost | Faceting | Stored Fields | Highlighting | Spatial |
|
||||
+================+========================+=====================+================+============+==========+===============+==============+=========+
|
||||
| Solr | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
|
||||
+----------------+------------------------+---------------------+----------------+------------+----------+---------------+--------------+---------+
|
||||
| Elasticsearch | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
|
||||
+----------------+------------------------+---------------------+----------------+------------+----------+---------------+--------------+---------+
|
||||
| Whoosh | Yes | Yes | Yes | Yes | No | Yes | Yes | No |
|
||||
+----------------+------------------------+---------------------+----------------+------------+----------+---------------+--------------+---------+
|
||||
| Xapian | Yes | Yes | Yes | Yes | Yes | Yes | Yes (plugin) | No |
|
||||
+----------------+------------------------+---------------------+----------------+------------+----------+---------------+--------------+---------+
|
||||
|
||||
|
||||
Wishlist
|
||||
========
|
||||
|
||||
The following are search backends that would be nice to have in Haystack but are
|
||||
licensed in a way that prevents them from being officially bundled. If the
|
||||
community expresses interest in any of these, there may be future development.
|
||||
|
||||
* Riak_
|
||||
* Lupyne_
|
||||
* Sphinx_
|
||||
|
||||
.. _Riak: http://www.basho.com/
|
||||
.. _Lupyne: http://code.google.com/p/lupyne/
|
||||
.. _Sphinx: http://www.sphinxsearch.com/
|
||||
|
||||
|
||||
Sphinx
|
||||
------
|
||||
|
||||
This backend is unlikely to be built. Sphinx is pretty gimpy & doesn't do
|
||||
blended search results across all models the way the other engines can.
|
||||
Very limited featureset as well.
|
||||
|
||||
* Full SearchQuerySet support
|
||||
* Automatic query building
|
||||
* Term Boosting
|
||||
* Stored (non-indexed) fields
|
||||
* Highlighting
|
||||
* Requires: sphinxapi.py (Comes with Sphinx)
|
|
@ -0,0 +1,263 @@
|
|||
.. _ref-best-practices:
|
||||
|
||||
==============
|
||||
Best Practices
|
||||
==============
|
||||
|
||||
What follows are some general recommendations on how to improve your search.
|
||||
Some tips represent performance benefits, some provide a better search index.
|
||||
You should evaluate these options for yourself and pick the ones that will
|
||||
work best for you. Not all situations are created equal and many of these
|
||||
options could be considered mandatory in some cases and unnecessary premature
|
||||
optimizations in others. Your mileage may vary.
|
||||
|
||||
|
||||
Good Search Needs Good Content
|
||||
==============================
|
||||
|
||||
Most search engines work best when they're given corpuses with predominantly
|
||||
text (as opposed to other data like dates, numbers, etc.) in decent quantities
|
||||
(more than a couple words). This is in stark contrast to the databases most
|
||||
people are used to, which rely heavily on non-text data to create relationships
|
||||
and for ease of querying.
|
||||
|
||||
To this end, if search is important to you, you should take the time to
|
||||
carefully craft your ``SearchIndex`` subclasses to give the search engine the
|
||||
best information you can. This isn't necessarily hard but is worth the
|
||||
investment of time and thought. Assuming you've only ever used the
|
||||
``BasicSearchIndex``, in creating custom ``SearchIndex`` classes, there are
|
||||
some easy improvements to make that will make your search better:
|
||||
|
||||
* For your ``document=True`` field, use a well-constructed template.
|
||||
* Add fields for data you might want to be able to filter by.
|
||||
* If the model has related data, you can squash good content from those
|
||||
related models into the parent model's ``SearchIndex``.
|
||||
* Similarly, if you have heavily de-normalized models, it may be best
|
||||
represented by a single indexed model rather than many indexed models.
|
||||
|
||||
Well-Constructed Templates
|
||||
--------------------------
|
||||
|
||||
A relatively unique concept in Haystack is the use of templates associated with
|
||||
``SearchIndex`` fields. These are data templates, will never been seen by users
|
||||
and ideally contain no HTML. They are used to collect various data from the
|
||||
model and structure it as a document for the search engine to analyze and index.
|
||||
|
||||
.. note::
|
||||
|
||||
If you read nothing else, this is the single most important thing you can
|
||||
do to make search on your site better for your users. Good templates can
|
||||
make or break your search and providing the search engine with good content
|
||||
to index is critical.
|
||||
|
||||
Good templates structure the data well and incorporate as much pertinent text
|
||||
as possible. This may include additional fields such as titles, author
|
||||
information, metadata, tags/categories. Without being artificial, you want to
|
||||
construct as much context as you can. This doesn't mean you should necessarily
|
||||
include every field, but you should include fields that provide good content
|
||||
or include terms you think your users may frequently search on.
|
||||
|
||||
Unless you have very unique numbers or dates, neither of these types of data
|
||||
are a good fit within templates. They are usually better suited to other
|
||||
fields for filtering within a ``SearchQuerySet``.
|
||||
|
||||
Additional Fields For Filtering
|
||||
-------------------------------
|
||||
|
||||
Documents by themselves are good for generating indexes of content but are
|
||||
generally poor for filtering content, for instance, by date. All search engines
|
||||
supported by Haystack provide a means to associate extra data as
|
||||
attributes/fields on a record. The database analogy would be adding extra
|
||||
columns to the table for filtering.
|
||||
|
||||
Good candidates here are date fields, number fields, de-normalized data from
|
||||
related objects, etc. You can expose these things to users in the form of a
|
||||
calendar range to specify, an author to look up or only data from a certain
|
||||
series of numbers to return.
|
||||
|
||||
You will need to plan ahead and anticipate what you might need to filter on,
|
||||
though with each field you add, you increase storage space usage. It's generally
|
||||
**NOT** recommended to include every field from a model, just ones you are
|
||||
likely to use.
|
||||
|
||||
Related Data
|
||||
------------
|
||||
|
||||
Related data is somewhat problematic to deal with, as most search engines are
|
||||
better with documents than they are with relationships. One way to approach this
|
||||
is to de-normalize a related child object or objects into the parent's document
|
||||
template. The inclusion of a foreign key's relevant data or a simple Django
|
||||
``{% for %}`` templatetag to iterate over the related objects can increase the
|
||||
salient data in your document. Be careful what you include and how you structure
|
||||
it, as this can have consequences on how well a result might rank in your
|
||||
search.
|
||||
|
||||
|
||||
Avoid Hitting The Database
|
||||
==========================
|
||||
|
||||
A very easy but effective thing you can do to drastically reduce hits on the
|
||||
database is to pre-render your search results using stored fields then disabling
|
||||
the ``load_all`` aspect of your ``SearchView``.
|
||||
|
||||
.. warning::
|
||||
|
||||
This technique may cause a substantial increase in the size of your index
|
||||
as you are basically using it as a storage mechanism.
|
||||
|
||||
To do this, you setup one or more stored fields (`indexed=False`) on your
|
||||
``SearchIndex`` classes. You should specify a template for the field, filling it
|
||||
with the data you'd want to display on your search results pages. When the model
|
||||
attached to the ``SearchIndex`` is placed in the index, this template will get
|
||||
rendered and stored in the index alongside the record.
|
||||
|
||||
.. note::
|
||||
|
||||
The downside of this method is that the HTML for the result will be locked
|
||||
in once it is indexed. To make changes to the structure, you'd have to
|
||||
reindex all of your content. It also limits you to a single display of the
|
||||
content (though you could use multiple fields if that suits your needs).
|
||||
|
||||
The second aspect is customizing your ``SearchView`` and its templates. First,
|
||||
pass the ``load_all=False`` to your ``SearchView``, ideally in your URLconf.
|
||||
This prevents the ``SearchQuerySet`` from loading all models objects for results
|
||||
ahead of time. Then, in your template, simply display the stored content from
|
||||
your ``SearchIndex`` as the HTML result.
|
||||
|
||||
.. warning::
|
||||
|
||||
To do this, you must absolutely avoid using ``{{ result.object }}`` or any
|
||||
further accesses beyond that. That call will hit the database, not only
|
||||
nullifying your work on lessening database hits, but actually making it
|
||||
worse as there will now be at least query for each result, up from a single
|
||||
query for each type of model with ``load_all=True``.
|
||||
|
||||
|
||||
Content-Type Specific Templates
|
||||
===============================
|
||||
|
||||
Frequently, when displaying results, you'll want to customize the HTML output
|
||||
based on what model the result represents.
|
||||
|
||||
In practice, the best way to handle this is through the use of ``include``
|
||||
along with the data on the ``SearchResult``.
|
||||
|
||||
Your existing loop might look something like::
|
||||
|
||||
{% for result in page.object_list %}
|
||||
<p>
|
||||
<a href="{{ result.object.get_absolute_url }}">{{ result.object.title }}</a>
|
||||
</p>
|
||||
{% empty %}
|
||||
<p>No results found.</p>
|
||||
{% endfor %}
|
||||
|
||||
An improved version might look like::
|
||||
|
||||
{% for result in page.object_list %}
|
||||
{% if result.content_type == "blog.post" %}
|
||||
{% include "search/includes/blog/post.html" %}
|
||||
{% endif %}
|
||||
{% if result.content_type == "media.photo" %}
|
||||
{% include "search/includes/media/photo.html" %}
|
||||
{% endif %}
|
||||
{% empty %}
|
||||
<p>No results found.</p>
|
||||
{% endfor %}
|
||||
|
||||
Those include files might look like::
|
||||
|
||||
# search/includes/blog/post.html
|
||||
<div class="post_result">
|
||||
<h3><a href="{{ result.object.get_absolute_url }}">{{ result.object.title }}</a></h3>
|
||||
|
||||
<p>{{ result.object.tease }}</p>
|
||||
</div>
|
||||
|
||||
# search/includes/media/photo.html
|
||||
<div class="photo_result">
|
||||
<a href="{{ result.object.get_absolute_url }}">
|
||||
<img src="http://your.media.example.com/media/{{ result.object.photo.url }}"></a>
|
||||
<p>Taken By {{ result.object.taken_by }}</p>
|
||||
</div>
|
||||
|
||||
You can make this even better by standardizing on an includes layout, then
|
||||
writing a template tag or filter that generates the include filename. Usage
|
||||
might looks something like::
|
||||
|
||||
{% for result in page.object_list %}
|
||||
{% with result|search_include as fragment %}
|
||||
{% include fragment %}
|
||||
{% endwith %}
|
||||
{% empty %}
|
||||
<p>No results found.</p>
|
||||
{% endfor %}
|
||||
|
||||
|
||||
Real-Time Search
|
||||
================
|
||||
|
||||
If your site sees heavy search traffic and up-to-date information is very
|
||||
important, Haystack provides a way to constantly keep your index up to date.
|
||||
|
||||
You can enable the ``RealtimeSignalProcessor`` within your settings, which
|
||||
will allow Haystack to automatically update the index whenever a model is
|
||||
saved/deleted.
|
||||
|
||||
You can find more information within the :doc:`signal_processors` documentation.
|
||||
|
||||
|
||||
Use Of A Queue For A Better User Experience
|
||||
===========================================
|
||||
|
||||
By default, you have to manually reindex content, Haystack immediately tries to merge
|
||||
it into the search index. If you have a write-heavy site, this could mean your
|
||||
search engine may spend most of its time churning on constant merges. If you can
|
||||
afford a small delay between when a model is saved and when it appears in the
|
||||
search results, queuing these merges is a good idea.
|
||||
|
||||
You gain a snappier interface for users as updates go into a queue (a fast
|
||||
operation) and then typical processing continues. You also get a lower churn
|
||||
rate, as most search engines deal with batches of updates better than many
|
||||
single updates. You can also use this to distribute load, as the queue consumer
|
||||
could live on a completely separate server from your webservers, allowing you
|
||||
to tune more efficiently.
|
||||
|
||||
Implementing this is relatively simple. There are two parts, creating a new
|
||||
``QueuedSignalProcessor`` class and creating a queue processing script to
|
||||
handle the actual updates.
|
||||
|
||||
For the ``QueuedSignalProcessor``, you should inherit from
|
||||
``haystack.signals.BaseSignalProcessor``, then alter the ``setup/teardown``
|
||||
methods to call an enqueuing method instead of directly calling
|
||||
``handle_save/handle_delete``. For example::
|
||||
|
||||
from haystack import signals
|
||||
|
||||
|
||||
class QueuedSignalProcessor(signals.BaseSignalProcessor):
|
||||
# Override the built-in.
|
||||
def setup(self):
|
||||
models.signals.post_save.connect(self.enqueue_save)
|
||||
models.signals.post_delete.connect(self.enqueue_delete)
|
||||
|
||||
# Override the built-in.
|
||||
def teardown(self):
|
||||
models.signals.post_save.disconnect(self.enqueue_save)
|
||||
models.signals.post_delete.disconnect(self.enqueue_delete)
|
||||
|
||||
# Add on a queuing method.
|
||||
def enqueue_save(self, sender, instance, **kwargs):
|
||||
# Push the save & information onto queue du jour here...
|
||||
|
||||
# Add on a queuing method.
|
||||
def enqueue_delete(self, sender, instance, **kwargs):
|
||||
# Push the delete & information onto queue du jour here...
|
||||
|
||||
For the consumer, this is much more specific to the queue used and your desired
|
||||
setup. At a minimum, you will need to periodically consume the queue, fetch the
|
||||
correct index from the ``SearchSite`` for your application, load the model from
|
||||
the message and pass that model to the ``update_object`` or ``remove_object``
|
||||
methods on the ``SearchIndex``. Proper grouping, batching and intelligent
|
||||
handling are all additional things that could be applied on top to further
|
||||
improve performance.
|
|
@ -0,0 +1,123 @@
|
|||
.. _ref-boost:
|
||||
|
||||
=====
|
||||
Boost
|
||||
=====
|
||||
|
||||
|
||||
Scoring is a critical component of good search. Normal full-text searches
|
||||
automatically score a document based on how well it matches the query provided.
|
||||
However, sometimes you want certain documents to score better than they
|
||||
otherwise would. Boosting is a way to achieve this. There are three types of
|
||||
boost:
|
||||
|
||||
* Term Boost
|
||||
* Document Boost
|
||||
* Field Boost
|
||||
|
||||
.. note::
|
||||
|
||||
Document & Field boost support was added in Haystack 1.1.
|
||||
|
||||
Despite all being types of boost, they take place at different times and have
|
||||
slightly different effects on scoring.
|
||||
|
||||
Term boost happens at query time (when the search query is run) and is based
|
||||
around increasing the score if a certain word/phrase is seen.
|
||||
|
||||
On the other hand, document & field boosts take place at indexing time (when
|
||||
the document is being added to the index). Document boost causes the relevance
|
||||
of the entire result to go up, where field boost causes only searches within
|
||||
that field to do better.
|
||||
|
||||
.. warning::
|
||||
|
||||
Be warned that boost is very, very sensitive & can hurt overall search
|
||||
quality if over-zealously applied. Even very small adjustments can affect
|
||||
relevance in a big way.
|
||||
|
||||
Term Boost
|
||||
==========
|
||||
|
||||
Term boosting is achieved by using ``SearchQuerySet.boost``. You provide it
|
||||
the term you want to boost on & a floating point value (based around ``1.0``
|
||||
as 100% - no boost).
|
||||
|
||||
Example::
|
||||
|
||||
# Slight increase in relevance for documents that include "banana".
|
||||
sqs = SearchQuerySet().boost('banana', 1.1)
|
||||
|
||||
# Big decrease in relevance for documents that include "blueberry".
|
||||
sqs = SearchQuerySet().boost('blueberry', 0.8)
|
||||
|
||||
See the :doc:`searchqueryset_api` docs for more details on using this method.
|
||||
|
||||
|
||||
Document Boost
|
||||
==============
|
||||
|
||||
Document boosting is done by adding a ``boost`` field to the prepared data
|
||||
``SearchIndex`` creates. The best way to do this is to override
|
||||
``SearchIndex.prepare``::
|
||||
|
||||
from haystack import indexes
|
||||
from notes.models import Note
|
||||
|
||||
|
||||
class NoteSearchIndex(indexes.SearchIndex, indexes.Indexable):
|
||||
# Your regular fields here then...
|
||||
|
||||
def prepare(self, obj):
|
||||
data = super(NoteSearchIndex, self).prepare(obj)
|
||||
data['boost'] = 1.1
|
||||
return data
|
||||
|
||||
|
||||
Another approach might be to add a new field called ``boost``. However, this
|
||||
can skew your schema and is not encouraged.
|
||||
|
||||
|
||||
Field Boost
|
||||
===========
|
||||
|
||||
Field boosting is enabled by setting the ``boost`` kwarg on the desired field.
|
||||
An example of this might be increasing the significance of a ``title``::
|
||||
|
||||
from haystack import indexes
|
||||
from notes.models import Note
|
||||
|
||||
|
||||
class NoteSearchIndex(indexes.SearchIndex, indexes.Indexable):
|
||||
text = indexes.CharField(document=True, use_template=True)
|
||||
title = indexes.CharField(model_attr='title', boost=1.125)
|
||||
|
||||
def get_model(self):
|
||||
return Note
|
||||
|
||||
.. note::
|
||||
|
||||
Field boosting only has an effect when the SearchQuerySet filters on the
|
||||
field which has been boosted. If you are using a default search view or
|
||||
form you will need override the search method or other include the field
|
||||
in your search query. This example CustomSearchForm searches the automatic
|
||||
``content`` field and the ``title`` field which has been boosted::
|
||||
|
||||
from haystack.forms import SearchForm
|
||||
|
||||
class CustomSearchForm(SearchForm):
|
||||
|
||||
def search(self):
|
||||
if not self.is_valid():
|
||||
return self.no_query_found()
|
||||
|
||||
if not self.cleaned_data.get('q'):
|
||||
return self.no_query_found()
|
||||
|
||||
q = self.cleaned_data['q']
|
||||
sqs = self.searchqueryset.filter(SQ(content=AutoQuery(q)) | SQ(title=AutoQuery(q)))
|
||||
|
||||
if self.load_all:
|
||||
sqs = sqs.load_all()
|
||||
|
||||
return sqs.highlight()
|
|
@ -0,0 +1,207 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Haystack documentation build configuration file, created by
|
||||
# sphinx-quickstart on Wed Apr 15 08:50:46 2009.
|
||||
#
|
||||
# This file is execfile()d with the current directory set to its containing dir.
|
||||
#
|
||||
# Note that not all possible configuration values are present in this
|
||||
# autogenerated file.
|
||||
#
|
||||
# All configuration values have a default; values that are commented out
|
||||
# serve to show the default.
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
# If extensions (or modules to document with autodoc) are in another directory,
|
||||
# add these directories to sys.path here. If the directory is relative to the
|
||||
# documentation root, use os.path.abspath to make it absolute, like shown here.
|
||||
#sys.path.append(os.path.abspath('.'))
|
||||
|
||||
# -- General configuration -----------------------------------------------------
|
||||
|
||||
# Add any Sphinx extension module names here, as strings. They can be extensions
|
||||
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
|
||||
extensions = []
|
||||
|
||||
# Add any paths that contain templates here, relative to this directory.
|
||||
templates_path = ['_templates']
|
||||
|
||||
# The suffix of source filenames.
|
||||
source_suffix = '.rst'
|
||||
|
||||
# The encoding of source files.
|
||||
#source_encoding = 'utf-8'
|
||||
|
||||
# The master toctree document.
|
||||
master_doc = 'toc'
|
||||
|
||||
# General information about the project.
|
||||
project = u'Haystack'
|
||||
copyright = u'2009-2013, Daniel Lindsley'
|
||||
|
||||
# The version info for the project you're documenting, acts as replacement for
|
||||
# |version| and |release|, also used in various other places throughout the
|
||||
# built documents.
|
||||
#
|
||||
# The short X.Y version.
|
||||
version = '2.1.1'
|
||||
# The full version, including alpha/beta/rc tags.
|
||||
release = '2.1.1-dev'
|
||||
|
||||
# The language for content autogenerated by Sphinx. Refer to documentation
|
||||
# for a list of supported languages.
|
||||
#language = None
|
||||
|
||||
# There are two options for replacing |today|: either, you set today to some
|
||||
# non-false value, then it is used:
|
||||
#today = ''
|
||||
# Else, today_fmt is used as the format for a strftime call.
|
||||
#today_fmt = '%B %d, %Y'
|
||||
|
||||
# List of documents that shouldn't be included in the build.
|
||||
#unused_docs = []
|
||||
|
||||
# List of directories, relative to source directory, that shouldn't be searched
|
||||
# for source files.
|
||||
exclude_trees = ['_build']
|
||||
|
||||
# The reST default role (used for this markup: `text`) to use for all documents.
|
||||
#default_role = None
|
||||
|
||||
# If true, '()' will be appended to :func: etc. cross-reference text.
|
||||
#add_function_parentheses = True
|
||||
|
||||
# If true, the current module name will be prepended to all description
|
||||
# unit titles (such as .. function::).
|
||||
#add_module_names = True
|
||||
|
||||
# If true, sectionauthor and moduleauthor directives will be shown in the
|
||||
# output. They are ignored by default.
|
||||
#show_authors = False
|
||||
|
||||
# The name of the Pygments (syntax highlighting) style to use.
|
||||
pygments_style = 'sphinx'
|
||||
|
||||
# A list of ignored prefixes for module index sorting.
|
||||
#modindex_common_prefix = []
|
||||
|
||||
|
||||
# -- Options for HTML output ---------------------------------------------------
|
||||
|
||||
# The theme to use for HTML and HTML Help pages. Major themes that come with
|
||||
# Sphinx are currently 'default' and 'sphinxdoc'.
|
||||
# html_theme = 'haystack_theme'
|
||||
|
||||
# Theme options are theme-specific and customize the look and feel of a theme
|
||||
# further. For a list of options available for each theme, see the
|
||||
# documentation.
|
||||
# html_theme_options = {
|
||||
# "rightsidebar": "true",
|
||||
# "bodyfont": "'Helvetica Neue', Arial, sans-serif",
|
||||
# "sidebarbgcolor": "#303c0c",
|
||||
# "sidebartextcolor": "#effbcb",
|
||||
# "sidebarlinkcolor": "#eef7ab",
|
||||
# "relbarbgcolor": "#caecff",
|
||||
# "relbartextcolor": "#262511",
|
||||
# "relbarlinkcolor": "#262511",
|
||||
# "footerbgcolor": "#262511",
|
||||
# }
|
||||
|
||||
# Add any paths that contain custom themes here, relative to this directory.
|
||||
html_theme_path = ['.']
|
||||
|
||||
# The name for this set of Sphinx documents. If None, it defaults to
|
||||
# "<project> v<release> documentation".
|
||||
#html_title = None
|
||||
|
||||
# A shorter title for the navigation bar. Default is the same as html_title.
|
||||
#html_short_title = None
|
||||
|
||||
# The name of an image file (relative to this directory) to place at the top
|
||||
# of the sidebar.
|
||||
#html_logo = None
|
||||
|
||||
# The name of an image file (within the static path) to use as favicon of the
|
||||
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
|
||||
# pixels large.
|
||||
#html_favicon = None
|
||||
|
||||
# Add any paths that contain custom static files (such as style sheets) here,
|
||||
# relative to this directory. They are copied after the builtin static files,
|
||||
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||
html_static_path = ['_static']
|
||||
|
||||
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
|
||||
# using the given strftime format.
|
||||
#html_last_updated_fmt = '%b %d, %Y'
|
||||
|
||||
# If true, SmartyPants will be used to convert quotes and dashes to
|
||||
# typographically correct entities.
|
||||
#html_use_smartypants = True
|
||||
|
||||
# Custom sidebar templates, maps document names to template names.
|
||||
#html_sidebars = {}
|
||||
|
||||
# Additional templates that should be rendered to pages, maps page names to
|
||||
# template names.
|
||||
#html_additional_pages = {}
|
||||
|
||||
# If false, no module index is generated.
|
||||
#html_use_modindex = True
|
||||
|
||||
# If false, no index is generated.
|
||||
#html_use_index = True
|
||||
|
||||
# If true, the index is split into individual pages for each letter.
|
||||
#html_split_index = False
|
||||
|
||||
# If true, links to the reST sources are added to the pages.
|
||||
#html_show_sourcelink = True
|
||||
|
||||
# If true, an OpenSearch description file will be output, and all pages will
|
||||
# contain a <link> tag referring to it. The value of this option must be the
|
||||
# base URL from which the finished HTML is served.
|
||||
#html_use_opensearch = ''
|
||||
|
||||
# If nonempty, this is the file name suffix for HTML files (e.g. ".xhtml").
|
||||
#html_file_suffix = ''
|
||||
|
||||
# Output file base name for HTML help builder.
|
||||
htmlhelp_basename = 'Haystackdoc'
|
||||
|
||||
|
||||
# -- Options for LaTeX output --------------------------------------------------
|
||||
|
||||
# The paper size ('letter' or 'a4').
|
||||
#latex_paper_size = 'letter'
|
||||
|
||||
# The font size ('10pt', '11pt' or '12pt').
|
||||
#latex_font_size = '10pt'
|
||||
|
||||
# Grouping the document tree into LaTeX files. List of tuples
|
||||
# (source start file, target name, title, author, documentclass [howto/manual]).
|
||||
latex_documents = [
|
||||
('index', 'Haystack.tex', u'Haystack Documentation',
|
||||
u'Daniel Lindsley', 'manual'),
|
||||
]
|
||||
|
||||
# The name of an image file (relative to this directory) to place at the top of
|
||||
# the title page.
|
||||
#latex_logo = None
|
||||
|
||||
# For "manual" documents, if this is true, then toplevel headings are parts,
|
||||
# not chapters.
|
||||
#latex_use_parts = False
|
||||
|
||||
# Additional stuff for the LaTeX preamble.
|
||||
#latex_preamble = ''
|
||||
|
||||
# Documents to append as an appendix to all manuals.
|
||||
#latex_appendices = []
|
||||
|
||||
# If false, no module index is generated.
|
||||
#latex_use_modindex = True
|
|
@ -0,0 +1,132 @@
|
|||
============
|
||||
Contributing
|
||||
============
|
||||
|
||||
Haystack is open-source and, as such, grows (or shrinks) & improves in part
|
||||
due to the community. Below are some guidelines on how to help with the project.
|
||||
|
||||
|
||||
Philosophy
|
||||
==========
|
||||
|
||||
* Haystack is BSD-licensed. All contributed code must be either
|
||||
|
||||
* the original work of the author, contributed under the BSD, or...
|
||||
* work taken from another project released under a BSD-compatible license.
|
||||
|
||||
* GPL'd (or similar) works are not eligible for inclusion.
|
||||
* Haystack's git master branch should always be stable, production-ready &
|
||||
passing all tests.
|
||||
* Major releases (1.x.x) are commitments to backward-compatibility of the public APIs.
|
||||
Any documented API should ideally not change between major releases.
|
||||
The exclusion to this rule is in the event of either a security issue
|
||||
or to accommodate changes in Django itself.
|
||||
* Minor releases (x.3.x) are for the addition of substantial features or major
|
||||
bugfixes.
|
||||
* Patch releases (x.x.4) are for minor features or bugfixes.
|
||||
|
||||
|
||||
Guidelines For Reporting An Issue/Feature
|
||||
=========================================
|
||||
|
||||
So you've found a bug or have a great idea for a feature. Here's the steps you
|
||||
should take to help get it added/fixed in Haystack:
|
||||
|
||||
* First, check to see if there's an existing issue/pull request for the
|
||||
bug/feature. All issues are at https://github.com/toastdriven/django-haystack/issues
|
||||
and pull reqs are at https://github.com/toastdriven/django-haystack/pulls.
|
||||
* If there isn't one there, please file an issue. The ideal report includes:
|
||||
|
||||
* A description of the problem/suggestion.
|
||||
* How to recreate the bug.
|
||||
* If relevant, including the versions of your:
|
||||
|
||||
* Python interpreter
|
||||
* Django
|
||||
* Haystack
|
||||
* Search engine used (as well as bindings)
|
||||
* Optionally of the other dependencies involved
|
||||
|
||||
* Ideally, creating a pull request with a (failing) test case demonstrating
|
||||
what's wrong. This makes it easy for us to reproduce & fix the problem.
|
||||
Instructions for running the tests are at :doc:`index`
|
||||
|
||||
You might also hop into the IRC channel (``#haystack`` on ``irc.freenode.net``)
|
||||
& raise your question there, as there may be someone who can help you with a
|
||||
work-around.
|
||||
|
||||
|
||||
Guidelines For Contributing Code
|
||||
================================
|
||||
|
||||
If you're ready to take the plunge & contribute back some code/docs, the
|
||||
process should look like:
|
||||
|
||||
* Fork the project on GitHub into your own account.
|
||||
* Clone your copy of Haystack.
|
||||
* Make a new branch in git & commit your changes there.
|
||||
* Push your new branch up to GitHub.
|
||||
* Again, ensure there isn't already an issue or pull request out there on it.
|
||||
If there is & you feel you have a better fix, please take note of the issue
|
||||
number & mention it in your pull request.
|
||||
* Create a new pull request (based on your branch), including what the
|
||||
problem/feature is, versions of your software & referencing any related
|
||||
issues/pull requests.
|
||||
|
||||
In order to be merged into Haystack, contributions must have the following:
|
||||
|
||||
* A solid patch that:
|
||||
|
||||
* is clear.
|
||||
* works across all supported versions of Python/Django.
|
||||
* follows the existing style of the code base (mostly PEP-8).
|
||||
* comments included as needed.
|
||||
|
||||
* A test case that demonstrates the previous flaw that now passes
|
||||
with the included patch.
|
||||
* If it adds/changes a public API, it must also include documentation
|
||||
for those changes.
|
||||
* Must be appropriately licensed (see "Philosophy").
|
||||
* Adds yourself to the AUTHORS file.
|
||||
|
||||
If your contribution lacks any of these things, they will have to be added
|
||||
by a core contributor before being merged into Haystack proper, which may take
|
||||
substantial time for the all-volunteer team to get to.
|
||||
|
||||
|
||||
Guidelines For Core Contributors
|
||||
================================
|
||||
|
||||
If you've been granted the commit bit, here's how to shepherd the changes in:
|
||||
|
||||
* Any time you go to work on Haystack, please use ``git pull --rebase`` to fetch
|
||||
the latest changes.
|
||||
* Any new features/bug fixes must meet the above guidelines for contributing
|
||||
code (solid patch/tests passing/docs included).
|
||||
* Commits are typically cherry-picked onto a branch off master.
|
||||
|
||||
* This is done so as not to include extraneous commits, as some people submit
|
||||
pull reqs based on their git master that has other things applied to it.
|
||||
|
||||
* A set of commits should be squashed down to a single commit.
|
||||
|
||||
* ``git merge --squash`` is a good tool for performing this, as is
|
||||
``git rebase -i HEAD~N``.
|
||||
* This is done to prevent anyone using the git repo from accidently pulling
|
||||
work-in-progress commits.
|
||||
|
||||
* Commit messages should use past tense, describe what changed & thank anyone
|
||||
involved. Examples::
|
||||
|
||||
"""Added support for the latest version of Whoosh (v2.3.2)."""
|
||||
"""Fixed a bug in ``solr_backend.py``. Thanks to joeschmoe for the report!"""
|
||||
"""BACKWARD-INCOMPATIBLE: Altered the arguments passed to ``SearchBackend``.
|
||||
|
||||
Further description appears here if the change warrants an explanation
|
||||
as to why it was done."""
|
||||
|
||||
* For any patches applied from a contributor, please ensure their name appears
|
||||
in the AUTHORS file.
|
||||
* When closing issues or pull requests, please reference the SHA in the closing
|
||||
message (i.e. ``Thanks! Fixed in SHA: 6b93f6``). GitHub will automatically
|
||||
link to it.
|
|
@ -0,0 +1,34 @@
|
|||
.. _ref-creating-new-backends:
|
||||
|
||||
=====================
|
||||
Creating New Backends
|
||||
=====================
|
||||
|
||||
The process should be fairly simple.
|
||||
|
||||
#. Create new backend file. Name is important.
|
||||
#. Two classes inside.
|
||||
|
||||
#. SearchBackend (inherit from haystack.backends.BaseSearchBackend)
|
||||
#. SearchQuery (inherit from haystack.backends.BaseSearchQuery)
|
||||
|
||||
|
||||
SearchBackend
|
||||
=============
|
||||
|
||||
Responsible for the actual connection and low-level details of interacting with
|
||||
the backend.
|
||||
|
||||
* Connects to search engine
|
||||
* Method for saving new docs to index
|
||||
* Method for removing docs from index
|
||||
* Method for performing the actual query
|
||||
|
||||
|
||||
SearchQuery
|
||||
===========
|
||||
|
||||
Responsible for taking structured data about the query and converting it into a
|
||||
backend appropriate format.
|
||||
|
||||
* Method for creating the backend specific query - ``build_query``.
|
|
@ -0,0 +1,107 @@
|
|||
.. ref-debugging:
|
||||
|
||||
==================
|
||||
Debugging Haystack
|
||||
==================
|
||||
|
||||
There are some common problems people run into when using Haystack for the first
|
||||
time. Some of the common problems and things to try appear below.
|
||||
|
||||
.. note::
|
||||
|
||||
As a general suggestion, your best friend when debugging an issue is to
|
||||
use the ``pdb`` library included with Python. By dropping a
|
||||
``import pdb; pdb.set_trace()`` in your code before the issue occurs, you
|
||||
can step through and examine variable/logic as you progress through. Make
|
||||
sure you don't commit those ``pdb`` lines though.
|
||||
|
||||
|
||||
"No module named haystack."
|
||||
===========================
|
||||
|
||||
This problem usually occurs when first adding Haystack to your project.
|
||||
|
||||
* Are you using the ``haystack`` directory within your ``django-haystack``
|
||||
checkout/install?
|
||||
* Is the ``haystack`` directory on your ``PYTHONPATH``? Alternatively, is
|
||||
``haystack`` symlinked into your project?
|
||||
* Start a Django shell (``./manage.py shell``) and try ``import haystack``.
|
||||
You may receive a different, more descriptive error message.
|
||||
* Double-check to ensure you have no circular imports. (i.e. module A tries
|
||||
importing from module B which is trying to import from module A.)
|
||||
|
||||
|
||||
"No results found." (On the web page)
|
||||
=====================================
|
||||
|
||||
Several issues can cause no results to be found. Most commonly it is either
|
||||
not running a ``rebuild_index`` to populate your index or having a blank
|
||||
``document=True`` field, resulting in no content for the engine to search on.
|
||||
|
||||
* Do you have a ``search_indexes.py`` located within an installed app?
|
||||
* Do you have data in your database?
|
||||
* Have you run a ``./manage.py rebuild_index`` to index all of your content?
|
||||
* Try running ``./manage.py rebuild_index -v2`` for more verbose output to
|
||||
ensure data is being processed/inserted.
|
||||
* Start a Django shell (``./manage.py shell``) and try::
|
||||
|
||||
>>> from haystack.query import SearchQuerySet
|
||||
>>> sqs = SearchQuerySet().all()
|
||||
>>> sqs.count()
|
||||
|
||||
* You should get back an integer > 0. If not, check the above and reindex.
|
||||
|
||||
>>> sqs[0] # Should get back a SearchResult object.
|
||||
>>> sqs[0].id # Should get something back like 'myapp.mymodel.1'.
|
||||
>>> sqs[0].text # ... or whatever your document=True field is.
|
||||
|
||||
* If you get back either ``u''`` or ``None``, it means that your data isn't
|
||||
making it into the main field that gets searched. You need to check that the
|
||||
field either has a template that uses the model data, a ``model_attr`` that
|
||||
pulls data directly from the model or a ``prepare/prepare_FOO`` method that
|
||||
populates the data at index time.
|
||||
* Check the template for your search page and ensure it is looping over the
|
||||
results properly. Also ensure that it's either accessing valid fields coming
|
||||
back from the search engine or that it's trying to access the associated
|
||||
model via the ``{{ result.object.foo }}`` lookup.
|
||||
|
||||
|
||||
"LockError: [Errno 17] File exists: '/path/to/whoosh_index/_MAIN_LOCK'"
|
||||
=======================================================================
|
||||
|
||||
This is a Whoosh-specific traceback. It occurs when the Whoosh engine in one
|
||||
process/thread is locks the index files for writing while another process/thread
|
||||
tries to access them. This is a common error when using ``RealtimeSignalProcessor``
|
||||
with Whoosh under any kind of load, which is why it's only recommended for
|
||||
small sites or development.
|
||||
|
||||
The only real solution is to set up a cron job that runs
|
||||
``./manage.py rebuild_index`` (optionally with ``--age=24``) that runs nightly
|
||||
(or however often you need) to refresh the search indexes. Then disable the
|
||||
use of the ``RealtimeSignalProcessor`` within your settings.
|
||||
|
||||
The downside to this is that you lose real-time search. For many people, this
|
||||
isn't an issue and this will allow you to scale Whoosh up to a much higher
|
||||
traffic. If this is not acceptable, you should investigate either the Solr or
|
||||
Xapian backends.
|
||||
|
||||
|
||||
"Failed to add documents to Solr: [Reason: None]"
|
||||
=================================================
|
||||
|
||||
This is a Solr-specific traceback. It generally occurs when there is an error
|
||||
with your ``HAYSTACK_CONNECTIONS[<alias>]['URL']``. Since Solr acts as a webservice, you should
|
||||
test the URL in your web browser. If you receive an error, you may need to
|
||||
change your URL.
|
||||
|
||||
This can also be caused when using old versions of pysolr (2.0.9 and before) with httplib2 and
|
||||
including a trailing slash in your ``HAYSTACK_CONNECTIONS[<alias>]['URL']``. If this applies to
|
||||
you, please upgrade to the current version of pysolr.
|
||||
|
||||
|
||||
"Got an unexpected keyword argument 'boost'"
|
||||
============================================
|
||||
|
||||
This is a Solr-specific traceback. This can also be caused when using old
|
||||
versions of pysolr (2.0.12 and before). Please upgrade your version of
|
||||
pysolr (2.0.13+).
|
|
@ -0,0 +1,328 @@
|
|||
.. _ref-faceting:
|
||||
|
||||
========
|
||||
Faceting
|
||||
========
|
||||
|
||||
What Is Faceting?
|
||||
-----------------
|
||||
|
||||
Faceting is a way to provide users with feedback about the number of documents
|
||||
which match terms they may be interested in. At its simplest, it gives
|
||||
document counts based on words in the corpus, date ranges, numeric ranges or
|
||||
even advanced queries.
|
||||
|
||||
Faceting is particularly useful when trying to provide users with drill-down
|
||||
capabilities. The general workflow in this regard is:
|
||||
|
||||
#. You can choose what you want to facet on.
|
||||
#. The search engine will return the counts it sees for that match.
|
||||
#. You display those counts to the user and provide them with a link.
|
||||
#. When the user chooses a link, you narrow the search query to only include
|
||||
those conditions and display the results, potentially with further facets.
|
||||
|
||||
.. note::
|
||||
|
||||
Faceting can be difficult, especially in providing the user with the right
|
||||
number of options and/or the right areas to be able to drill into. This
|
||||
is unique to every situation and demands following what real users need.
|
||||
|
||||
You may want to consider logging queries and looking at popular terms to
|
||||
help you narrow down how you can help your users.
|
||||
|
||||
Haystack provides functionality so that all of the above steps are possible.
|
||||
From the ground up, let's build a faceted search setup. This assumes that you
|
||||
have been to work through the :doc:`tutorial` and have a working Haystack
|
||||
installation. The same setup from the :doc:`tutorial` applies here.
|
||||
|
||||
1. Determine Facets And ``SearchQuerySet``
|
||||
------------------------------------------
|
||||
|
||||
Determining what you want to facet on isn't always easy. For our purposes,
|
||||
we'll facet on the ``author`` field.
|
||||
|
||||
In order to facet effectively, the search engine should store both a standard
|
||||
representation of your data as well as exact version to facet on. This is
|
||||
generally accomplished by duplicating the field and storing it via two
|
||||
different types. Duplication is suggested so that those fields are still
|
||||
searchable in the standard ways.
|
||||
|
||||
To inform Haystack of this, you simply pass along a ``faceted=True`` parameter
|
||||
on the field(s) you wish to facet on. So to modify our existing example::
|
||||
|
||||
class NoteIndex(SearchIndex, indexes.Indexable):
|
||||
text = CharField(document=True, use_template=True)
|
||||
author = CharField(model_attr='user', faceted=True)
|
||||
pub_date = DateTimeField(model_attr='pub_date')
|
||||
|
||||
Haystack quietly handles all of the backend details for you, creating a similar
|
||||
field to the type you specified with ``_exact`` appended. Our example would now
|
||||
have both a ``author`` and ``author_exact`` field, though this is largely an
|
||||
implementation detail.
|
||||
|
||||
To pull faceting information out of the index, we'll use the
|
||||
``SearchQuerySet.facet`` method to setup the facet and the
|
||||
``SearchQuerySet.facet_counts`` method to retrieve back the counts seen.
|
||||
|
||||
Experimenting in a shell (``./manage.py shell``) is a good way to get a feel
|
||||
for what various facets might look like::
|
||||
|
||||
>>> from haystack.query import SearchQuerySet
|
||||
>>> sqs = SearchQuerySet().facet('author')
|
||||
>>> sqs.facet_counts()
|
||||
{
|
||||
'dates': {},
|
||||
'fields': {
|
||||
'author': [
|
||||
('john', 4),
|
||||
('daniel', 2),
|
||||
('sally', 1),
|
||||
('terry', 1),
|
||||
],
|
||||
},
|
||||
'queries': {}
|
||||
}
|
||||
|
||||
.. note::
|
||||
|
||||
Note that, despite the duplication of fields, you should provide the
|
||||
regular name of the field when faceting. Haystack will intelligently
|
||||
handle the underlying details and mapping.
|
||||
|
||||
As you can see, we get back a dictionary which provides access to the three
|
||||
types of facets available: ``fields``, ``dates`` and ``queries``. Since we only
|
||||
faceted on the ``author`` field (which actually facets on the ``author_exact``
|
||||
field managed by Haystack), only the ``fields`` key has any data
|
||||
associated with it. In this case, we have a corpus of eight documents with four
|
||||
unique authors.
|
||||
|
||||
.. note::
|
||||
Facets are chainable, like most ``SearchQuerySet`` methods. However, unlike
|
||||
most ``SearchQuerySet`` methods, they are *NOT* affected by ``filter`` or
|
||||
similar methods. The only method that has any effect on facets is the
|
||||
``narrow`` method (which is how you provide drill-down).
|
||||
|
||||
Configuring facet behaviour
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
You can configure the behaviour of your facets by passing options
|
||||
for each facet in your SearchQuerySet. These options can be backend specific.
|
||||
|
||||
**limit**
|
||||
*tested on Solr*
|
||||
|
||||
The ``limit`` parameter limits the results for each query. On Solr, the default `facet.limit`_ is 100 and a
|
||||
negative number removes the limit.
|
||||
|
||||
.. _facet.limit: https://wiki.apache.org/solr/SimpleFacetParameters#facet.limit
|
||||
|
||||
Example usage::
|
||||
|
||||
>>> from haystack.query import SearchQuerySet
|
||||
>>> sqs = SearchQuerySet().facet('author', limit=-1)
|
||||
>>> sqs.facet_counts()
|
||||
{
|
||||
'dates': {},
|
||||
'fields': {
|
||||
'author': [
|
||||
('abraham', 1),
|
||||
('benny', 2),
|
||||
('cindy', 1),
|
||||
('diana', 5),
|
||||
],
|
||||
},
|
||||
'queries': {}
|
||||
}
|
||||
|
||||
>>> sqs = SearchQuerySet().facet('author', limit=2)
|
||||
>>> sqs.facet_counts()
|
||||
{
|
||||
'dates': {},
|
||||
'fields': {
|
||||
'author': [
|
||||
('abraham', 1),
|
||||
('benny', 2),
|
||||
],
|
||||
},
|
||||
'queries': {}
|
||||
}
|
||||
|
||||
**sort**
|
||||
*tested on Solr*
|
||||
|
||||
The ``sort`` parameter will sort the results for each query. Solr's default
|
||||
`facet.sort`_ is ``index``, which will sort the facets alphabetically. Changing
|
||||
the parameter to ``count`` will sort the facets by the number of results for
|
||||
each facet value.
|
||||
|
||||
.. _facet.sort: https://wiki.apache.org/solr/SimpleFacetParameters#facet.sort
|
||||
|
||||
|
||||
Example usage::
|
||||
|
||||
>>> from haystack.query import SearchQuerySet
|
||||
>>> sqs = SearchQuerySet().facet('author', sort='index', )
|
||||
>>> sqs.facet_counts()
|
||||
{
|
||||
'dates': {},
|
||||
'fields': {
|
||||
'author': [
|
||||
('abraham', 1),
|
||||
('benny', 2),
|
||||
('cindy', 1),
|
||||
('diana', 5),
|
||||
],
|
||||
},
|
||||
'queries': {}
|
||||
}
|
||||
|
||||
>>> sqs = SearchQuerySet().facet('author', sort='count', )
|
||||
>>> sqs.facet_counts()
|
||||
{
|
||||
'dates': {},
|
||||
'fields': {
|
||||
'author': [
|
||||
('diana', 5),
|
||||
('benny', 2),
|
||||
('abraham', 1),
|
||||
('cindy', 1),
|
||||
],
|
||||
},
|
||||
'queries': {}
|
||||
}
|
||||
|
||||
|
||||
Now that we have the facet we want, it's time to implement it.
|
||||
|
||||
2. Switch to the ``FacetedSearchView`` and ``FacetedSearchForm``
|
||||
----------------------------------------------------------------
|
||||
|
||||
There are three things that we'll need to do to expose facets to our frontend.
|
||||
The first is construct the ``SearchQuerySet`` we want to use. We should have
|
||||
that from the previous step. The second is to switch to the
|
||||
``FacetedSearchView``. This view is useful because it prepares the facet counts
|
||||
and provides them in the context as ``facets``.
|
||||
|
||||
Optionally, the third step is to switch to the ``FacetedSearchForm``. As it
|
||||
currently stands, this is only useful if you want to provide drill-down, though
|
||||
it may provide more functionality in the future. We'll do it for the sake of
|
||||
having it in place but know that it's not required.
|
||||
|
||||
In your URLconf, you'll need to switch to the ``FacetedSearchView``. Your
|
||||
URLconf should resemble::
|
||||
|
||||
from django.conf.urls.defaults import *
|
||||
from haystack.forms import FacetedSearchForm
|
||||
from haystack.query import SearchQuerySet
|
||||
from haystack.views import FacetedSearchView
|
||||
|
||||
|
||||
sqs = SearchQuerySet().facet('author')
|
||||
|
||||
|
||||
urlpatterns = patterns('haystack.views',
|
||||
url(r'^$', FacetedSearchView(form_class=FacetedSearchForm, searchqueryset=sqs), name='haystack_search'),
|
||||
)
|
||||
|
||||
The ``FacetedSearchView`` will now instantiate the ``FacetedSearchForm`` and use
|
||||
the ``SearchQuerySet`` we provided. Now, a ``facets`` variable will be present
|
||||
in the context. This is added in an overridden ``extra_context`` method.
|
||||
|
||||
|
||||
3. Display The Facets In The Template
|
||||
-------------------------------------
|
||||
|
||||
Templating facets involves simply adding an extra bit of processing to display
|
||||
the facets (and optionally to link to provide drill-down). An example template
|
||||
might look like this::
|
||||
|
||||
<form method="get" action=".">
|
||||
<table>
|
||||
<tbody>
|
||||
{{ form.as_table }}
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td><input type="submit" value="Search"></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</form>
|
||||
|
||||
{% if query %}
|
||||
<!-- Begin faceting. -->
|
||||
<h2>By Author</h2>
|
||||
|
||||
<div>
|
||||
<dl>
|
||||
{% if facets.fields.author %}
|
||||
<dt>Author</dt>
|
||||
{# Provide only the top 5 authors #}
|
||||
{% for author in facets.fields.author|slice:":5" %}
|
||||
<dd><a href="{{ request.get_full_path }}&selected_facets=author_exact:{{ author.0|urlencode }}">{{ author.0 }}</a> ({{ author.1 }})</dd>
|
||||
{% endfor %}
|
||||
{% else %}
|
||||
<p>No author facets.</p>
|
||||
{% endif %}
|
||||
</dl>
|
||||
</div>
|
||||
<!-- End faceting -->
|
||||
|
||||
<!-- Display results... -->
|
||||
{% for result in page.object_list %}
|
||||
<div class="search_result">
|
||||
<h3><a href="{{ result.object.get_absolute_url }}">{{ result.object.title }}</a></h3>
|
||||
|
||||
<p>{{ result.object.body|truncatewords:80 }}</p>
|
||||
</div>
|
||||
{% empty %}
|
||||
<p>Sorry, no results found.</p>
|
||||
{% endfor %}
|
||||
{% endif %}
|
||||
|
||||
Displaying the facets is a matter of looping through the facets you want and
|
||||
providing the UI to suit. The ``author.0`` is the facet text from the backend
|
||||
and the ``author.1`` is the facet count.
|
||||
|
||||
4. Narrowing The Search
|
||||
-----------------------
|
||||
|
||||
We've also set ourselves up for the last bit, the drill-down aspect. By
|
||||
appending on the ``selected_facets`` to the URLs, we're informing the
|
||||
``FacetedSearchForm`` that we want to narrow our results to only those
|
||||
containing the author we provided.
|
||||
|
||||
For a concrete example, if the facets on author come back as::
|
||||
|
||||
{
|
||||
'dates': {},
|
||||
'fields': {
|
||||
'author': [
|
||||
('john', 4),
|
||||
('daniel', 2),
|
||||
('sally', 1),
|
||||
('terry', 1),
|
||||
],
|
||||
},
|
||||
'queries': {}
|
||||
}
|
||||
|
||||
You should present a list similar to::
|
||||
|
||||
<ul>
|
||||
<li><a href="/search/?q=Haystack&selected_facets=author_exact:john">john</a> (4)</li>
|
||||
<li><a href="/search/?q=Haystack&selected_facets=author_exact:daniel">daniel</a> (2)</li>
|
||||
<li><a href="/search/?q=Haystack&selected_facets=author_exact:sally">sally</a> (1)</li>
|
||||
<li><a href="/search/?q=Haystack&selected_facets=author_exact:terry">terry</a> (1)</li>
|
||||
</ul>
|
||||
|
||||
.. warning::
|
||||
|
||||
Haystack can automatically handle most details around faceting. However,
|
||||
since ``selected_facets`` is passed directly to narrow, it must use the
|
||||
duplicated field name. Improvements to this are planned but incomplete.
|
||||
|
||||
This is simply the default behavior but it is possible to override or provide
|
||||
your own form which does additional processing. You could also write your own
|
||||
faceted ``SearchView``, which could provide additional/different facets based
|
||||
on facets chosen. There is a wide range of possibilities available to help the
|
||||
user navigate your content.
|
|
@ -0,0 +1,117 @@
|
|||
.. _ref-frequently-asked-questions:
|
||||
|
||||
==============================
|
||||
(In)Frequently Asked Questions
|
||||
==============================
|
||||
|
||||
|
||||
What is Haystack?
|
||||
=================
|
||||
|
||||
Haystack is meant to be a portable interface to a search engine of your choice.
|
||||
Some might call it a search framework, an abstraction layer or what have you.
|
||||
The idea is that you write your search code once and should be able to freely
|
||||
switch between backends as your situation necessitates.
|
||||
|
||||
|
||||
Why should I consider using Haystack?
|
||||
=====================================
|
||||
|
||||
Haystack is targeted at the following use cases:
|
||||
|
||||
* If you want to feature search on your site and search solutions like Google or
|
||||
Yahoo search don't fit your needs.
|
||||
* If you want to be able to customize your search and search on more than just
|
||||
the main content.
|
||||
* If you want to have features like drill-down (faceting) or "More Like This".
|
||||
* If you want a interface that is non-search engine specific, allowing you to
|
||||
change your mind later without much rewriting.
|
||||
|
||||
|
||||
When should I not be using Haystack?
|
||||
====================================
|
||||
|
||||
* Non-Model-based data. If you just want to index random data (flat files,
|
||||
alternate sources, etc.), Haystack isn't a good solution. Haystack is very
|
||||
``Model``-based and doesn't work well outside of that use case.
|
||||
* Ultra-high volume. Because of the very nature of Haystack (abstraction layer),
|
||||
there's more overhead involved. This makes it portable, but as with all
|
||||
abstraction layers, you lose a little performance. You also can't take full
|
||||
advantage of the exact feature-set of your search engine. This is the price
|
||||
of pluggable backends.
|
||||
|
||||
|
||||
Why was Haystack created when there are so many other search options?
|
||||
=====================================================================
|
||||
|
||||
The proliferation of search options in Django is a relatively recent development
|
||||
and is actually one of the reasons for Haystack's existence. There are too
|
||||
many options that are only partial solutions or are too engine specific.
|
||||
|
||||
Further, most use an unfamiliar API and documentation is lacking in most cases.
|
||||
|
||||
Haystack is an attempt to unify these efforts into one solution. That's not to
|
||||
say there should be no alternatives, but Haystack should provide a good
|
||||
solution to 80%+ of the search use cases out there.
|
||||
|
||||
|
||||
What's the history behind Haystack?
|
||||
===================================
|
||||
|
||||
Haystack started because of my frustration with the lack of good search options
|
||||
(before many other apps came out) and as the result of extensive use of
|
||||
Djangosearch. Djangosearch was a decent solution but had a number of
|
||||
shortcomings, such as:
|
||||
|
||||
* Tied to the models.py, so you'd have to modify the source of third-party (
|
||||
or django.contrib) apps in order to effectively use it.
|
||||
* All or nothing approach to indexes. So all indexes appear on all sites and
|
||||
in all places.
|
||||
* Lack of tests.
|
||||
* Lack of documentation.
|
||||
* Uneven backend implementations.
|
||||
|
||||
The initial idea was to simply fork Djangosearch and improve on these (and
|
||||
other issues). However, after stepping back, I decided to overhaul the entire
|
||||
API (and most of the underlying code) to be more representative of what I would
|
||||
want as an end-user. The result was starting afresh and reusing concepts (and
|
||||
some code) from Djangosearch as needed.
|
||||
|
||||
As a result of this heritage, you can actually still find some portions of
|
||||
Djangosearch present in Haystack (especially in the ``SearchIndex`` and
|
||||
``SearchBackend`` classes) where it made sense. The original authors of
|
||||
Djangosearch are aware of this and thus far have seemed to be fine with this
|
||||
reuse.
|
||||
|
||||
|
||||
Why doesn't <search engine X> have a backend included in Haystack?
|
||||
==================================================================
|
||||
|
||||
Several possibilities on this.
|
||||
|
||||
#. Licensing
|
||||
|
||||
A common problem is that the Python bindings for a specific engine may
|
||||
have been released under an incompatible license. The goal is for Haystack
|
||||
to remain BSD licensed and importing bindings with an incompatible license
|
||||
can technically convert the entire codebase to that license. This most
|
||||
commonly occurs with GPL'ed bindings.
|
||||
|
||||
#. Lack of time
|
||||
|
||||
The search engine in question may be on the list of backends to add and we
|
||||
simply haven't gotten to it yet. We welcome patches for additional backends.
|
||||
|
||||
#. Incompatible API
|
||||
|
||||
In order for an engine to work well with Haystack, a certain baseline set of
|
||||
features is needed. This is often an issue when the engine doesn't support
|
||||
ranged queries or additional attributes associated with a search record.
|
||||
|
||||
#. We're not aware of the engine
|
||||
|
||||
If you think we may not be aware of the engine you'd like, please tell us
|
||||
about it (preferably via the group -
|
||||
http://groups.google.com/group/django-haystack/). Be sure to check through
|
||||
the backends (in case it wasn't documented) and search the history on the
|
||||
group to minimize duplicates.
|
|
@ -0,0 +1,76 @@
|
|||
.. _ref-glossary:
|
||||
|
||||
========
|
||||
Glossary
|
||||
========
|
||||
|
||||
Search is a domain full of its own jargon and definitions. As this may be an
|
||||
unfamiliar territory to many developers, what follows are some commonly used
|
||||
terms and what they mean.
|
||||
|
||||
|
||||
Engine
|
||||
An engine, for the purposes of Haystack, is a third-party search solution.
|
||||
It might be a full service (i.e. Solr_) or a library to build an
|
||||
engine with (i.e. Whoosh_)
|
||||
|
||||
.. _Solr: http://lucene.apache.org/solr/
|
||||
.. _Whoosh: https://bitbucket.org/mchaput/whoosh/
|
||||
|
||||
Index
|
||||
The datastore used by the engine is called an index. Its structure can vary
|
||||
wildly between engines but commonly they resemble a document store. This is
|
||||
the source of all information in Haystack.
|
||||
|
||||
Document
|
||||
A document is essentially a record within the index. It usually contains at
|
||||
least one blob of text that serves as the primary content the engine searches
|
||||
and may have additional data hung off it.
|
||||
|
||||
Corpus
|
||||
A term for a collection of documents. When talking about the documents stored
|
||||
by the engine (rather than the technical implementation of the storage), this
|
||||
term is commonly used.
|
||||
|
||||
Field
|
||||
Within the index, each document may store extra data with the main content as
|
||||
a field. Also sometimes called an attribute, this usually represents metadata
|
||||
or extra content about the document. Haystack can use these fields for
|
||||
filtering and display.
|
||||
|
||||
Term
|
||||
A term is generally a single word (or word-like) string of characters used
|
||||
in a search query.
|
||||
|
||||
Stemming
|
||||
A means of determining if a word has any root words. This varies by language,
|
||||
but in English, this generally consists of removing plurals, an action form of
|
||||
the word, et cetera. For instance, in English, 'giraffes' would stem to
|
||||
'giraffe'. Similarly, 'exclamation' would stem to 'exclaim'. This is useful
|
||||
for finding variants of the word that may appear in other documents.
|
||||
|
||||
Boost
|
||||
Boost provides a means to take a term or phrase from a search query and alter
|
||||
the relevance of a result based on if that term is found in the result, a form
|
||||
of weighting. For instance, if you wanted to more heavily weight results that
|
||||
included the word 'zebra', you'd specify a boost for that term within the
|
||||
query.
|
||||
|
||||
More Like This
|
||||
Incorporating techniques from information retrieval and artificial
|
||||
intelligence, More Like This is a technique for finding other documents within
|
||||
the index that closely resemble the document in question. This is useful for
|
||||
programmatically generating a list of similar content for a user to browse
|
||||
based on the current document they are viewing.
|
||||
|
||||
Faceting
|
||||
Faceting is a way to provide insight to the user into the contents of your
|
||||
corpus. In its simplest form, it is a set of document counts returned with
|
||||
results when performing a query. These counts can be used as feedback for
|
||||
the user, allowing the user to choose interesting aspects of their search
|
||||
results and "drill down" into those results.
|
||||
|
||||
An example might be providing a facet on an ``author`` field, providing back a
|
||||
list of authors and the number of documents in the index they wrote. This
|
||||
could be presented to the user with a link, allowing the user to click and
|
||||
narrow their original search to all results by that author.
|
|
@ -0,0 +1,22 @@
|
|||
{% extends "basic/layout.html" %}
|
||||
|
||||
{%- block extrahead %}
|
||||
<link rel="stylesheet" href="http://haystacksearch.org/css/front.css" media="screen">
|
||||
<link rel="stylesheet" href="_static/documentation.css" media="screen">
|
||||
{% endblock %}
|
||||
|
||||
{%- block header %}
|
||||
<div id="header">
|
||||
<h1>Haystack</h1>
|
||||
<p>Modular search for Django</p>
|
||||
|
||||
<ul class="features">
|
||||
<li>Term Boost</li>
|
||||
<li>More Like This</li>
|
||||
<li>Faceting</li>
|
||||
<li>Stored (non-indexed) fields</li>
|
||||
<li>Highlighting</li>
|
||||
<li>Spelling Suggestions</li>
|
||||
</ul>
|
||||
</div>
|
||||
{% endblock %}
|
|
@ -0,0 +1,29 @@
|
|||
a, a:link, a:hover { background-color: transparent !important; color: #CAECFF; outline-color: transparent !important; text-decoration: underline; }
|
||||
dl dt { text-decoration: underline; }
|
||||
dl.class dt, dl.method dt { background-color: #444444; padding: 5px; text-decoration: none; }
|
||||
tt.descname { font-weight: normal; }
|
||||
dl.method dt span.optional { font-weight: normal; }
|
||||
div#header { margin-bottom: 0px; }
|
||||
div.document, div.related, div.footer { width: 900px; margin: 0 auto; }
|
||||
div.document { margin-top: 10px; }
|
||||
div.related { background-color: #262511; padding-left: 10px; padding-right: 10px; }
|
||||
div.documentwrapper { width:640px; float:left;}
|
||||
div.body h1,
|
||||
div.body h2,
|
||||
div.body h3,
|
||||
div.body h4,
|
||||
div.body h5,
|
||||
div.body h6 {
|
||||
background-color: #053211;
|
||||
font-weight: normal;
|
||||
border-bottom: 2px solid #262511;
|
||||
margin: 20px -20px 10px -20px;
|
||||
padding: 3px 0 3px 10px;
|
||||
}
|
||||
div.sphinxsidebar { width:220px; float:right;}
|
||||
div.sphinxsidebar ul { padding-left: 10px; }
|
||||
div.sphinxsidebar ul ul { padding-left: 10px; margin-left: 10px; }
|
||||
div.bodywrapper { margin: 0px; }
|
||||
div.highlight-python, div.highlight { background-color: #262511; margin-bottom: 10px; padding: 10px; }
|
||||
div.footer { background-color:#262511; font-size: 90%; padding: 10px; }
|
||||
table thead { background-color: #053211; border-bottom: 1px solid #262511; }
|
|
@ -0,0 +1,2 @@
|
|||
[theme]
|
||||
inherit = basic
|
|
@ -0,0 +1,77 @@
|
|||
.. _ref-highlighting:
|
||||
|
||||
============
|
||||
Highlighting
|
||||
============
|
||||
|
||||
Haystack supports two different methods of highlighting. You can either use
|
||||
``SearchQuerySet.highlight`` or the built-in ``{% highlight %}`` template tag,
|
||||
which uses the ``Highlighter`` class. Each approach has advantages and
|
||||
disadvantages you need to weigh when deciding which to use.
|
||||
|
||||
If you want portable, flexible, decently fast code, the
|
||||
``{% highlight %}`` template tag (or manually using the underlying
|
||||
``Highlighter`` class) is the way to go. On the other hand, if you care more
|
||||
about speed and will only ever be using one backend,
|
||||
``SearchQuerySet.highlight`` may suit your needs better.
|
||||
|
||||
Use of ``SearchQuerySet.highlight`` is documented in the
|
||||
:doc:`searchqueryset_api` documentation and the ``{% highlight %}`` tag is
|
||||
covered in the :doc:`templatetags` documentation, so the rest of this material
|
||||
will cover the ``Highlighter`` implementation.
|
||||
|
||||
|
||||
``Highlighter``
|
||||
---------------
|
||||
|
||||
The ``Highlighter`` class is a pure-Python implementation included with Haystack
|
||||
that's designed for flexibility. If you use the ``{% highlight %}`` template
|
||||
tag, you'll be automatically using this class. You can also use it manually in
|
||||
your code. For example::
|
||||
|
||||
>>> from haystack.utils import Highlighter
|
||||
|
||||
>>> my_text = 'This is a sample block that would be more meaningful in real life.'
|
||||
>>> my_query = 'block meaningful'
|
||||
|
||||
>>> highlight = Highlighter(my_query)
|
||||
>>> highlight.highlight(my_text)
|
||||
u'...<span class="highlighted">block</span> that would be more <span class="highlighted">meaningful</span> in real life.'
|
||||
|
||||
The default implementation takes three optional kwargs: ``html_tag``,
|
||||
``css_class`` and ``max_length``. These allow for basic customizations to the
|
||||
output, like so::
|
||||
|
||||
>>> from haystack.utils import Highlighter
|
||||
|
||||
>>> my_text = 'This is a sample block that would be more meaningful in real life.'
|
||||
>>> my_query = 'block meaningful'
|
||||
|
||||
>>> highlight = Highlighter(my_query, html_tag='div', css_class='found', max_length=35)
|
||||
>>> highlight.highlight(my_text)
|
||||
u'...<div class="found">block</div> that would be more <div class="found">meaningful</div>...'
|
||||
|
||||
Further, if this implementation doesn't suit your needs, you can define your own
|
||||
custom highlighter class. As long as it implements the API you've just seen, it
|
||||
can highlight however you choose. For example::
|
||||
|
||||
# In ``myapp/utils.py``...
|
||||
from haystack.utils import Highlighter
|
||||
|
||||
class BorkHighlighter(Highlighter):
|
||||
def render_html(self, highlight_locations=None, start_offset=None, end_offset=None):
|
||||
highlighted_chunk = self.text_block[start_offset:end_offset]
|
||||
|
||||
for word in self.query_words:
|
||||
highlighted_chunk = highlighted_chunk.replace(word, 'Bork!')
|
||||
|
||||
return highlighted_chunk
|
||||
|
||||
Then set the ``HAYSTACK_CUSTOM_HIGHLIGHTER`` setting to
|
||||
``myapp.utils.BorkHighlighter``. Usage would then look like::
|
||||
|
||||
>>> highlight = BorkHighlighter(my_query)
|
||||
>>> highlight.highlight(my_text)
|
||||
u'Bork! that would be more Bork! in real life.'
|
||||
|
||||
Now the ``{% highlight %}`` template tag will also use this highlighter.
|
|
@ -0,0 +1,117 @@
|
|||
Welcome to Haystack!
|
||||
====================
|
||||
|
||||
Haystack provides modular search for Django. It features a unified, familiar
|
||||
API that allows you to plug in different search backends (such as Solr_,
|
||||
Elasticsearch_, Whoosh_, Xapian_, etc.) without having to modify your code.
|
||||
|
||||
.. _Solr: http://lucene.apache.org/solr/
|
||||
.. _Elasticsearch: http://elasticsearch.org/
|
||||
.. _Whoosh: https://bitbucket.org/mchaput/whoosh/
|
||||
.. _Xapian: http://xapian.org/
|
||||
|
||||
|
||||
.. note::
|
||||
|
||||
This documentation represents Haystack 2.x. For old versions of the documentation: `1.2`_, `1.1`_.
|
||||
|
||||
.. _`1.2`: http://django-haystack.readthedocs.org/en/v1.2.6/index.html
|
||||
.. _`1.1`: http://django-haystack.readthedocs.org/en/v1.1/index.html
|
||||
|
||||
Getting Started
|
||||
---------------
|
||||
|
||||
If you're new to Haystack, you may want to start with these documents to get
|
||||
you up and running:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
tutorial
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
views_and_forms
|
||||
templatetags
|
||||
glossary
|
||||
management_commands
|
||||
faq
|
||||
who_uses
|
||||
other_apps
|
||||
installing_search_engines
|
||||
debugging
|
||||
|
||||
migration_from_1_to_2
|
||||
python3
|
||||
contributing
|
||||
|
||||
|
||||
Advanced Uses
|
||||
-------------
|
||||
|
||||
Once you've got Haystack working, here are some of the more complex features
|
||||
you may want to include in your application.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
best_practices
|
||||
highlighting
|
||||
faceting
|
||||
autocomplete
|
||||
boost
|
||||
signal_processors
|
||||
multiple_index
|
||||
rich_content_extraction
|
||||
spatial
|
||||
admin
|
||||
|
||||
|
||||
Reference
|
||||
---------
|
||||
|
||||
If you're an experienced user and are looking for a reference, you may be
|
||||
looking for API documentation and advanced usage as detailed in:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
searchqueryset_api
|
||||
searchindex_api
|
||||
inputtypes
|
||||
searchfield_api
|
||||
searchresult_api
|
||||
searchquery_api
|
||||
searchbackend_api
|
||||
|
||||
architecture_overview
|
||||
backend_support
|
||||
settings
|
||||
utils
|
||||
|
||||
|
||||
Developing
|
||||
----------
|
||||
|
||||
Finally, if you're looking to help out with the development of Haystack,
|
||||
the following links should help guide you on running tests and creating
|
||||
additional backends:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
running_tests
|
||||
creating_new_backends
|
||||
|
||||
|
||||
Requirements
|
||||
------------
|
||||
|
||||
Haystack has a relatively easily-met set of requirements.
|
||||
|
||||
* Python 2.7+ or Python 3.3+
|
||||
* Django 1.6+
|
||||
|
||||
Additionally, each backend has its own requirements. You should refer to
|
||||
:doc:`installing_search_engines` for more details.
|
|
@ -0,0 +1,177 @@
|
|||
.. _ref-inputtypes:
|
||||
|
||||
===========
|
||||
Input Types
|
||||
===========
|
||||
|
||||
Input types allow you to specify more advanced query behavior. They serve as a
|
||||
way to alter the query, often in backend-specific ways, without altering your
|
||||
Python code; as well as enabling use of more advanced features.
|
||||
|
||||
Input types currently are only useful with the ``filter/exclude`` methods on
|
||||
``SearchQuerySet``. Expanding this support to other methods is on the roadmap.
|
||||
|
||||
|
||||
Available Input Types
|
||||
=====================
|
||||
|
||||
Included with Haystack are the following input types:
|
||||
|
||||
``Raw``
|
||||
-------
|
||||
|
||||
.. class:: haystack.inputs.Raw
|
||||
|
||||
Raw allows you to specify backend-specific query syntax. If Haystack doesn't
|
||||
provide a way to access special query functionality, you can make use of this
|
||||
input type to pass it along.
|
||||
|
||||
Example::
|
||||
|
||||
# Fielded.
|
||||
sqs = SearchQuerySet().filter(author=Raw('daniel OR jones'))
|
||||
|
||||
# Non-fielded.
|
||||
# See ``AltParser`` for a better way to construct this.
|
||||
sqs = SearchQuerySet().filter(content=Raw('{!dismax qf=author mm=1}haystack'))
|
||||
|
||||
|
||||
``Clean``
|
||||
---------
|
||||
|
||||
.. class:: haystack.inputs.Clean
|
||||
|
||||
``Clean`` takes standard user (untrusted) input and sanitizes it. It ensures
|
||||
that no unintended operators or special characters make it into the query.
|
||||
|
||||
This is roughly analogous to Django's ``autoescape`` support.
|
||||
|
||||
.. note::
|
||||
|
||||
By default, if you hand a ``SearchQuerySet`` a bare string, it will get
|
||||
wrapped in this class.
|
||||
|
||||
Example::
|
||||
|
||||
# This becomes "daniel or jones".
|
||||
sqs = SearchQuerySet().filter(content=Clean('daniel OR jones'))
|
||||
|
||||
# Things like ``:`` & ``/`` get escaped.
|
||||
sqs = SearchQuerySet().filter(url=Clean('http://www.example.com'))
|
||||
|
||||
# Equivalent (automatically wrapped in ``Clean``).
|
||||
sqs = SearchQuerySet().filter(url='http://www.example.com')
|
||||
|
||||
|
||||
``Exact``
|
||||
---------
|
||||
|
||||
.. class:: haystack.inputs.Exact
|
||||
|
||||
``Exact`` allows for making sure a phrase is exactly matched, unlike the usual
|
||||
``AND`` lookups, where words may be far apart.
|
||||
|
||||
Example::
|
||||
|
||||
sqs = SearchQuerySet().filter(author=Exact('n-gram support'))
|
||||
|
||||
# Equivalent.
|
||||
sqs = SearchQuerySet().filter(author__exact='n-gram support')
|
||||
|
||||
|
||||
``Not``
|
||||
-------
|
||||
|
||||
.. class:: haystack.inputs.Not
|
||||
|
||||
``Not`` allows negation of the query fragment it wraps. As ``Not`` is a subclass
|
||||
of ``Clean``, it will also sanitize the query.
|
||||
|
||||
This is generally only used internally. Most people prefer to use the
|
||||
``SearchQuerySet.exclude`` method.
|
||||
|
||||
Example::
|
||||
|
||||
sqs = SearchQuerySet().filter(author=Not('daniel'))
|
||||
|
||||
|
||||
``AutoQuery``
|
||||
-------------
|
||||
|
||||
.. class:: haystack.inputs.AutoQuery
|
||||
|
||||
``AutoQuery`` takes a more complex user query (that includes simple, standard
|
||||
query syntax bits) & forms a proper query out of them. It also handles
|
||||
sanitizing that query using ``Clean`` to ensure the query doesn't break.
|
||||
|
||||
``AutoQuery`` accommodates for handling regular words, NOT-ing words &
|
||||
extracting exact phrases.
|
||||
|
||||
Example::
|
||||
|
||||
# Against the main text field with an accidental ":" before "search".
|
||||
# Generates a query like ``haystack (NOT whoosh) "fast search"``
|
||||
sqs = SearchQuerySet().filter(content=AutoQuery('haystack -whoosh "fast :search"'))
|
||||
|
||||
# Equivalent.
|
||||
sqs = SearchQuerySet().auto_query('haystack -whoosh "fast :search"')
|
||||
|
||||
# Fielded.
|
||||
sqs = SearchQuerySet().filter(author=AutoQuery('daniel -day -lewis'))
|
||||
|
||||
|
||||
``AltParser``
|
||||
-------------
|
||||
|
||||
.. class:: haystack.inputs.AltParser
|
||||
|
||||
``AltParser`` lets you specify that a portion of the query should use a
|
||||
separate parser in the search engine. This is search-engine-specific, so it may
|
||||
decrease the portability of your app.
|
||||
|
||||
Currently only supported under Solr.
|
||||
|
||||
Example::
|
||||
|
||||
# DisMax.
|
||||
sqs = SearchQuerySet().filter(content=AltParser('dismax', 'haystack', qf='text', mm=1))
|
||||
|
||||
# Prior to the spatial support, you could do...
|
||||
sqs = SearchQuerySet().filter(content=AltParser('dismax', 'haystack', qf='author', mm=1))
|
||||
|
||||
|
||||
Creating Your Own Input Types
|
||||
=============================
|
||||
|
||||
Building your own input type is relatively simple. All input types are simple
|
||||
classes that provide an ``__init__`` & a ``prepare`` method.
|
||||
|
||||
The ``__init__`` may accept any ``args/kwargs``, though the typical use usually
|
||||
just involves a query string.
|
||||
|
||||
The ``prepare`` method lets you alter the query the user provided before it
|
||||
becomes of the main query. It is lazy, called as late as possible, right before
|
||||
the final query is built & shipped to the engine.
|
||||
|
||||
A full, if somewhat silly, example looks like::
|
||||
|
||||
from haystack.inputs import Clean
|
||||
|
||||
|
||||
class NoShoutCaps(Clean):
|
||||
input_type_name = 'no_shout_caps'
|
||||
# This is the default & doesn't need to be specified.
|
||||
post_process = True
|
||||
|
||||
def __init__(self, query_string, **kwargs):
|
||||
# Stash the original, if you need it.
|
||||
self.original = query_string
|
||||
super(NoShoutCaps, self).__init__(query_string, **kwargs)
|
||||
|
||||
def prepare(self, query_obj):
|
||||
# We need a reference to the current ``SearchQuery`` object this
|
||||
# will run against, in case we need backend-specific code.
|
||||
query_string = super(NoShoutCaps, self).prepare(query_obj)
|
||||
|
||||
# Take that, capital letters!
|
||||
return query_string.lower()
|
|
@ -0,0 +1,222 @@
|
|||
.. _ref-installing-search-engines:
|
||||
|
||||
=========================
|
||||
Installing Search Engines
|
||||
=========================
|
||||
|
||||
Solr
|
||||
====
|
||||
|
||||
Official Download Location: http://www.apache.org/dyn/closer.cgi/lucene/solr/
|
||||
|
||||
Solr is Java but comes in a pre-packaged form that requires very little other
|
||||
than the JRE and Jetty. It's very performant and has an advanced featureset.
|
||||
Haystack suggests using Solr 3.5+, though it's possible to get it working on
|
||||
Solr 1.4 with a little effort. Installation is relatively simple::
|
||||
|
||||
curl -LO https://archive.apache.org/dist/lucene/solr/4.10.2/solr-4.10.2.tgz
|
||||
tar xvzf solr-4.10.2.tgz
|
||||
cd solr-4.10.2
|
||||
cd example
|
||||
java -jar start.jar
|
||||
|
||||
You'll need to revise your schema. You can generate this from your application
|
||||
(once Haystack is installed and setup) by running
|
||||
``./manage.py build_solr_schema``. Take the output from that command and place
|
||||
it in ``solr-4.10.2/example/solr/collection1/conf/schema.xml``. Then restart Solr.
|
||||
|
||||
.. note::
|
||||
``build_solr_schema`` uses a template to generate ``schema.xml``. Haystack
|
||||
provides a default template using some sensible defaults. If you would like
|
||||
to provide your own template, you will need to place it in
|
||||
``search_configuration/solr.xml``, inside a directory specified by your app's
|
||||
``TEMPLATE_DIRS`` setting. Examples::
|
||||
|
||||
/myproj/myapp/templates/search_configuration/solr.xml
|
||||
# ...or...
|
||||
/myproj/templates/search_configuration/solr.xml
|
||||
|
||||
You'll also need a Solr binding, ``pysolr``. The official ``pysolr`` package,
|
||||
distributed via PyPI, is the best version to use (2.1.0+). Place ``pysolr.py``
|
||||
somewhere on your ``PYTHONPATH``.
|
||||
|
||||
.. note::
|
||||
|
||||
``pysolr`` has its own dependencies that aren't covered by Haystack. See
|
||||
https://pypi.python.org/pypi/pysolr for the latest documentation.
|
||||
|
||||
More Like This
|
||||
--------------
|
||||
|
||||
To enable the "More Like This" functionality in Haystack, you'll need
|
||||
to enable the ``MoreLikeThisHandler``. Add the following line to your
|
||||
``solrconfig.xml`` file within the ``config`` tag::
|
||||
|
||||
<requestHandler name="/mlt" class="solr.MoreLikeThisHandler" />
|
||||
|
||||
Spelling Suggestions
|
||||
--------------------
|
||||
|
||||
To enable the spelling suggestion functionality in Haystack, you'll need to
|
||||
enable the ``SpellCheckComponent``.
|
||||
|
||||
The first thing to do is create a special field on your ``SearchIndex`` class
|
||||
that mirrors the ``text`` field, but uses ``FacetCharField``. This disables
|
||||
the post-processing that Solr does, which can mess up your suggestions.
|
||||
Something like the following is suggested::
|
||||
|
||||
class MySearchIndex(indexes.SearchIndex, indexes.Indexable):
|
||||
text = indexes.CharField(document=True, use_template=True)
|
||||
# ... normal fields then...
|
||||
suggestions = indexes.FacetCharField()
|
||||
|
||||
def prepare(self, obj):
|
||||
prepared_data = super(MySearchIndex, self).prepare(obj)
|
||||
prepared_data['suggestions'] = prepared_data['text']
|
||||
return prepared_data
|
||||
|
||||
Then, you enable it in Solr by adding the following line to your
|
||||
``solrconfig.xml`` file within the ``config`` tag::
|
||||
|
||||
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
|
||||
|
||||
<str name="queryAnalyzerFieldType">textSpell</str>
|
||||
|
||||
<lst name="spellchecker">
|
||||
<str name="name">default</str>
|
||||
<str name="field">suggestions</str>
|
||||
<str name="spellcheckIndexDir">./spellchecker1</str>
|
||||
<str name="buildOnCommit">true</str>
|
||||
</lst>
|
||||
</searchComponent>
|
||||
|
||||
Then change your default handler from::
|
||||
|
||||
<requestHandler name="standard" class="solr.StandardRequestHandler" default="true" />
|
||||
|
||||
... to ...::
|
||||
|
||||
<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
|
||||
<arr name="last-components">
|
||||
<str>spellcheck</str>
|
||||
</arr>
|
||||
</requestHandler>
|
||||
|
||||
Be warned that the ``<str name="field">suggestions</str>`` portion will be specific to
|
||||
your ``SearchIndex`` classes (in this case, assuming the main field is called
|
||||
``text``).
|
||||
|
||||
|
||||
Elasticsearch
|
||||
=============
|
||||
|
||||
Official Download Location: http://www.elasticsearch.org/download/
|
||||
|
||||
Elasticsearch is Java but comes in a pre-packaged form that requires very
|
||||
little other than the JRE. It's also very performant, scales easily and has
|
||||
an advanced featureset. Haystack requires at least version 0.90.0+.
|
||||
Installation is best done using a package manager::
|
||||
|
||||
# On Mac OS X...
|
||||
brew install elasticsearch
|
||||
|
||||
# On Ubuntu...
|
||||
apt-get install elasticsearch
|
||||
|
||||
# Then start via:
|
||||
elasticsearch -f -D es.config=<path to YAML config>
|
||||
|
||||
# Example:
|
||||
elasticsearch -f -D es.config=/usr/local/Cellar/elasticsearch/0.90.0/config/elasticsearch.yml
|
||||
|
||||
You may have to alter the configuration to run on ``localhost`` when developing
|
||||
locally. Modifications should be done in a YAML file, the stock one being
|
||||
``config/elasticsearch.yml``::
|
||||
|
||||
# Unicast Discovery (disable multicast)
|
||||
discovery.zen.ping.multicast.enabled: false
|
||||
discovery.zen.ping.unicast.hosts: ["127.0.0.1"]
|
||||
|
||||
# Name your cluster here to whatever.
|
||||
# My machine is called "Venus", so...
|
||||
cluster:
|
||||
name: venus
|
||||
|
||||
network:
|
||||
host: 127.0.0.1
|
||||
|
||||
path:
|
||||
logs: /usr/local/var/log
|
||||
data: /usr/local/var/data
|
||||
|
||||
You'll also need an Elasticsearch binding: elasticsearch-py_ (**NOT**
|
||||
``pyes``). Place ``elasticsearch`` somewhere on your ``PYTHONPATH``
|
||||
(usually ``python setup.py install`` or ``pip install elasticsearch``).
|
||||
|
||||
.. _elasticsearch-py: http://pypi.python.org/pypi/elasticsearch/
|
||||
|
||||
.. note::
|
||||
|
||||
Elasticsearch 1.0 is slightly backwards incompatible so you need to make sure
|
||||
you have the proper version of `elasticsearch-py` installed - releases with
|
||||
major version 1 (1.X.Y) are to be used with Elasticsearch 1.0 and later, 0.4
|
||||
releases are meant to work with Elasticsearch 0.90.X.
|
||||
|
||||
.. note::
|
||||
|
||||
``elasticsearch`` has its own dependencies that aren't covered by
|
||||
Haystack. You'll also need ``urllib3``.
|
||||
|
||||
|
||||
Whoosh
|
||||
======
|
||||
|
||||
Official Download Location: http://bitbucket.org/mchaput/whoosh/
|
||||
|
||||
Whoosh is pure Python, so it's a great option for getting started quickly and
|
||||
for development, though it does work for small scale live deployments. The
|
||||
current recommended version is 1.3.1+. You can install via PyPI_ using
|
||||
``sudo easy_install whoosh`` or ``sudo pip install whoosh``.
|
||||
|
||||
Note that, while capable otherwise, the Whoosh backend does not currently
|
||||
support "More Like This" or faceting. Support for these features has recently
|
||||
been added to Whoosh itself & may be present in a future release.
|
||||
|
||||
.. _PyPI: http://pypi.python.org/pypi/Whoosh/
|
||||
|
||||
|
||||
Xapian
|
||||
======
|
||||
|
||||
Official Download Location: http://xapian.org/download
|
||||
|
||||
Xapian is written in C++ so it requires compilation (unless your OS has a
|
||||
package for it). Installation looks like::
|
||||
|
||||
curl -O http://oligarchy.co.uk/xapian/1.2.18/xapian-core-1.2.18.tar.xz
|
||||
curl -O http://oligarchy.co.uk/xapian/1.2.18/xapian-bindings-1.2.18.tar.xz
|
||||
|
||||
unxz xapian-core-1.2.18.tar.xz
|
||||
unxz xapian-bindings-1.2.18.tar.xz
|
||||
|
||||
tar xvf xapian-core-1.2.18.tar
|
||||
tar xvf xapian-bindings-1.2.18.tar
|
||||
|
||||
cd xapian-core-1.2.18
|
||||
./configure
|
||||
make
|
||||
sudo make install
|
||||
|
||||
cd ..
|
||||
cd xapian-bindings-1.2.18
|
||||
./configure
|
||||
make
|
||||
sudo make install
|
||||
|
||||
Xapian is a third-party supported backend. It is not included in Haystack
|
||||
proper due to licensing. To use it, you need both Haystack itself as well as
|
||||
``xapian-haystack``. You can download the source from
|
||||
http://github.com/notanumber/xapian-haystack/tree/master. Installation
|
||||
instructions can be found on that page as well. The backend, written
|
||||
by David Sauve (notanumber), fully implements the `SearchQuerySet` API and is
|
||||
an excellent alternative to Solr.
|
|
@ -0,0 +1,201 @@
|
|||
.. _ref-management-commands:
|
||||
|
||||
===================
|
||||
Management Commands
|
||||
===================
|
||||
|
||||
Haystack comes with several management commands to make working with Haystack
|
||||
easier.
|
||||
|
||||
|
||||
``clear_index``
|
||||
===============
|
||||
|
||||
The ``clear_index`` command wipes out your entire search index. Use with
|
||||
caution. In addition to the standard management command options, it accepts the
|
||||
following arguments::
|
||||
|
||||
``--noinput``:
|
||||
If provided, the interactive prompts are skipped and the index is
|
||||
uncerimoniously wiped out.
|
||||
``--verbosity``:
|
||||
Accepted but ignored.
|
||||
``--using``:
|
||||
If provided, determines which connection should be used. Default is
|
||||
``default``.
|
||||
``--nocommit``:
|
||||
If provided, it will pass commit=False to the backend. This means that the
|
||||
update will not become immediately visible and will depend on another explicit commit
|
||||
or the backend's commit strategy to complete the update.
|
||||
|
||||
By default, this is an **INTERACTIVE** command and assumes that you do **NOT**
|
||||
wish to delete the entire index.
|
||||
|
||||
.. note::
|
||||
|
||||
The ``--nocommit`` argument is only supported by the Solr backend.
|
||||
|
||||
.. warning::
|
||||
|
||||
Depending on the backend you're using, this may simply delete the entire
|
||||
directory, so be sure your ``HAYSTACK_CONNECTIONS[<alias>]['PATH']`` setting is correctly
|
||||
pointed at just the index directory.
|
||||
|
||||
|
||||
``update_index``
|
||||
================
|
||||
|
||||
.. note::
|
||||
|
||||
If you use the ``--start/--end`` flags on this command, you'll need to
|
||||
install dateutil_ to handle the datetime parsing.
|
||||
|
||||
.. _dateutil: http://pypi.python.org/pypi/python-dateutil/1.5
|
||||
|
||||
The ``update_index`` command will freshen all of the content in your index. It
|
||||
iterates through all indexed models and updates the records in the index. In
|
||||
addition to the standard management command options, it accepts the following
|
||||
arguments::
|
||||
|
||||
``--age``:
|
||||
Number of hours back to consider objects new. Useful for nightly
|
||||
reindexes (``--age=24``). Requires ``SearchIndexes`` to implement
|
||||
the ``get_updated_field`` method. Default is ``None``.
|
||||
``--start``:
|
||||
The start date for indexing within. Can be any dateutil-parsable string,
|
||||
recommended to be YYYY-MM-DDTHH:MM:SS. Requires ``SearchIndexes`` to
|
||||
implement the ``get_updated_field`` method. Default is ``None``.
|
||||
``--end``:
|
||||
The end date for indexing within. Can be any dateutil-parsable string,
|
||||
recommended to be YYYY-MM-DDTHH:MM:SS. Requires ``SearchIndexes`` to
|
||||
implement the ``get_updated_field`` method. Default is ``None``.
|
||||
``--batch-size``:
|
||||
Number of items to index at once. Default is 1000.
|
||||
``--remove``:
|
||||
Remove objects from the index that are no longer present in the
|
||||
database.
|
||||
``--workers``:
|
||||
Allows for the use multiple workers to parallelize indexing. Requires
|
||||
``multiprocessing``.
|
||||
``--verbosity``:
|
||||
If provided, dumps out more information about what's being done.
|
||||
|
||||
* ``0`` = No output
|
||||
* ``1`` = Minimal output describing what models were indexed
|
||||
and how many records.
|
||||
* ``2`` = Full output, including everything from ``1`` plus output
|
||||
on each batch that is indexed, which is useful when debugging.
|
||||
``--using``:
|
||||
If provided, determines which connection should be used. Default is
|
||||
``default``.
|
||||
``--nocommit``:
|
||||
If provided, it will pass commit=False to the backend. This means that the
|
||||
updates will not become immediately visible and will depend on another explicit commit
|
||||
or the backend's commit strategy to complete the update.
|
||||
|
||||
.. note::
|
||||
|
||||
The ``--nocommit`` argument is only supported by the Solr and Elasticsearch backends.
|
||||
|
||||
Examples::
|
||||
|
||||
# Update everything.
|
||||
./manage.py update_index --settings=settings.prod
|
||||
|
||||
# Update everything with lots of information about what's going on.
|
||||
./manage.py update_index --settings=settings.prod --verbosity=2
|
||||
|
||||
# Update everything, cleaning up after deleted models.
|
||||
./manage.py update_index --remove --settings=settings.prod
|
||||
|
||||
# Update everything changed in the last 2 hours.
|
||||
./manage.py update_index --age=2 --settings=settings.prod
|
||||
|
||||
# Update everything between Dec. 1, 2011 & Dec 31, 2011
|
||||
./manage.py update_index --start='2011-12-01T00:00:00' --end='2011-12-31T23:59:59' --settings=settings.prod
|
||||
|
||||
# Update just a couple apps.
|
||||
./manage.py update_index blog auth comments --settings=settings.prod
|
||||
|
||||
# Update just a single model (in a complex app).
|
||||
./manage.py update_index auth.User --settings=settings.prod
|
||||
|
||||
# Crazy Go-Nuts University
|
||||
./manage.py update_index events.Event media news.Story --start='2011-01-01T00:00:00 --remove --using=hotbackup --workers=12 --verbosity=2 --settings=settings.prod
|
||||
|
||||
.. note::
|
||||
|
||||
This command *ONLY* updates records in the index. It does *NOT* handle
|
||||
deletions unless the ``--remove`` flag is provided. You might consider
|
||||
a queue consumer if the memory requirements for ``--remove`` don't
|
||||
fit your needs. Alternatively, you can use the
|
||||
``RealtimeSignalProcessor``, which will automatically handle deletions.
|
||||
|
||||
|
||||
``rebuild_index``
|
||||
=================
|
||||
|
||||
A shortcut for ``clear_index`` followed by ``update_index``. It accepts any/all
|
||||
of the arguments of the following arguments::
|
||||
|
||||
``--age``:
|
||||
Number of hours back to consider objects new. Useful for nightly
|
||||
reindexes (``--age=24``). Requires ``SearchIndexes`` to implement
|
||||
the ``get_updated_field`` method.
|
||||
``--batch-size``:
|
||||
Number of items to index at once. Default is 1000.
|
||||
``--site``:
|
||||
The site object to use when reindexing (like `search_sites.mysite`).
|
||||
``--noinput``:
|
||||
If provided, the interactive prompts are skipped and the index is
|
||||
uncerimoniously wiped out.
|
||||
``--remove``:
|
||||
Remove objects from the index that are no longer present in the
|
||||
database.
|
||||
``--verbosity``:
|
||||
If provided, dumps out more information about what's being done.
|
||||
|
||||
* ``0`` = No output
|
||||
* ``1`` = Minimal output describing what models were indexed
|
||||
and how many records.
|
||||
* ``2`` = Full output, including everything from ``1`` plus output
|
||||
on each batch that is indexed, which is useful when debugging.
|
||||
``--using``:
|
||||
If provided, determines which connection should be used. Default is
|
||||
``default``.
|
||||
``--nocommit``:
|
||||
If provided, it will pass commit=False to the backend. This means that the
|
||||
update will not become immediately visible and will depend on another explicit commit
|
||||
or the backend's commit strategy to complete the update.
|
||||
|
||||
For when you really, really want a completely rebuilt index.
|
||||
|
||||
|
||||
``build_solr_schema``
|
||||
=====================
|
||||
|
||||
Once all of your ``SearchIndex`` classes are in place, this command can be used
|
||||
to generate the XML schema Solr needs to handle the search data. It accepts the
|
||||
following arguments::
|
||||
|
||||
``--filename``:
|
||||
If provided, directs output to a file instead of stdout.
|
||||
``--using``:
|
||||
If provided, determines which connection should be used. Default is
|
||||
``default``.
|
||||
|
||||
.. warning::
|
||||
|
||||
This command does NOT update the ``schema.xml`` file for you. You either
|
||||
have to specify a ``filename`` flag or have to
|
||||
copy-paste (or redirect) the output to the correct file. Haystack has no
|
||||
way of knowing where your Solr is setup (or if it's even on the same
|
||||
machine), hence the manual step.
|
||||
|
||||
|
||||
``haystack_info``
|
||||
=================
|
||||
|
||||
Provides some basic information about how Haystack is setup and what models it
|
||||
is handling. It accepts no arguments. Useful when debugging or when using
|
||||
Haystack-enabled third-party apps.
|
|
@ -0,0 +1,285 @@
|
|||
.. _ref-migration_from_1_to_2:
|
||||
|
||||
===========================================
|
||||
Migrating From Haystack 1.X to Haystack 2.X
|
||||
===========================================
|
||||
|
||||
Haystack introduced several backward-incompatible changes in the process of
|
||||
moving from the 1.X series to the 2.X series. These were done to clean up the
|
||||
API, to support new features & to clean up problems in 1.X. At a high level,
|
||||
they consisted of:
|
||||
|
||||
* The removal of ``SearchSite`` & ``haystack.site``.
|
||||
* The removal of ``handle_registrations`` & ``autodiscover``.
|
||||
* The addition of multiple index support.
|
||||
* The addition of ``SignalProcessors`` & the removal of ``RealTimeSearchIndex``.
|
||||
* The removal/renaming of various settings.
|
||||
|
||||
This guide will help you make the changes needed to be compatible with Haystack
|
||||
2.X.
|
||||
|
||||
|
||||
Settings
|
||||
========
|
||||
|
||||
Most prominently, the old way of specifying a backend & its settings has changed
|
||||
to support the multiple index feature. A complete Haystack 1.X example might
|
||||
look like::
|
||||
|
||||
HAYSTACK_SEARCH_ENGINE = 'solr'
|
||||
HAYSTACK_SOLR_URL = 'http://localhost:9001/solr/default'
|
||||
HAYSTACK_SOLR_TIMEOUT = 60 * 5
|
||||
HAYSTACK_INCLUDE_SPELLING = True
|
||||
HAYSTACK_BATCH_SIZE = 100
|
||||
|
||||
# Or...
|
||||
HAYSTACK_SEARCH_ENGINE = 'whoosh'
|
||||
HAYSTACK_WHOOSH_PATH = '/home/search/whoosh_index'
|
||||
HAYSTACK_WHOOSH_STORAGE = 'file'
|
||||
HAYSTACK_WHOOSH_POST_LIMIT = 128 * 1024 * 1024
|
||||
HAYSTACK_INCLUDE_SPELLING = True
|
||||
HAYSTACK_BATCH_SIZE = 100
|
||||
|
||||
# Or...
|
||||
HAYSTACK_SEARCH_ENGINE = 'xapian'
|
||||
HAYSTACK_XAPIAN_PATH = '/home/search/xapian_index'
|
||||
HAYSTACK_INCLUDE_SPELLING = True
|
||||
HAYSTACK_BATCH_SIZE = 100
|
||||
|
||||
In Haystack 2.X, you can now supply as many backends as you like, so all of the
|
||||
above settings can now be active at the same time. A translated set of settings
|
||||
would look like::
|
||||
|
||||
HAYSTACK_CONNECTIONS = {
|
||||
'default': {
|
||||
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
|
||||
'URL': 'http://localhost:9001/solr/default',
|
||||
'TIMEOUT': 60 * 5,
|
||||
'INCLUDE_SPELLING': True,
|
||||
'BATCH_SIZE': 100,
|
||||
},
|
||||
'autocomplete': {
|
||||
'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine',
|
||||
'PATH': '/home/search/whoosh_index',
|
||||
'STORAGE': 'file',
|
||||
'POST_LIMIT': 128 * 1024 * 1024,
|
||||
'INCLUDE_SPELLING': True,
|
||||
'BATCH_SIZE': 100,
|
||||
},
|
||||
'slave': {
|
||||
'ENGINE': 'xapian_backend.XapianEngine',
|
||||
'PATH': '/home/search/xapian_index',
|
||||
'INCLUDE_SPELLING': True,
|
||||
'BATCH_SIZE': 100,
|
||||
},
|
||||
}
|
||||
|
||||
You are required to have at least one connection listed within
|
||||
``HAYSTACK_CONNECTIONS``, it must be named ``default`` & it must have a valid
|
||||
``ENGINE`` within it. Bare minimum looks like::
|
||||
|
||||
HAYSTACK_CONNECTIONS = {
|
||||
'default': {
|
||||
'ENGINE': 'haystack.backends.simple_backend.SimpleEngine'
|
||||
}
|
||||
}
|
||||
|
||||
The key for each backend is an identifier you use to describe the backend within
|
||||
your app. You should refer to the :ref:`ref-multiple_index` documentation for
|
||||
more information on using the new multiple indexes & routing features.
|
||||
|
||||
Also note that the ``ENGINE`` setting has changed from a lowercase "short name"
|
||||
of the engine to a full path to a new ``Engine`` class within the backend.
|
||||
Available options are:
|
||||
|
||||
* ``haystack.backends.solr_backend.SolrEngine``
|
||||
* ``haystack.backends.whoosh_backend.WhooshEngine``
|
||||
* ``haystack.backends.simple_backend.SimpleEngine``
|
||||
|
||||
Additionally, the following settings were outright removed & will generate
|
||||
an exception if found:
|
||||
|
||||
* ``HAYSTACK_SITECONF`` - Remove this setting & the file it pointed to.
|
||||
* ``HAYSTACK_ENABLE_REGISTRATIONS``
|
||||
* ``HAYSTACK_INCLUDE_SPELLING``
|
||||
|
||||
|
||||
Backends
|
||||
========
|
||||
|
||||
The ``dummy`` backend was outright removed from Haystack, as it served very
|
||||
little use after the ``simple`` (pure-ORM-powered) backend was introduced.
|
||||
|
||||
If you wrote a custom backend, please refer to the "Custom Backends" section
|
||||
below.
|
||||
|
||||
|
||||
Indexes
|
||||
=======
|
||||
|
||||
The other major changes affect the ``SearchIndex`` class. As the concept of
|
||||
``haystack.site`` & ``SearchSite`` are gone, you'll need to modify your indexes.
|
||||
|
||||
A Haystack 1.X index might've looked like::
|
||||
|
||||
import datetime
|
||||
from haystack.indexes import *
|
||||
from haystack import site
|
||||
from myapp.models import Note
|
||||
|
||||
|
||||
class NoteIndex(SearchIndex):
|
||||
text = CharField(document=True, use_template=True)
|
||||
author = CharField(model_attr='user')
|
||||
pub_date = DateTimeField(model_attr='pub_date')
|
||||
|
||||
def get_queryset(self):
|
||||
"""Used when the entire index for model is updated."""
|
||||
return Note.objects.filter(pub_date__lte=datetime.datetime.now())
|
||||
|
||||
|
||||
site.register(Note, NoteIndex)
|
||||
|
||||
A converted Haystack 2.X index should look like::
|
||||
|
||||
import datetime
|
||||
from haystack import indexes
|
||||
from myapp.models import Note
|
||||
|
||||
|
||||
class NoteIndex(indexes.SearchIndex, indexes.Indexable):
|
||||
text = indexes.CharField(document=True, use_template=True)
|
||||
author = indexes.CharField(model_attr='user')
|
||||
pub_date = indexes.DateTimeField(model_attr='pub_date')
|
||||
|
||||
def get_model(self):
|
||||
return Note
|
||||
|
||||
def index_queryset(self, using=None):
|
||||
"""Used when the entire index for model is updated."""
|
||||
return self.get_model().objects.filter(pub_date__lte=datetime.datetime.now())
|
||||
|
||||
Note the import on ``site`` & the registration statements are gone. Newly added
|
||||
are is the ``NoteIndex.get_model`` method. This is a **required** method &
|
||||
should simply return the ``Model`` class the index is for.
|
||||
|
||||
There's also a new, additional class added to the ``class`` definition. The
|
||||
``indexes.Indexable`` class is a simple mixin that serves to identify the
|
||||
classes Haystack should automatically discover & use. If you have a custom
|
||||
base class (say ``QueuedSearchIndex``) that other indexes inherit from, simply
|
||||
leave the ``indexes.Indexable`` off that declaration & Haystack won't try to
|
||||
use it.
|
||||
|
||||
Additionally, the name of the ``document=True`` field is now enforced to be
|
||||
``text`` across all indexes. If you need it named something else, you should
|
||||
set the ``HAYSTACK_DOCUMENT_FIELD`` setting. For example::
|
||||
|
||||
HAYSTACK_DOCUMENT_FIELD = 'pink_polka_dot'
|
||||
|
||||
Finally, the ``index_queryset`` method should supplant the ``get_queryset``
|
||||
method. This was present in the Haystack 1.2.X series (with a deprecation warning
|
||||
in 1.2.4+) but has been removed in Haystack v2.
|
||||
|
||||
Finally, if you were unregistering other indexes before, you should make use of
|
||||
the new ``EXCLUDED_INDEXES`` setting available in each backend's settings. It
|
||||
should be a list of strings that contain the Python import path to the indexes
|
||||
that should not be loaded & used. For example::
|
||||
|
||||
HAYSTACK_CONNECTIONS = {
|
||||
'default': {
|
||||
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
|
||||
'URL': 'http://localhost:9001/solr/default',
|
||||
'EXCLUDED_INDEXES': [
|
||||
# Imagine that these indexes exist. They don't.
|
||||
'django.contrib.auth.search_indexes.UserIndex',
|
||||
'third_party_blog_app.search_indexes.EntryIndex',
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
This allows for reliable swapping of the index that handles a model without
|
||||
relying on correct import order.
|
||||
|
||||
|
||||
Removal of ``RealTimeSearchIndex``
|
||||
==================================
|
||||
|
||||
Use of the ``haystack.indexes.RealTimeSearchIndex`` is no longer valid. It has
|
||||
been removed in favor of ``RealtimeSignalProcessor``. To migrate, first change
|
||||
the inheritance of all your ``RealTimeSearchIndex`` subclasses to use
|
||||
``SearchIndex`` instead::
|
||||
|
||||
# Old.
|
||||
class MySearchIndex(indexes.RealTimeSearchIndex, indexes.Indexable):
|
||||
# ...
|
||||
|
||||
|
||||
# New.
|
||||
class MySearchIndex(indexes.SearchIndex, indexes.Indexable):
|
||||
# ...
|
||||
|
||||
Then update your settings to enable use of the ``RealtimeSignalProcessor``::
|
||||
|
||||
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
|
||||
|
||||
|
||||
Done!
|
||||
=====
|
||||
|
||||
For most basic uses of Haystack, this is all that is necessary to work with
|
||||
Haystack 2.X. You should rebuild your index if needed & test your new setup.
|
||||
|
||||
|
||||
Advanced Uses
|
||||
=============
|
||||
|
||||
Swapping Backend
|
||||
----------------
|
||||
|
||||
If you were manually swapping the ``SearchQuery`` or ``SearchBackend`` being
|
||||
used by ``SearchQuerySet`` in the past, it's now preferable to simply setup
|
||||
another connection & use the ``SearchQuerySet.using`` method to select that
|
||||
connection instead.
|
||||
|
||||
Also, if you were manually instantiating ``SearchBackend`` or ``SearchQuery``,
|
||||
it's now preferable to rely on the connection's engine to return the right
|
||||
thing. For example::
|
||||
|
||||
from haystack import connections
|
||||
backend = connections['default'].get_backend()
|
||||
query = connections['default'].get_query()
|
||||
|
||||
|
||||
Custom Backends
|
||||
---------------
|
||||
|
||||
If you had written a custom ``SearchBackend`` and/or custom ``SearchQuery``,
|
||||
there's a little more work needed to be Haystack 2.X compatible.
|
||||
|
||||
You should, but don't have to, rename your ``SearchBackend`` & ``SearchQuery``
|
||||
classes to be more descriptive/less collide-y. For example,
|
||||
``solr_backend.SearchBackend`` became ``solr_backend.SolrSearchBackend``. This
|
||||
prevents non-namespaced imports from stomping on each other.
|
||||
|
||||
You need to add a new class to your backend, subclassing ``BaseEngine``. This
|
||||
allows specifying what ``backend`` & ``query`` should be used on a connection
|
||||
with less duplication/naming trickery. It goes at the bottom of the file (so
|
||||
that the classes are defined above it) and should look like::
|
||||
|
||||
from haystack.backends import BaseEngine
|
||||
from haystack.backends.solr_backend import SolrSearchQuery
|
||||
|
||||
# Code then...
|
||||
|
||||
class MyCustomSolrEngine(BaseEngine):
|
||||
# Use our custom backend.
|
||||
backend = MySolrBackend
|
||||
# Use the built-in Solr query.
|
||||
query = SolrSearchQuery
|
||||
|
||||
Your ``HAYSTACK_CONNECTIONS['default']['ENGINE']`` should then point to the
|
||||
full Python import path to your new ``BaseEngine`` subclass.
|
||||
|
||||
Finally, you will likely have to adjust the ``SearchBackend.__init__`` &
|
||||
``SearchQuery.__init__``, as they have changed significantly. Please refer to
|
||||
the commits for those backends.
|
|
@ -0,0 +1,201 @@
|
|||
.. _ref-multiple_index:
|
||||
|
||||
================
|
||||
Multiple Indexes
|
||||
================
|
||||
|
||||
Much like Django's `multiple database support`_, Haystack has "multiple index"
|
||||
support. This allows you to talk to several different engines at the same time.
|
||||
It enables things like master-slave setups, multiple language indexing,
|
||||
separate indexes for general search & autocomplete as well as other options.
|
||||
|
||||
.. _`multiple database support`: http://docs.djangoproject.com/en/1.3/topics/db/multi-db/
|
||||
|
||||
|
||||
Specifying Available Connections
|
||||
================================
|
||||
|
||||
You can supply as many backends as you like, each with a descriptive name. A
|
||||
complete setup that accesses all backends might look like::
|
||||
|
||||
HAYSTACK_CONNECTIONS = {
|
||||
'default': {
|
||||
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
|
||||
'URL': 'http://localhost:9001/solr/default',
|
||||
'TIMEOUT': 60 * 5,
|
||||
'INCLUDE_SPELLING': True,
|
||||
'BATCH_SIZE': 100,
|
||||
'SILENTLY_FAIL': True,
|
||||
},
|
||||
'autocomplete': {
|
||||
'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine',
|
||||
'PATH': '/home/search/whoosh_index',
|
||||
'STORAGE': 'file',
|
||||
'POST_LIMIT': 128 * 1024 * 1024,
|
||||
'INCLUDE_SPELLING': True,
|
||||
'BATCH_SIZE': 100,
|
||||
'SILENTLY_FAIL': True,
|
||||
},
|
||||
'slave': {
|
||||
'ENGINE': 'xapian_backend.XapianEngine',
|
||||
'PATH': '/home/search/xapian_index',
|
||||
'INCLUDE_SPELLING': True,
|
||||
'BATCH_SIZE': 100,
|
||||
'SILENTLY_FAIL': True,
|
||||
},
|
||||
'db': {
|
||||
'ENGINE': 'haystack.backends.simple_backend.SimpleEngine',
|
||||
'SILENTLY_FAIL': True,
|
||||
}
|
||||
}
|
||||
|
||||
You are required to have at least one connection listed within
|
||||
``HAYSTACK_CONNECTIONS``, it must be named ``default`` & it must have a valid
|
||||
``ENGINE`` within it.
|
||||
|
||||
|
||||
Management Commands
|
||||
===================
|
||||
|
||||
All management commands that manipulate data use **ONLY** one connection at a
|
||||
time. By default, they use the ``default`` index but accept a ``--using`` flag
|
||||
to specify a different connection. For example::
|
||||
|
||||
./manage.py rebuild_index --noinput --using=whoosh
|
||||
|
||||
|
||||
Automatic Routing
|
||||
=================
|
||||
|
||||
To make the selection of the correct index easier, Haystack (like Django) has
|
||||
the concept of "routers". All provided routers are checked whenever a read or
|
||||
write happens, stopping at the first router that knows how to handle it.
|
||||
|
||||
Haystack ships with a ``DefaultRouter`` enabled. It looks like::
|
||||
|
||||
class DefaultRouter(BaseRouter):
|
||||
def for_read(self, **hints):
|
||||
return DEFAULT_ALIAS
|
||||
|
||||
def for_write(self, **hints):
|
||||
return DEFAULT_ALIAS
|
||||
|
||||
On a read (when a search query is executed), the ``DefaultRouter.for_read``
|
||||
method is checked & returns the ``DEFAULT_ALIAS`` (which is ``default``),
|
||||
telling whatever requested it that it should perform the query against the
|
||||
``default`` connection. The same process is followed for writes.
|
||||
|
||||
If the ``for_read`` or ``for_write`` method returns ``None``, that indicates
|
||||
that the current router can't handle the data. The next router is then checked.
|
||||
|
||||
The ``hints`` passed can be anything that helps the router make a decision. This
|
||||
data should always be considered optional & be guarded against. At current,
|
||||
``for_write`` receives an ``index`` option (pointing to the ``SearchIndex``
|
||||
calling it) while ``for_read`` may receive ``models`` (being a list of ``Model``
|
||||
classes the ``SearchQuerySet`` may be looking at).
|
||||
|
||||
You may provide as many routers as you like by overriding the
|
||||
``HAYSTACK_ROUTERS`` setting. For example::
|
||||
|
||||
HAYSTACK_ROUTERS = ['myapp.routers.MasterRouter', 'myapp.routers.SlaveRouter', 'haystack.routers.DefaultRouter']
|
||||
|
||||
|
||||
Master-Slave Example
|
||||
--------------------
|
||||
|
||||
The ``MasterRouter`` & ``SlaveRouter`` might look like::
|
||||
|
||||
from haystack import routers
|
||||
|
||||
|
||||
class MasterRouter(routers.BaseRouter):
|
||||
def for_write(self, **hints):
|
||||
return 'master'
|
||||
|
||||
def for_read(self, **hints):
|
||||
return None
|
||||
|
||||
|
||||
class SlaveRouter(routers.BaseRouter):
|
||||
def for_write(self, **hints):
|
||||
return None
|
||||
|
||||
def for_read(self, **hints):
|
||||
return 'slave'
|
||||
|
||||
The observant might notice that since the methods don't overlap, this could be
|
||||
combined into one ``Router`` like so::
|
||||
|
||||
from haystack import routers
|
||||
|
||||
|
||||
class MasterSlaveRouter(routers.BaseRouter):
|
||||
def for_write(self, **hints):
|
||||
return 'master'
|
||||
|
||||
def for_read(self, **hints):
|
||||
return 'slave'
|
||||
|
||||
|
||||
Manually Selecting
|
||||
==================
|
||||
|
||||
There may be times when automatic selection of the correct index is undesirable,
|
||||
such as when fixing erroneous data in an index or when you know exactly where
|
||||
data should be located.
|
||||
|
||||
For this, the ``SearchQuerySet`` class allows for manually selecting the index
|
||||
via the ``SearchQuerySet.using`` method::
|
||||
|
||||
from haystack.query import SearchQuerySet
|
||||
|
||||
# Uses the routers' opinion.
|
||||
sqs = SearchQuerySet().auto_query('banana')
|
||||
|
||||
# Forces the default.
|
||||
sqs = SearchQuerySet().using('default').auto_query('banana')
|
||||
|
||||
# Forces the slave connection (presuming it was setup).
|
||||
sqs = SearchQuerySet().using('slave').auto_query('banana')
|
||||
|
||||
.. warning::
|
||||
|
||||
Note that the models a ``SearchQuerySet`` is trying to pull from must all come
|
||||
from the same index. Haystack is not able to combine search queries against
|
||||
different indexes.
|
||||
|
||||
|
||||
Custom Index Selection
|
||||
======================
|
||||
|
||||
If a specific backend has been selected, the ``SearchIndex.index_queryset`` and
|
||||
``SearchIndex.read_queryset`` will receive the backend name, giving indexes the
|
||||
opportunity to customize the returned queryset.
|
||||
|
||||
For example, a site which uses separate indexes for recent items and older
|
||||
content might define ``index_queryset`` to filter the items based on date::
|
||||
|
||||
def index_queryset(self, using=None):
|
||||
qs = Note.objects.all()
|
||||
archive_limit = datetime.datetime.now() - datetime.timedelta(days=90)
|
||||
|
||||
if using == "archive":
|
||||
return qs.filter(pub_date__lte=archive_limit)
|
||||
else:
|
||||
return qs.filter(pub_date__gte=archive_limit)
|
||||
|
||||
|
||||
Multi-lingual Content
|
||||
---------------------
|
||||
|
||||
Most search engines require you to set the language at the index level. For
|
||||
example, a multi-lingual site using Solr can use `multiple cores <http://wiki.apache.org/solr/CoreAdmin>`_ and corresponding Haystack
|
||||
backends using the language name. Under this scenario, queries are simple::
|
||||
|
||||
sqs = SearchQuerySet.using(lang).auto_query(…)
|
||||
|
||||
During index updates, the Index's ``index_queryset`` method will need to filter
|
||||
the items to avoid sending the wrong content to the search engine::
|
||||
|
||||
def index_queryset(self, using=None):
|
||||
return Post.objects.filter(language=using)
|
|
@ -0,0 +1,98 @@
|
|||
.. _ref-other_apps:
|
||||
|
||||
=============================
|
||||
Haystack-Related Applications
|
||||
=============================
|
||||
|
||||
Sub Apps
|
||||
========
|
||||
|
||||
These are apps that build on top of the infrastructure provided by Haystack.
|
||||
Useful for essentially extending what Haystack can do.
|
||||
|
||||
queued_search
|
||||
-------------
|
||||
|
||||
http://github.com/toastdriven/queued_search (2.X compatible)
|
||||
|
||||
Provides a queue-based setup as an alternative to ``RealtimeSignalProcessor`` or
|
||||
constantly running the ``update_index`` command. Useful for high-load, short
|
||||
update time situations.
|
||||
|
||||
celery-haystack
|
||||
---------------
|
||||
|
||||
https://github.com/jezdez/celery-haystack (1.X and 2.X compatible)
|
||||
|
||||
Also provides a queue-based setup, this time centered around Celery. Useful
|
||||
for keeping the index fresh per model instance or with the included task
|
||||
to call the ``update_index`` management command instead.
|
||||
|
||||
haystack-rqueue
|
||||
---------------
|
||||
|
||||
https://github.com/mandx/haystack-rqueue (2.X compatible)
|
||||
|
||||
Also provides a queue-based setup, this time centered around RQ. Useful
|
||||
for keeping the index fresh using ``./manage.py rqworker``.
|
||||
|
||||
django-celery-haystack
|
||||
----------------------
|
||||
|
||||
https://github.com/mixcloud/django-celery-haystack-SearchIndex
|
||||
|
||||
Another queue-based setup, also around Celery. Useful
|
||||
for keeping the index fresh.
|
||||
|
||||
saved_searches
|
||||
--------------
|
||||
|
||||
http://github.com/toastdriven/saved_searches (2.X compatible)
|
||||
|
||||
Adds personalization to search. Retains a history of queries run by the various
|
||||
users on the site (including anonymous users). This can be used to present the
|
||||
user with their search history and provide most popular/most recent queries
|
||||
on the site.
|
||||
|
||||
saved-search
|
||||
------------
|
||||
|
||||
https://github.com/DirectEmployers/saved-search
|
||||
|
||||
An alternate take on persisting user searches, this has a stronger focus
|
||||
on locale-based searches as well as further integration.
|
||||
|
||||
haystack-static-pages
|
||||
---------------------
|
||||
|
||||
http://github.com/trapeze/haystack-static-pages
|
||||
|
||||
Provides a simple way to index flat (non-model-based) content on your site.
|
||||
By using the management command that comes with it, it can crawl all pertinent
|
||||
pages on your site and add them to search.
|
||||
|
||||
django-tumbleweed
|
||||
-----------------
|
||||
|
||||
http://github.com/mcroydon/django-tumbleweed
|
||||
|
||||
Provides a tumblelog-like view to any/all Haystack-enabled models on your
|
||||
site. Useful for presenting date-based views of search data. Attempts to avoid
|
||||
the database completely where possible.
|
||||
|
||||
|
||||
Haystack-Enabled Apps
|
||||
=====================
|
||||
|
||||
These are reusable apps that ship with ``SearchIndexes``, suitable for quick
|
||||
integration with Haystack.
|
||||
|
||||
* django-faq (freq. asked questions app) - http://github.com/benspaulding/django-faq
|
||||
* django-essays (blog-like essay app) - http://github.com/bkeating/django-essays
|
||||
* gtalug (variety of apps) - http://github.com/myles/gtalug
|
||||
* sciencemuseum (science museum open data) - http://github.com/simonw/sciencemuseum
|
||||
* vz-wiki (wiki) - http://github.com/jobscry/vz-wiki
|
||||
* ffmff (events app) - http://github.com/stefreak/ffmff
|
||||
* Dinette (forums app) - http://github.com/uswaretech/Dinette
|
||||
* fiftystates_site (site) - http://github.com/sunlightlabs/fiftystates_site
|
||||
* Open-Knesset (site) - http://github.com/ofri/Open-Knesset
|
|
@ -0,0 +1,47 @@
|
|||
.. _ref-python3:
|
||||
|
||||
================
|
||||
Python 3 Support
|
||||
================
|
||||
|
||||
As of Haystack v2.1.0, it has been ported to support both Python 2 & Python 3
|
||||
within the same codebase. This builds on top of what `six`_ & `Django`_ provide.
|
||||
|
||||
No changes are required for anyone running an existing Haystack
|
||||
installation. The API is completely backward-compatible, so you should be able
|
||||
to run your existing software without modification.
|
||||
|
||||
Virtually all tests pass under both Python 2 & 3, with a small number of
|
||||
expected failures under Python (typically related to ordering, see below).
|
||||
|
||||
.. _`six`: http://pythonhosted.org/six/
|
||||
.. _`Django`: https://docs.djangoproject.com/en/1.5/topics/python3/#str-and-unicode-methods
|
||||
|
||||
|
||||
Supported Backends
|
||||
==================
|
||||
|
||||
The following backends are fully supported under Python 3. However, you may
|
||||
need to update these dependencies if you have a pre-existing setup.
|
||||
|
||||
* Solr (pysolr>=3.1.0)
|
||||
* Elasticsearch
|
||||
|
||||
|
||||
Notes
|
||||
=====
|
||||
|
||||
Testing
|
||||
-------
|
||||
|
||||
If you were testing things such as the query generated by a given
|
||||
``SearchQuerySet`` or how your forms would render, under Python 3.3.2+,
|
||||
`hash randomization`_ is in effect, which means that the ordering of
|
||||
dictionaries is no longer consistent, even on the same platform.
|
||||
|
||||
Haystack took the approach of abandoning making assertions about the entire
|
||||
structure. Instead, we either simply assert that the new object contains the
|
||||
right things or make a call to ``sorted(...)`` around it to ensure order. It is
|
||||
recommended you take a similar approach.
|
||||
|
||||
.. _`hash randomization`: http://docs.python.org/3/whatsnew/3.3.html#builtin-functions-and-types
|
|
@ -0,0 +1,68 @@
|
|||
.. _ref-rich_content_extraction:
|
||||
|
||||
=======================
|
||||
Rich Content Extraction
|
||||
=======================
|
||||
|
||||
For some projects it is desirable to index text content which is stored in
|
||||
structured files such as PDFs, Microsoft Office documents, images, etc.
|
||||
Currently only Solr's `ExtractingRequestHandler`_ is directly supported by
|
||||
Haystack but the approach below could be used with any backend which supports
|
||||
this feature.
|
||||
|
||||
.. _`ExtractingRequestHandler`: http://wiki.apache.org/solr/ExtractingRequestHandler
|
||||
|
||||
Extracting Content
|
||||
==================
|
||||
|
||||
:meth:`SearchBackend.extract_file_contents` accepts a file or file-like object
|
||||
and returns a dictionary containing two keys: ``metadata`` and ``contents``. The
|
||||
``contents`` value will be a string containing all of the text which the backend
|
||||
managed to extract from the file contents. ``metadata`` will always be a
|
||||
dictionary but the keys and values will vary based on the underlying extraction
|
||||
engine and the type of file provided.
|
||||
|
||||
Indexing Extracted Content
|
||||
==========================
|
||||
|
||||
Generally you will want to include the extracted text in your main document
|
||||
field along with everything else specified in your search template. This example
|
||||
shows how to override a hypothetical ``FileIndex``'s ``prepare`` method to
|
||||
include the extract content along with information retrieved from the database::
|
||||
|
||||
def prepare(self, obj):
|
||||
data = super(FileIndex, self).prepare(obj)
|
||||
|
||||
# This could also be a regular Python open() call, a StringIO instance
|
||||
# or the result of opening a URL. Note that due to a library limitation
|
||||
# file_obj must have a .name attribute even if you need to set one
|
||||
# manually before calling extract_file_contents:
|
||||
file_obj = obj.the_file.open()
|
||||
|
||||
extracted_data = self.backend.extract_file_contents(file_obj)
|
||||
|
||||
# Now we'll finally perform the template processing to render the
|
||||
# text field with *all* of our metadata visible for templating:
|
||||
t = loader.select_template(('search/indexes/myapp/file_text.txt', ))
|
||||
data['text'] = t.render(Context({'object': obj,
|
||||
'extracted': extracted_data}))
|
||||
|
||||
return data
|
||||
|
||||
This allows you to insert the extracted text at the appropriate place in your
|
||||
template, modified or intermixed with database content as appropriate:
|
||||
|
||||
.. code-block:: html+django
|
||||
|
||||
{{ object.title }}
|
||||
{{ object.owner.name }}
|
||||
|
||||
…
|
||||
|
||||
{% for k, v in extracted.metadata.items %}
|
||||
{% for val in v %}
|
||||
{{ k }}: {{ val|safe }}
|
||||
{% endfor %}
|
||||
{% endfor %}
|
||||
|
||||
{{ extracted.contents|striptags|safe }}
|
|
@ -0,0 +1,70 @@
|
|||
.. _ref-running-tests:
|
||||
|
||||
=============
|
||||
Running Tests
|
||||
=============
|
||||
|
||||
Everything
|
||||
==========
|
||||
|
||||
The simplest way to get up and running with Haystack's tests is to run::
|
||||
|
||||
python setup.py test
|
||||
|
||||
This installs all of the backend libraries & all dependencies for getting the
|
||||
tests going and runs the tests. You will still have to setup search servers
|
||||
(for running Solr tests, the spatial Solr tests & the Elasticsearch tests).
|
||||
|
||||
|
||||
Cherry-Picked
|
||||
=============
|
||||
|
||||
If you'd rather not run all the tests, run only the backends you need since
|
||||
tests for backends that are not running will be skipped.
|
||||
|
||||
``Haystack`` is maintained with all tests passing at all times, so if you
|
||||
receive any errors during testing, please check your setup and file a report if
|
||||
the errors persist.
|
||||
|
||||
To run just a portion of the tests you can use the script ``run_tests.py`` and
|
||||
just specify the files or directories you wish to run, for example::
|
||||
|
||||
cd test_haystack
|
||||
./run_tests.py whoosh_tests test_loading.py
|
||||
|
||||
The ``run_tests.py`` script is just a tiny wrapper around the nose_ library and
|
||||
any options you pass to it will be passed on; including ``--help`` to get a
|
||||
list of possible options::
|
||||
|
||||
cd test_haystack
|
||||
./run_tests.py --help
|
||||
|
||||
.. _nose: https://nose.readthedocs.org/en/latest/
|
||||
|
||||
Configuring Solr
|
||||
================
|
||||
|
||||
Haystack assumes that you have a Solr server running on port ``9001`` which
|
||||
uses the schema and configuration provided in the
|
||||
``test_haystack/solr_tests/server/`` directory. For convenience, a script is
|
||||
provided which will download, configure and start a test Solr server::
|
||||
|
||||
test_haystack/solr_tests/server/start-solr-test-server.sh
|
||||
|
||||
If no server is found all solr-related tests will be skipped.
|
||||
|
||||
Configuring Elasticsearch
|
||||
=========================
|
||||
|
||||
The test suite will try to connect to Elasticsearch on port ``9200``. If no
|
||||
server is found all elasticsearch tests will be skipped. Note that the tests
|
||||
are destructive - during the teardown phase they will wipe the cluster clean so
|
||||
make sure you don't run them against an instance with data you wish to keep.
|
||||
|
||||
If you want to run the geo-django tests you may need to review the
|
||||
`GeoDjango GEOS and GDAL settings`_ before running these commands::
|
||||
|
||||
cd test_haystack
|
||||
./run_tests.py elasticsearch_tests
|
||||
|
||||
.. _GeoDjango GEOS and GDAL settings: https://docs.djangoproject.com/en/1.7/ref/contrib/gis/install/geolibs/#geos-library-path
|
|
@ -0,0 +1,124 @@
|
|||
.. _ref-searchbackend-api:
|
||||
|
||||
=====================
|
||||
``SearchBackend`` API
|
||||
=====================
|
||||
|
||||
.. class:: SearchBackend(connection_alias, **connection_options)
|
||||
|
||||
The ``SearchBackend`` class handles interaction directly with the backend. The
|
||||
search query it performs is usually fed to it from a ``SearchQuery`` class that
|
||||
has been built for that backend.
|
||||
|
||||
This class must be at least partially implemented on a per-backend basis and
|
||||
is usually accompanied by a ``SearchQuery`` class within the same module.
|
||||
|
||||
Unless you are writing a new backend, it is unlikely you need to directly
|
||||
access this class.
|
||||
|
||||
|
||||
Method Reference
|
||||
================
|
||||
|
||||
``update``
|
||||
----------
|
||||
|
||||
.. method:: SearchBackend.update(self, index, iterable)
|
||||
|
||||
Updates the backend when given a ``SearchIndex`` and a collection of
|
||||
documents.
|
||||
|
||||
This method MUST be implemented by each backend, as it will be highly
|
||||
specific to each one.
|
||||
|
||||
``remove``
|
||||
----------
|
||||
|
||||
.. method:: SearchBackend.remove(self, obj_or_string)
|
||||
|
||||
Removes a document/object from the backend. Can be either a model
|
||||
instance or the identifier (i.e. ``app_name.model_name.id``) in the
|
||||
event the object no longer exists.
|
||||
|
||||
This method MUST be implemented by each backend, as it will be highly
|
||||
specific to each one.
|
||||
|
||||
``clear``
|
||||
---------
|
||||
|
||||
.. method:: SearchBackend.clear(self, models=[])
|
||||
|
||||
Clears the backend of all documents/objects for a collection of models.
|
||||
|
||||
This method MUST be implemented by each backend, as it will be highly
|
||||
specific to each one.
|
||||
|
||||
``search``
|
||||
----------
|
||||
|
||||
.. method:: SearchBackend.search(self, query_string, sort_by=None, start_offset=0, end_offset=None, fields='', highlight=False, facets=None, date_facets=None, query_facets=None, narrow_queries=None, spelling_query=None, limit_to_registered_models=None, result_class=None, **kwargs)
|
||||
|
||||
Takes a query to search on and returns a dictionary.
|
||||
|
||||
The query should be a string that is appropriate syntax for the backend.
|
||||
|
||||
The returned dictionary should contain the keys 'results' and 'hits'.
|
||||
The 'results' value should be an iterable of populated ``SearchResult``
|
||||
objects. The 'hits' should be an integer count of the number of matched
|
||||
results the search backend found.
|
||||
|
||||
This method MUST be implemented by each backend, as it will be highly
|
||||
specific to each one.
|
||||
|
||||
``extract_file_contents``
|
||||
-------------------------
|
||||
|
||||
.. method:: SearchBackend.extract_file_contents(self, file_obj)
|
||||
|
||||
Perform text extraction on the provided file or file-like object. Returns either
|
||||
None or a dictionary containing the keys ``contents`` and ``metadata``. The
|
||||
``contents`` field will always contain the extracted text content returned by
|
||||
the underlying search engine but ``metadata`` may vary considerably based on
|
||||
the backend and the input file.
|
||||
|
||||
``prep_value``
|
||||
--------------
|
||||
|
||||
.. method:: SearchBackend.prep_value(self, value)
|
||||
|
||||
Hook to give the backend a chance to prep an attribute value before
|
||||
sending it to the search engine.
|
||||
|
||||
By default, just force it to unicode.
|
||||
|
||||
``more_like_this``
|
||||
------------------
|
||||
|
||||
.. method:: SearchBackend.more_like_this(self, model_instance, additional_query_string=None, result_class=None)
|
||||
|
||||
Takes a model object and returns results the backend thinks are similar.
|
||||
|
||||
This method MUST be implemented by each backend, as it will be highly
|
||||
specific to each one.
|
||||
|
||||
``build_schema``
|
||||
----------------
|
||||
|
||||
.. method:: SearchBackend.build_schema(self, fields)
|
||||
|
||||
Takes a dictionary of fields and returns schema information.
|
||||
|
||||
This method MUST be implemented by each backend, as it will be highly
|
||||
specific to each one.
|
||||
|
||||
``build_models_list``
|
||||
---------------------
|
||||
|
||||
.. method:: SearchBackend.build_models_list(self)
|
||||
|
||||
Builds a list of models for searching.
|
||||
|
||||
The ``search`` method should use this and the ``django_ct`` field to
|
||||
narrow the results (unless the user indicates not to). This helps ignore
|
||||
any results that are not currently handled models and ensures
|
||||
consistent caching.
|
|
@ -0,0 +1,262 @@
|
|||
.. _ref-searchfield-api:
|
||||
|
||||
===================
|
||||
``SearchField`` API
|
||||
===================
|
||||
|
||||
.. class:: SearchField
|
||||
|
||||
The ``SearchField`` and its subclasses provides a way to declare what data
|
||||
you're interested in indexing. They are used with ``SearchIndexes``, much like
|
||||
``forms.*Field`` are used within forms or ``models.*Field`` within models.
|
||||
|
||||
They provide both the means for storing data in the index, as well as preparing
|
||||
the data before it's placed in the index. Haystack uses all fields from all
|
||||
``SearchIndex`` classes to determine what the engine's index schema ought to
|
||||
look like.
|
||||
|
||||
In practice, you'll likely never actually use the base ``SearchField``, as the
|
||||
subclasses are much better at handling real data.
|
||||
|
||||
|
||||
Subclasses
|
||||
==========
|
||||
|
||||
Included with Haystack are the following field types:
|
||||
|
||||
* ``BooleanField``
|
||||
* ``CharField``
|
||||
* ``DateField``
|
||||
* ``DateTimeField``
|
||||
* ``DecimalField``
|
||||
* ``EdgeNgramField``
|
||||
* ``FloatField``
|
||||
* ``IntegerField``
|
||||
* ``LocationField``
|
||||
* ``MultiValueField``
|
||||
* ``NgramField``
|
||||
|
||||
And equivalent faceted versions:
|
||||
|
||||
* ``FacetBooleanField``
|
||||
* ``FacetCharField``
|
||||
* ``FacetDateField``
|
||||
* ``FacetDateTimeField``
|
||||
* ``FacetDecimalField``
|
||||
* ``FacetFloatField``
|
||||
* ``FacetIntegerField``
|
||||
* ``FacetMultiValueField``
|
||||
|
||||
.. note::
|
||||
|
||||
There is no faceted variant of the n-gram fields. Because of how the engine
|
||||
generates n-grams, faceting on these field types (``NgramField`` &
|
||||
``EdgeNgram``) would make very little sense.
|
||||
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
While ``SearchField`` objects can be used on their own, they're generally used
|
||||
within a ``SearchIndex``. You use them in a declarative manner, just like
|
||||
fields in ``django.forms.Form`` or ``django.db.models.Model`` objects. For
|
||||
example::
|
||||
|
||||
from haystack import indexes
|
||||
from myapp.models import Note
|
||||
|
||||
|
||||
class NoteIndex(indexes.SearchIndex, indexes.Indexable):
|
||||
text = indexes.CharField(document=True, use_template=True)
|
||||
author = indexes.CharField(model_attr='user')
|
||||
pub_date = indexes.DateTimeField(model_attr='pub_date')
|
||||
|
||||
def get_model(self):
|
||||
return Note
|
||||
|
||||
This will hook up those fields with the index and, when updating a ``Model``
|
||||
object, pull the relevant data out and prepare it for storage in the index.
|
||||
|
||||
|
||||
Field Options
|
||||
=============
|
||||
|
||||
``default``
|
||||
-----------
|
||||
|
||||
.. attribute:: SearchField.default
|
||||
|
||||
Provides a means for specifying a fallback value in the event that no data is
|
||||
found for the field. Can be either a value or a callable.
|
||||
|
||||
``document``
|
||||
------------
|
||||
|
||||
.. attribute:: SearchField.document
|
||||
|
||||
A boolean flag that indicates which of the fields in the ``SearchIndex`` ought
|
||||
to be the primary field for searching within. Default is ``False``.
|
||||
|
||||
.. note::
|
||||
|
||||
Only one field can be marked as the ``document=True`` field, so you should
|
||||
standardize this name and the format of the field between all of your
|
||||
``SearchIndex`` classes.
|
||||
|
||||
``indexed``
|
||||
-----------
|
||||
|
||||
.. attribute:: SearchField.indexed
|
||||
|
||||
A boolean flag for indicating whether or not the data from this field will
|
||||
be searchable within the index. Default is ``True``.
|
||||
|
||||
The companion of this option is ``stored``.
|
||||
|
||||
``index_fieldname``
|
||||
-------------------
|
||||
|
||||
.. attribute:: SearchField.index_fieldname
|
||||
|
||||
The ``index_fieldname`` option allows you to force the name of the field in the
|
||||
index. This does not change how Haystack refers to the field. This is useful
|
||||
when using Solr's dynamic attributes or when integrating with other external
|
||||
software.
|
||||
|
||||
Default is variable name of the field within the ``SearchIndex``.
|
||||
|
||||
``model_attr``
|
||||
--------------
|
||||
|
||||
.. attribute:: SearchField.model_attr
|
||||
|
||||
The ``model_attr`` option is a shortcut for preparing data. Rather than having
|
||||
to manually fetch data out of a ``Model``, ``model_attr`` allows you to specify
|
||||
a string that will automatically pull data out for you. For example::
|
||||
|
||||
# Automatically looks within the model and populates the field with
|
||||
# the ``last_name`` attribute.
|
||||
author = CharField(model_attr='last_name')
|
||||
|
||||
It also handles callables::
|
||||
|
||||
# On a ``User`` object, pulls the full name as pieced together by the
|
||||
# ``get_full_name`` method.
|
||||
author = CharField(model_attr='get_full_name')
|
||||
|
||||
And can look through relations::
|
||||
|
||||
# Pulls the ``bio`` field from a ``UserProfile`` object that has a
|
||||
# ``OneToOneField`` relationship to a ``User`` object.
|
||||
biography = CharField(model_attr='user__profile__bio')
|
||||
|
||||
``null``
|
||||
--------
|
||||
|
||||
.. attribute:: SearchField.null
|
||||
|
||||
A boolean flag for indicating whether or not it's permissible for the field
|
||||
not to contain any data. Default is ``False``.
|
||||
|
||||
.. note::
|
||||
|
||||
Unlike Django's database layer, which injects a ``NULL`` into the database
|
||||
when a field is marked nullable, ``null=True`` will actually exclude that
|
||||
field from being included with the document. This is more efficient for the
|
||||
search engine to deal with.
|
||||
|
||||
``stored``
|
||||
----------
|
||||
|
||||
.. attribute:: SearchField.stored
|
||||
|
||||
A boolean flag for indicating whether or not the data from this field will
|
||||
be stored within the index. Default is ``True``.
|
||||
|
||||
This is useful for pulling data out of the index along with the search result
|
||||
in order to save on hits to the database.
|
||||
|
||||
The companion of this option is ``indexed``.
|
||||
|
||||
``template_name``
|
||||
-----------------
|
||||
|
||||
.. attribute:: SearchField.template_name
|
||||
|
||||
Allows you to override the name of the template to use when preparing data. By
|
||||
default, the data templates for fields are located within your ``TEMPLATE_DIRS``
|
||||
under a path like ``search/indexes/{app_label}/{model_name}_{field_name}.txt``.
|
||||
This option lets you override that path (though still within ``TEMPLATE_DIRS``).
|
||||
|
||||
Example::
|
||||
|
||||
bio = CharField(use_template=True, template_name='myapp/data/bio.txt')
|
||||
|
||||
You can also provide a list of templates, as ``loader.select_template`` is used
|
||||
under the hood.
|
||||
|
||||
Example::
|
||||
|
||||
bio = CharField(use_template=True, template_name=['myapp/data/bio.txt', 'myapp/bio.txt', 'bio.txt'])
|
||||
|
||||
|
||||
``use_template``
|
||||
----------------
|
||||
|
||||
.. attribute:: SearchField.use_template
|
||||
|
||||
A boolean flag for indicating whether or not a field should prepare its data
|
||||
via a data template or not. Default is False.
|
||||
|
||||
Data templates are extremely useful, as they let you easily tie together
|
||||
different parts of the ``Model`` (and potentially related models). This leads
|
||||
to better search results with very little effort.
|
||||
|
||||
|
||||
|
||||
Method Reference
|
||||
================
|
||||
|
||||
``__init__``
|
||||
------------
|
||||
|
||||
.. method:: SearchField.__init__(self, model_attr=None, use_template=False, template_name=None, document=False, indexed=True, stored=True, faceted=False, default=NOT_PROVIDED, null=False, index_fieldname=None, facet_class=None, boost=1.0, weight=None)
|
||||
|
||||
Instantiates a fresh ``SearchField`` instance.
|
||||
|
||||
``has_default``
|
||||
---------------
|
||||
|
||||
.. method:: SearchField.has_default(self)
|
||||
|
||||
Returns a boolean of whether this field has a default value.
|
||||
|
||||
``prepare``
|
||||
-----------
|
||||
|
||||
.. method:: SearchField.prepare(self, obj)
|
||||
|
||||
Takes data from the provided object and prepares it for storage in the
|
||||
index.
|
||||
|
||||
``prepare_template``
|
||||
--------------------
|
||||
|
||||
.. method:: SearchField.prepare_template(self, obj)
|
||||
|
||||
Flattens an object for indexing.
|
||||
|
||||
This loads a template
|
||||
(``search/indexes/{app_label}/{model_name}_{field_name}.txt``) and
|
||||
returns the result of rendering that template. ``object`` will be in
|
||||
its context.
|
||||
|
||||
``convert``
|
||||
-----------
|
||||
|
||||
.. method:: SearchField.convert(self, value)
|
||||
|
||||
Handles conversion between the data found and the type of the field.
|
||||
|
||||
Extending classes should override this method and provide correct
|
||||
data coercion.
|
|
@ -0,0 +1,618 @@
|
|||
.. _ref-searchindex-api:
|
||||
|
||||
===================
|
||||
``SearchIndex`` API
|
||||
===================
|
||||
|
||||
.. class:: SearchIndex()
|
||||
|
||||
The ``SearchIndex`` class allows the application developer a way to provide data to
|
||||
the backend in a structured format. Developers familiar with Django's ``Form``
|
||||
or ``Model`` classes should find the syntax for indexes familiar.
|
||||
|
||||
This class is arguably the most important part of integrating Haystack into your
|
||||
application, as it has a large impact on the quality of the search results and
|
||||
how easy it is for users to find what they're looking for. Care and effort
|
||||
should be put into making your indexes the best they can be.
|
||||
|
||||
|
||||
Quick Start
|
||||
===========
|
||||
|
||||
For the impatient::
|
||||
|
||||
import datetime
|
||||
from haystack import indexes
|
||||
from myapp.models import Note
|
||||
|
||||
|
||||
class NoteIndex(indexes.SearchIndex, indexes.Indexable):
|
||||
text = indexes.CharField(document=True, use_template=True)
|
||||
author = indexes.CharField(model_attr='user')
|
||||
pub_date = indexes.DateTimeField(model_attr='pub_date')
|
||||
|
||||
def get_model(self):
|
||||
return Note
|
||||
|
||||
def index_queryset(self, using=None):
|
||||
"Used when the entire index for model is updated."
|
||||
return self.get_model().objects.filter(pub_date__lte=datetime.datetime.now())
|
||||
|
||||
|
||||
Background
|
||||
==========
|
||||
|
||||
Unlike relational databases, most search engines supported by Haystack are
|
||||
primarily document-based. They focus on a single text blob which they tokenize,
|
||||
analyze and index. When searching, this field is usually the primary one that
|
||||
is searched.
|
||||
|
||||
Further, the schema used by most engines is the same for all types of data
|
||||
added, unlike a relational database that has a table schema for each chunk of
|
||||
data.
|
||||
|
||||
It may be helpful to think of your search index as something closer to a
|
||||
key-value store instead of imagining it in terms of a RDBMS.
|
||||
|
||||
|
||||
Why Create Fields?
|
||||
------------------
|
||||
|
||||
Despite being primarily document-driven, most search engines also support the
|
||||
ability to associate other relevant data with the indexed document. These
|
||||
attributes can be mapped through the use of fields within Haystack.
|
||||
|
||||
Common uses include storing pertinent data information, categorizations of the
|
||||
document, author information and related data. By adding fields for these pieces
|
||||
of data, you provide a means to further narrow/filter search terms. This can
|
||||
be useful from either a UI perspective (a better advanced search form) or from a
|
||||
developer standpoint (section-dependent search, off-loading certain tasks to
|
||||
search, et cetera).
|
||||
|
||||
.. warning::
|
||||
|
||||
Haystack reserves the following field names for internal use: ``id``,
|
||||
``django_ct``, ``django_id`` & ``content``. The ``name`` & ``type`` names
|
||||
used to be reserved but no longer are.
|
||||
|
||||
You can override these field names using the ``HAYSTACK_ID_FIELD``,
|
||||
``HAYSTACK_DJANGO_CT_FIELD`` & ``HAYSTACK_DJANGO_ID_FIELD`` if needed.
|
||||
|
||||
|
||||
Significance Of ``document=True``
|
||||
---------------------------------
|
||||
|
||||
Most search engines that were candidates for inclusion in Haystack all had a
|
||||
central concept of a document that they indexed. These documents form a corpus
|
||||
within which to primarily search. Because this ideal is so central and most of
|
||||
Haystack is designed to have pluggable backends, it is important to ensure that
|
||||
all engines have at least a bare minimum of the data they need to function.
|
||||
|
||||
As a result, when creating a ``SearchIndex``, at least one field must be marked
|
||||
with ``document=True``. This signifies to Haystack that whatever is placed in
|
||||
this field while indexing is to be the primary text the search engine indexes.
|
||||
The name of this field can be almost anything, but ``text`` is one of the
|
||||
more common names used.
|
||||
|
||||
|
||||
Stored/Indexed Fields
|
||||
---------------------
|
||||
|
||||
One shortcoming of the use of search is that you rarely have all or the most
|
||||
up-to-date information about an object in the index. As a result, when
|
||||
retrieving search results, you will likely have to access the object in the
|
||||
database to provide better information.
|
||||
|
||||
However, this can also hit the database quite heavily (think
|
||||
``.get(pk=result.id)`` per object). If your search is popular, this can lead
|
||||
to a big performance hit. There are two ways to prevent this. The first way is
|
||||
``SearchQuerySet.load_all``, which tries to group all similar objects and pull
|
||||
them through one query instead of many. This still hits the DB and incurs a
|
||||
performance penalty.
|
||||
|
||||
The other option is to leverage stored fields. By default, all fields in
|
||||
Haystack are both indexed (searchable by the engine) and stored (retained by
|
||||
the engine and presented in the results). By using a stored field, you can
|
||||
store commonly used data in such a way that you don't need to hit the database
|
||||
when processing the search result to get more information.
|
||||
|
||||
For example, one great way to leverage this is to pre-rendering an object's
|
||||
search result template DURING indexing. You define an additional field, render
|
||||
a template with it and it follows the main indexed record into the index. Then,
|
||||
when that record is pulled when it matches a query, you can simply display the
|
||||
contents of that field, which avoids the database hit.:
|
||||
|
||||
Within ``myapp/search_indexes.py``::
|
||||
|
||||
class NoteIndex(SearchIndex, indexes.Indexable):
|
||||
text = CharField(document=True, use_template=True)
|
||||
author = CharField(model_attr='user')
|
||||
pub_date = DateTimeField(model_attr='pub_date')
|
||||
# Define the additional field.
|
||||
rendered = CharField(use_template=True, indexed=False)
|
||||
|
||||
Then, inside a template named ``search/indexes/myapp/note_rendered.txt``::
|
||||
|
||||
<h2>{{ object.title }}</h2>
|
||||
|
||||
<p>{{ object.content }}</p>
|
||||
|
||||
And finally, in ``search/search.html``::
|
||||
|
||||
...
|
||||
|
||||
{% for result in page.object_list %}
|
||||
<div class="search_result">
|
||||
{{ result.rendered|safe }}
|
||||
</div>
|
||||
{% endfor %}
|
||||
|
||||
|
||||
Keeping The Index Fresh
|
||||
=======================
|
||||
|
||||
There are several approaches to keeping the search index in sync with your
|
||||
database. None are more correct than the others and depending the traffic you
|
||||
see, the churn rate of your data and what concerns are important to you
|
||||
(CPU load, how recent, et cetera).
|
||||
|
||||
The conventional method is to use ``SearchIndex`` in combination with cron
|
||||
jobs. Running a ``./manage.py update_index`` every couple hours will keep your
|
||||
data in sync within that timeframe and will handle the updates in a very
|
||||
efficient batch. Additionally, Whoosh (and to a lesser extent Xapian) behaves
|
||||
better when using this approach.
|
||||
|
||||
Another option is to use ``RealtimeSignalProcessor``, which uses Django's
|
||||
signals to immediately update the index any time a model is saved/deleted. This
|
||||
yields a much more current search index at the expense of being fairly
|
||||
inefficient. Solr & Elasticsearch are the only backends that handles this well
|
||||
under load, and even then, you should make sure you have the server capacity
|
||||
to spare.
|
||||
|
||||
A third option is to develop a custom ``QueuedSignalProcessor`` that, much like
|
||||
``RealtimeSignalProcessor``, uses Django's signals to enqueue messages for
|
||||
updates/deletes. Then writing a management command to consume these messages
|
||||
in batches, yielding a nice compromise between the previous two options.
|
||||
|
||||
For more information see :doc:`signal_processors`.
|
||||
|
||||
.. note::
|
||||
|
||||
Haystack doesn't ship with a ``QueuedSignalProcessor`` largely because there is
|
||||
such a diversity of lightweight queuing options and that they tend to
|
||||
polarize developers. Queuing is outside of Haystack's goals (provide good,
|
||||
powerful search) and, as such, is left to the developer.
|
||||
|
||||
Additionally, the implementation is relatively trivial & there are already
|
||||
good third-party add-ons for Haystack to enable this.
|
||||
|
||||
|
||||
Advanced Data Preparation
|
||||
=========================
|
||||
|
||||
In most cases, using the `model_attr` parameter on your fields allows you to
|
||||
easily get data from a Django model to the document in your index, as it handles
|
||||
both direct attribute access as well as callable functions within your model.
|
||||
|
||||
.. note::
|
||||
|
||||
The ``model_attr`` keyword argument also can look through relations in
|
||||
models. So you can do something like ``model_attr='author__first_name'``
|
||||
to pull just the first name of the author, similar to some lookups used
|
||||
by Django's ORM.
|
||||
|
||||
However, sometimes, even more control over what gets placed in your index is
|
||||
needed. To facilitate this, ``SearchIndex`` objects have a 'preparation' stage
|
||||
that populates data just before it is indexed. You can hook into this phase in
|
||||
several ways.
|
||||
|
||||
This should be very familiar to developers who have used Django's ``forms``
|
||||
before as it loosely follows similar concepts, though the emphasis here is
|
||||
less on cleansing data from user input and more on making the data friendly
|
||||
to the search backend.
|
||||
|
||||
1. ``prepare_FOO(self, object)``
|
||||
--------------------------------
|
||||
|
||||
The most common way to affect a single field's data is to create a
|
||||
``prepare_FOO`` method (where FOO is the name of the field). As a parameter
|
||||
to this method, you will receive the instance that is attempting to be indexed.
|
||||
|
||||
.. note::
|
||||
|
||||
This method is analogous to Django's ``Form.clean_FOO`` methods.
|
||||
|
||||
To keep with our existing example, one use case might be altering the name
|
||||
inside the ``author`` field to be "firstname lastname <email>". In this case,
|
||||
you might write the following code::
|
||||
|
||||
class NoteIndex(SearchIndex, indexes.Indexable):
|
||||
text = CharField(document=True, use_template=True)
|
||||
author = CharField(model_attr='user')
|
||||
pub_date = DateTimeField(model_attr='pub_date')
|
||||
|
||||
def get_model(self):
|
||||
return Note
|
||||
|
||||
def prepare_author(self, obj):
|
||||
return "%s <%s>" % (obj.user.get_full_name(), obj.user.email)
|
||||
|
||||
This method should return a single value (or list/tuple/dict) to populate that
|
||||
field's data upon indexing. Note that this method takes priority over whatever
|
||||
data may come from the field itself.
|
||||
|
||||
Just like ``Form.clean_FOO``, the field's ``prepare`` runs before the
|
||||
``prepare_FOO``, allowing you to access ``self.prepared_data``. For example::
|
||||
|
||||
class NoteIndex(SearchIndex, indexes.Indexable):
|
||||
text = CharField(document=True, use_template=True)
|
||||
author = CharField(model_attr='user')
|
||||
pub_date = DateTimeField(model_attr='pub_date')
|
||||
|
||||
def get_model(self):
|
||||
return Note
|
||||
|
||||
def prepare_author(self, obj):
|
||||
# Say we want last name first, the hard way.
|
||||
author = u''
|
||||
|
||||
if 'author' in self.prepared_data:
|
||||
name_bits = self.prepared_data['author'].split()
|
||||
author = "%s, %s" % (name_bits[-1], ' '.join(name_bits[:-1]))
|
||||
|
||||
return author
|
||||
|
||||
This method is fully function with ``model_attr``, so if there's no convenient
|
||||
way to access the data you want, this is an excellent way to prepare it::
|
||||
|
||||
class NoteIndex(SearchIndex, indexes.Indexable):
|
||||
text = CharField(document=True, use_template=True)
|
||||
author = CharField(model_attr='user')
|
||||
categories = MultiValueField()
|
||||
pub_date = DateTimeField(model_attr='pub_date')
|
||||
|
||||
def get_model(self):
|
||||
return Note
|
||||
|
||||
def prepare_categories(self, obj):
|
||||
# Since we're using a M2M relationship with a complex lookup,
|
||||
# we can prepare the list here.
|
||||
return [category.id for category in obj.category_set.active().order_by('-created')]
|
||||
|
||||
|
||||
2. ``prepare(self, object)``
|
||||
----------------------------
|
||||
|
||||
Each ``SearchIndex`` gets a ``prepare`` method, which handles collecting all
|
||||
the data. This method should return a dictionary that will be the final data
|
||||
used by the search backend.
|
||||
|
||||
Overriding this method is useful if you need to collect more than one piece
|
||||
of data or need to incorporate additional data that is not well represented
|
||||
by a single ``SearchField``. An example might look like::
|
||||
|
||||
class NoteIndex(SearchIndex, indexes.Indexable):
|
||||
text = CharField(document=True, use_template=True)
|
||||
author = CharField(model_attr='user')
|
||||
pub_date = DateTimeField(model_attr='pub_date')
|
||||
|
||||
def get_model(self):
|
||||
return Note
|
||||
|
||||
def prepare(self, object):
|
||||
self.prepared_data = super(NoteIndex, self).prepare(object)
|
||||
|
||||
# Add in tags (assuming there's a M2M relationship to Tag on the model).
|
||||
# Note that this would NOT get picked up by the automatic
|
||||
# schema tools provided by Haystack.
|
||||
self.prepared_data['tags'] = [tag.name for tag in object.tags.all()]
|
||||
|
||||
return self.prepared_data
|
||||
|
||||
If you choose to use this method, you should make a point to be careful to call
|
||||
the ``super()`` method before altering the data. Without doing so, you may have
|
||||
an incomplete set of data populating your indexes.
|
||||
|
||||
This method has the final say in all data, overriding both what the fields
|
||||
provide as well as any ``prepare_FOO`` methods on the class.
|
||||
|
||||
.. note::
|
||||
|
||||
This method is roughly analogous to Django's ``Form.full_clean`` and
|
||||
``Form.clean`` methods. However, unlike these methods, it is not fired
|
||||
as the result of trying to access ``self.prepared_data``. It requires
|
||||
an explicit call.
|
||||
|
||||
|
||||
3. Overriding ``prepare(self, object)`` On Individual ``SearchField`` Objects
|
||||
-----------------------------------------------------------------------------
|
||||
|
||||
The final way to manipulate your data is to implement a custom ``SearchField``
|
||||
object and write its ``prepare`` method to populate/alter the data any way you
|
||||
choose. For instance, a (naive) user-created ``GeoPointField`` might look
|
||||
something like::
|
||||
|
||||
from django.utils import six
|
||||
from haystack import indexes
|
||||
|
||||
class GeoPointField(indexes.CharField):
|
||||
def __init__(self, **kwargs):
|
||||
kwargs['default'] = '0.00-0.00'
|
||||
super(GeoPointField, self).__init__(**kwargs)
|
||||
|
||||
def prepare(self, obj):
|
||||
return six.text_type("%s-%s" % (obj.latitude, obj.longitude))
|
||||
|
||||
The ``prepare`` method simply returns the value to be used for that field. It's
|
||||
entirely possible to include data that's not directly referenced to the object
|
||||
here, depending on your needs.
|
||||
|
||||
Note that this is NOT a recommended approach to storing geographic data in a
|
||||
search engine (there is no formal suggestion on this as support is usually
|
||||
non-existent), merely an example of how to extend existing fields.
|
||||
|
||||
.. note::
|
||||
|
||||
This method is analagous to Django's ``Field.clean`` methods.
|
||||
|
||||
|
||||
Adding New Fields
|
||||
=================
|
||||
|
||||
If you have an existing ``SearchIndex`` and you add a new field to it, Haystack
|
||||
will add this new data on any updates it sees after that point. However, this
|
||||
will not populate the existing data you already have.
|
||||
|
||||
In order for the data to be picked up, you will need to run ``./manage.py
|
||||
rebuild_index``. This will cause all backends to rebuild the existing data
|
||||
already present in the quickest and most efficient way.
|
||||
|
||||
.. note::
|
||||
|
||||
With the Solr backend, you'll also have to add to the appropriate
|
||||
``schema.xml`` for your configuration before running the ``rebuild_index``.
|
||||
|
||||
|
||||
``Search Index``
|
||||
================
|
||||
|
||||
``get_model``
|
||||
-------------
|
||||
|
||||
.. method:: SearchIndex.get_model(self)
|
||||
|
||||
Should return the ``Model`` class (not an instance) that the rest of the
|
||||
``SearchIndex`` should use.
|
||||
|
||||
This method is required & you must override it to return the correct class.
|
||||
|
||||
``index_queryset``
|
||||
------------------
|
||||
|
||||
.. method:: SearchIndex.index_queryset(self, using=None)
|
||||
|
||||
Get the default QuerySet to index when doing a full update.
|
||||
|
||||
Subclasses can override this method to avoid indexing certain objects.
|
||||
|
||||
``read_queryset``
|
||||
-----------------
|
||||
|
||||
.. method:: SearchIndex.read_queryset(self, using=None)
|
||||
|
||||
Get the default QuerySet for read actions.
|
||||
|
||||
Subclasses can override this method to work with other managers.
|
||||
Useful when working with default managers that filter some objects.
|
||||
|
||||
``build_queryset``
|
||||
-------------------
|
||||
|
||||
.. method:: SearchIndex.build_queryset(self, start_date=None, end_date=None)
|
||||
|
||||
Get the default QuerySet to index when doing an index update.
|
||||
|
||||
Subclasses can override this method to take into account related
|
||||
model modification times.
|
||||
|
||||
The default is to use ``SearchIndex.index_queryset`` and filter
|
||||
based on ``SearchIndex.get_updated_field``
|
||||
|
||||
``prepare``
|
||||
-----------
|
||||
|
||||
.. method:: SearchIndex.prepare(self, obj)
|
||||
|
||||
Fetches and adds/alters data before indexing.
|
||||
|
||||
``get_content_field``
|
||||
---------------------
|
||||
|
||||
.. method:: SearchIndex.get_content_field(self)
|
||||
|
||||
Returns the field that supplies the primary document to be indexed.
|
||||
|
||||
``update``
|
||||
----------
|
||||
|
||||
.. method:: SearchIndex.update(self, using=None)
|
||||
|
||||
Updates the entire index.
|
||||
|
||||
If ``using`` is provided, it specifies which connection should be
|
||||
used. Default relies on the routers to decide which backend should
|
||||
be used.
|
||||
|
||||
``update_object``
|
||||
-----------------
|
||||
|
||||
.. method:: SearchIndex.update_object(self, instance, using=None, **kwargs)
|
||||
|
||||
Update the index for a single object. Attached to the class's
|
||||
post-save hook.
|
||||
|
||||
If ``using`` is provided, it specifies which connection should be
|
||||
used. Default relies on the routers to decide which backend should
|
||||
be used.
|
||||
|
||||
``remove_object``
|
||||
-----------------
|
||||
|
||||
.. method:: SearchIndex.remove_object(self, instance, using=None, **kwargs)
|
||||
|
||||
Remove an object from the index. Attached to the class's
|
||||
post-delete hook.
|
||||
|
||||
If ``using`` is provided, it specifies which connection should be
|
||||
used. Default relies on the routers to decide which backend should
|
||||
be used.
|
||||
|
||||
``clear``
|
||||
---------
|
||||
|
||||
.. method:: SearchIndex.clear(self, using=None)
|
||||
|
||||
Clears the entire index.
|
||||
|
||||
If ``using`` is provided, it specifies which connection should be
|
||||
used. Default relies on the routers to decide which backend should
|
||||
be used.
|
||||
|
||||
``reindex``
|
||||
-----------
|
||||
|
||||
.. method:: SearchIndex.reindex(self, using=None)
|
||||
|
||||
Completely clears the index for this model and rebuilds it.
|
||||
|
||||
If ``using`` is provided, it specifies which connection should be
|
||||
used. Default relies on the routers to decide which backend should
|
||||
be used.
|
||||
|
||||
``get_updated_field``
|
||||
---------------------
|
||||
|
||||
.. method:: SearchIndex.get_updated_field(self)
|
||||
|
||||
Get the field name that represents the updated date for the model.
|
||||
|
||||
If specified, this is used by the reindex command to filter out results
|
||||
from the ``QuerySet``, enabling you to reindex only recent records. This
|
||||
method should either return None (reindex everything always) or a
|
||||
string of the ``Model``'s ``DateField``/``DateTimeField`` name.
|
||||
|
||||
``should_update``
|
||||
-----------------
|
||||
|
||||
.. method:: SearchIndex.should_update(self, instance, **kwargs)
|
||||
|
||||
Determine if an object should be updated in the index.
|
||||
|
||||
It's useful to override this when an object may save frequently and
|
||||
cause excessive reindexing. You should check conditions on the instance
|
||||
and return False if it is not to be indexed.
|
||||
|
||||
The ``kwargs`` passed along to this method can be the same as the ones passed
|
||||
by Django when a Model is saved/delete, so it's possible to check if the object
|
||||
has been created or not. See ``django.db.models.signals.post_save`` for details
|
||||
on what is passed.
|
||||
|
||||
By default, returns True (always reindex).
|
||||
|
||||
``load_all_queryset``
|
||||
---------------------
|
||||
|
||||
.. method:: SearchIndex.load_all_queryset(self)
|
||||
|
||||
Provides the ability to override how objects get loaded in conjunction
|
||||
with ``RelatedSearchQuerySet.load_all``. This is useful for post-processing the
|
||||
results from the query, enabling things like adding ``select_related`` or
|
||||
filtering certain data.
|
||||
|
||||
.. warning::
|
||||
|
||||
Utilizing this functionality can have negative performance implications.
|
||||
Please see the section on ``RelatedSearchQuerySet`` within
|
||||
:doc:`searchqueryset_api` for further information.
|
||||
|
||||
By default, returns ``all()`` on the model's default manager.
|
||||
|
||||
Example::
|
||||
|
||||
class NoteIndex(SearchIndex, indexes.Indexable):
|
||||
text = CharField(document=True, use_template=True)
|
||||
author = CharField(model_attr='user')
|
||||
pub_date = DateTimeField(model_attr='pub_date')
|
||||
|
||||
def get_model(self):
|
||||
return Note
|
||||
|
||||
def load_all_queryset(self):
|
||||
# Pull all objects related to the Note in search results.
|
||||
return Note.objects.all().select_related()
|
||||
|
||||
When searching, the ``RelatedSearchQuerySet`` appends on a call to ``in_bulk``, so be
|
||||
sure that the ``QuerySet`` you provide can accommodate this and that the ids
|
||||
passed to ``in_bulk`` will map to the model in question.
|
||||
|
||||
If you need a specific ``QuerySet`` in one place, you can specify this at the
|
||||
``RelatedSearchQuerySet`` level using the ``load_all_queryset`` method. See
|
||||
:doc:`searchqueryset_api` for usage.
|
||||
|
||||
|
||||
``ModelSearchIndex``
|
||||
====================
|
||||
|
||||
The ``ModelSearchIndex`` class allows for automatic generation of a
|
||||
``SearchIndex`` based on the fields of the model assigned to it.
|
||||
|
||||
With the exception of the automated introspection, it is a ``SearchIndex``
|
||||
class, so all notes above pertaining to ``SearchIndexes`` apply. As with the
|
||||
``ModelForm`` class in Django, it employs an inner class called ``Meta``, which
|
||||
should contain a ``model`` attribute. By default all non-relational model
|
||||
fields are included as search fields on the index, but fields can be restricted
|
||||
by way of a ``fields`` whitelist, or excluded with an ``excludes`` list, to
|
||||
prevent certain fields from appearing in the class.
|
||||
|
||||
In addition, it adds a `text` field that is the ``document=True`` field and
|
||||
has `use_template=True` option set, just like the ``BasicSearchIndex``.
|
||||
|
||||
.. warning::
|
||||
|
||||
Usage of this class might result in inferior ``SearchIndex`` objects, which
|
||||
can directly affect your search results. Use this to establish basic
|
||||
functionality and move to custom `SearchIndex` objects for better control.
|
||||
|
||||
At this time, it does not handle related fields.
|
||||
|
||||
Quick Start
|
||||
-----------
|
||||
|
||||
For the impatient::
|
||||
|
||||
import datetime
|
||||
from haystack import indexes
|
||||
from myapp.models import Note
|
||||
|
||||
# All Fields
|
||||
class AllNoteIndex(indexes.ModelSearchIndex, indexes.Indexable):
|
||||
class Meta:
|
||||
model = Note
|
||||
|
||||
# Blacklisted Fields
|
||||
class LimitedNoteIndex(indexes.ModelSearchIndex, indexes.Indexable):
|
||||
class Meta:
|
||||
model = Note
|
||||
excludes = ['user']
|
||||
|
||||
# Whitelisted Fields
|
||||
class NoteIndex(indexes.ModelSearchIndex, indexes.Indexable):
|
||||
class Meta:
|
||||
model = Note
|
||||
fields = ['user', 'pub_date']
|
||||
|
||||
# Note that regular ``SearchIndex`` methods apply.
|
||||
def index_queryset(self, using=None):
|
||||
"Used when the entire index for model is updated."
|
||||
return Note.objects.filter(pub_date__lte=datetime.datetime.now())
|
||||
|
|
@ -0,0 +1,336 @@
|
|||
.. _ref-searchquery-api:
|
||||
|
||||
===================
|
||||
``SearchQuery`` API
|
||||
===================
|
||||
|
||||
.. class:: SearchQuery(using=DEFAULT_ALIAS)
|
||||
|
||||
The ``SearchQuery`` class acts as an intermediary between ``SearchQuerySet``'s
|
||||
abstraction and ``SearchBackend``'s actual search. Given the metadata provided
|
||||
by ``SearchQuerySet``, ``SearchQuery`` builds the actual query and interacts
|
||||
with the ``SearchBackend`` on ``SearchQuerySet``'s behalf.
|
||||
|
||||
This class must be at least partially implemented on a per-backend basis, as portions
|
||||
are highly specific to the backend. It usually is bundled with the accompanying
|
||||
``SearchBackend``.
|
||||
|
||||
Most people will **NOT** have to use this class directly. ``SearchQuerySet``
|
||||
handles all interactions with ``SearchQuery`` objects and provides a nicer
|
||||
interface to work with.
|
||||
|
||||
Should you need advanced/custom behavior, you can supply your version of
|
||||
``SearchQuery`` that overrides/extends the class in the manner you see fit.
|
||||
You can either hook it up in a ``BaseEngine`` subclass or ``SearchQuerySet``
|
||||
objects take a kwarg parameter ``query`` where you can pass in your class.
|
||||
|
||||
|
||||
``SQ`` Objects
|
||||
==============
|
||||
|
||||
For expressing more complex queries, especially involving AND/OR/NOT in
|
||||
different combinations, you should use ``SQ`` objects. Like
|
||||
``django.db.models.Q`` objects, ``SQ`` objects can be passed to
|
||||
``SearchQuerySet.filter`` and use the familiar unary operators (``&``, ``|`` and
|
||||
``~``) to generate complex parts of the query.
|
||||
|
||||
.. warning::
|
||||
|
||||
Any data you pass to ``SQ`` objects is passed along **unescaped**. If
|
||||
you don't trust the data you're passing along, you should use
|
||||
the ``clean`` method on your ``SearchQuery`` to sanitize the data.
|
||||
|
||||
Example::
|
||||
|
||||
from haystack.query import SQ
|
||||
|
||||
# We want "title: Foo AND (tags:bar OR tags:moof)"
|
||||
sqs = SearchQuerySet().filter(title='Foo').filter(SQ(tags='bar') | SQ(tags='moof'))
|
||||
|
||||
# To clean user-provided data:
|
||||
sqs = SearchQuerySet()
|
||||
clean_query = sqs.query.clean(user_query)
|
||||
sqs = sqs.filter(SQ(title=clean_query) | SQ(tags=clean_query))
|
||||
|
||||
Internally, the ``SearchQuery`` object maintains a tree of ``SQ`` objects. Each
|
||||
``SQ`` object supports what field it looks up against, what kind of lookup (i.e.
|
||||
the ``__`` filters), what value it's looking for, if it's a AND/OR/NOT and
|
||||
tracks any children it may have. The ``SearchQuery.build_query`` method starts
|
||||
with the root of the tree, building part of the final query at each node until
|
||||
the full final query is ready for the ``SearchBackend``.
|
||||
|
||||
|
||||
Backend-Specific Methods
|
||||
========================
|
||||
|
||||
When implementing a new backend, the following methods will need to be created:
|
||||
|
||||
``build_query_fragment``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.build_query_fragment(self, field, filter_type, value)
|
||||
|
||||
Generates a query fragment from a field, filter type and a value.
|
||||
|
||||
Must be implemented in backends as this will be highly backend specific.
|
||||
|
||||
|
||||
Inheritable Methods
|
||||
===================
|
||||
|
||||
The following methods have a complete implementation in the base class and
|
||||
can largely be used unchanged.
|
||||
|
||||
``build_query``
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.build_query(self)
|
||||
|
||||
Interprets the collected query metadata and builds the final query to
|
||||
be sent to the backend.
|
||||
|
||||
``build_params``
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.build_params(self, spelling_query=None)
|
||||
|
||||
Generates a list of params to use when searching.
|
||||
|
||||
``clean``
|
||||
~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.clean(self, query_fragment)
|
||||
|
||||
Provides a mechanism for sanitizing user input before presenting the
|
||||
value to the backend.
|
||||
|
||||
A basic (override-able) implementation is provided.
|
||||
|
||||
``run``
|
||||
~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.run(self, spelling_query=None, **kwargs)
|
||||
|
||||
Builds and executes the query. Returns a list of search results.
|
||||
|
||||
Optionally passes along an alternate query for spelling suggestions.
|
||||
|
||||
Optionally passes along more kwargs for controlling the search query.
|
||||
|
||||
``run_mlt``
|
||||
~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.run_mlt(self, **kwargs)
|
||||
|
||||
Executes the More Like This. Returns a list of search results similar
|
||||
to the provided document (and optionally query).
|
||||
|
||||
``run_raw``
|
||||
~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.run_raw(self, **kwargs)
|
||||
|
||||
Executes a raw query. Returns a list of search results.
|
||||
|
||||
``get_count``
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.get_count(self)
|
||||
|
||||
Returns the number of results the backend found for the query.
|
||||
|
||||
If the query has not been run, this will execute the query and store
|
||||
the results.
|
||||
|
||||
``get_results``
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.get_results(self, **kwargs)
|
||||
|
||||
Returns the results received from the backend.
|
||||
|
||||
If the query has not been run, this will execute the query and store
|
||||
the results.
|
||||
|
||||
``get_facet_counts``
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.get_facet_counts(self)
|
||||
|
||||
Returns the results received from the backend.
|
||||
|
||||
If the query has not been run, this will execute the query and store
|
||||
the results.
|
||||
|
||||
``boost_fragment``
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.boost_fragment(self, boost_word, boost_value)
|
||||
|
||||
Generates query fragment for boosting a single word/value pair.
|
||||
|
||||
``matching_all_fragment``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.matching_all_fragment(self)
|
||||
|
||||
Generates the query that matches all documents.
|
||||
|
||||
``add_filter``
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.add_filter(self, expression, value, use_not=False, use_or=False)
|
||||
|
||||
Narrows the search by requiring certain conditions.
|
||||
|
||||
``add_order_by``
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.add_order_by(self, field)
|
||||
|
||||
Orders the search result by a field.
|
||||
|
||||
``clear_order_by``
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.clear_order_by(self)
|
||||
|
||||
Clears out all ordering that has been already added, reverting the
|
||||
query to relevancy.
|
||||
|
||||
``add_model``
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.add_model(self, model)
|
||||
|
||||
Restricts the query requiring matches in the given model.
|
||||
|
||||
This builds upon previous additions, so you can limit to multiple models
|
||||
by chaining this method several times.
|
||||
|
||||
``set_limits``
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.set_limits(self, low=None, high=None)
|
||||
|
||||
Restricts the query by altering either the start, end or both offsets.
|
||||
|
||||
``clear_limits``
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.clear_limits(self)
|
||||
|
||||
Clears any existing limits.
|
||||
|
||||
``add_boost``
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.add_boost(self, term, boost_value)
|
||||
|
||||
Adds a boosted term and the amount to boost it to the query.
|
||||
|
||||
``raw_search``
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.raw_search(self, query_string, **kwargs)
|
||||
|
||||
Runs a raw query (no parsing) against the backend.
|
||||
|
||||
This method causes the ``SearchQuery`` to ignore the standard query-generating
|
||||
facilities, running only what was provided instead.
|
||||
|
||||
Note that any kwargs passed along will override anything provided
|
||||
to the rest of the ``SearchQuerySet``.
|
||||
|
||||
``more_like_this``
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.more_like_this(self, model_instance)
|
||||
|
||||
Allows backends with support for "More Like This" to return results
|
||||
similar to the provided instance.
|
||||
|
||||
``add_stats_query``
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
.. method:: SearchQuery.add_stats_query(self,stats_field,stats_facets)
|
||||
|
||||
Adds stats and stats_facets queries for the Solr backend.
|
||||
|
||||
``add_highlight``
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.add_highlight(self)
|
||||
|
||||
Adds highlighting to the search results.
|
||||
|
||||
``add_within``
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.add_within(self, field, point_1, point_2):
|
||||
|
||||
Adds bounding box parameters to search query.
|
||||
|
||||
``add_dwithin``
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.add_dwithin(self, field, point, distance):
|
||||
|
||||
Adds radius-based parameters to search query.
|
||||
|
||||
``add_distance``
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.add_distance(self, field, point):
|
||||
|
||||
Denotes that results should include distance measurements from the
|
||||
point passed in.
|
||||
|
||||
``add_field_facet``
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.add_field_facet(self, field, **options)
|
||||
|
||||
Adds a regular facet on a field.
|
||||
|
||||
``add_date_facet``
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.add_date_facet(self, field, start_date, end_date, gap_by, gap_amount)
|
||||
|
||||
Adds a date-based facet on a field.
|
||||
|
||||
``add_query_facet``
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.add_query_facet(self, field, query)
|
||||
|
||||
Adds a query facet on a field.
|
||||
|
||||
``add_narrow_query``
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.add_narrow_query(self, query)
|
||||
|
||||
Narrows a search to a subset of all documents per the query.
|
||||
|
||||
Generally used in conjunction with faceting.
|
||||
|
||||
``set_result_class``
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.set_result_class(self, klass)
|
||||
|
||||
Sets the result class to use for results.
|
||||
|
||||
Overrides any previous usages. If ``None`` is provided, Haystack will
|
||||
revert back to the default ``SearchResult`` object.
|
||||
|
||||
``using``
|
||||
~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuery.using(self, using=None)
|
||||
|
||||
Allows for overriding which connection should be used. This
|
||||
disables the use of routers when performing the query.
|
||||
|
||||
If ``None`` is provided, it has no effect on what backend is used.
|
|
@ -0,0 +1,893 @@
|
|||
.. _ref-searchqueryset-api:
|
||||
|
||||
======================
|
||||
``SearchQuerySet`` API
|
||||
======================
|
||||
|
||||
.. class:: SearchQuerySet(using=None, query=None)
|
||||
|
||||
The ``SearchQuerySet`` class is designed to make performing a search and
|
||||
iterating over its results easy and consistent. For those familiar with Django's
|
||||
ORM ``QuerySet``, much of the ``SearchQuerySet`` API should feel familiar.
|
||||
|
||||
|
||||
Why Follow ``QuerySet``?
|
||||
========================
|
||||
|
||||
A couple reasons to follow (at least in part) the ``QuerySet`` API:
|
||||
|
||||
#. Consistency with Django
|
||||
#. Most Django programmers have experience with the ORM and can use this
|
||||
knowledge with ``SearchQuerySet``.
|
||||
|
||||
And from a high-level perspective, ``QuerySet`` and ``SearchQuerySet`` do very similar
|
||||
things: given certain criteria, provide a set of results. Both are powered by
|
||||
multiple backends, both are abstractions on top of the way a query is performed.
|
||||
|
||||
|
||||
Quick Start
|
||||
===========
|
||||
|
||||
For the impatient::
|
||||
|
||||
from haystack.query import SearchQuerySet
|
||||
all_results = SearchQuerySet().all()
|
||||
hello_results = SearchQuerySet().filter(content='hello')
|
||||
hello_world_results = SearchQuerySet().filter(content='hello world')
|
||||
unfriendly_results = SearchQuerySet().exclude(content='hello').filter(content='world')
|
||||
recent_results = SearchQuerySet().order_by('-pub_date')[:5]
|
||||
|
||||
# Using the new input types...
|
||||
from haystack.inputs import AutoQuery, Exact, Clean
|
||||
sqs = SearchQuerySet().filter(content=AutoQuery(request.GET['q']), product_type=Exact('ancient book'))
|
||||
|
||||
if request.GET['product_url']:
|
||||
sqs = sqs.filter(product_url=Clean(request.GET['product_url']))
|
||||
|
||||
For more on the ``AutoQuery``, ``Exact``, ``Clean`` classes & friends, see the
|
||||
:ref:`ref-inputtypes` documentation.
|
||||
|
||||
|
||||
``SearchQuerySet``
|
||||
==================
|
||||
|
||||
By default, ``SearchQuerySet`` provide the documented functionality. You can
|
||||
extend with your own behavior by simply subclassing from ``SearchQuerySet`` and
|
||||
adding what you need, then using your subclass in place of ``SearchQuerySet``.
|
||||
|
||||
Most methods in ``SearchQuerySet`` "chain" in a similar fashion to ``QuerySet``.
|
||||
Additionally, like ``QuerySet``, ``SearchQuerySet`` is lazy (meaning it evaluates the
|
||||
query as late as possible). So the following is valid::
|
||||
|
||||
from haystack.query import SearchQuerySet
|
||||
results = SearchQuerySet().exclude(content='hello').filter(content='world').order_by('-pub_date').boost('title', 0.5)[10:20]
|
||||
|
||||
|
||||
The ``content`` Shortcut
|
||||
========================
|
||||
|
||||
Searching your document fields is a very common activity. To help mitigate
|
||||
possible differences in ``SearchField`` names (and to help the backends deal
|
||||
with search queries that inspect the main corpus), there is a special field
|
||||
called ``content``. You may use this in any place that other fields names would
|
||||
work (e.g. ``filter``, ``exclude``, etc.) to indicate you simply want to
|
||||
search the main documents.
|
||||
|
||||
For example::
|
||||
|
||||
from haystack.query import SearchQuerySet
|
||||
|
||||
# This searches whatever fields were marked ``document=True``.
|
||||
results = SearchQuerySet().exclude(content='hello')
|
||||
|
||||
This special pseudo-field works best with the ``exact`` lookup and may yield
|
||||
strange or unexpected results with the other lookups.
|
||||
|
||||
|
||||
``SearchQuerySet`` Methods
|
||||
==========================
|
||||
|
||||
The primary interface to search in Haystack is through the ``SearchQuerySet``
|
||||
object. It provides a clean, programmatic, portable API to the search backend.
|
||||
Many aspects are also "chainable", meaning you can call methods one after another, each
|
||||
applying their changes to the previous ``SearchQuerySet`` and further narrowing
|
||||
the search.
|
||||
|
||||
All ``SearchQuerySet`` objects implement a list-like interface, meaning you can
|
||||
perform actions like getting the length of the results, accessing a result at an
|
||||
offset or even slicing the result list.
|
||||
|
||||
|
||||
Methods That Return A ``SearchQuerySet``
|
||||
----------------------------------------
|
||||
|
||||
``all``
|
||||
~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.all(self):
|
||||
|
||||
Returns all results for the query. This is largely a no-op (returns an identical
|
||||
copy) but useful for denoting exactly what behavior is going on.
|
||||
|
||||
``none``
|
||||
~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.none(self):
|
||||
|
||||
Returns an ``EmptySearchQuerySet`` that behaves like a ``SearchQuerySet`` but
|
||||
always yields no results.
|
||||
|
||||
``filter``
|
||||
~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.filter(self, **kwargs)
|
||||
|
||||
Filters the search by looking for (and including) certain attributes.
|
||||
|
||||
The lookup parameters (``**kwargs``) should follow the `Field lookups`_ below.
|
||||
If you specify more than one pair, they will be joined in the query according to
|
||||
the ``HAYSTACK_DEFAULT_OPERATOR`` setting (defaults to ``AND``).
|
||||
|
||||
You can pass it either strings or a variety of :ref:`ref-inputtypes` if you
|
||||
need more advanced query behavior.
|
||||
|
||||
.. warning::
|
||||
|
||||
Any data you pass to ``filter`` gets auto-escaped. If you need to send
|
||||
non-escaped data, use the ``Raw`` input type (:ref:`ref-inputtypes`).
|
||||
|
||||
Also, if a string with one or more spaces in it is specified as the value, the
|
||||
string will get passed along **AS IS**. This will mean that it will **NOT**
|
||||
be treated as a phrase (like Haystack 1.X's behavior).
|
||||
|
||||
If you want to match a phrase, you should use either the ``__exact`` filter
|
||||
type or the ``Exact`` input type (:ref:`ref-inputtypes`).
|
||||
|
||||
Examples::
|
||||
|
||||
sqs = SearchQuerySet().filter(content='foo')
|
||||
|
||||
sqs = SearchQuerySet().filter(content='foo', pub_date__lte=datetime.date(2008, 1, 1))
|
||||
|
||||
# Identical to the previous example.
|
||||
sqs = SearchQuerySet().filter(content='foo').filter(pub_date__lte=datetime.date(2008, 1, 1))
|
||||
|
||||
# To send unescaped data:
|
||||
from haystack.inputs import Raw
|
||||
sqs = SearchQuerySet().filter(title=Raw(trusted_query))
|
||||
|
||||
# To use auto-query behavior on a non-``document=True`` field.
|
||||
from haystack.inputs import AutoQuery
|
||||
sqs = SearchQuerySet().filter(title=AutoQuery(user_query))
|
||||
|
||||
|
||||
``exclude``
|
||||
~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.exclude(self, **kwargs)
|
||||
|
||||
Narrows the search by ensuring certain attributes are not included.
|
||||
|
||||
.. warning::
|
||||
|
||||
Any data you pass to ``exclude`` gets auto-escaped. If you need to send
|
||||
non-escaped data, use the ``Raw`` input type (:ref:`ref-inputtypes`).
|
||||
|
||||
Example::
|
||||
|
||||
sqs = SearchQuerySet().exclude(content='foo')
|
||||
|
||||
|
||||
``filter_and``
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.filter_and(self, **kwargs)
|
||||
|
||||
Narrows the search by looking for (and including) certain attributes. Join
|
||||
behavior in the query is forced to be ``AND``. Used primarily by the ``filter``
|
||||
method.
|
||||
|
||||
``filter_or``
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.filter_or(self, **kwargs)
|
||||
|
||||
Narrows the search by looking for (and including) certain attributes. Join
|
||||
behavior in the query is forced to be ``OR``. Used primarily by the ``filter``
|
||||
method.
|
||||
|
||||
``order_by``
|
||||
~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.order_by(self, *args)
|
||||
|
||||
Alters the order in which the results should appear. Arguments should be strings
|
||||
that map to the attributes/fields within the index. You may specify multiple
|
||||
fields by comma separating them::
|
||||
|
||||
SearchQuerySet().filter(content='foo').order_by('author', 'pub_date')
|
||||
|
||||
Default behavior is ascending order. To specify descending order, prepend the
|
||||
string with a ``-``::
|
||||
|
||||
SearchQuerySet().filter(content='foo').order_by('-pub_date')
|
||||
|
||||
.. note::
|
||||
|
||||
In general, ordering is locale-specific. Haystack makes no effort to try to
|
||||
reconcile differences between characters from different languages. This
|
||||
means that accented characters will sort closely with the same character
|
||||
and **NOT** necessarily close to the unaccented form of the character.
|
||||
|
||||
If you want this kind of behavior, you should override the ``prepare_FOO``
|
||||
methods on your ``SearchIndex`` objects to transliterate the characters
|
||||
as you see fit.
|
||||
|
||||
``highlight``
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.highlight(self)
|
||||
|
||||
If supported by the backend, the ``SearchResult`` objects returned will include
|
||||
a highlighted version of the result::
|
||||
|
||||
sqs = SearchQuerySet().filter(content='foo').highlight()
|
||||
result = sqs[0]
|
||||
result.highlighted['text'][0] # u'Two computer scientists walk into a bar. The bartender says "<em>Foo</em>!".'
|
||||
|
||||
``models``
|
||||
~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.models(self, *models)
|
||||
|
||||
Accepts an arbitrary number of Model classes to include in the search. This will
|
||||
narrow the search results to only include results from the models specified.
|
||||
|
||||
Example::
|
||||
|
||||
SearchQuerySet().filter(content='foo').models(BlogEntry, Comment)
|
||||
|
||||
``result_class``
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.result_class(self, klass)
|
||||
|
||||
Allows specifying a different class to use for results.
|
||||
|
||||
Overrides any previous usages. If ``None`` is provided, Haystack will
|
||||
revert back to the default ``SearchResult`` object.
|
||||
|
||||
Example::
|
||||
|
||||
SearchQuerySet().result_class(CustomResult)
|
||||
|
||||
``boost``
|
||||
~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.boost(self, term, boost_value)
|
||||
|
||||
Boosts a certain term of the query. You provide the term to be boosted and the
|
||||
value is the amount to boost it by. Boost amounts may be either an integer or a
|
||||
float.
|
||||
|
||||
Example::
|
||||
|
||||
SearchQuerySet().filter(content='foo').boost('bar', 1.5)
|
||||
|
||||
``facet``
|
||||
~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.facet(self, field, **options)
|
||||
|
||||
Adds faceting to a query for the provided field. You provide the field (from one
|
||||
of the ``SearchIndex`` classes) you like to facet on. Any keyword options you
|
||||
provide will be passed along to the backend for that facet.
|
||||
|
||||
Example::
|
||||
|
||||
# For SOLR (setting f.author.facet.*; see http://wiki.apache.org/solr/SimpleFacetParameters#Parameters)
|
||||
SearchQuerySet().facet('author', mincount=1, limit=10)
|
||||
# For ElasticSearch (see http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html)
|
||||
SearchQuerySet().facet('author', size=10, order='term')
|
||||
|
||||
In the search results you get back, facet counts will be populated in the
|
||||
``SearchResult`` object. You can access them via the ``facet_counts`` method.
|
||||
|
||||
Example::
|
||||
|
||||
# Count document hits for each author within the index.
|
||||
SearchQuerySet().filter(content='foo').facet('author')
|
||||
|
||||
``date_facet``
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.date_facet(self, field, start_date, end_date, gap_by, gap_amount=1)
|
||||
|
||||
Adds faceting to a query for the provided field by date. You provide the field
|
||||
(from one of the ``SearchIndex`` classes) you like to facet on, a ``start_date``
|
||||
(either ``datetime.datetime`` or ``datetime.date``), an ``end_date`` and the
|
||||
amount of time between gaps as ``gap_by`` (one of ``'year'``, ``'month'``,
|
||||
``'day'``, ``'hour'``, ``'minute'`` or ``'second'``).
|
||||
|
||||
You can also optionally provide a ``gap_amount`` to specify a different
|
||||
increment than ``1``. For example, specifying gaps by week (every seven days)
|
||||
would be ``gap_by='day', gap_amount=7``).
|
||||
|
||||
In the search results you get back, facet counts will be populated in the
|
||||
``SearchResult`` object. You can access them via the ``facet_counts`` method.
|
||||
|
||||
Example::
|
||||
|
||||
# Count document hits for each day between 2009-06-07 to 2009-07-07 within the index.
|
||||
SearchQuerySet().filter(content='foo').date_facet('pub_date', start_date=datetime.date(2009, 6, 7), end_date=datetime.date(2009, 7, 7), gap_by='day')
|
||||
|
||||
``query_facet``
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.query_facet(self, field, query)
|
||||
|
||||
Adds faceting to a query for the provided field with a custom query. You provide
|
||||
the field (from one of the ``SearchIndex`` classes) you like to facet on and the
|
||||
backend-specific query (as a string) you'd like to execute.
|
||||
|
||||
Please note that this is **NOT** portable between backends. The syntax is entirely
|
||||
dependent on the backend. No validation/cleansing is performed and it is up to
|
||||
the developer to ensure the query's syntax is correct.
|
||||
|
||||
In the search results you get back, facet counts will be populated in the
|
||||
``SearchResult`` object. You can access them via the ``facet_counts`` method.
|
||||
|
||||
Example::
|
||||
|
||||
# Count document hits for authors that start with 'jo' within the index.
|
||||
SearchQuerySet().filter(content='foo').query_facet('author', 'jo*')
|
||||
|
||||
``within``
|
||||
~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.within(self, field, point_1, point_2):
|
||||
|
||||
Spatial: Adds a bounding box search to the query.
|
||||
|
||||
See the :ref:`ref-spatial` docs for more information.
|
||||
|
||||
``dwithin``
|
||||
~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.dwithin(self, field, point, distance):
|
||||
|
||||
Spatial: Adds a distance-based search to the query.
|
||||
|
||||
See the :ref:`ref-spatial` docs for more information.
|
||||
|
||||
``stats``
|
||||
~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.stats(self, field):
|
||||
|
||||
Adds stats to a query for the provided field. This is supported on
|
||||
Solr only. You provide the field (from one of the ``SearchIndex``
|
||||
classes) you would like stats on.
|
||||
|
||||
In the search results you get back, stats will be populated in the
|
||||
``SearchResult`` object. You can access them via the `` stats_results`` method.
|
||||
|
||||
Example::
|
||||
|
||||
# Get stats on the author field.
|
||||
SearchQuerySet().filter(content='foo').stats('author')
|
||||
|
||||
``stats_facet``
|
||||
~~~~~~~~~~~~~~~
|
||||
.. method:: SearchQuerySet.stats_facet(self, field,
|
||||
.. facet_fields=None):
|
||||
|
||||
Adds stats facet for the given field and facet_fields represents the
|
||||
faceted fields. This is supported on Solr only.
|
||||
|
||||
Example::
|
||||
|
||||
# Get stats on the author field, and stats on the author field
|
||||
faceted by bookstore.
|
||||
SearchQuerySet().filter(content='foo').stats_facet('author','bookstore')
|
||||
|
||||
|
||||
``distance``
|
||||
~~~~~~~~~~~~
|
||||
.. method:: SearchQuerySet.distance(self, field, point):
|
||||
|
||||
Spatial: Denotes results must have distance measurements from the
|
||||
provided point.
|
||||
|
||||
See the :ref:`ref-spatial` docs for more information.
|
||||
|
||||
``narrow``
|
||||
~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.narrow(self, query)
|
||||
|
||||
Pulls a subset of documents from the search engine to search within. This is
|
||||
for advanced usage, especially useful when faceting.
|
||||
|
||||
Example::
|
||||
|
||||
# Search, from recipes containing 'blend', for recipes containing 'banana'.
|
||||
SearchQuerySet().narrow('blend').filter(content='banana')
|
||||
|
||||
# Using a fielded search where the recipe's title contains 'smoothie', find all recipes published before 2009.
|
||||
SearchQuerySet().narrow('title:smoothie').filter(pub_date__lte=datetime.datetime(2009, 1, 1))
|
||||
|
||||
By using ``narrow``, you can create drill-down interfaces for faceting by
|
||||
applying ``narrow`` calls for each facet that gets selected.
|
||||
|
||||
This method is different from ``SearchQuerySet.filter()`` in that it does not
|
||||
affect the query sent to the engine. It pre-limits the document set being
|
||||
searched. Generally speaking, if you're in doubt of whether to use
|
||||
``filter`` or ``narrow``, use ``filter``.
|
||||
|
||||
.. note::
|
||||
|
||||
This method is, generally speaking, not necessarily portable between
|
||||
backends. The syntax is entirely dependent on the backend, though most
|
||||
backends have a similar syntax for basic fielded queries. No
|
||||
validation/cleansing is performed and it is up to the developer to ensure
|
||||
the query's syntax is correct.
|
||||
|
||||
``raw_search``
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.raw_search(self, query_string, **kwargs)
|
||||
|
||||
Passes a raw query directly to the backend. This is for advanced usage, where
|
||||
the desired query can not be expressed via ``SearchQuerySet``.
|
||||
|
||||
This method is still supported, however it now uses the much more flexible
|
||||
``Raw`` input type (:ref:`ref-inputtypes`).
|
||||
|
||||
.. warning::
|
||||
|
||||
Different from Haystack 1.X, this method no longer causes immediate
|
||||
evaluation & now chains appropriately.
|
||||
|
||||
Example::
|
||||
|
||||
# In the case of Solr... (this example could be expressed with SearchQuerySet)
|
||||
SearchQuerySet().raw_search('django_ct:blog.blogentry "However, it is"')
|
||||
|
||||
# Equivalent.
|
||||
from haystack.inputs import Raw
|
||||
sqs = SearchQuerySet().filter(content=Raw('django_ct:blog.blogentry "However, it is"'))
|
||||
|
||||
Please note that this is **NOT** portable between backends. The syntax is entirely
|
||||
dependent on the backend. No validation/cleansing is performed and it is up to
|
||||
the developer to ensure the query's syntax is correct.
|
||||
|
||||
Further, the use of ``**kwargs`` are completely undocumented intentionally. If
|
||||
a third-party backend can implement special features beyond what's present, it
|
||||
should use those ``**kwargs`` for passing that information. Developers should
|
||||
be careful to make sure there are no conflicts with the backend's ``search``
|
||||
method, as that is called directly.
|
||||
|
||||
``load_all``
|
||||
~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.load_all(self)
|
||||
|
||||
Efficiently populates the objects in the search results. Without using this
|
||||
method, DB lookups are done on a per-object basis, resulting in many individual
|
||||
trips to the database. If ``load_all`` is used, the ``SearchQuerySet`` will
|
||||
group similar objects into a single query, resulting in only as many queries as
|
||||
there are different object types returned.
|
||||
|
||||
Example::
|
||||
|
||||
SearchQuerySet().filter(content='foo').load_all()
|
||||
|
||||
``auto_query``
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.auto_query(self, query_string, fieldname=None)
|
||||
|
||||
Performs a best guess constructing the search query.
|
||||
|
||||
This method is intended for common use directly with a user's query. This
|
||||
method is still supported, however it now uses the much more flexible
|
||||
``AutoQuery`` input type (:ref:`ref-inputtypes`).
|
||||
|
||||
It handles exact matches (specified with single or double quotes), negation (
|
||||
using a ``-`` immediately before the term) and joining remaining terms with the
|
||||
operator specified in ``HAYSTACK_DEFAULT_OPERATOR``.
|
||||
|
||||
Example::
|
||||
|
||||
sqs = SearchQuerySet().auto_query('goldfish "old one eye" -tank')
|
||||
|
||||
# Equivalent.
|
||||
from haystack.inputs import AutoQuery
|
||||
sqs = SearchQuerySet().filter(content=AutoQuery('goldfish "old one eye" -tank'))
|
||||
|
||||
# Against a different field.
|
||||
sqs = SearchQuerySet().filter(title=AutoQuery('goldfish "old one eye" -tank'))
|
||||
|
||||
|
||||
``autocomplete``
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
A shortcut method to perform an autocomplete search.
|
||||
|
||||
Must be run against fields that are either ``NgramField`` or
|
||||
``EdgeNgramField``.
|
||||
|
||||
Example::
|
||||
|
||||
SearchQuerySet().autocomplete(title_autocomplete='gol')
|
||||
|
||||
``more_like_this``
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.more_like_this(self, model_instance)
|
||||
|
||||
Finds similar results to the object passed in.
|
||||
|
||||
You should pass in an instance of a model (for example, one fetched via a
|
||||
``get`` in Django's ORM). This will execute a query on the backend that searches
|
||||
for similar results. The instance you pass in should be an indexed object.
|
||||
Previously called methods will have an effect on the provided results.
|
||||
|
||||
It will evaluate its own backend-specific query and populate the
|
||||
``SearchQuerySet`` in the same manner as other methods.
|
||||
|
||||
Example::
|
||||
|
||||
entry = Entry.objects.get(slug='haystack-one-oh-released')
|
||||
mlt = SearchQuerySet().more_like_this(entry)
|
||||
mlt.count() # 5
|
||||
mlt[0].object.title # "Haystack Beta 1 Released"
|
||||
|
||||
# ...or...
|
||||
mlt = SearchQuerySet().filter(public=True).exclude(pub_date__lte=datetime.date(2009, 7, 21)).more_like_this(entry)
|
||||
mlt.count() # 2
|
||||
mlt[0].object.title # "Haystack Beta 1 Released"
|
||||
|
||||
``using``
|
||||
~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.using(self, connection_name)
|
||||
|
||||
Allows switching which connection the ``SearchQuerySet`` uses to search in.
|
||||
|
||||
Example::
|
||||
|
||||
# Let the routers decide which connection to use.
|
||||
sqs = SearchQuerySet().all()
|
||||
|
||||
# Specify the 'default'.
|
||||
sqs = SearchQuerySet().all().using('default')
|
||||
|
||||
|
||||
Methods That Do Not Return A ``SearchQuerySet``
|
||||
-----------------------------------------------
|
||||
|
||||
``count``
|
||||
~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.count(self)
|
||||
|
||||
Returns the total number of matching results.
|
||||
|
||||
This returns an integer count of the total number of results the search backend
|
||||
found that matched. This method causes the query to evaluate and run the search.
|
||||
|
||||
Example::
|
||||
|
||||
SearchQuerySet().filter(content='foo').count()
|
||||
|
||||
``best_match``
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.best_match(self)
|
||||
|
||||
Returns the best/top search result that matches the query.
|
||||
|
||||
This method causes the query to evaluate and run the search. This method returns
|
||||
a ``SearchResult`` object that is the best match the search backend found::
|
||||
|
||||
foo = SearchQuerySet().filter(content='foo').best_match()
|
||||
foo.id # Something like 5.
|
||||
|
||||
# Identical to:
|
||||
foo = SearchQuerySet().filter(content='foo')[0]
|
||||
|
||||
``latest``
|
||||
~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.latest(self, date_field)
|
||||
|
||||
Returns the most recent search result that matches the query.
|
||||
|
||||
This method causes the query to evaluate and run the search. This method returns
|
||||
a ``SearchResult`` object that is the most recent match the search backend
|
||||
found::
|
||||
|
||||
foo = SearchQuerySet().filter(content='foo').latest('pub_date')
|
||||
foo.id # Something like 3.
|
||||
|
||||
# Identical to:
|
||||
foo = SearchQuerySet().filter(content='foo').order_by('-pub_date')[0]
|
||||
|
||||
``facet_counts``
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.facet_counts(self)
|
||||
|
||||
Returns the facet counts found by the query. This will cause the query to
|
||||
execute and should generally be used when presenting the data (template-level).
|
||||
|
||||
You receive back a dictionary with three keys: ``fields``, ``dates`` and
|
||||
``queries``. Each contains the facet counts for whatever facets you specified
|
||||
within your ``SearchQuerySet``.
|
||||
|
||||
.. note::
|
||||
|
||||
The resulting dictionary may change before 1.0 release. It's fairly
|
||||
backend-specific at the time of writing. Standardizing is waiting on
|
||||
implementing other backends that support faceting and ensuring that the
|
||||
results presented will meet their needs as well.
|
||||
|
||||
Example::
|
||||
|
||||
# Count document hits for each author.
|
||||
sqs = SearchQuerySet().filter(content='foo').facet('author')
|
||||
|
||||
sqs.facet_counts()
|
||||
# Gives the following response:
|
||||
# {
|
||||
# 'dates': {},
|
||||
# 'fields': {
|
||||
# 'author': [
|
||||
# ('john', 4),
|
||||
# ('daniel', 2),
|
||||
# ('sally', 1),
|
||||
# ('terry', 1),
|
||||
# ],
|
||||
# },
|
||||
# 'queries': {}
|
||||
# }
|
||||
|
||||
``stats_results``
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.stats_results(self):
|
||||
|
||||
Returns the stats results found by the query.
|
||||
|
||||
This will cause the query to execute and should generally be used when
|
||||
presenting the data (template-level).
|
||||
|
||||
You receive back a dictionary with three keys: ``fields``, ``dates`` and
|
||||
``queries``. Each contains the facet counts for whatever facets you specified
|
||||
within your ``SearchQuerySet``.
|
||||
|
||||
.. note::
|
||||
|
||||
The resulting dictionary may change before 1.0 release. It's fairly
|
||||
backend-specific at the time of writing. Standardizing is waiting on
|
||||
implementing other backends that support faceting and ensuring that the
|
||||
results presented will meet their needs as well.
|
||||
|
||||
Example::
|
||||
|
||||
# Count document hits for each author.
|
||||
sqs = SearchQuerySet().filter(content='foo').stats('price')
|
||||
|
||||
sqs.stats_results()
|
||||
|
||||
# Gives the following response
|
||||
# {
|
||||
# 'stats_fields':{
|
||||
# 'author:{
|
||||
# 'min': 0.0,
|
||||
# 'max': 2199.0,
|
||||
# 'sum': 5251.2699999999995,
|
||||
# 'count': 15,
|
||||
# 'missing': 11,
|
||||
# 'sumOfSquares': 6038619.160300001,
|
||||
# 'mean': 350.08466666666664,
|
||||
# 'stddev': 547.737557906113
|
||||
# }
|
||||
# }
|
||||
#
|
||||
# }
|
||||
|
||||
|
||||
``spelling_suggestion``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.spelling_suggestion(self, preferred_query=None)
|
||||
|
||||
Returns the spelling suggestion found by the query.
|
||||
|
||||
To work, you must set ``INCLUDE_SPELLING`` within your connection's
|
||||
settings dictionary to ``True``, and you must rebuild your index afterwards.
|
||||
Otherwise, ``None`` will be returned.
|
||||
|
||||
This method causes the query to evaluate and run the search if it hasn't already
|
||||
run. Search results will be populated as normal but with an additional spelling
|
||||
suggestion. Note that this does *NOT* run the revised query, only suggests
|
||||
improvements.
|
||||
|
||||
If provided, the optional argument to this method lets you specify an alternate
|
||||
query for the spelling suggestion to be run on. This is useful for passing along
|
||||
a raw user-provided query, especially when there are many methods chained on the
|
||||
``SearchQuerySet``.
|
||||
|
||||
Example::
|
||||
|
||||
sqs = SearchQuerySet().auto_query('mor exmples')
|
||||
sqs.spelling_suggestion() # u'more examples'
|
||||
|
||||
# ...or...
|
||||
suggestion = SearchQuerySet().spelling_suggestion('moar exmples')
|
||||
suggestion # u'more examples'
|
||||
|
||||
``values``
|
||||
~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.values(self, *fields)
|
||||
|
||||
Returns a list of dictionaries, each containing the key/value pairs for the
|
||||
result, exactly like Django's ``ValuesQuerySet``.
|
||||
|
||||
This method causes the query to evaluate and run the search if it hasn't already
|
||||
run.
|
||||
|
||||
You must provide a list of one or more fields as arguments. These fields will
|
||||
be the ones included in the individual results.
|
||||
|
||||
Example::
|
||||
|
||||
sqs = SearchQuerySet().auto_query('banana').values('title', 'description')
|
||||
|
||||
|
||||
``values_list``
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
.. method:: SearchQuerySet.values_list(self, *fields, **kwargs)
|
||||
|
||||
Returns a list of field values as tuples, exactly like Django's
|
||||
``ValuesListQuerySet``.
|
||||
|
||||
This method causes the query to evaluate and run the search if it hasn't already
|
||||
run.
|
||||
|
||||
You must provide a list of one or more fields as arguments. These fields will
|
||||
be the ones included in the individual results.
|
||||
|
||||
You may optionally also provide a ``flat=True`` kwarg, which in the case of a
|
||||
single field being provided, will return a flat list of that field rather than
|
||||
a list of tuples.
|
||||
|
||||
Example::
|
||||
|
||||
sqs = SearchQuerySet().auto_query('banana').values_list('title', 'description')
|
||||
|
||||
# ...or just the titles as a flat list...
|
||||
sqs = SearchQuerySet().auto_query('banana').values_list('title', flat=True)
|
||||
|
||||
|
||||
.. _field-lookups:
|
||||
|
||||
Field Lookups
|
||||
-------------
|
||||
|
||||
The following lookup types are supported:
|
||||
|
||||
* contains
|
||||
* exact
|
||||
* gt
|
||||
* gte
|
||||
* lt
|
||||
* lte
|
||||
* in
|
||||
* startswith
|
||||
* range
|
||||
|
||||
These options are similar in function to the way Django's lookup types work.
|
||||
The actual behavior of these lookups is backend-specific.
|
||||
|
||||
.. warning::
|
||||
|
||||
The ``startswith`` filter is strongly affected by the other ways the engine
|
||||
parses data, especially in regards to stemming (see :doc:`glossary`). This
|
||||
can mean that if the query ends in a vowel or a plural form, it may get
|
||||
stemmed before being evaluated.
|
||||
|
||||
This is both backend-specific and yet fairly consistent between engines,
|
||||
and may be the cause of sometimes unexpected results.
|
||||
|
||||
.. warning::
|
||||
|
||||
The ``contains`` filter became the new default filter as of Haystack v2.X
|
||||
(the default in Haystack v1.X was ``exact``). This changed because ``exact``
|
||||
caused problems and was unintuitive for new people trying to use Haystack.
|
||||
``contains`` is a much more natural usage.
|
||||
|
||||
If you had an app built on Haystack v1.X & are upgrading, you'll need to
|
||||
sanity-check & possibly change any code that was relying on the default.
|
||||
The solution is just to add ``__exact`` to any "bare" field in a
|
||||
``.filter(...)`` clause.
|
||||
|
||||
Example::
|
||||
|
||||
SearchQuerySet().filter(content='foo')
|
||||
|
||||
# Identical to:
|
||||
SearchQuerySet().filter(content__contains='foo')
|
||||
|
||||
# Phrase matching.
|
||||
SearchQuerySet().filter(content__exact='hello world')
|
||||
|
||||
# Other usages look like:
|
||||
SearchQuerySet().filter(pub_date__gte=datetime.date(2008, 1, 1), pub_date__lt=datetime.date(2009, 1, 1))
|
||||
SearchQuerySet().filter(author__in=['daniel', 'john', 'jane'])
|
||||
SearchQuerySet().filter(view_count__range=[3, 5])
|
||||
|
||||
|
||||
``EmptySearchQuerySet``
|
||||
=======================
|
||||
|
||||
Also included in Haystack is an ``EmptySearchQuerySet`` class. It behaves just
|
||||
like ``SearchQuerySet`` but will always return zero results. This is useful for
|
||||
places where you want no query to occur or results to be returned.
|
||||
|
||||
|
||||
``RelatedSearchQuerySet``
|
||||
=========================
|
||||
|
||||
Sometimes you need to filter results based on relations in the database that are
|
||||
not present in the search index or are difficult to express that way. To this
|
||||
end, ``RelatedSearchQuerySet`` allows you to post-process the search results by
|
||||
calling ``load_all_queryset``.
|
||||
|
||||
.. warning::
|
||||
|
||||
``RelatedSearchQuerySet`` can have negative performance implications.
|
||||
Because results are excluded based on the database after the search query
|
||||
has been run, you can't guarantee offsets within the cache. Therefore, the
|
||||
entire cache that appears before the offset you request must be filled in
|
||||
order to produce consistent results. On large result sets and at higher
|
||||
slices, this can take time.
|
||||
|
||||
This is the old behavior of ``SearchQuerySet``, so performance is no worse
|
||||
than the early days of Haystack.
|
||||
|
||||
It supports all other methods that the standard ``SearchQuerySet`` does, with
|
||||
the addition of the ``load_all_queryset`` method and paying attention to the
|
||||
``load_all_queryset`` method of ``SearchIndex`` objects when populating the
|
||||
cache.
|
||||
|
||||
``load_all_queryset``
|
||||
---------------------
|
||||
|
||||
.. method:: RelatedSearchQuerySet.load_all_queryset(self, model_class, queryset)
|
||||
|
||||
Allows for specifying a custom ``QuerySet`` that changes how ``load_all`` will
|
||||
fetch records for the provided model. This is useful for post-processing the
|
||||
results from the query, enabling things like adding ``select_related`` or
|
||||
filtering certain data.
|
||||
|
||||
Example::
|
||||
|
||||
sqs = RelatedSearchQuerySet().filter(content='foo').load_all()
|
||||
# For the Entry model, we want to include related models directly associated
|
||||
# with the Entry to save on DB queries.
|
||||
sqs = sqs.load_all_queryset(Entry, Entry.objects.all().select_related(depth=1))
|
||||
|
||||
This method chains indefinitely, so you can specify ``QuerySets`` for as many
|
||||
models as you wish, one per model. The ``SearchQuerySet`` appends on a call to
|
||||
``in_bulk``, so be sure that the ``QuerySet`` you provide can accommodate this
|
||||
and that the ids passed to ``in_bulk`` will map to the model in question.
|
||||
|
||||
If you need to do this frequently and have one ``QuerySet`` you'd like to apply
|
||||
everywhere, you can specify this at the ``SearchIndex`` level using the
|
||||
``load_all_queryset`` method. See :doc:`searchindex_api` for usage.
|
|
@ -0,0 +1,62 @@
|
|||
.. _ref-searchresult-api:
|
||||
|
||||
====================
|
||||
``SearchResult`` API
|
||||
====================
|
||||
|
||||
.. class:: SearchResult(app_label, model_name, pk, score, **kwargs)
|
||||
|
||||
The ``SearchResult`` class provides structure to the results that come back from
|
||||
the search index. These objects are what a ``SearchQuerySet`` will return when
|
||||
evaluated.
|
||||
|
||||
|
||||
Attribute Reference
|
||||
===================
|
||||
|
||||
The class exposes the following useful attributes/properties:
|
||||
|
||||
* ``app_label`` - The application the model is attached to.
|
||||
* ``model_name`` - The model's name.
|
||||
* ``pk`` - The primary key of the model.
|
||||
* ``score`` - The score provided by the search engine.
|
||||
* ``object`` - The actual model instance (lazy loaded).
|
||||
* ``model`` - The model class.
|
||||
* ``verbose_name`` - A prettier version of the model's class name for display.
|
||||
* ``verbose_name_plural`` - A prettier version of the model's *plural* class name for display.
|
||||
* ``searchindex`` - Returns the ``SearchIndex`` class associated with this
|
||||
result.
|
||||
* ``distance`` - On geo-spatial queries, this returns a ``Distance`` object
|
||||
representing the distance the result was from the focused point.
|
||||
|
||||
|
||||
Method Reference
|
||||
================
|
||||
|
||||
``content_type``
|
||||
----------------
|
||||
|
||||
.. method:: SearchResult.content_type(self)
|
||||
|
||||
Returns the content type for the result's model instance.
|
||||
|
||||
``get_additional_fields``
|
||||
-------------------------
|
||||
|
||||
.. method:: SearchResult.get_additional_fields(self)
|
||||
|
||||
Returns a dictionary of all of the fields from the raw result.
|
||||
|
||||
Useful for serializing results. Only returns what was seen from the
|
||||
search engine, so it may have extra fields Haystack's indexes aren't
|
||||
aware of.
|
||||
|
||||
``get_stored_fields``
|
||||
---------------------
|
||||
|
||||
.. method:: SearchResult.get_stored_fields(self)
|
||||
|
||||
Returns a dictionary of all of the stored fields from the SearchIndex.
|
||||
|
||||
Useful for serializing results. Only returns the fields Haystack's
|
||||
indexes are aware of as being 'stored'.
|
|
@ -0,0 +1,289 @@
|
|||
.. _ref-settings:
|
||||
|
||||
=================
|
||||
Haystack Settings
|
||||
=================
|
||||
|
||||
As a way to extend/change the default behavior within Haystack, there are
|
||||
several settings you can alter within your ``settings.py``. This is a
|
||||
comprehensive list of the settings Haystack recognizes.
|
||||
|
||||
|
||||
``HAYSTACK_DEFAULT_OPERATOR``
|
||||
=============================
|
||||
|
||||
**Optional**
|
||||
|
||||
This setting controls what the default behavior for chaining ``SearchQuerySet``
|
||||
filters together is.
|
||||
|
||||
Valid options are::
|
||||
|
||||
HAYSTACK_DEFAULT_OPERATOR = 'AND'
|
||||
HAYSTACK_DEFAULT_OPERATOR = 'OR'
|
||||
|
||||
Defaults to ``AND``.
|
||||
|
||||
|
||||
``HAYSTACK_CONNECTIONS``
|
||||
========================
|
||||
|
||||
**Required**
|
||||
|
||||
This setting controls which backends should be available. It should be a
|
||||
dictionary of dictionaries resembling the following (complete) example::
|
||||
|
||||
HAYSTACK_CONNECTIONS = {
|
||||
'default': {
|
||||
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
|
||||
'URL': 'http://localhost:9001/solr/default',
|
||||
'TIMEOUT': 60 * 5,
|
||||
'INCLUDE_SPELLING': True,
|
||||
'BATCH_SIZE': 100,
|
||||
'EXCLUDED_INDEXES': ['thirdpartyapp.search_indexes.BarIndex'],
|
||||
},
|
||||
'autocomplete': {
|
||||
'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine',
|
||||
'PATH': '/home/search/whoosh_index',
|
||||
'STORAGE': 'file',
|
||||
'POST_LIMIT': 128 * 1024 * 1024,
|
||||
'INCLUDE_SPELLING': True,
|
||||
'BATCH_SIZE': 100,
|
||||
'EXCLUDED_INDEXES': ['thirdpartyapp.search_indexes.BarIndex'],
|
||||
},
|
||||
'slave': {
|
||||
'ENGINE': 'xapian_backend.XapianEngine',
|
||||
'PATH': '/home/search/xapian_index',
|
||||
'INCLUDE_SPELLING': True,
|
||||
'BATCH_SIZE': 100,
|
||||
'EXCLUDED_INDEXES': ['thirdpartyapp.search_indexes.BarIndex'],
|
||||
},
|
||||
'db': {
|
||||
'ENGINE': 'haystack.backends.simple_backend.SimpleEngine',
|
||||
'EXCLUDED_INDEXES': ['thirdpartyapp.search_indexes.BarIndex'],
|
||||
}
|
||||
}
|
||||
|
||||
No default for this setting is provided.
|
||||
|
||||
The main keys (``default`` & friends) are identifiers for your application.
|
||||
You can use them any place the API exposes ``using`` as a method or kwarg.
|
||||
|
||||
There must always be at least a ``default`` key within this setting.
|
||||
|
||||
The ``ENGINE`` option is required for all backends & should point to the
|
||||
``BaseEngine`` subclass for the backend.
|
||||
|
||||
Additionally, each backend may have additional options it requires:
|
||||
|
||||
* Solr
|
||||
|
||||
* ``URL`` - The URL to the Solr core.
|
||||
|
||||
* Whoosh
|
||||
|
||||
* ``PATH`` - The filesystem path to where the index data is located.
|
||||
|
||||
* Xapian
|
||||
|
||||
* ``PATH`` - The filesystem path to where the index data is located.
|
||||
|
||||
The following options are optional:
|
||||
|
||||
* ``INCLUDE_SPELLING`` - Include spelling suggestions. Default is ``False``
|
||||
* ``BATCH_SIZE`` - How many records should be updated at once via the management
|
||||
commands. Default is ``1000``.
|
||||
* ``TIMEOUT`` - (Solr and ElasticSearch) How long to wait (in seconds) before
|
||||
the connection times out. Default is ``10``.
|
||||
* ``STORAGE`` - (Whoosh-only) Which storage engine to use. Accepts ``file`` or
|
||||
``ram``. Default is ``file``.
|
||||
* ``POST_LIMIT`` - (Whoosh-only) How large the file sizes can be. Default is
|
||||
``128 * 1024 * 1024``.
|
||||
* ``FLAGS`` - (Xapian-only) A list of flags to use when querying the index.
|
||||
* ``EXCLUDED_INDEXES`` - A list of strings (as Python import paths) to indexes
|
||||
you do **NOT** want included. Useful for omitting third-party things you
|
||||
don't want indexed or for when you want to replace an index.
|
||||
* ``KWARGS`` - (Solr and ElasticSearch) Any additional keyword arguments that
|
||||
should be passed on to the underlying client library.
|
||||
|
||||
|
||||
``HAYSTACK_ROUTERS``
|
||||
====================
|
||||
|
||||
**Optional**
|
||||
|
||||
This setting controls how routing is performed to allow different backends to
|
||||
handle updates/deletes/reads.
|
||||
|
||||
An example::
|
||||
|
||||
HAYSTACK_ROUTERS = ['search_routers.MasterSlaveRouter', 'haystack.routers.DefaultRouter']
|
||||
|
||||
Defaults to ``['haystack.routers.DefaultRouter']``.
|
||||
|
||||
|
||||
``HAYSTACK_SIGNAL_PROCESSOR``
|
||||
=============================
|
||||
|
||||
**Optional**
|
||||
|
||||
This setting controls what ``SignalProcessor`` class is used to handle Django's
|
||||
signals & keep the search index up-to-date.
|
||||
|
||||
An example::
|
||||
|
||||
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
|
||||
|
||||
Defaults to ``'haystack.signals.BaseSignalProcessor'``.
|
||||
|
||||
|
||||
``HAYSTACK_DOCUMENT_FIELD``
|
||||
===========================
|
||||
|
||||
**Optional**
|
||||
|
||||
This setting controls what fieldname Haystack relies on as the default field
|
||||
for searching within.
|
||||
|
||||
An example::
|
||||
|
||||
HAYSTACK_DOCUMENT_FIELD = 'wall_o_text'
|
||||
|
||||
Defaults to ``text``.
|
||||
|
||||
|
||||
``HAYSTACK_SEARCH_RESULTS_PER_PAGE``
|
||||
====================================
|
||||
|
||||
**Optional**
|
||||
|
||||
This setting controls how many results are shown per page when using the
|
||||
included ``SearchView`` and its subclasses.
|
||||
|
||||
An example::
|
||||
|
||||
HAYSTACK_SEARCH_RESULTS_PER_PAGE = 50
|
||||
|
||||
Defaults to ``20``.
|
||||
|
||||
|
||||
``HAYSTACK_CUSTOM_HIGHLIGHTER``
|
||||
===============================
|
||||
|
||||
**Optional**
|
||||
|
||||
This setting allows you to specify your own custom ``Highlighter``
|
||||
implementation for use with the ``{% highlight %}`` template tag. It should be
|
||||
the full path to the class.
|
||||
|
||||
An example::
|
||||
|
||||
HAYSTACK_CUSTOM_HIGHLIGHTER = 'myapp.utils.BorkHighlighter'
|
||||
|
||||
No default is provided. Haystack automatically falls back to the default
|
||||
implementation.
|
||||
|
||||
|
||||
``HAYSTACK_ITERATOR_LOAD_PER_QUERY``
|
||||
====================================
|
||||
|
||||
**Optional**
|
||||
|
||||
This setting controls the number of results that are pulled at once when
|
||||
iterating through a ``SearchQuerySet``. If you generally consume large portions
|
||||
at a time, you can bump this up for better performance.
|
||||
|
||||
.. note::
|
||||
|
||||
This is not used in the case of a slice on a ``SearchQuerySet``, which
|
||||
already overrides the number of results pulled at once.
|
||||
|
||||
An example::
|
||||
|
||||
HAYSTACK_ITERATOR_LOAD_PER_QUERY = 100
|
||||
|
||||
The default is 10 results at a time.
|
||||
|
||||
|
||||
``HAYSTACK_LIMIT_TO_REGISTERED_MODELS``
|
||||
=======================================
|
||||
|
||||
**Optional**
|
||||
|
||||
This setting allows you to control whether or not Haystack will limit the
|
||||
search results seen to just the models registered. It should be a boolean.
|
||||
|
||||
If your search index is never used for anything other than the models
|
||||
registered with Haystack, you can turn this off and get a small to moderate
|
||||
performance boost.
|
||||
|
||||
An example::
|
||||
|
||||
HAYSTACK_LIMIT_TO_REGISTERED_MODELS = False
|
||||
|
||||
Default is ``True``.
|
||||
|
||||
|
||||
``HAYSTACK_ID_FIELD``
|
||||
=====================
|
||||
|
||||
**Optional**
|
||||
|
||||
This setting allows you to control what the unique field name used internally
|
||||
by Haystack is called. Rarely needed unless your field names collide with
|
||||
Haystack's defaults.
|
||||
|
||||
An example::
|
||||
|
||||
HAYSTACK_ID_FIELD = 'my_id'
|
||||
|
||||
Default is ``id``.
|
||||
|
||||
|
||||
``HAYSTACK_DJANGO_CT_FIELD``
|
||||
============================
|
||||
|
||||
**Optional**
|
||||
|
||||
This setting allows you to control what the content type field name used
|
||||
internally by Haystack is called. Rarely needed unless your field names
|
||||
collide with Haystack's defaults.
|
||||
|
||||
An example::
|
||||
|
||||
HAYSTACK_DJANGO_CT_FIELD = 'my_django_ct'
|
||||
|
||||
Default is ``django_ct``.
|
||||
|
||||
|
||||
``HAYSTACK_DJANGO_ID_FIELD``
|
||||
============================
|
||||
|
||||
**Optional**
|
||||
|
||||
This setting allows you to control what the primary key field name used
|
||||
internally by Haystack is called. Rarely needed unless your field names
|
||||
collide with Haystack's defaults.
|
||||
|
||||
An example::
|
||||
|
||||
HAYSTACK_DJANGO_ID_FIELD = 'my_django_id'
|
||||
|
||||
Default is ``django_id``.
|
||||
|
||||
|
||||
``HAYSTACK_IDENTIFIER_METHOD``
|
||||
==============================
|
||||
|
||||
**Optional**
|
||||
|
||||
This setting allows you to provide a custom method for
|
||||
``haystack.utils.get_identifier``. Useful when the default identifier
|
||||
pattern of <app.label>.<object_name>.<pk> isn't suited to your
|
||||
needs.
|
||||
|
||||
An example::
|
||||
|
||||
HAYSTACK_IDENTIFIER_METHOD = 'my_app.module.get_identifier'
|
||||
|
||||
Default is ``haystack.utils.default_get_identifier``.
|
|
@ -0,0 +1,117 @@
|
|||
.. _ref-signal_processors:
|
||||
|
||||
=================
|
||||
Signal Processors
|
||||
=================
|
||||
|
||||
Keeping data in sync between the (authoritative) database & the
|
||||
(non-authoritative) search index is one of the more difficult problems when
|
||||
using Haystack. Even frequently running the ``update_index`` management command
|
||||
still introduces lag between when the data is stored & when it's available
|
||||
for searching.
|
||||
|
||||
A solution to this is to incorporate Django's signals (specifically
|
||||
``models.db.signals.post_save`` & ``models.db.signals.post_delete``), which then
|
||||
trigger *individual* updates to the search index, keeping them in near-perfect
|
||||
sync.
|
||||
|
||||
Older versions of Haystack (pre-v2.0) tied the ``SearchIndex`` directly to the
|
||||
signals, which caused occasional conflicts of interest with third-party
|
||||
applications.
|
||||
|
||||
To solve this, starting with Haystack v2.0, the concept of a ``SignalProcessor``
|
||||
has been introduced. In it's simplest form, the ``SignalProcessor`` listens
|
||||
to whatever signals are setup & can be configured to then trigger the updates
|
||||
without having to change any ``SearchIndex`` code.
|
||||
|
||||
.. warning::
|
||||
|
||||
Incorporating Haystack's ``SignalProcessor`` into your setup **will**
|
||||
increase the overall load (CPU & perhaps I/O depending on configuration).
|
||||
You will need to capacity plan for this & ensure you can make the tradeoff
|
||||
of more real-time results for increased load.
|
||||
|
||||
|
||||
Default - ``BaseSignalProcessor``
|
||||
=================================
|
||||
|
||||
The default setup is configured to use the
|
||||
``haystack.signals.BaseSignalProcessor`` class, which includes all the
|
||||
underlying code necessary to handle individual updates/deletes, **BUT DOES NOT
|
||||
HOOK UP THE SIGNALS**.
|
||||
|
||||
This means that, by default, **NO ACTION IS TAKEN BY HAYSTACK** when a model is
|
||||
saved or deleted. The ``BaseSignalProcessor.setup`` &
|
||||
``BaseSignalProcessor.teardown`` methods are both empty to prevent anything
|
||||
from being setup at initialization time.
|
||||
|
||||
This usage is configured very simply (again, by default) with the
|
||||
``HAYSTACK_SIGNAL_PROCESSOR`` setting. An example of manually setting this
|
||||
would look like::
|
||||
|
||||
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.BaseSignalProcessor'
|
||||
|
||||
This class forms an excellent base if you'd like to override/extend for more
|
||||
advanced behavior. Which leads us to...
|
||||
|
||||
|
||||
Realtime - ``RealtimeSignalProcessor``
|
||||
======================================
|
||||
|
||||
The other included ``SignalProcessor`` is the
|
||||
``haystack.signals.RealtimeSignalProcessor`` class. It is an extremely thin
|
||||
extension of the ``BaseSignalProcessor`` class, differing only in that
|
||||
in implements the ``setup/teardown`` methods, tying **ANY** Model
|
||||
``save/delete`` to the signal processor.
|
||||
|
||||
If the model has an associated ``SearchIndex``, the ``RealtimeSignalProcessor``
|
||||
will then trigger an update/delete of that model instance within the search
|
||||
index proper.
|
||||
|
||||
Configuration looks like::
|
||||
|
||||
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
|
||||
|
||||
This causes **all** ``SearchIndex`` classes to work in a realtime fashion.
|
||||
|
||||
.. note::
|
||||
|
||||
These updates happen in-process, which if a request-response cycle is
|
||||
involved, may cause the user with the browser to sit & wait for indexing to
|
||||
be completed. Since this wait can be undesirable, especially under load,
|
||||
you may wish to look into queued search options. See the
|
||||
:ref:`ref-other_apps` documentation for existing options.
|
||||
|
||||
|
||||
Custom ``SignalProcessors``
|
||||
===========================
|
||||
|
||||
The ``BaseSignalProcessor`` & ``RealtimeSignalProcessor`` classes are fairly
|
||||
simple/straightforward to customize or extend. Rather than forking Haystack to
|
||||
implement your modifications, you should create your own subclass within your
|
||||
codebase (anywhere that's importable is usually fine, though you should avoid
|
||||
``models.py`` files).
|
||||
|
||||
For instance, if you only wanted ``User`` saves to be realtime, deferring all
|
||||
other updates to the management commands, you'd implement the following code::
|
||||
|
||||
from django.contrib.auth.models import User
|
||||
from django.db import models
|
||||
from haystack import signals
|
||||
|
||||
|
||||
class UserOnlySignalProcessor(signals.BaseSignalProcessor):
|
||||
def setup(self):
|
||||
# Listen only to the ``User`` model.
|
||||
models.signals.post_save.connect(self.handle_save, sender=User)
|
||||
models.signals.post_delete.connect(self.handle_delete, sender=User)
|
||||
|
||||
def teardown(self):
|
||||
# Disconnect only for the ``User`` model.
|
||||
models.signals.post_save.disconnect(self.handle_save, sender=User)
|
||||
models.signals.post_delete.disconnect(self.handle_delete, sender=User)
|
||||
|
||||
For other customizations (modifying how saves/deletes should work), you'll need
|
||||
to override/extend the ``handle_save/handle_delete`` methods. The source code
|
||||
is your best option for referring to how things currently work on your version
|
||||
of Haystack.
|
|
@ -0,0 +1,412 @@
|
|||
.. _ref-spatial:
|
||||
|
||||
==============
|
||||
Spatial Search
|
||||
==============
|
||||
|
||||
Spatial search (also called geospatial search) allows you to take data that
|
||||
has a geographic location & enhance the search results by limiting them to a
|
||||
physical area. Haystack, combined with the latest versions of a couple engines,
|
||||
can provide this type of search.
|
||||
|
||||
In addition, Haystack tries to implement these features in a way that is as
|
||||
close to GeoDjango_ as possible. There are some differences, which we'll
|
||||
highlight throughout this guide. Additionally, while the support isn't as
|
||||
comprehensive as PostGIS (for example), it is still quite useful.
|
||||
|
||||
.. _GeoDjango: http://geodjango.org/
|
||||
|
||||
|
||||
Additional Requirements
|
||||
=======================
|
||||
|
||||
The spatial functionality has only one non-included, non-available-in-Django
|
||||
dependency:
|
||||
|
||||
* ``geopy`` - ``pip install geopy``
|
||||
|
||||
If you do not ever need distance information, you may be able to skip
|
||||
installing ``geopy``.
|
||||
|
||||
|
||||
Support
|
||||
=======
|
||||
|
||||
You need the latest & greatest of either Solr or Elasticsearch. None of the
|
||||
other backends (specifially the engines) support this kind of search.
|
||||
|
||||
For Solr_, you'll need at least **v3.5+**. In addition, if you have an existing
|
||||
install of Haystack & Solr, you'll need to upgrade the schema & reindex your
|
||||
data. If you're adding geospatial data, you would have to reindex anyhow.
|
||||
|
||||
For Elasticsearch, you'll need at least v0.17.7, preferably v0.18.6 or better.
|
||||
If you're adding geospatial data, you'll have to reindex as well.
|
||||
|
||||
.. _Solr: http://lucene.apache.org/solr/
|
||||
|
||||
====================== ====== =============== ======== ======== ======
|
||||
Lookup Type Solr Elasticsearch Whoosh Xapian Simple
|
||||
====================== ====== =============== ======== ======== ======
|
||||
`within` X X
|
||||
`dwithin` X X
|
||||
`distance` X X
|
||||
`order_by('distance')` X X
|
||||
`polygon` X
|
||||
====================== ====== =============== ======== ======== ======
|
||||
|
||||
For more details, you can inspect http://wiki.apache.org/solr/SpatialSearch
|
||||
or http://www.elasticsearch.org/guide/reference/query-dsl/geo-bounding-box-filter.html.
|
||||
|
||||
|
||||
Geospatial Assumptions
|
||||
======================
|
||||
|
||||
``Points``
|
||||
----------
|
||||
|
||||
Haystack prefers to work with ``Point`` objects, which are located in
|
||||
``django.contrib.gis.geos.Point`` but conviently importable out of
|
||||
``haystack.utils.geo.Point``.
|
||||
|
||||
``Point`` objects use **LONGITUDE, LATITUDE** for their construction, regardless
|
||||
if you use the parameters to instantiate them or WKT_/``GEOSGeometry``.
|
||||
|
||||
.. _WKT: http://en.wikipedia.org/wiki/Well-known_text
|
||||
|
||||
Examples::
|
||||
|
||||
# Using positional arguments.
|
||||
from haystack.utils.geo import Point
|
||||
pnt = Point(-95.23592948913574, 38.97127105172941)
|
||||
|
||||
# Using WKT.
|
||||
from django.contrib.gis.geos import GEOSGeometry
|
||||
pnt = GEOSGeometry('POINT(-95.23592948913574 38.97127105172941)')
|
||||
|
||||
They are preferred over just providing ``latitude, longitude`` because they are
|
||||
more intelligent, have a spatial reference system attached & are more consistent
|
||||
with GeoDjango's use.
|
||||
|
||||
|
||||
``Distance``
|
||||
------------
|
||||
|
||||
Haystack also uses the ``D`` (or ``Distance``) objects from GeoDjango,
|
||||
implemented in ``django.contrib.gis.measure.Distance`` but conveniently
|
||||
importable out of ``haystack.utils.geo.D`` (or ``haystack.utils.geo.Distance``).
|
||||
|
||||
``Distance`` objects accept a very flexible set of measurements during
|
||||
instantiaton and can convert amongst them freely. This is important, because
|
||||
the engines rely on measurements being in kilometers but you're free to use
|
||||
whatever units you want.
|
||||
|
||||
Examples::
|
||||
|
||||
from haystack.utils.geo import D
|
||||
|
||||
# Start at 5 miles.
|
||||
imperial_d = D(mi=5)
|
||||
|
||||
# Convert to fathoms...
|
||||
fathom_d = imperial_d.fathom
|
||||
|
||||
# Now to kilometers...
|
||||
km_d = imperial_d.km
|
||||
|
||||
# And back to miles.
|
||||
mi = imperial_d.mi
|
||||
|
||||
They are preferred over just providing a raw distance because they are
|
||||
more intelligent, have a well-defined unit system attached & are consistent
|
||||
with GeoDjango's use.
|
||||
|
||||
|
||||
``WGS-84``
|
||||
----------
|
||||
|
||||
All engines assume WGS-84 (SRID 4326). At the time of writing, there does **not**
|
||||
appear to be a way to switch this. Haystack will transform all points into this
|
||||
coordinate system for you.
|
||||
|
||||
|
||||
Indexing
|
||||
========
|
||||
|
||||
Indexing is relatively simple. Simply add a ``LocationField`` (or several)
|
||||
onto your ``SearchIndex`` class(es) & provide them a ``Point`` object. For
|
||||
example::
|
||||
|
||||
from haystack import indexes
|
||||
from shops.models import Shop
|
||||
|
||||
|
||||
class ShopIndex(indexes.SearchIndex, indexes.Indexable):
|
||||
text = indexes.CharField(document=True, use_template=True)
|
||||
# ... the usual, then...
|
||||
location = indexes.LocationField(model_attr='coordinates')
|
||||
|
||||
def get_model(self):
|
||||
return Shop
|
||||
|
||||
If you must manually prepare the data, you have to do something slightly less
|
||||
convenient, returning a string-ified version of the coordinates in WGS-84 as
|
||||
``lat,long``::
|
||||
|
||||
from haystack import indexes
|
||||
from shops.models import Shop
|
||||
|
||||
|
||||
class ShopIndex(indexes.SearchIndex, indexes.Indexable):
|
||||
text = indexes.CharField(document=True, use_template=True)
|
||||
# ... the usual, then...
|
||||
location = indexes.LocationField()
|
||||
|
||||
def get_model(self):
|
||||
return Shop
|
||||
|
||||
def prepare_location(self, obj):
|
||||
# If you're just storing the floats...
|
||||
return "%s,%s" % (obj.latitude, obj.longitude)
|
||||
|
||||
Alternatively, you could build a method/property onto the ``Shop`` model that
|
||||
returns a ``Point`` based on those coordinates::
|
||||
|
||||
# shops/models.py
|
||||
from django.contrib.gis.geos import Point
|
||||
from django.db import models
|
||||
|
||||
|
||||
class Shop(models.Model):
|
||||
# ... the usual, then...
|
||||
latitude = models.FloatField()
|
||||
longitude = models.FloatField()
|
||||
|
||||
# Usual methods, then...
|
||||
def get_location(self):
|
||||
# Remember, longitude FIRST!
|
||||
return Point(self.longitude, self.latitude)
|
||||
|
||||
|
||||
# shops/search_indexes.py
|
||||
from haystack import indexes
|
||||
from shops.models import Shop
|
||||
|
||||
|
||||
class ShopIndex(indexes.SearchIndex, indexes.Indexable):
|
||||
text = indexes.CharField(document=True, use_template=True)
|
||||
location = indexes.LocationField(model_attr='get_location')
|
||||
|
||||
def get_model(self):
|
||||
return Shop
|
||||
|
||||
|
||||
Querying
|
||||
========
|
||||
|
||||
There are two types of geospatial queries you can run, ``within`` & ``dwithin``.
|
||||
Like their GeoDjango counterparts (within_ & dwithin_), these methods focus on
|
||||
finding results within an area.
|
||||
|
||||
.. _within: https://docs.djangoproject.com/en/dev/ref/contrib/gis/geoquerysets/#within
|
||||
.. _dwithin: https://docs.djangoproject.com/en/dev/ref/contrib/gis/geoquerysets/#dwithin
|
||||
|
||||
|
||||
``within``
|
||||
----------
|
||||
|
||||
.. method:: SearchQuerySet.within(self, field, point_1, point_2)
|
||||
|
||||
``within`` is a bounding box comparison. A bounding box is a rectangular area
|
||||
within which to search. It's composed of a bottom-left point & a top-right
|
||||
point. It is faster but slighty sloppier than its counterpart.
|
||||
|
||||
Examples::
|
||||
|
||||
from haystack.query import SearchQuerySet
|
||||
from haystack.utils.geo import Point
|
||||
|
||||
downtown_bottom_left = Point(-95.23947, 38.9637903)
|
||||
downtown_top_right = Point(-95.23362278938293, 38.973081081164715)
|
||||
|
||||
# 'location' is the fieldname from our ``SearchIndex``...
|
||||
|
||||
# Do the bounding box query.
|
||||
sqs = SearchQuerySet().within('location', downtown_bottom_left, downtown_top_right)
|
||||
|
||||
# Can be chained with other Haystack calls.
|
||||
sqs = SearchQuerySet().auto_query('coffee').within('location', downtown_bottom_left, downtown_top_right).order_by('-popularity')
|
||||
|
||||
.. note::
|
||||
|
||||
In GeoDjango, assuming the ``Shop`` model had been properly geo-ified, this
|
||||
would have been implemented as::
|
||||
|
||||
from shops.models import Shop
|
||||
Shop.objects.filter(location__within=(downtown_bottom_left, downtown_top_right))
|
||||
|
||||
Haystack's form differs because it yielded a cleaner implementation, was
|
||||
no more typing than the GeoDjango version & tried to maintain the same
|
||||
terminology/similar signature.
|
||||
|
||||
|
||||
``dwithin``
|
||||
-----------
|
||||
|
||||
.. method:: SearchQuerySet.dwithin(self, field, point, distance)
|
||||
|
||||
``dwithin`` is a radius-based search. A radius-based search is a circular area
|
||||
within which to search. It's composed of a center point & a radius (in
|
||||
kilometers, though Haystack will use the ``D`` object's conversion utilities to
|
||||
get it there). It is slower than``within`` but very exact & can involve fewer
|
||||
calculations on your part.
|
||||
|
||||
Examples::
|
||||
|
||||
from haystack.query import SearchQuerySet
|
||||
from haystack.utils.geo import Point, D
|
||||
|
||||
ninth_and_mass = Point(-95.23592948913574, 38.96753407043678)
|
||||
# Within a two miles.
|
||||
max_dist = D(mi=2)
|
||||
|
||||
# 'location' is the fieldname from our ``SearchIndex``...
|
||||
|
||||
# Do the radius query.
|
||||
sqs = SearchQuerySet().dwithin('location', ninth_and_mass, max_dist)
|
||||
|
||||
# Can be chained with other Haystack calls.
|
||||
sqs = SearchQuerySet().auto_query('coffee').dwithin('location', ninth_and_mass, max_dist).order_by('-popularity')
|
||||
|
||||
.. note::
|
||||
|
||||
In GeoDjango, assuming the ``Shop`` model had been properly geo-ified, this
|
||||
would have been implemented as::
|
||||
|
||||
from shops.models import Shop
|
||||
Shop.objects.filter(location__dwithin=(ninth_and_mass, D(mi=2)))
|
||||
|
||||
Haystack's form differs because it yielded a cleaner implementation, was
|
||||
no more typing than the GeoDjango version & tried to maintain the same
|
||||
terminology/similar signature.
|
||||
|
||||
|
||||
``distance``
|
||||
------------
|
||||
|
||||
.. method:: SearchQuerySet.distance(self, field, point)
|
||||
|
||||
By default, search results will come back without distance information attached
|
||||
to them. In the concept of a bounding box, it would be ambiguous what the
|
||||
distances would be calculated against. And it is more calculation that may not
|
||||
be necessary.
|
||||
|
||||
So like GeoDjango, Haystack exposes a method to signify that you want to
|
||||
include these calculated distances on results.
|
||||
|
||||
Examples::
|
||||
|
||||
from haystack.query import SearchQuerySet
|
||||
from haystack.utils.geo import Point, D
|
||||
|
||||
ninth_and_mass = Point(-95.23592948913574, 38.96753407043678)
|
||||
|
||||
# On a bounding box...
|
||||
downtown_bottom_left = Point(-95.23947, 38.9637903)
|
||||
downtown_top_right = Point(-95.23362278938293, 38.973081081164715)
|
||||
|
||||
sqs = SearchQuerySet().within('location', downtown_bottom_left, downtown_top_right).distance('location', ninth_and_mass)
|
||||
|
||||
# ...Or on a radius query.
|
||||
sqs = SearchQuerySet().dwithin('location', ninth_and_mass, D(mi=2)).distance('location', ninth_and_mass)
|
||||
|
||||
You can even apply a different field, for instance if you calculate results of
|
||||
key, well-cached hotspots in town but want distances from the user's current
|
||||
position::
|
||||
|
||||
from haystack.query import SearchQuerySet
|
||||
from haystack.utils.geo import Point, D
|
||||
|
||||
ninth_and_mass = Point(-95.23592948913574, 38.96753407043678)
|
||||
user_loc = Point(-95.23455619812012, 38.97240128290697)
|
||||
|
||||
sqs = SearchQuerySet().dwithin('location', ninth_and_mass, D(mi=2)).distance('location', user_loc)
|
||||
|
||||
.. note::
|
||||
|
||||
The astute will notice this is Haystack's biggest departure from GeoDjango.
|
||||
In GeoDjango, this would have been implemented as::
|
||||
|
||||
from shops.models import Shop
|
||||
Shop.objects.filter(location__dwithin=(ninth_and_mass, D(mi=2))).distance(user_loc)
|
||||
|
||||
Note that, by default, the GeoDjango form leaves *out* the field to be
|
||||
calculating against (though it's possible to override it & specify the
|
||||
field).
|
||||
|
||||
Haystack's form differs because the same assumptions are difficult to make.
|
||||
GeoDjango deals with a single model at a time, where Haystack deals with
|
||||
a broad mix of models. Additionally, accessing ``Model`` information is a
|
||||
couple hops away, so Haystack favors the explicit (if slightly more typing)
|
||||
approach.
|
||||
|
||||
|
||||
Ordering
|
||||
========
|
||||
|
||||
Because you're dealing with search, even with geospatial queries, results still
|
||||
come back in **RELEVANCE** order. If you want to offer the user ordering
|
||||
results by distance, there's a simple way to enable this ordering.
|
||||
|
||||
Using the standard Haystack ``order_by`` method, if you specify ``distance`` or
|
||||
``-distance`` **ONLY**, you'll get geographic ordering. Additionally, you must
|
||||
have a call to ``.distance()`` somewhere in the chain, otherwise there is no
|
||||
distance information on the results & nothing to sort by.
|
||||
|
||||
Examples::
|
||||
|
||||
from haystack.query import SearchQuerySet
|
||||
from haystack.utils.geo import Point, D
|
||||
|
||||
ninth_and_mass = Point(-95.23592948913574, 38.96753407043678)
|
||||
downtown_bottom_left = Point(-95.23947, 38.9637903)
|
||||
downtown_top_right = Point(-95.23362278938293, 38.973081081164715)
|
||||
|
||||
# Non-geo ordering.
|
||||
sqs = SearchQuerySet().within('location', downtown_bottom_left, downtown_top_right).order_by('title')
|
||||
sqs = SearchQuerySet().within('location', downtown_bottom_left, downtown_top_right).distance('location', ninth_and_mass).order_by('-created')
|
||||
|
||||
# Geo ordering, closest to farthest.
|
||||
sqs = SearchQuerySet().within('location', downtown_bottom_left, downtown_top_right).distance('location', ninth_and_mass).order_by('distance')
|
||||
# Geo ordering, farthest to closest.
|
||||
sqs = SearchQuerySet().dwithin('location', ninth_and_mass, D(mi=2)).distance('location', ninth_and_mass).order_by('-distance')
|
||||
|
||||
.. note::
|
||||
|
||||
This call is identical to the GeoDjango usage.
|
||||
|
||||
.. warning::
|
||||
|
||||
You can not specify both a distance & lexicographic ordering. If you specify
|
||||
more than just ``distance`` or ``-distance``, Haystack assumes ``distance``
|
||||
is a field in the index & tries to sort on it. Example::
|
||||
|
||||
# May blow up!
|
||||
sqs = SearchQuerySet().dwithin('location', ninth_and_mass, D(mi=2)).distance('location', ninth_and_mass).order_by('distance', 'title')
|
||||
|
||||
This is a limitation in the engine's implementation.
|
||||
|
||||
If you actually **have** a field called ``distance`` (& aren't using
|
||||
calculated distance information), Haystack will do the right thing in
|
||||
these circumstances.
|
||||
|
||||
|
||||
Caveats
|
||||
=======
|
||||
|
||||
In all cases, you may call the ``within/dwithin/distance`` methods as many times
|
||||
as you like. However, the **LAST** call is the information that will be used.
|
||||
No combination logic is available, as this is largely a backend limitation.
|
||||
|
||||
Combining calls to both ``within`` & ``dwithin`` may yield unexpected or broken
|
||||
results. They don't overlap when performing queries, so it may be possible to
|
||||
construct queries that work. Your Mileage May Vary.
|
|
@ -0,0 +1,68 @@
|
|||
.. _ref-templatetags:
|
||||
|
||||
=============
|
||||
Template Tags
|
||||
=============
|
||||
|
||||
Haystack comes with a couple common template tags to make using some of its
|
||||
special features available to templates.
|
||||
|
||||
|
||||
``highlight``
|
||||
=============
|
||||
|
||||
Takes a block of text and highlights words from a provided query within that
|
||||
block of text. Optionally accepts arguments to provide the HTML tag to wrap
|
||||
highlighted word in, a CSS class to use with the tag and a maximum length of
|
||||
the blurb in characters.
|
||||
|
||||
The defaults are ``span`` for the HTML tag, ``highlighted`` for the CSS class
|
||||
and 200 characters for the excerpt.
|
||||
|
||||
Syntax::
|
||||
|
||||
{% highlight <text_block> with <query> [css_class "class_name"] [html_tag "span"] [max_length 200] %}
|
||||
|
||||
Example::
|
||||
|
||||
# Highlight summary with default behavior.
|
||||
{% highlight result.summary with query %}
|
||||
|
||||
# Highlight summary but wrap highlighted words with a div and the
|
||||
# following CSS class.
|
||||
{% highlight result.summary with query html_tag "div" css_class "highlight_me_please" %}
|
||||
|
||||
# Highlight summary but only show 40 words.
|
||||
{% highlight result.summary with query max_length 40 %}
|
||||
|
||||
The highlighter used by this tag can be overridden as needed. See the
|
||||
:doc:`highlighting` documentation for more information.
|
||||
|
||||
|
||||
``more_like_this``
|
||||
==================
|
||||
|
||||
Fetches similar items from the search index to find content that is similar
|
||||
to the provided model's content.
|
||||
|
||||
.. note::
|
||||
|
||||
This requires a backend that has More Like This built-in.
|
||||
|
||||
Syntax::
|
||||
|
||||
{% more_like_this model_instance as varname [for app_label.model_name,app_label.model_name,...] [limit n] %}
|
||||
|
||||
Example::
|
||||
|
||||
# Pull a full SearchQuerySet (lazy loaded) of similar content.
|
||||
{% more_like_this entry as related_content %}
|
||||
|
||||
# Pull just the top 5 similar pieces of content.
|
||||
{% more_like_this entry as related_content limit 5 %}
|
||||
|
||||
# Pull just the top 5 similar entries or comments.
|
||||
{% more_like_this entry as related_content for "blog.entry,comments.comment" limit 5 %}
|
||||
|
||||
This tag behaves exactly like ``SearchQuerySet.more_like_this``, so all notes in
|
||||
that regard apply here as well.
|
|
@ -0,0 +1,53 @@
|
|||
Table Of Contents
|
||||
=================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
index
|
||||
tutorial
|
||||
glossary
|
||||
views_and_forms
|
||||
templatetags
|
||||
management_commands
|
||||
architecture_overview
|
||||
backend_support
|
||||
installing_search_engines
|
||||
settings
|
||||
faq
|
||||
who_uses
|
||||
other_apps
|
||||
debugging
|
||||
|
||||
migration_from_1_to_2
|
||||
python3
|
||||
contributing
|
||||
|
||||
best_practices
|
||||
highlighting
|
||||
faceting
|
||||
autocomplete
|
||||
boost
|
||||
signal_processors
|
||||
multiple_index
|
||||
rich_content_extraction
|
||||
spatial
|
||||
|
||||
searchqueryset_api
|
||||
searchindex_api
|
||||
inputtypes
|
||||
searchfield_api
|
||||
searchresult_api
|
||||
searchquery_api
|
||||
searchbackend_api
|
||||
|
||||
running_tests
|
||||
creating_new_backends
|
||||
utils
|
||||
|
||||
|
||||
Indices and tables
|
||||
==================
|
||||
|
||||
* :ref:`search`
|
||||
|
|
@ -0,0 +1,398 @@
|
|||
.. _ref-tutorial:
|
||||
|
||||
=============================
|
||||
Getting Started with Haystack
|
||||
=============================
|
||||
|
||||
Search is a topic of ever increasing importance. Users increasing rely on search
|
||||
to separate signal from noise and find what they're looking for quickly. In
|
||||
addition, search can provide insight into what things are popular (many
|
||||
searches), what things are difficult to find on the site and ways you can
|
||||
improve the site better.
|
||||
|
||||
To this end, Haystack tries to make integrating custom search as easy as
|
||||
possible while being flexible/powerful enough to handle more advanced use cases.
|
||||
|
||||
Haystack is a reusable app (that is, it relies only on its own code and focuses
|
||||
on providing just search) that plays nicely with both apps you control as well as
|
||||
third-party apps (such as ``django.contrib.*``) without having to modify the
|
||||
sources.
|
||||
|
||||
Haystack also does pluggable backends (much like Django's database
|
||||
layer), so virtually all of the code you write ought to be portable between
|
||||
whichever search engine you choose.
|
||||
|
||||
.. note::
|
||||
|
||||
If you hit a stumbling block, there is both a `mailing list`_ and
|
||||
`#haystack on irc.freenode.net`_ to get help.
|
||||
|
||||
.. note::
|
||||
|
||||
You can participate in and/or track the development of Haystack by
|
||||
subscribing to the `development mailing list`_.
|
||||
|
||||
.. _mailing list: http://groups.google.com/group/django-haystack
|
||||
.. _#haystack on irc.freenode.net: irc://irc.freenode.net/haystack
|
||||
.. _development mailing list: http://groups.google.com/group/django-haystack-dev
|
||||
|
||||
This tutorial assumes that you have a basic familiarity with the various major
|
||||
parts of Django (models/forms/views/settings/URLconfs) and tailored to the
|
||||
typical use case. There are shortcuts available as well as hooks for much
|
||||
more advanced setups, but those will not be covered here.
|
||||
|
||||
For example purposes, we'll be adding search functionality to a simple
|
||||
note-taking application. Here is ``myapp/models.py``::
|
||||
|
||||
from django.db import models
|
||||
from django.contrib.auth.models import User
|
||||
|
||||
|
||||
class Note(models.Model):
|
||||
user = models.ForeignKey(User)
|
||||
pub_date = models.DateTimeField()
|
||||
title = models.CharField(max_length=200)
|
||||
body = models.TextField()
|
||||
|
||||
def __unicode__(self):
|
||||
return self.title
|
||||
|
||||
Finally, before starting with Haystack, you will want to choose a search
|
||||
backend to get started. There is a quick-start guide to
|
||||
:doc:`installing_search_engines`, though you may want to defer to each engine's
|
||||
official instructions.
|
||||
|
||||
|
||||
Installation
|
||||
=============
|
||||
|
||||
Use your favorite Python package manager to install the app from PyPI, e.g.
|
||||
|
||||
Example::
|
||||
|
||||
pip install django-haystack
|
||||
|
||||
|
||||
Configuration
|
||||
=============
|
||||
|
||||
Add Haystack To ``INSTALLED_APPS``
|
||||
----------------------------------
|
||||
|
||||
As with most Django applications, you should add Haystack to the
|
||||
``INSTALLED_APPS`` within your settings file (usually ``settings.py``).
|
||||
|
||||
Example::
|
||||
|
||||
INSTALLED_APPS = [
|
||||
'django.contrib.admin',
|
||||
'django.contrib.auth',
|
||||
'django.contrib.contenttypes',
|
||||
'django.contrib.sessions',
|
||||
'django.contrib.sites',
|
||||
|
||||
# Added.
|
||||
'haystack',
|
||||
|
||||
# Then your usual apps...
|
||||
'blog',
|
||||
]
|
||||
|
||||
|
||||
Modify Your ``settings.py``
|
||||
---------------------------
|
||||
|
||||
Within your ``settings.py``, you'll need to add a setting to indicate where your
|
||||
site configuration file will live and which backend to use, as well as other
|
||||
settings for that backend.
|
||||
|
||||
``HAYSTACK_CONNECTIONS`` is a required setting and should be at least one of
|
||||
the following:
|
||||
|
||||
Solr
|
||||
~~~~
|
||||
|
||||
Example::
|
||||
|
||||
HAYSTACK_CONNECTIONS = {
|
||||
'default': {
|
||||
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
|
||||
'URL': 'http://127.0.0.1:8983/solr'
|
||||
# ...or for multicore...
|
||||
# 'URL': 'http://127.0.0.1:8983/solr/mysite',
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
Elasticsearch
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
Example::
|
||||
|
||||
HAYSTACK_CONNECTIONS = {
|
||||
'default': {
|
||||
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
|
||||
'URL': 'http://127.0.0.1:9200/',
|
||||
'INDEX_NAME': 'haystack',
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
Whoosh
|
||||
~~~~~~
|
||||
|
||||
Requires setting ``PATH`` to the place on your filesystem where the
|
||||
Whoosh index should be located. Standard warnings about permissions and keeping
|
||||
it out of a place your webserver may serve documents out of apply.
|
||||
|
||||
Example::
|
||||
|
||||
import os
|
||||
HAYSTACK_CONNECTIONS = {
|
||||
'default': {
|
||||
'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine',
|
||||
'PATH': os.path.join(os.path.dirname(__file__), 'whoosh_index'),
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
Xapian
|
||||
~~~~~~
|
||||
|
||||
First, install the Xapian backend (via
|
||||
http://github.com/notanumber/xapian-haystack/tree/master) per the instructions
|
||||
included with the backend.
|
||||
|
||||
Requires setting ``PATH`` to the place on your filesystem where the
|
||||
Xapian index should be located. Standard warnings about permissions and keeping
|
||||
it out of a place your webserver may serve documents out of apply.
|
||||
|
||||
Example::
|
||||
|
||||
import os
|
||||
HAYSTACK_CONNECTIONS = {
|
||||
'default': {
|
||||
'ENGINE': 'xapian_backend.XapianEngine',
|
||||
'PATH': os.path.join(os.path.dirname(__file__), 'xapian_index'),
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
Simple
|
||||
~~~~~~
|
||||
|
||||
The ``simple`` backend using very basic matching via the database itself. It's
|
||||
not recommended for production use but it will return results.
|
||||
|
||||
.. warning::
|
||||
|
||||
This backend does *NOT* work like the other backends do. Data preparation
|
||||
does nothing & advanced filtering calls do not work. You really probably
|
||||
don't want this unless you're in an environment where you just want to
|
||||
silence Haystack.
|
||||
|
||||
Example::
|
||||
|
||||
HAYSTACK_CONNECTIONS = {
|
||||
'default': {
|
||||
'ENGINE': 'haystack.backends.simple_backend.SimpleEngine',
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
Handling Data
|
||||
=============
|
||||
|
||||
Creating ``SearchIndexes``
|
||||
--------------------------
|
||||
|
||||
``SearchIndex`` objects are the way Haystack determines what data should be
|
||||
placed in the search index and handles the flow of data in. You can think of
|
||||
them as being similar to Django ``Models`` or ``Forms`` in that they are
|
||||
field-based and manipulate/store data.
|
||||
|
||||
You generally create a unique ``SearchIndex`` for each type of ``Model`` you
|
||||
wish to index, though you can reuse the same ``SearchIndex`` between different
|
||||
models if you take care in doing so and your field names are very standardized.
|
||||
|
||||
To build a ``SearchIndex``, all that's necessary is to subclass both
|
||||
``indexes.SearchIndex`` & ``indexes.Indexable``,
|
||||
define the fields you want to store data with and define a ``get_model`` method.
|
||||
|
||||
We'll create the following ``NoteIndex`` to correspond to our ``Note``
|
||||
model. This code generally goes in a ``search_indexes.py`` file within the app
|
||||
it applies to, though that is not required. This allows
|
||||
Haystack to automatically pick it up. The ``NoteIndex`` should look like::
|
||||
|
||||
import datetime
|
||||
from haystack import indexes
|
||||
from myapp.models import Note
|
||||
|
||||
|
||||
class NoteIndex(indexes.SearchIndex, indexes.Indexable):
|
||||
text = indexes.CharField(document=True, use_template=True)
|
||||
author = indexes.CharField(model_attr='user')
|
||||
pub_date = indexes.DateTimeField(model_attr='pub_date')
|
||||
|
||||
def get_model(self):
|
||||
return Note
|
||||
|
||||
def index_queryset(self, using=None):
|
||||
"""Used when the entire index for model is updated."""
|
||||
return self.get_model().objects.filter(pub_date__lte=datetime.datetime.now())
|
||||
|
||||
Every ``SearchIndex`` requires there be one (and only one) field with
|
||||
``document=True``. This indicates to both Haystack and the search engine about
|
||||
which field is the primary field for searching within.
|
||||
|
||||
.. warning::
|
||||
|
||||
When you choose a ``document=True`` field, it should be consistently named
|
||||
across all of your ``SearchIndex`` classes to avoid confusing the backend.
|
||||
The convention is to name this field ``text``.
|
||||
|
||||
There is nothing special about the ``text`` field name used in all of the
|
||||
examples. It could be anything; you could call it ``pink_polka_dot`` and
|
||||
it won't matter. It's simply a convention to call it ``text``.
|
||||
|
||||
Additionally, we're providing ``use_template=True`` on the ``text`` field. This
|
||||
allows us to use a data template (rather than error-prone concatenation) to
|
||||
build the document the search engine will index. You’ll need to
|
||||
create a new template inside your template directory called
|
||||
``search/indexes/myapp/note_text.txt`` and place the following inside::
|
||||
|
||||
{{ object.title }}
|
||||
{{ object.user.get_full_name }}
|
||||
{{ object.body }}
|
||||
|
||||
In addition, we added several other fields (``author`` and ``pub_date``). These
|
||||
are useful when you want to provide additional filtering options. Haystack comes
|
||||
with a variety of ``SearchField`` classes to handle most types of data.
|
||||
|
||||
A common theme is to allow admin users to add future content but have it not
|
||||
display on the site until that future date is reached. We specify a custom
|
||||
``index_queryset`` method to prevent those future items from being indexed.
|
||||
|
||||
.. _Django admin site: http://docs.djangoproject.com/en/dev/ref/contrib/admin/
|
||||
|
||||
|
||||
Setting Up The Views
|
||||
====================
|
||||
|
||||
Add The ``SearchView`` To Your URLconf
|
||||
--------------------------------------
|
||||
|
||||
Within your URLconf, add the following line::
|
||||
|
||||
(r'^search/', include('haystack.urls')),
|
||||
|
||||
This will pull in the default URLconf for Haystack. It consists of a single
|
||||
URLconf that points to a ``SearchView`` instance. You can change this class's
|
||||
behavior by passing it any of several keyword arguments or override it entirely
|
||||
with your own view.
|
||||
|
||||
|
||||
Search Template
|
||||
---------------
|
||||
|
||||
Your search template (``search/search.html`` for the default case) will likely
|
||||
be very simple. The following is enough to get going (your template/block names
|
||||
will likely differ)::
|
||||
|
||||
{% extends 'base.html' %}
|
||||
|
||||
{% block content %}
|
||||
<h2>Search</h2>
|
||||
|
||||
<form method="get" action=".">
|
||||
<table>
|
||||
{{ form.as_table }}
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td>
|
||||
<input type="submit" value="Search">
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
{% if query %}
|
||||
<h3>Results</h3>
|
||||
|
||||
{% for result in page.object_list %}
|
||||
<p>
|
||||
<a href="{{ result.object.get_absolute_url }}">{{ result.object.title }}</a>
|
||||
</p>
|
||||
{% empty %}
|
||||
<p>No results found.</p>
|
||||
{% endfor %}
|
||||
|
||||
{% if page.has_previous or page.has_next %}
|
||||
<div>
|
||||
{% if page.has_previous %}<a href="?q={{ query }}&page={{ page.previous_page_number }}">{% endif %}« Previous{% if page.has_previous %}</a>{% endif %}
|
||||
|
|
||||
{% if page.has_next %}<a href="?q={{ query }}&page={{ page.next_page_number }}">{% endif %}Next »{% if page.has_next %}</a>{% endif %}
|
||||
</div>
|
||||
{% endif %}
|
||||
{% else %}
|
||||
{# Show some example queries to run, maybe query syntax, something else? #}
|
||||
{% endif %}
|
||||
</form>
|
||||
{% endblock %}
|
||||
|
||||
Note that the ``page.object_list`` is actually a list of ``SearchResult``
|
||||
objects instead of individual models. These objects have all the data returned
|
||||
from that record within the search index as well as score. They can also
|
||||
directly access the model for the result via ``{{ result.object }}``. So the
|
||||
``{{ result.object.title }}`` uses the actual ``Note`` object in the database
|
||||
and accesses its ``title`` field.
|
||||
|
||||
|
||||
Reindex
|
||||
-------
|
||||
|
||||
The final step, now that you have everything setup, is to put your data in
|
||||
from your database into the search index. Haystack ships with a management
|
||||
command to make this process easy.
|
||||
|
||||
.. note::
|
||||
|
||||
If you're using the Solr backend, you have an extra step. Solr's
|
||||
configuration is XML-based, so you'll need to manually regenerate the
|
||||
schema. You should run
|
||||
``./manage.py build_solr_schema`` first, drop the XML output in your
|
||||
Solr's ``schema.xml`` file and restart your Solr server.
|
||||
|
||||
Simply run ``./manage.py rebuild_index``. You'll get some totals of how many
|
||||
models were processed and placed in the index.
|
||||
|
||||
.. note::
|
||||
|
||||
Using the standard ``SearchIndex``, your search index content is only
|
||||
updated whenever you run either ``./manage.py update_index`` or start
|
||||
afresh with ``./manage.py rebuild_index``.
|
||||
|
||||
You should cron up a ``./manage.py update_index`` job at whatever interval
|
||||
works best for your site (using ``--age=<num_hours>`` reduces the number of
|
||||
things to update).
|
||||
|
||||
Alternatively, if you have low traffic and/or your search engine can handle
|
||||
it, the ``RealtimeSignalProcessor`` automatically handles updates/deletes
|
||||
for you.
|
||||
|
||||
|
||||
Complete!
|
||||
=========
|
||||
|
||||
You can now visit the search section of your site, enter a search query and
|
||||
receive search results back for the query! Congratulations!
|
||||
|
||||
|
||||
What's Next?
|
||||
============
|
||||
|
||||
This tutorial just scratches the surface of what Haystack provides. The
|
||||
``SearchQuerySet`` is the underpinning of all search in Haystack and provides
|
||||
a powerful, ``QuerySet``-like API (see :ref:`ref-searchqueryset-api`). You can
|
||||
use much more complicated ``SearchForms``/``SearchViews`` to give users a better
|
||||
UI (see :ref:`ref-views-and_forms`). And the :ref:`ref-best-practices` provides
|
||||
insight into non-obvious or advanced usages of Haystack.
|
|
@ -0,0 +1,18 @@
|
|||
.. _ref-utils:
|
||||
|
||||
=========
|
||||
Utilities
|
||||
=========
|
||||
|
||||
Included here are some of the general use bits included with Haystack.
|
||||
|
||||
|
||||
``get_identifier``
|
||||
------------------
|
||||
|
||||
.. function:: get_identifier(obj_or_string)
|
||||
|
||||
Gets an unique identifier for the object or a string representing the
|
||||
object.
|
||||
|
||||
If not overridden, uses ``<app_label>.<object_name>.<pk>``.
|
|
@ -0,0 +1,408 @@
|
|||
.. _ref-views-and_forms:
|
||||
|
||||
=============
|
||||
Views & Forms
|
||||
=============
|
||||
|
||||
.. note::
|
||||
|
||||
As of version 2.4 the views in ``haystack.views.SearchView`` are deprecated in
|
||||
favor of the new generic views in ``haystack.generic_views.SearchView``
|
||||
which use the standard Django `class-based views`_ which are available in
|
||||
every version of Django which is supported by Haystack.
|
||||
|
||||
.. _class-based views: https://docs.djangoproject.com/en/1.7/topics/class-based-views/
|
||||
|
||||
Haystack comes with some default, simple views & forms as well as some
|
||||
django-style views to help you get started and to cover the common cases.
|
||||
Included is a way to provide:
|
||||
|
||||
* Basic, query-only search.
|
||||
* Search by models.
|
||||
* Search with basic highlighted results.
|
||||
* Faceted search.
|
||||
* Search by models with basic highlighted results.
|
||||
|
||||
Most processing is done by the forms provided by Haystack via the ``search``
|
||||
method. As a result, all but the faceted types (see :doc:`faceting`) use the
|
||||
standard ``SearchView``.
|
||||
|
||||
There is very little coupling between the forms & the views (other than relying
|
||||
on the existence of a ``search`` method on the form), so you may interchangeably
|
||||
use forms and/or views anywhere within your own code.
|
||||
|
||||
Forms
|
||||
=====
|
||||
|
||||
.. currentmodule:: haystack.forms
|
||||
|
||||
``SearchForm``
|
||||
--------------
|
||||
|
||||
The most basic of the form types, this form consists of a single field, the
|
||||
``q`` field (for query). Upon searching, the form will take the cleaned contents
|
||||
of the ``q`` field and perform an ``auto_query`` on either the custom
|
||||
``SearchQuerySet`` you provide or off a default ``SearchQuerySet``.
|
||||
|
||||
To customize the ``SearchQuerySet`` the form will use, pass it a
|
||||
``searchqueryset`` parameter to the constructor with the ``SearchQuerySet``
|
||||
you'd like to use. If using this form in conjunction with a ``SearchView``,
|
||||
the form will receive whatever ``SearchQuerySet`` you provide to the view with
|
||||
no additional work needed.
|
||||
|
||||
The ``SearchForm`` also accepts a ``load_all`` parameter (``True`` or
|
||||
``False``), which determines how the database is queried when iterating through
|
||||
the results. This also is received automatically from the ``SearchView``.
|
||||
|
||||
All other forms in Haystack inherit (either directly or indirectly) from this
|
||||
form.
|
||||
|
||||
``HighlightedSearchForm``
|
||||
-------------------------
|
||||
|
||||
Identical to the ``SearchForm`` except that it tags the ``highlight`` method on
|
||||
to the end of the ``SearchQuerySet`` to enable highlighted results.
|
||||
|
||||
``ModelSearchForm``
|
||||
-------------------
|
||||
|
||||
This form adds new fields to form. It iterates through all registered models for
|
||||
the current ``SearchSite`` and provides a checkbox for each one. If no models
|
||||
are selected, all types will show up in the results.
|
||||
|
||||
``HighlightedModelSearchForm``
|
||||
------------------------------
|
||||
|
||||
Identical to the ``ModelSearchForm`` except that it tags the ``highlight``
|
||||
method on to the end of the ``SearchQuerySet`` to enable highlighted results on
|
||||
the selected models.
|
||||
|
||||
``FacetedSearchForm``
|
||||
---------------------
|
||||
|
||||
Identical to the ``SearchForm`` except that it adds a hidden ``selected_facets``
|
||||
field onto the form, allowing the form to narrow the results based on the facets
|
||||
chosen by the user.
|
||||
|
||||
Creating Your Own Form
|
||||
----------------------
|
||||
|
||||
The simplest way to go about creating your own form is to inherit from
|
||||
``SearchForm`` (or the desired parent) and extend the ``search`` method. By
|
||||
doing this, you save yourself most of the work of handling data correctly and
|
||||
stay API compatible with the ``SearchView``.
|
||||
|
||||
For example, let's say you're providing search with a user-selectable date range
|
||||
associated with it. You might create a form that looked as follows::
|
||||
|
||||
from django import forms
|
||||
from haystack.forms import SearchForm
|
||||
|
||||
|
||||
class DateRangeSearchForm(SearchForm):
|
||||
start_date = forms.DateField(required=False)
|
||||
end_date = forms.DateField(required=False)
|
||||
|
||||
def search(self):
|
||||
# First, store the SearchQuerySet received from other processing.
|
||||
sqs = super(DateRangeSearchForm, self).search()
|
||||
|
||||
if not self.is_valid():
|
||||
return self.no_query_found()
|
||||
|
||||
# Check to see if a start_date was chosen.
|
||||
if self.cleaned_data['start_date']:
|
||||
sqs = sqs.filter(pub_date__gte=self.cleaned_data['start_date'])
|
||||
|
||||
# Check to see if an end_date was chosen.
|
||||
if self.cleaned_data['end_date']:
|
||||
sqs = sqs.filter(pub_date__lte=self.cleaned_data['end_date'])
|
||||
|
||||
return sqs
|
||||
|
||||
This form adds two new fields for (optionally) choosing the start and end dates.
|
||||
Within the ``search`` method, we grab the results from the parent form's
|
||||
processing. Then, if a user has selected a start and/or end date, we apply that
|
||||
filtering. Finally, we simply return the ``SearchQuerySet``.
|
||||
|
||||
Views
|
||||
=====
|
||||
|
||||
.. currentmodule:: haystack.views
|
||||
|
||||
.. note::
|
||||
|
||||
As of version 2.4 the views in ``haystack.views.SearchView`` are deprecated in
|
||||
favor of the new generic views in ``haystack.generic_views.SearchView``
|
||||
which use the standard Django `class-based views`_ which are available in
|
||||
every version of Django which is supported by Haystack.
|
||||
|
||||
.. _class-based views: https://docs.djangoproject.com/en/1.7/topics/class-based-views/
|
||||
|
||||
New Django Class Based Views
|
||||
----------------------------
|
||||
|
||||
.. versionadded:: 2.4.0
|
||||
|
||||
The views in ``haystack.generic_views.SearchView`` inherit from Django’s standard
|
||||
`FormView <https://docs.djangoproject.com/en/1.7/ref/class-based-views/generic-editing/#formview>`_.
|
||||
The example views can be customized like any other Django class-based view as
|
||||
demonstrated in this example which filters the search results in ``get_queryset``::
|
||||
|
||||
# views.py
|
||||
from datetime import date
|
||||
|
||||
from haystack.generic_views import SearchView
|
||||
|
||||
class MySearchView(SearchView):
|
||||
"""My custom search view."""
|
||||
|
||||
def get_queryset(self):
|
||||
queryset = super(MySearchView, self).get_queryset()
|
||||
# further filter queryset based on some set of criteria
|
||||
return queryset.filter(pub_date__gte=date(2015, 1, 1))
|
||||
|
||||
def get_context_data(self, *args, **kwargs):
|
||||
context = super(MySearchView, self).get_context_data(*args, **kwargs)
|
||||
# do something
|
||||
return context
|
||||
|
||||
# urls.py
|
||||
|
||||
urlpatterns = patterns('',
|
||||
url(r'^/search/?$', MySearchView.as_view(), name='search_view'),
|
||||
)
|
||||
|
||||
|
||||
Upgrading
|
||||
~~~~~~~~~
|
||||
|
||||
Upgrading from basic usage of the old-style views to new-style views is usually as simple as:
|
||||
|
||||
#. Create new views under ``views.py`` subclassing ``haystack.generic_views.SearchView``
|
||||
or ``haystack.generic_views.FacetedSearchView``
|
||||
#. Move all parameters of your old-style views from your ``urls.py`` to attributes on
|
||||
your new views. This will require renaming ``searchqueryset`` to ``queryset`` and
|
||||
``template`` to ``template_name``
|
||||
#. Review your templates and replace the ``page`` variable with ``page_object``
|
||||
|
||||
Here's an example::
|
||||
|
||||
### old-style views...
|
||||
# urls.py
|
||||
|
||||
sqs = SearchQuerySet().filter(author='john')
|
||||
|
||||
urlpatterns = patterns('haystack.views',
|
||||
url(r'^$', SearchView(
|
||||
template='my/special/path/john_search.html',
|
||||
searchqueryset=sqs,
|
||||
form_class=SearchForm
|
||||
), name='haystack_search'),
|
||||
)
|
||||
|
||||
### new-style views...
|
||||
# views.py
|
||||
|
||||
class JohnSearchView(SearchView):
|
||||
template_name = 'my/special/path/john_search.html'
|
||||
queryset = SearchQuerySet().filter(author='john')
|
||||
form_class = SearchForm
|
||||
|
||||
# urls.py
|
||||
from myapp.views import JohnSearchView
|
||||
|
||||
urlpatterns = patterns('',
|
||||
url(r'^$', JohnSearchView.as_view(), name='haystack_search'),
|
||||
)
|
||||
|
||||
|
||||
If your views overrode methods on the old-style SearchView, you will need to
|
||||
refactor those methods to the equivalents on Django's generic views. For example,
|
||||
if you previously used ``extra_context()`` to add additional template variables or
|
||||
preprocess the values returned by Haystack, that code would move to ``get_context_data``
|
||||
|
||||
+-----------------------+-------------------------------------------+
|
||||
| Old Method | New Method |
|
||||
+=======================+===========================================+
|
||||
| ``extra_context()`` | `get_context_data()`_ |
|
||||
+-----------------------+-------------------------------------------+
|
||||
| ``create_response()`` | `dispatch()`_ or ``get()`` and ``post()`` |
|
||||
+-----------------------+-------------------------------------------+
|
||||
| ``get_query()`` | `get_queryset()`_ |
|
||||
+-----------------------+-------------------------------------------+
|
||||
|
||||
.. _get_context_data(): https://docs.djangoproject.com/en/1.7/ref/class-based-views/mixins-simple/#django.views.generic.base.ContextMixin.get_context_data
|
||||
.. _dispatch(): https://docs.djangoproject.com/en/1.7/ref/class-based-views/base/#django.views.generic.base.View.dispatch
|
||||
.. _get_queryset(): https://docs.djangoproject.com/en/1.7/ref/class-based-views/mixins-multiple-object/#django.views.generic.list.MultipleObjectMixin.get_queryset
|
||||
|
||||
|
||||
Old-Style Views
|
||||
---------------
|
||||
|
||||
.. deprecated:: 2.4.0
|
||||
|
||||
Haystack comes bundled with three views, the class-based views (``SearchView`` &
|
||||
``FacetedSearchView``) and a traditional functional view (``basic_search``).
|
||||
|
||||
The class-based views provide for easy extension should you need to alter the
|
||||
way a view works. Except in the case of faceting (again, see :doc:`faceting`),
|
||||
the ``SearchView`` works interchangeably with all other forms provided by
|
||||
Haystack.
|
||||
|
||||
The functional view provides an example of how Haystack can be used in more
|
||||
traditional settings or as an example of how to write a more complex custom
|
||||
view. It is also thread-safe.
|
||||
|
||||
``SearchView(template=None, load_all=True, form_class=None, searchqueryset=None, context_class=RequestContext, results_per_page=None)``
|
||||
---------------------------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
The ``SearchView`` is designed to be easy/flexible enough to override common
|
||||
changes as well as being internally abstracted so that only altering a specific
|
||||
portion of the code should be easy to do.
|
||||
|
||||
Without touching any of the internals of the ``SearchView``, you can modify
|
||||
which template is used, which form class should be instantiated to search with,
|
||||
what ``SearchQuerySet`` to use in the event you wish to pre-filter the results.
|
||||
what ``Context``-style object to use in the response and the ``load_all``
|
||||
performance optimization to reduce hits on the database. These options can (and
|
||||
generally should) be overridden at the URLconf level. For example, to have a
|
||||
custom search limited to the 'John' author, displaying all models to search by
|
||||
and specifying a custom template (``my/special/path/john_search.html``), your
|
||||
URLconf should look something like::
|
||||
|
||||
from django.conf.urls.defaults import *
|
||||
from haystack.forms import ModelSearchForm
|
||||
from haystack.query import SearchQuerySet
|
||||
from haystack.views import SearchView
|
||||
|
||||
sqs = SearchQuerySet().filter(author='john')
|
||||
|
||||
# Without threading...
|
||||
urlpatterns = patterns('haystack.views',
|
||||
url(r'^$', SearchView(
|
||||
template='my/special/path/john_search.html',
|
||||
searchqueryset=sqs,
|
||||
form_class=SearchForm
|
||||
), name='haystack_search'),
|
||||
)
|
||||
|
||||
# With threading...
|
||||
from haystack.views import SearchView, search_view_factory
|
||||
|
||||
urlpatterns = patterns('haystack.views',
|
||||
url(r'^$', search_view_factory(
|
||||
view_class=SearchView,
|
||||
template='my/special/path/john_search.html',
|
||||
searchqueryset=sqs,
|
||||
form_class=ModelSearchForm
|
||||
), name='haystack_search'),
|
||||
)
|
||||
|
||||
.. warning::
|
||||
|
||||
The standard ``SearchView`` is not thread-safe. Use the
|
||||
``search_view_factory`` function, which returns thread-safe instances of
|
||||
``SearchView``.
|
||||
|
||||
By default, if you don't specify a ``form_class``, the view will use the
|
||||
``haystack.forms.ModelSearchForm`` form.
|
||||
|
||||
Beyond this customizations, you can create your own ``SearchView`` and
|
||||
extend/override the following methods to change the functionality.
|
||||
|
||||
``__call__(self, request)``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Generates the actual response to the search.
|
||||
|
||||
Relies on internal, overridable methods to construct the response. You generally
|
||||
should avoid altering this method unless you need to change the flow of the
|
||||
methods or to add a new method into the processing.
|
||||
|
||||
``build_form(self, form_kwargs=None)``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Instantiates the form the class should use to process the search query.
|
||||
|
||||
Optionally accepts a dictionary of parameters that are passed on to the
|
||||
form's ``__init__``. You can use this to lightly customize the form.
|
||||
|
||||
You should override this if you write a custom form that needs special
|
||||
parameters for instantiation.
|
||||
|
||||
``get_query(self)``
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Returns the query provided by the user.
|
||||
|
||||
Returns an empty string if the query is invalid. This pulls the cleaned query
|
||||
from the form, via the ``q`` field, for use elsewhere within the ``SearchView``.
|
||||
This is used to populate the ``query`` context variable.
|
||||
|
||||
``get_results(self)``
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Fetches the results via the form.
|
||||
|
||||
Returns an empty list if there's no query to search with. This method relies on
|
||||
the form to do the heavy lifting as much as possible.
|
||||
|
||||
``build_page(self)``
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Paginates the results appropriately.
|
||||
|
||||
In case someone does not want to use Django's built-in pagination, it
|
||||
should be a simple matter to override this method to do what they would
|
||||
like.
|
||||
|
||||
``extra_context(self)``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Allows the addition of more context variables as needed. Must return a
|
||||
dictionary whose contents will add to or overwrite the other variables in the
|
||||
context.
|
||||
|
||||
``create_response(self)``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Generates the actual HttpResponse to send back to the user. It builds the page,
|
||||
creates the context and renders the response for all the aforementioned
|
||||
processing.
|
||||
|
||||
|
||||
``basic_search(request, template='search/search.html', load_all=True, form_class=ModelSearchForm, searchqueryset=None, context_class=RequestContext, extra_context=None, results_per_page=None)``
|
||||
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
The ``basic_search`` tries to provide most of the same functionality as the
|
||||
class-based views but resembles a more traditional generic view. It's both a
|
||||
working view if you prefer not to use the class-based views as well as a good
|
||||
starting point for writing highly custom views.
|
||||
|
||||
Since it is all one function, the only means of extension are passing in
|
||||
kwargs, similar to the way generic views work.
|
||||
|
||||
|
||||
Creating Your Own View
|
||||
----------------------
|
||||
|
||||
As with the forms, inheritance is likely your best bet. In this case, the
|
||||
``FacetedSearchView`` is a perfect example of how to extend the existing
|
||||
``SearchView``. The complete code for the ``FacetedSearchView`` looks like::
|
||||
|
||||
class FacetedSearchView(SearchView):
|
||||
def extra_context(self):
|
||||
extra = super(FacetedSearchView, self).extra_context()
|
||||
|
||||
if self.results == []:
|
||||
extra['facets'] = self.form.search().facet_counts()
|
||||
else:
|
||||
extra['facets'] = self.results.facet_counts()
|
||||
|
||||
return extra
|
||||
|
||||
It updates the name of the class (generally for documentation purposes) and
|
||||
adds the facets from the ``SearchQuerySet`` to the context as the ``facets``
|
||||
variable. As with the custom form example above, it relies on the parent class
|
||||
to handle most of the processing and extends that only where needed.
|
|
@ -0,0 +1,357 @@
|
|||
.. _ref-who-uses:
|
||||
|
||||
Sites Using Haystack
|
||||
====================
|
||||
|
||||
The following sites are a partial list of people using Haystack. I'm always
|
||||
interested in adding more sites, so please find me (``daniellindsley``) via
|
||||
IRC or the mailing list thread.
|
||||
|
||||
|
||||
LJWorld/Lawrence.com/KUSports
|
||||
-----------------------------
|
||||
|
||||
For all things search-related.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* http://www2.ljworld.com/search/
|
||||
* http://www2.ljworld.com/search/vertical/news.story/
|
||||
* http://www2.ljworld.com/marketplace/
|
||||
* http://www.lawrence.com/search/
|
||||
* http://www.kusports.com/search/
|
||||
|
||||
|
||||
AltWeeklies
|
||||
-----------
|
||||
|
||||
Providing an API to story aggregation.
|
||||
|
||||
Using: Whoosh
|
||||
|
||||
* http://www.northcoastjournal.com/altweeklies/documentation/
|
||||
|
||||
|
||||
Trapeze
|
||||
-------
|
||||
|
||||
Various projects.
|
||||
|
||||
Using: Xapian
|
||||
|
||||
* http://www.trapeze.com/
|
||||
* http://www.windmobile.ca/
|
||||
* http://www.bonefishgrill.com/
|
||||
* http://www.canadiantire.ca/ (Portions of)
|
||||
|
||||
|
||||
Vickerey.com
|
||||
------------
|
||||
|
||||
For (really well done) search & faceting.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* http://store.vickerey.com/products/search/
|
||||
|
||||
|
||||
Eldarion
|
||||
--------
|
||||
|
||||
Various projects.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* http://eldarion.com/
|
||||
|
||||
|
||||
Sunlight Labs
|
||||
-------------
|
||||
|
||||
For general search.
|
||||
|
||||
Using: Whoosh & Solr
|
||||
|
||||
* http://sunlightlabs.com/
|
||||
* http://subsidyscope.com/
|
||||
|
||||
|
||||
NASA
|
||||
----
|
||||
|
||||
For general search.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* An internal site called SMD Spacebook 1.1.
|
||||
* http://science.nasa.gov/
|
||||
|
||||
|
||||
AllForLocal
|
||||
-----------
|
||||
|
||||
For general search.
|
||||
|
||||
* http://www.allforlocal.com/
|
||||
|
||||
|
||||
HUGE
|
||||
----
|
||||
|
||||
Various projects.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* http://hugeinc.com/
|
||||
* http://houselogic.com/
|
||||
|
||||
|
||||
Brick Design
|
||||
------------
|
||||
|
||||
For search on Explore.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* http://bricksf.com/
|
||||
* http://explore.org/
|
||||
|
||||
|
||||
Winding Road
|
||||
------------
|
||||
|
||||
For general search.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* http://www.windingroad.com/
|
||||
|
||||
|
||||
Reddit
|
||||
------
|
||||
|
||||
For Reddit Gifts.
|
||||
|
||||
Using: Whoosh
|
||||
|
||||
* http://redditgifts.com/
|
||||
|
||||
|
||||
Pegasus News
|
||||
------------
|
||||
|
||||
For general search.
|
||||
|
||||
Using: Xapian
|
||||
|
||||
* http://www.pegasusnews.com/
|
||||
|
||||
|
||||
Rampframe
|
||||
---------
|
||||
|
||||
For general search.
|
||||
|
||||
Using: Xapian
|
||||
|
||||
* http://www.rampframe.com/
|
||||
|
||||
|
||||
Forkinit
|
||||
--------
|
||||
|
||||
For general search, model-specific search and suggestions via MLT.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* http://forkinit.com/
|
||||
|
||||
|
||||
Structured Abstraction
|
||||
----------------------
|
||||
|
||||
For general search.
|
||||
|
||||
Using: Xapian
|
||||
|
||||
* http://www.structuredabstraction.com/
|
||||
* http://www.delivergood.org/
|
||||
|
||||
|
||||
CustomMade
|
||||
----------
|
||||
|
||||
For general search.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* http://www.custommade.com/
|
||||
|
||||
|
||||
University of the Andes, Dept. of Political Science
|
||||
---------------------------------------------------
|
||||
|
||||
For general search & section-specific search. Developed by Monoku.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* http://www.congresovisible.org/
|
||||
* http://www.monoku.com/
|
||||
|
||||
|
||||
Christchurch Art Gallery
|
||||
------------------------
|
||||
|
||||
For general search & section-specific search.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* http://christchurchartgallery.org.nz/search/
|
||||
* http://christchurchartgallery.org.nz/collection/browse/
|
||||
|
||||
|
||||
DevCheatSheet.com
|
||||
-----------------
|
||||
|
||||
For general search.
|
||||
|
||||
Using: Xapian
|
||||
|
||||
* http://devcheatsheet.com/
|
||||
|
||||
|
||||
TodasLasRecetas
|
||||
---------------
|
||||
|
||||
For search, faceting & More Like This.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* http://www.todaslasrecetas.es/receta/s/?q=langostinos
|
||||
* http://www.todaslasrecetas.es/receta/9526/brochetas-de-langostinos
|
||||
|
||||
|
||||
AstroBin
|
||||
--------
|
||||
|
||||
For general search.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* http://www.astrobin.com/
|
||||
|
||||
|
||||
European Paper Company
|
||||
----------------------
|
||||
|
||||
For general search.
|
||||
|
||||
Using: ???
|
||||
|
||||
* http://europeanpaper.com/
|
||||
|
||||
|
||||
mtn-op
|
||||
------
|
||||
|
||||
For general search.
|
||||
|
||||
Using: ???
|
||||
|
||||
* http://mountain-op.com/
|
||||
|
||||
|
||||
Crate
|
||||
-----
|
||||
|
||||
Crate is a PyPI mirror/replacement. It's using Haystack to power all search &
|
||||
faceted navigation on the site.
|
||||
|
||||
Using: Elasticsearch
|
||||
|
||||
* https://crate.io/
|
||||
|
||||
|
||||
Pix Populi
|
||||
----------
|
||||
|
||||
Pix Populi is a popular French photo sharing site.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* http://www.pix-populi.fr/
|
||||
|
||||
|
||||
LocalWiki
|
||||
----------
|
||||
|
||||
LocalWiki is a tool for collaborating in local, geographic communities.
|
||||
It's using Haystack to power search on every LocalWiki instance.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* http://localwiki.org/
|
||||
|
||||
|
||||
Pitchup
|
||||
-------
|
||||
|
||||
For faceting, geo and autocomplete.
|
||||
|
||||
Using: ???
|
||||
|
||||
* http://www.pitchup.com/search/
|
||||
|
||||
|
||||
Gidsy
|
||||
-----
|
||||
|
||||
Gidsy makes it easy for anyone to organize and find exciting things
|
||||
to do everywhere in the world.
|
||||
|
||||
For activity search, area pages, forums and private messages.
|
||||
|
||||
Using: Elasticsearch
|
||||
|
||||
* https://gidsy.com/
|
||||
* https://gidsy.com/search/
|
||||
* https://gidsy.com/forum/
|
||||
|
||||
|
||||
GroundCity
|
||||
----------
|
||||
|
||||
Groundcity is a Romanian dynamic real estate site.
|
||||
|
||||
For real estate, forums and comments.
|
||||
|
||||
Using: Whoosh
|
||||
|
||||
* http://groundcity.ro/cautare/
|
||||
|
||||
|
||||
Docket Alarm
|
||||
------------
|
||||
|
||||
Docket Alarm allows people to search court dockets across
|
||||
the country. With it, you can search court dockets in the International Trade
|
||||
Commission (ITC), the Patent Trial and Appeal Board (PTAB) and All Federal
|
||||
Courts.
|
||||
|
||||
Using: Elasticsearch
|
||||
|
||||
* https://www.docketalarm.com/search/ITC
|
||||
* https://www.docketalarm.com/search/PTAB
|
||||
* https://www.docketalarm.com/search/dockets
|
||||
|
||||
|
||||
Educreations
|
||||
-------------
|
||||
|
||||
Educreations makes it easy for anyone to teach what they know and learn
|
||||
what they don't with a recordable whiteboard. Haystack is used to
|
||||
provide search across users and lessons.
|
||||
|
||||
Using: Solr
|
||||
|
||||
* http://www.educreations.com/browse/
|
|
@ -0,0 +1,71 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
import logging
|
||||
|
||||
from django.conf import settings
|
||||
from django.core.exceptions import ImproperlyConfigured
|
||||
|
||||
from haystack.constants import DEFAULT_ALIAS
|
||||
from haystack import signals
|
||||
from haystack.utils import loading
|
||||
|
||||
|
||||
__author__ = 'Daniel Lindsley'
|
||||
__version__ = (2, 4, 0)
|
||||
|
||||
|
||||
# Setup default logging.
|
||||
log = logging.getLogger('haystack')
|
||||
stream = logging.StreamHandler()
|
||||
stream.setLevel(logging.INFO)
|
||||
log.addHandler(stream)
|
||||
|
||||
|
||||
# Help people clean up from 1.X.
|
||||
if hasattr(settings, 'HAYSTACK_SITECONF'):
|
||||
raise ImproperlyConfigured('The HAYSTACK_SITECONF setting is no longer used & can be removed.')
|
||||
if hasattr(settings, 'HAYSTACK_SEARCH_ENGINE'):
|
||||
raise ImproperlyConfigured('The HAYSTACK_SEARCH_ENGINE setting has been replaced with HAYSTACK_CONNECTIONS.')
|
||||
if hasattr(settings, 'HAYSTACK_ENABLE_REGISTRATIONS'):
|
||||
raise ImproperlyConfigured('The HAYSTACK_ENABLE_REGISTRATIONS setting is no longer used & can be removed.')
|
||||
if hasattr(settings, 'HAYSTACK_INCLUDE_SPELLING'):
|
||||
raise ImproperlyConfigured('The HAYSTACK_INCLUDE_SPELLING setting is now a per-backend setting & belongs in HAYSTACK_CONNECTIONS.')
|
||||
|
||||
|
||||
# Check the 2.X+ bits.
|
||||
if not hasattr(settings, 'HAYSTACK_CONNECTIONS'):
|
||||
raise ImproperlyConfigured('The HAYSTACK_CONNECTIONS setting is required.')
|
||||
if DEFAULT_ALIAS not in settings.HAYSTACK_CONNECTIONS:
|
||||
raise ImproperlyConfigured("The default alias '%s' must be included in the HAYSTACK_CONNECTIONS setting." % DEFAULT_ALIAS)
|
||||
|
||||
# Load the connections.
|
||||
connections = loading.ConnectionHandler(settings.HAYSTACK_CONNECTIONS)
|
||||
|
||||
# Load the router(s).
|
||||
connection_router = loading.ConnectionRouter()
|
||||
|
||||
if hasattr(settings, 'HAYSTACK_ROUTERS'):
|
||||
if not isinstance(settings.HAYSTACK_ROUTERS, (list, tuple)):
|
||||
raise ImproperlyConfigured("The HAYSTACK_ROUTERS setting must be either a list or tuple.")
|
||||
|
||||
connection_router = loading.ConnectionRouter(settings.HAYSTACK_ROUTERS)
|
||||
|
||||
# Setup the signal processor.
|
||||
signal_processor_path = getattr(settings, 'HAYSTACK_SIGNAL_PROCESSOR', 'haystack.signals.BaseSignalProcessor')
|
||||
signal_processor_class = loading.import_class(signal_processor_path)
|
||||
signal_processor = signal_processor_class(connections, connection_router)
|
||||
|
||||
|
||||
# Per-request, reset the ghetto query log.
|
||||
# Probably not extraordinarily thread-safe but should only matter when
|
||||
# DEBUG = True.
|
||||
def reset_search_queries(**kwargs):
|
||||
for conn in connections.all():
|
||||
conn.reset_queries()
|
||||
|
||||
|
||||
if settings.DEBUG:
|
||||
from django.core import signals as django_signals
|
||||
django_signals.request_started.connect(reset_search_queries)
|
|
@ -0,0 +1,163 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
from django import template
|
||||
from django.contrib.admin.options import csrf_protect_m, ModelAdmin
|
||||
from django.contrib.admin.views.main import ChangeList, SEARCH_VAR
|
||||
from django.core.exceptions import PermissionDenied
|
||||
from django.core.paginator import InvalidPage, Paginator
|
||||
from django.shortcuts import render_to_response
|
||||
from django.utils.translation import ungettext
|
||||
|
||||
from haystack import connections
|
||||
from haystack.query import SearchQuerySet
|
||||
from haystack.utils import get_model_ct_tuple
|
||||
|
||||
try:
|
||||
from django.utils.encoding import force_text
|
||||
except ImportError:
|
||||
from django.utils.encoding import force_unicode as force_text
|
||||
|
||||
|
||||
def list_max_show_all(changelist):
|
||||
"""
|
||||
Returns the maximum amount of results a changelist can have for the
|
||||
"Show all" link to be displayed in a manner compatible with both Django
|
||||
1.4 and 1.3. See Django ticket #15997 for details.
|
||||
"""
|
||||
try:
|
||||
# This import is available in Django 1.3 and below
|
||||
from django.contrib.admin.views.main import MAX_SHOW_ALL_ALLOWED
|
||||
return MAX_SHOW_ALL_ALLOWED
|
||||
except ImportError:
|
||||
return changelist.list_max_show_all
|
||||
|
||||
|
||||
class SearchChangeList(ChangeList):
|
||||
def __init__(self, **kwargs):
|
||||
self.haystack_connection = kwargs.pop('haystack_connection', 'default')
|
||||
super(SearchChangeList, self).__init__(**kwargs)
|
||||
|
||||
def get_results(self, request):
|
||||
if not SEARCH_VAR in request.GET:
|
||||
return super(SearchChangeList, self).get_results(request)
|
||||
|
||||
# Note that pagination is 0-based, not 1-based.
|
||||
sqs = SearchQuerySet(self.haystack_connection).models(self.model).auto_query(request.GET[SEARCH_VAR]).load_all()
|
||||
|
||||
paginator = Paginator(sqs, self.list_per_page)
|
||||
# Get the number of objects, with admin filters applied.
|
||||
result_count = paginator.count
|
||||
full_result_count = SearchQuerySet(self.haystack_connection).models(self.model).all().count()
|
||||
|
||||
can_show_all = result_count <= list_max_show_all(self)
|
||||
multi_page = result_count > self.list_per_page
|
||||
|
||||
# Get the list of objects to display on this page.
|
||||
try:
|
||||
result_list = paginator.page(self.page_num + 1).object_list
|
||||
# Grab just the Django models, since that's what everything else is
|
||||
# expecting.
|
||||
result_list = [result.object for result in result_list]
|
||||
except InvalidPage:
|
||||
result_list = ()
|
||||
|
||||
self.result_count = result_count
|
||||
self.full_result_count = full_result_count
|
||||
self.result_list = result_list
|
||||
self.can_show_all = can_show_all
|
||||
self.multi_page = multi_page
|
||||
self.paginator = paginator
|
||||
|
||||
|
||||
class SearchModelAdminMixin(object):
|
||||
# haystack connection to use for searching
|
||||
haystack_connection = 'default'
|
||||
|
||||
@csrf_protect_m
|
||||
def changelist_view(self, request, extra_context=None):
|
||||
if not self.has_change_permission(request, None):
|
||||
raise PermissionDenied
|
||||
|
||||
if not SEARCH_VAR in request.GET:
|
||||
# Do the usual song and dance.
|
||||
return super(SearchModelAdminMixin, self).changelist_view(request, extra_context)
|
||||
|
||||
# Do a search of just this model and populate a Changelist with the
|
||||
# returned bits.
|
||||
if not self.model in connections[self.haystack_connection].get_unified_index().get_indexed_models():
|
||||
# Oops. That model isn't being indexed. Return the usual
|
||||
# behavior instead.
|
||||
return super(SearchModelAdminMixin, self).changelist_view(request, extra_context)
|
||||
|
||||
# So. Much. Boilerplate.
|
||||
# Why copy-paste a few lines when you can copy-paste TONS of lines?
|
||||
list_display = list(self.list_display)
|
||||
|
||||
kwargs = {
|
||||
'haystack_connection': self.haystack_connection,
|
||||
'request': request,
|
||||
'model': self.model,
|
||||
'list_display': list_display,
|
||||
'list_display_links': self.list_display_links,
|
||||
'list_filter': self.list_filter,
|
||||
'date_hierarchy': self.date_hierarchy,
|
||||
'search_fields': self.search_fields,
|
||||
'list_select_related': self.list_select_related,
|
||||
'list_per_page': self.list_per_page,
|
||||
'list_editable': self.list_editable,
|
||||
'model_admin': self
|
||||
}
|
||||
|
||||
# Django 1.4 compatibility.
|
||||
if hasattr(self, 'list_max_show_all'):
|
||||
kwargs['list_max_show_all'] = self.list_max_show_all
|
||||
|
||||
changelist = SearchChangeList(**kwargs)
|
||||
formset = changelist.formset = None
|
||||
media = self.media
|
||||
|
||||
# Build the action form and populate it with available actions.
|
||||
# Check actions to see if any are available on this changelist
|
||||
actions = self.get_actions(request)
|
||||
if actions:
|
||||
action_form = self.action_form(auto_id=None)
|
||||
action_form.fields['action'].choices = self.get_action_choices(request)
|
||||
else:
|
||||
action_form = None
|
||||
|
||||
selection_note = ungettext('0 of %(count)d selected',
|
||||
'of %(count)d selected', len(changelist.result_list))
|
||||
selection_note_all = ungettext('%(total_count)s selected',
|
||||
'All %(total_count)s selected', changelist.result_count)
|
||||
|
||||
context = {
|
||||
'module_name': force_text(self.model._meta.verbose_name_plural),
|
||||
'selection_note': selection_note % {'count': len(changelist.result_list)},
|
||||
'selection_note_all': selection_note_all % {'total_count': changelist.result_count},
|
||||
'title': changelist.title,
|
||||
'is_popup': changelist.is_popup,
|
||||
'cl': changelist,
|
||||
'media': media,
|
||||
'has_add_permission': self.has_add_permission(request),
|
||||
# More Django 1.4 compatibility
|
||||
'root_path': getattr(self.admin_site, 'root_path', None),
|
||||
'app_label': self.model._meta.app_label,
|
||||
'action_form': action_form,
|
||||
'actions_on_top': self.actions_on_top,
|
||||
'actions_on_bottom': self.actions_on_bottom,
|
||||
'actions_selection_counter': getattr(self, 'actions_selection_counter', 0),
|
||||
}
|
||||
context.update(extra_context or {})
|
||||
context_instance = template.RequestContext(request, current_app=self.admin_site.name)
|
||||
app_name, model_name = get_model_ct_tuple(self.model)
|
||||
return render_to_response(self.change_list_template or [
|
||||
'admin/%s/%s/change_list.html' % (app_name, model_name),
|
||||
'admin/%s/change_list.html' % app_name,
|
||||
'admin/change_list.html'
|
||||
], context, context_instance=context_instance)
|
||||
|
||||
|
||||
class SearchModelAdmin(SearchModelAdminMixin, ModelAdmin):
|
||||
pass
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,944 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
import datetime
|
||||
import re
|
||||
import warnings
|
||||
|
||||
from django.conf import settings
|
||||
from django.core.exceptions import ImproperlyConfigured
|
||||
from django.utils import six
|
||||
|
||||
import haystack
|
||||
from haystack.backends import BaseEngine, BaseSearchBackend, BaseSearchQuery, log_query
|
||||
from haystack.constants import DEFAULT_OPERATOR, DJANGO_CT, DJANGO_ID, ID
|
||||
from haystack.exceptions import MissingDependency, MoreLikeThisError, SkipDocument
|
||||
from haystack.inputs import Clean, Exact, PythonData, Raw
|
||||
from haystack.models import SearchResult
|
||||
from haystack.utils import log as logging
|
||||
from haystack.utils import get_identifier, get_model_ct
|
||||
from haystack.utils.app_loading import haystack_get_model
|
||||
|
||||
try:
|
||||
import elasticsearch
|
||||
from elasticsearch.helpers import bulk_index
|
||||
from elasticsearch.exceptions import NotFoundError
|
||||
except ImportError:
|
||||
raise MissingDependency("The 'elasticsearch' backend requires the installation of 'elasticsearch'. Please refer to the documentation.")
|
||||
|
||||
|
||||
DATETIME_REGEX = re.compile(
|
||||
r'^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})T'
|
||||
r'(?P<hour>\d{2}):(?P<minute>\d{2}):(?P<second>\d{2})(\.\d+)?$')
|
||||
|
||||
|
||||
class ElasticsearchSearchBackend(BaseSearchBackend):
|
||||
# Word reserved by Elasticsearch for special use.
|
||||
RESERVED_WORDS = (
|
||||
'AND',
|
||||
'NOT',
|
||||
'OR',
|
||||
'TO',
|
||||
)
|
||||
|
||||
# Characters reserved by Elasticsearch for special use.
|
||||
# The '\\' must come first, so as not to overwrite the other slash replacements.
|
||||
RESERVED_CHARACTERS = (
|
||||
'\\', '+', '-', '&&', '||', '!', '(', ')', '{', '}',
|
||||
'[', ']', '^', '"', '~', '*', '?', ':', '/',
|
||||
)
|
||||
|
||||
# Settings to add an n-gram & edge n-gram analyzer.
|
||||
DEFAULT_SETTINGS = {
|
||||
'settings': {
|
||||
"analysis": {
|
||||
"analyzer": {
|
||||
"ngram_analyzer": {
|
||||
"type": "custom",
|
||||
"tokenizer": "standard",
|
||||
"filter": ["haystack_ngram", "lowercase"]
|
||||
},
|
||||
"edgengram_analyzer": {
|
||||
"type": "custom",
|
||||
"tokenizer": "standard",
|
||||
"filter": ["haystack_edgengram", "lowercase"]
|
||||
}
|
||||
},
|
||||
"tokenizer": {
|
||||
"haystack_ngram_tokenizer": {
|
||||
"type": "nGram",
|
||||
"min_gram": 3,
|
||||
"max_gram": 15,
|
||||
},
|
||||
"haystack_edgengram_tokenizer": {
|
||||
"type": "edgeNGram",
|
||||
"min_gram": 2,
|
||||
"max_gram": 15,
|
||||
"side": "front"
|
||||
}
|
||||
},
|
||||
"filter": {
|
||||
"haystack_ngram": {
|
||||
"type": "nGram",
|
||||
"min_gram": 3,
|
||||
"max_gram": 15
|
||||
},
|
||||
"haystack_edgengram": {
|
||||
"type": "edgeNGram",
|
||||
"min_gram": 2,
|
||||
"max_gram": 15
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
def __init__(self, connection_alias, **connection_options):
|
||||
super(ElasticsearchSearchBackend, self).__init__(connection_alias, **connection_options)
|
||||
|
||||
if not 'URL' in connection_options:
|
||||
raise ImproperlyConfigured("You must specify a 'URL' in your settings for connection '%s'." % connection_alias)
|
||||
|
||||
if not 'INDEX_NAME' in connection_options:
|
||||
raise ImproperlyConfigured("You must specify a 'INDEX_NAME' in your settings for connection '%s'." % connection_alias)
|
||||
|
||||
self.conn = elasticsearch.Elasticsearch(connection_options['URL'], timeout=self.timeout, **connection_options.get('KWARGS', {}))
|
||||
self.index_name = connection_options['INDEX_NAME']
|
||||
self.log = logging.getLogger('haystack')
|
||||
self.setup_complete = False
|
||||
self.existing_mapping = {}
|
||||
|
||||
def setup(self):
|
||||
"""
|
||||
Defers loading until needed.
|
||||
"""
|
||||
# Get the existing mapping & cache it. We'll compare it
|
||||
# during the ``update`` & if it doesn't match, we'll put the new
|
||||
# mapping.
|
||||
try:
|
||||
self.existing_mapping = self.conn.indices.get_mapping(index=self.index_name)
|
||||
except NotFoundError:
|
||||
pass
|
||||
except Exception:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
unified_index = haystack.connections[self.connection_alias].get_unified_index()
|
||||
self.content_field_name, field_mapping = self.build_schema(unified_index.all_searchfields())
|
||||
current_mapping = {
|
||||
'modelresult': {
|
||||
'properties': field_mapping,
|
||||
'_boost': {
|
||||
'name': 'boost',
|
||||
'null_value': 1.0
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if current_mapping != self.existing_mapping:
|
||||
try:
|
||||
# Make sure the index is there first.
|
||||
self.conn.indices.create(index=self.index_name, body=self.DEFAULT_SETTINGS, ignore=400)
|
||||
self.conn.indices.put_mapping(index=self.index_name, doc_type='modelresult', body=current_mapping)
|
||||
self.existing_mapping = current_mapping
|
||||
except Exception:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
self.setup_complete = True
|
||||
|
||||
def update(self, index, iterable, commit=True):
|
||||
if not self.setup_complete:
|
||||
try:
|
||||
self.setup()
|
||||
except elasticsearch.TransportError as e:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
self.log.error("Failed to add documents to Elasticsearch: %s", e)
|
||||
return
|
||||
|
||||
prepped_docs = []
|
||||
|
||||
for obj in iterable:
|
||||
try:
|
||||
prepped_data = index.full_prepare(obj)
|
||||
final_data = {}
|
||||
|
||||
# Convert the data to make sure it's happy.
|
||||
for key, value in prepped_data.items():
|
||||
final_data[key] = self._from_python(value)
|
||||
final_data['_id'] = final_data[ID]
|
||||
|
||||
prepped_docs.append(final_data)
|
||||
except SkipDocument:
|
||||
self.log.debug(u"Indexing for object `%s` skipped", obj)
|
||||
except elasticsearch.TransportError as e:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
# We'll log the object identifier but won't include the actual object
|
||||
# to avoid the possibility of that generating encoding errors while
|
||||
# processing the log message:
|
||||
self.log.error(u"%s while preparing object for update" % e.__class__.__name__, exc_info=True, extra={
|
||||
"data": {
|
||||
"index": index,
|
||||
"object": get_identifier(obj)
|
||||
}
|
||||
})
|
||||
|
||||
bulk_index(self.conn, prepped_docs, index=self.index_name, doc_type='modelresult')
|
||||
|
||||
if commit:
|
||||
self.conn.indices.refresh(index=self.index_name)
|
||||
|
||||
def remove(self, obj_or_string, commit=True):
|
||||
doc_id = get_identifier(obj_or_string)
|
||||
|
||||
if not self.setup_complete:
|
||||
try:
|
||||
self.setup()
|
||||
except elasticsearch.TransportError as e:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
self.log.error("Failed to remove document '%s' from Elasticsearch: %s", doc_id, e)
|
||||
return
|
||||
|
||||
try:
|
||||
self.conn.delete(index=self.index_name, doc_type='modelresult', id=doc_id, ignore=404)
|
||||
|
||||
if commit:
|
||||
self.conn.indices.refresh(index=self.index_name)
|
||||
except elasticsearch.TransportError as e:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
self.log.error("Failed to remove document '%s' from Elasticsearch: %s", doc_id, e)
|
||||
|
||||
def clear(self, models=[], commit=True):
|
||||
# We actually don't want to do this here, as mappings could be
|
||||
# very different.
|
||||
# if not self.setup_complete:
|
||||
# self.setup()
|
||||
|
||||
try:
|
||||
if not models:
|
||||
self.conn.indices.delete(index=self.index_name, ignore=404)
|
||||
self.setup_complete = False
|
||||
self.existing_mapping = {}
|
||||
else:
|
||||
models_to_delete = []
|
||||
|
||||
for model in models:
|
||||
models_to_delete.append("%s:%s" % (DJANGO_CT, get_model_ct(model)))
|
||||
|
||||
# Delete by query in Elasticsearch asssumes you're dealing with
|
||||
# a ``query`` root object. :/
|
||||
query = {'query': {'query_string': {'query': " OR ".join(models_to_delete)}}}
|
||||
self.conn.delete_by_query(index=self.index_name, doc_type='modelresult', body=query)
|
||||
except elasticsearch.TransportError as e:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
if len(models):
|
||||
self.log.error("Failed to clear Elasticsearch index of models '%s': %s", ','.join(models_to_delete), e)
|
||||
else:
|
||||
self.log.error("Failed to clear Elasticsearch index: %s", e)
|
||||
|
||||
def build_search_kwargs(self, query_string, sort_by=None, start_offset=0, end_offset=None,
|
||||
fields='', highlight=False, facets=None,
|
||||
date_facets=None, query_facets=None,
|
||||
narrow_queries=None, spelling_query=None,
|
||||
within=None, dwithin=None, distance_point=None,
|
||||
models=None, limit_to_registered_models=None,
|
||||
result_class=None):
|
||||
index = haystack.connections[self.connection_alias].get_unified_index()
|
||||
content_field = index.document_field
|
||||
|
||||
if query_string == '*:*':
|
||||
kwargs = {
|
||||
'query': {
|
||||
"match_all": {}
|
||||
},
|
||||
}
|
||||
else:
|
||||
kwargs = {
|
||||
'query': {
|
||||
'query_string': {
|
||||
'default_field': content_field,
|
||||
'default_operator': DEFAULT_OPERATOR,
|
||||
'query': query_string,
|
||||
'analyze_wildcard': True,
|
||||
'auto_generate_phrase_queries': True,
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
# so far, no filters
|
||||
filters = []
|
||||
|
||||
if fields:
|
||||
if isinstance(fields, (list, set)):
|
||||
fields = " ".join(fields)
|
||||
|
||||
kwargs['fields'] = fields
|
||||
|
||||
if sort_by is not None:
|
||||
order_list = []
|
||||
for field, direction in sort_by:
|
||||
if field == 'distance' and distance_point:
|
||||
# Do the geo-enabled sort.
|
||||
lng, lat = distance_point['point'].get_coords()
|
||||
sort_kwargs = {
|
||||
"_geo_distance": {
|
||||
distance_point['field']: [lng, lat],
|
||||
"order": direction,
|
||||
"unit": "km"
|
||||
}
|
||||
}
|
||||
else:
|
||||
if field == 'distance':
|
||||
warnings.warn("In order to sort by distance, you must call the '.distance(...)' method.")
|
||||
|
||||
# Regular sorting.
|
||||
sort_kwargs = {field: {'order': direction}}
|
||||
|
||||
order_list.append(sort_kwargs)
|
||||
|
||||
kwargs['sort'] = order_list
|
||||
|
||||
# From/size offsets don't seem to work right in Elasticsearch's DSL. :/
|
||||
# if start_offset is not None:
|
||||
# kwargs['from'] = start_offset
|
||||
|
||||
# if end_offset is not None:
|
||||
# kwargs['size'] = end_offset - start_offset
|
||||
|
||||
if highlight is True:
|
||||
kwargs['highlight'] = {
|
||||
'fields': {
|
||||
content_field: {'store': 'yes'},
|
||||
}
|
||||
}
|
||||
|
||||
if self.include_spelling:
|
||||
kwargs['suggest'] = {
|
||||
'suggest': {
|
||||
'text': spelling_query or query_string,
|
||||
'term': {
|
||||
# Using content_field here will result in suggestions of stemmed words.
|
||||
'field': '_all',
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
if narrow_queries is None:
|
||||
narrow_queries = set()
|
||||
|
||||
if facets is not None:
|
||||
kwargs.setdefault('facets', {})
|
||||
|
||||
for facet_fieldname, extra_options in facets.items():
|
||||
facet_options = {
|
||||
'terms': {
|
||||
'field': facet_fieldname,
|
||||
'size': 100,
|
||||
},
|
||||
}
|
||||
# Special cases for options applied at the facet level (not the terms level).
|
||||
if extra_options.pop('global_scope', False):
|
||||
# Renamed "global_scope" since "global" is a python keyword.
|
||||
facet_options['global'] = True
|
||||
if 'facet_filter' in extra_options:
|
||||
facet_options['facet_filter'] = extra_options.pop('facet_filter')
|
||||
facet_options['terms'].update(extra_options)
|
||||
kwargs['facets'][facet_fieldname] = facet_options
|
||||
|
||||
if date_facets is not None:
|
||||
kwargs.setdefault('facets', {})
|
||||
|
||||
for facet_fieldname, value in date_facets.items():
|
||||
# Need to detect on gap_by & only add amount if it's more than one.
|
||||
interval = value.get('gap_by').lower()
|
||||
|
||||
# Need to detect on amount (can't be applied on months or years).
|
||||
if value.get('gap_amount', 1) != 1 and interval not in ('month', 'year'):
|
||||
# Just the first character is valid for use.
|
||||
interval = "%s%s" % (value['gap_amount'], interval[:1])
|
||||
|
||||
kwargs['facets'][facet_fieldname] = {
|
||||
'date_histogram': {
|
||||
'field': facet_fieldname,
|
||||
'interval': interval,
|
||||
},
|
||||
'facet_filter': {
|
||||
"range": {
|
||||
facet_fieldname: {
|
||||
'from': self._from_python(value.get('start_date')),
|
||||
'to': self._from_python(value.get('end_date')),
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if query_facets is not None:
|
||||
kwargs.setdefault('facets', {})
|
||||
|
||||
for facet_fieldname, value in query_facets:
|
||||
kwargs['facets'][facet_fieldname] = {
|
||||
'query': {
|
||||
'query_string': {
|
||||
'query': value,
|
||||
}
|
||||
},
|
||||
}
|
||||
|
||||
if limit_to_registered_models is None:
|
||||
limit_to_registered_models = getattr(settings, 'HAYSTACK_LIMIT_TO_REGISTERED_MODELS', True)
|
||||
|
||||
if models and len(models):
|
||||
model_choices = sorted(get_model_ct(model) for model in models)
|
||||
elif limit_to_registered_models:
|
||||
# Using narrow queries, limit the results to only models handled
|
||||
# with the current routers.
|
||||
model_choices = self.build_models_list()
|
||||
else:
|
||||
model_choices = []
|
||||
|
||||
if len(model_choices) > 0:
|
||||
filters.append({"terms": {DJANGO_CT: model_choices}})
|
||||
|
||||
for q in narrow_queries:
|
||||
filters.append({
|
||||
'fquery': {
|
||||
'query': {
|
||||
'query_string': {
|
||||
'query': q
|
||||
},
|
||||
},
|
||||
'_cache': True,
|
||||
}
|
||||
})
|
||||
|
||||
if within is not None:
|
||||
from haystack.utils.geo import generate_bounding_box
|
||||
|
||||
((south, west), (north, east)) = generate_bounding_box(within['point_1'], within['point_2'])
|
||||
within_filter = {
|
||||
"geo_bounding_box": {
|
||||
within['field']: {
|
||||
"top_left": {
|
||||
"lat": north,
|
||||
"lon": west
|
||||
},
|
||||
"bottom_right": {
|
||||
"lat": south,
|
||||
"lon": east
|
||||
}
|
||||
}
|
||||
},
|
||||
}
|
||||
filters.append(within_filter)
|
||||
|
||||
if dwithin is not None:
|
||||
lng, lat = dwithin['point'].get_coords()
|
||||
|
||||
# NB: the 1.0.0 release of elasticsearch introduce an
|
||||
# incompatible change on the distance filter formating
|
||||
if elasticsearch.VERSION >= (1, 0, 0):
|
||||
distance = "%(dist).6f%(unit)s" % {
|
||||
'dist': dwithin['distance'].km,
|
||||
'unit': "km"
|
||||
}
|
||||
else:
|
||||
distance = dwithin['distance'].km
|
||||
|
||||
dwithin_filter = {
|
||||
"geo_distance": {
|
||||
"distance": distance,
|
||||
dwithin['field']: {
|
||||
"lat": lat,
|
||||
"lon": lng
|
||||
}
|
||||
}
|
||||
}
|
||||
filters.append(dwithin_filter)
|
||||
|
||||
# if we want to filter, change the query type to filteres
|
||||
if filters:
|
||||
kwargs["query"] = {"filtered": {"query": kwargs.pop("query")}}
|
||||
if len(filters) == 1:
|
||||
kwargs['query']['filtered']["filter"] = filters[0]
|
||||
else:
|
||||
kwargs['query']['filtered']["filter"] = {"bool": {"must": filters}}
|
||||
|
||||
return kwargs
|
||||
|
||||
@log_query
|
||||
def search(self, query_string, **kwargs):
|
||||
if len(query_string) == 0:
|
||||
return {
|
||||
'results': [],
|
||||
'hits': 0,
|
||||
}
|
||||
|
||||
if not self.setup_complete:
|
||||
self.setup()
|
||||
|
||||
search_kwargs = self.build_search_kwargs(query_string, **kwargs)
|
||||
search_kwargs['from'] = kwargs.get('start_offset', 0)
|
||||
|
||||
order_fields = set()
|
||||
for order in search_kwargs.get('sort', []):
|
||||
for key in order.keys():
|
||||
order_fields.add(key)
|
||||
|
||||
geo_sort = '_geo_distance' in order_fields
|
||||
|
||||
end_offset = kwargs.get('end_offset')
|
||||
start_offset = kwargs.get('start_offset', 0)
|
||||
if end_offset is not None and end_offset > start_offset:
|
||||
search_kwargs['size'] = end_offset - start_offset
|
||||
|
||||
try:
|
||||
raw_results = self.conn.search(body=search_kwargs,
|
||||
index=self.index_name,
|
||||
doc_type='modelresult',
|
||||
_source=True)
|
||||
except elasticsearch.TransportError as e:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
self.log.error("Failed to query Elasticsearch using '%s': %s", query_string, e)
|
||||
raw_results = {}
|
||||
|
||||
return self._process_results(raw_results,
|
||||
highlight=kwargs.get('highlight'),
|
||||
result_class=kwargs.get('result_class', SearchResult),
|
||||
distance_point=kwargs.get('distance_point'),
|
||||
geo_sort=geo_sort)
|
||||
|
||||
def more_like_this(self, model_instance, additional_query_string=None,
|
||||
start_offset=0, end_offset=None, models=None,
|
||||
limit_to_registered_models=None, result_class=None, **kwargs):
|
||||
from haystack import connections
|
||||
|
||||
if not self.setup_complete:
|
||||
self.setup()
|
||||
|
||||
# Deferred models will have a different class ("RealClass_Deferred_fieldname")
|
||||
# which won't be in our registry:
|
||||
model_klass = model_instance._meta.concrete_model
|
||||
|
||||
index = connections[self.connection_alias].get_unified_index().get_index(model_klass)
|
||||
field_name = index.get_content_field()
|
||||
params = {}
|
||||
|
||||
if start_offset is not None:
|
||||
params['search_from'] = start_offset
|
||||
|
||||
if end_offset is not None:
|
||||
params['search_size'] = end_offset - start_offset
|
||||
|
||||
doc_id = get_identifier(model_instance)
|
||||
|
||||
try:
|
||||
raw_results = self.conn.mlt(index=self.index_name, doc_type='modelresult', id=doc_id, mlt_fields=[field_name], **params)
|
||||
except elasticsearch.TransportError as e:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
self.log.error("Failed to fetch More Like This from Elasticsearch for document '%s': %s", doc_id, e)
|
||||
raw_results = {}
|
||||
|
||||
return self._process_results(raw_results, result_class=result_class)
|
||||
|
||||
def _process_results(self, raw_results, highlight=False,
|
||||
result_class=None, distance_point=None,
|
||||
geo_sort=False):
|
||||
from haystack import connections
|
||||
results = []
|
||||
hits = raw_results.get('hits', {}).get('total', 0)
|
||||
facets = {}
|
||||
spelling_suggestion = None
|
||||
|
||||
if result_class is None:
|
||||
result_class = SearchResult
|
||||
|
||||
if self.include_spelling and 'suggest' in raw_results:
|
||||
raw_suggest = raw_results['suggest'].get('suggest')
|
||||
if raw_suggest:
|
||||
spelling_suggestion = ' '.join([word['text'] if len(word['options']) == 0 else word['options'][0]['text'] for word in raw_suggest])
|
||||
|
||||
if 'facets' in raw_results:
|
||||
facets = {
|
||||
'fields': {},
|
||||
'dates': {},
|
||||
'queries': {},
|
||||
}
|
||||
|
||||
for facet_fieldname, facet_info in raw_results['facets'].items():
|
||||
if facet_info.get('_type', 'terms') == 'terms':
|
||||
facets['fields'][facet_fieldname] = [(individual['term'], individual['count']) for individual in facet_info['terms']]
|
||||
elif facet_info.get('_type', 'terms') == 'date_histogram':
|
||||
# Elasticsearch provides UTC timestamps with an extra three
|
||||
# decimals of precision, which datetime barfs on.
|
||||
facets['dates'][facet_fieldname] = [(datetime.datetime.utcfromtimestamp(individual['time'] / 1000), individual['count']) for individual in facet_info['entries']]
|
||||
elif facet_info.get('_type', 'terms') == 'query':
|
||||
facets['queries'][facet_fieldname] = facet_info['count']
|
||||
|
||||
unified_index = connections[self.connection_alias].get_unified_index()
|
||||
indexed_models = unified_index.get_indexed_models()
|
||||
content_field = unified_index.document_field
|
||||
|
||||
for raw_result in raw_results.get('hits', {}).get('hits', []):
|
||||
source = raw_result['_source']
|
||||
app_label, model_name = source[DJANGO_CT].split('.')
|
||||
additional_fields = {}
|
||||
model = haystack_get_model(app_label, model_name)
|
||||
|
||||
if model and model in indexed_models:
|
||||
for key, value in source.items():
|
||||
index = unified_index.get_index(model)
|
||||
string_key = str(key)
|
||||
|
||||
if string_key in index.fields and hasattr(index.fields[string_key], 'convert'):
|
||||
additional_fields[string_key] = index.fields[string_key].convert(value)
|
||||
else:
|
||||
additional_fields[string_key] = self._to_python(value)
|
||||
|
||||
del(additional_fields[DJANGO_CT])
|
||||
del(additional_fields[DJANGO_ID])
|
||||
|
||||
if 'highlight' in raw_result:
|
||||
additional_fields['highlighted'] = raw_result['highlight'].get(content_field, '')
|
||||
|
||||
if distance_point:
|
||||
additional_fields['_point_of_origin'] = distance_point
|
||||
|
||||
if geo_sort and raw_result.get('sort'):
|
||||
from haystack.utils.geo import Distance
|
||||
additional_fields['_distance'] = Distance(km=float(raw_result['sort'][0]))
|
||||
else:
|
||||
additional_fields['_distance'] = None
|
||||
|
||||
result = result_class(app_label, model_name, source[DJANGO_ID], raw_result['_score'], **additional_fields)
|
||||
results.append(result)
|
||||
else:
|
||||
hits -= 1
|
||||
|
||||
return {
|
||||
'results': results,
|
||||
'hits': hits,
|
||||
'facets': facets,
|
||||
'spelling_suggestion': spelling_suggestion,
|
||||
}
|
||||
|
||||
def build_schema(self, fields):
|
||||
content_field_name = ''
|
||||
mapping = {
|
||||
DJANGO_CT: {'type': 'string', 'index': 'not_analyzed', 'include_in_all': False},
|
||||
DJANGO_ID: {'type': 'string', 'index': 'not_analyzed', 'include_in_all': False},
|
||||
}
|
||||
|
||||
for field_name, field_class in fields.items():
|
||||
field_mapping = FIELD_MAPPINGS.get(field_class.field_type, DEFAULT_FIELD_MAPPING).copy()
|
||||
if field_class.boost != 1.0:
|
||||
field_mapping['boost'] = field_class.boost
|
||||
|
||||
if field_class.document is True:
|
||||
content_field_name = field_class.index_fieldname
|
||||
|
||||
# Do this last to override `text` fields.
|
||||
if field_mapping['type'] == 'string':
|
||||
if field_class.indexed is False or hasattr(field_class, 'facet_for'):
|
||||
field_mapping['index'] = 'not_analyzed'
|
||||
del field_mapping['analyzer']
|
||||
|
||||
mapping[field_class.index_fieldname] = field_mapping
|
||||
|
||||
return (content_field_name, mapping)
|
||||
|
||||
def _iso_datetime(self, value):
|
||||
"""
|
||||
If value appears to be something datetime-like, return it in ISO format.
|
||||
|
||||
Otherwise, return None.
|
||||
"""
|
||||
if hasattr(value, 'strftime'):
|
||||
if hasattr(value, 'hour'):
|
||||
return value.isoformat()
|
||||
else:
|
||||
return '%sT00:00:00' % value.isoformat()
|
||||
|
||||
def _from_python(self, value):
|
||||
"""Convert more Python data types to ES-understandable JSON."""
|
||||
iso = self._iso_datetime(value)
|
||||
if iso:
|
||||
return iso
|
||||
elif isinstance(value, six.binary_type):
|
||||
# TODO: Be stricter.
|
||||
return six.text_type(value, errors='replace')
|
||||
elif isinstance(value, set):
|
||||
return list(value)
|
||||
return value
|
||||
|
||||
def _to_python(self, value):
|
||||
"""Convert values from ElasticSearch to native Python values."""
|
||||
if isinstance(value, (int, float, complex, list, tuple, bool)):
|
||||
return value
|
||||
|
||||
if isinstance(value, six.string_types):
|
||||
possible_datetime = DATETIME_REGEX.search(value)
|
||||
|
||||
if possible_datetime:
|
||||
date_values = possible_datetime.groupdict()
|
||||
|
||||
for dk, dv in date_values.items():
|
||||
date_values[dk] = int(dv)
|
||||
|
||||
return datetime.datetime(
|
||||
date_values['year'], date_values['month'],
|
||||
date_values['day'], date_values['hour'],
|
||||
date_values['minute'], date_values['second'])
|
||||
|
||||
try:
|
||||
# This is slightly gross but it's hard to tell otherwise what the
|
||||
# string's original type might have been. Be careful who you trust.
|
||||
converted_value = eval(value)
|
||||
|
||||
# Try to handle most built-in types.
|
||||
if isinstance(
|
||||
converted_value,
|
||||
(int, list, tuple, set, dict, float, complex)):
|
||||
return converted_value
|
||||
except Exception:
|
||||
# If it fails (SyntaxError or its ilk) or we don't trust it,
|
||||
# continue on.
|
||||
pass
|
||||
|
||||
return value
|
||||
|
||||
# DRL_FIXME: Perhaps move to something where, if none of these
|
||||
# match, call a custom method on the form that returns, per-backend,
|
||||
# the right type of storage?
|
||||
DEFAULT_FIELD_MAPPING = {'type': 'string', 'analyzer': 'snowball'}
|
||||
FIELD_MAPPINGS = {
|
||||
'edge_ngram': {'type': 'string', 'analyzer': 'edgengram_analyzer'},
|
||||
'ngram': {'type': 'string', 'analyzer': 'ngram_analyzer'},
|
||||
'date': {'type': 'date'},
|
||||
'datetime': {'type': 'date'},
|
||||
|
||||
'location': {'type': 'geo_point'},
|
||||
'boolean': {'type': 'boolean'},
|
||||
'float': {'type': 'float'},
|
||||
'long': {'type': 'long'},
|
||||
'integer': {'type': 'long'},
|
||||
}
|
||||
|
||||
|
||||
# Sucks that this is almost an exact copy of what's in the Solr backend,
|
||||
# but we can't import due to dependencies.
|
||||
class ElasticsearchSearchQuery(BaseSearchQuery):
|
||||
def matching_all_fragment(self):
|
||||
return '*:*'
|
||||
|
||||
def build_query_fragment(self, field, filter_type, value):
|
||||
from haystack import connections
|
||||
query_frag = ''
|
||||
|
||||
if not hasattr(value, 'input_type_name'):
|
||||
# Handle when we've got a ``ValuesListQuerySet``...
|
||||
if hasattr(value, 'values_list'):
|
||||
value = list(value)
|
||||
|
||||
if isinstance(value, six.string_types):
|
||||
# It's not an ``InputType``. Assume ``Clean``.
|
||||
value = Clean(value)
|
||||
else:
|
||||
value = PythonData(value)
|
||||
|
||||
# Prepare the query using the InputType.
|
||||
prepared_value = value.prepare(self)
|
||||
|
||||
if not isinstance(prepared_value, (set, list, tuple)):
|
||||
# Then convert whatever we get back to what pysolr wants if needed.
|
||||
prepared_value = self.backend._from_python(prepared_value)
|
||||
|
||||
# 'content' is a special reserved word, much like 'pk' in
|
||||
# Django's ORM layer. It indicates 'no special field'.
|
||||
if field == 'content':
|
||||
index_fieldname = ''
|
||||
else:
|
||||
index_fieldname = u'%s:' % connections[self._using].get_unified_index().get_index_fieldname(field)
|
||||
|
||||
filter_types = {
|
||||
'contains': u'%s',
|
||||
'startswith': u'%s*',
|
||||
'exact': u'%s',
|
||||
'gt': u'{%s TO *}',
|
||||
'gte': u'[%s TO *]',
|
||||
'lt': u'{* TO %s}',
|
||||
'lte': u'[* TO %s]',
|
||||
}
|
||||
|
||||
if value.post_process is False:
|
||||
query_frag = prepared_value
|
||||
else:
|
||||
if filter_type in ['contains', 'startswith']:
|
||||
if value.input_type_name == 'exact':
|
||||
query_frag = prepared_value
|
||||
else:
|
||||
# Iterate over terms & incorportate the converted form of each into the query.
|
||||
terms = []
|
||||
|
||||
if isinstance(prepared_value, six.string_types):
|
||||
for possible_value in prepared_value.split(' '):
|
||||
terms.append(filter_types[filter_type] % self.backend._from_python(possible_value))
|
||||
else:
|
||||
terms.append(filter_types[filter_type] % self.backend._from_python(prepared_value))
|
||||
|
||||
if len(terms) == 1:
|
||||
query_frag = terms[0]
|
||||
else:
|
||||
query_frag = u"(%s)" % " AND ".join(terms)
|
||||
elif filter_type == 'in':
|
||||
in_options = []
|
||||
|
||||
for possible_value in prepared_value:
|
||||
in_options.append(u'"%s"' % self.backend._from_python(possible_value))
|
||||
|
||||
query_frag = u"(%s)" % " OR ".join(in_options)
|
||||
elif filter_type == 'range':
|
||||
start = self.backend._from_python(prepared_value[0])
|
||||
end = self.backend._from_python(prepared_value[1])
|
||||
query_frag = u'["%s" TO "%s"]' % (start, end)
|
||||
elif filter_type == 'exact':
|
||||
if value.input_type_name == 'exact':
|
||||
query_frag = prepared_value
|
||||
else:
|
||||
prepared_value = Exact(prepared_value).prepare(self)
|
||||
query_frag = filter_types[filter_type] % prepared_value
|
||||
else:
|
||||
if value.input_type_name != 'exact':
|
||||
prepared_value = Exact(prepared_value).prepare(self)
|
||||
|
||||
query_frag = filter_types[filter_type] % prepared_value
|
||||
|
||||
if len(query_frag) and not isinstance(value, Raw):
|
||||
if not query_frag.startswith('(') and not query_frag.endswith(')'):
|
||||
query_frag = "(%s)" % query_frag
|
||||
|
||||
return u"%s%s" % (index_fieldname, query_frag)
|
||||
|
||||
def build_alt_parser_query(self, parser_name, query_string='', **kwargs):
|
||||
if query_string:
|
||||
kwargs['v'] = query_string
|
||||
|
||||
kwarg_bits = []
|
||||
|
||||
for key in sorted(kwargs.keys()):
|
||||
if isinstance(kwargs[key], six.string_types) and ' ' in kwargs[key]:
|
||||
kwarg_bits.append(u"%s='%s'" % (key, kwargs[key]))
|
||||
else:
|
||||
kwarg_bits.append(u"%s=%s" % (key, kwargs[key]))
|
||||
|
||||
return u"{!%s %s}" % (parser_name, ' '.join(kwarg_bits))
|
||||
|
||||
def build_params(self, spelling_query=None, **kwargs):
|
||||
search_kwargs = {
|
||||
'start_offset': self.start_offset,
|
||||
'result_class': self.result_class
|
||||
}
|
||||
order_by_list = None
|
||||
|
||||
if self.order_by:
|
||||
if order_by_list is None:
|
||||
order_by_list = []
|
||||
|
||||
for field in self.order_by:
|
||||
direction = 'asc'
|
||||
if field.startswith('-'):
|
||||
direction = 'desc'
|
||||
field = field[1:]
|
||||
order_by_list.append((field, direction))
|
||||
|
||||
search_kwargs['sort_by'] = order_by_list
|
||||
|
||||
if self.date_facets:
|
||||
search_kwargs['date_facets'] = self.date_facets
|
||||
|
||||
if self.distance_point:
|
||||
search_kwargs['distance_point'] = self.distance_point
|
||||
|
||||
if self.dwithin:
|
||||
search_kwargs['dwithin'] = self.dwithin
|
||||
|
||||
if self.end_offset is not None:
|
||||
search_kwargs['end_offset'] = self.end_offset
|
||||
|
||||
if self.facets:
|
||||
search_kwargs['facets'] = self.facets
|
||||
|
||||
if self.fields:
|
||||
search_kwargs['fields'] = self.fields
|
||||
|
||||
if self.highlight:
|
||||
search_kwargs['highlight'] = self.highlight
|
||||
|
||||
if self.models:
|
||||
search_kwargs['models'] = self.models
|
||||
|
||||
if self.narrow_queries:
|
||||
search_kwargs['narrow_queries'] = self.narrow_queries
|
||||
|
||||
if self.query_facets:
|
||||
search_kwargs['query_facets'] = self.query_facets
|
||||
|
||||
if self.within:
|
||||
search_kwargs['within'] = self.within
|
||||
|
||||
if spelling_query:
|
||||
search_kwargs['spelling_query'] = spelling_query
|
||||
|
||||
return search_kwargs
|
||||
|
||||
def run(self, spelling_query=None, **kwargs):
|
||||
"""Builds and executes the query. Returns a list of search results."""
|
||||
final_query = self.build_query()
|
||||
search_kwargs = self.build_params(spelling_query, **kwargs)
|
||||
|
||||
if kwargs:
|
||||
search_kwargs.update(kwargs)
|
||||
|
||||
results = self.backend.search(final_query, **search_kwargs)
|
||||
self._results = results.get('results', [])
|
||||
self._hit_count = results.get('hits', 0)
|
||||
self._facet_counts = self.post_process_facets(results)
|
||||
self._spelling_suggestion = results.get('spelling_suggestion', None)
|
||||
|
||||
def run_mlt(self, **kwargs):
|
||||
"""Builds and executes the query. Returns a list of search results."""
|
||||
if self._more_like_this is False or self._mlt_instance is None:
|
||||
raise MoreLikeThisError("No instance was provided to determine 'More Like This' results.")
|
||||
|
||||
additional_query_string = self.build_query()
|
||||
search_kwargs = {
|
||||
'start_offset': self.start_offset,
|
||||
'result_class': self.result_class,
|
||||
'models': self.models
|
||||
}
|
||||
|
||||
if self.end_offset is not None:
|
||||
search_kwargs['end_offset'] = self.end_offset - self.start_offset
|
||||
|
||||
results = self.backend.more_like_this(self._mlt_instance, additional_query_string, **search_kwargs)
|
||||
self._results = results.get('results', [])
|
||||
self._hit_count = results.get('hits', 0)
|
||||
|
||||
|
||||
class ElasticsearchSearchEngine(BaseEngine):
|
||||
backend = ElasticsearchSearchBackend
|
||||
query = ElasticsearchSearchQuery
|
|
@ -0,0 +1,135 @@
|
|||
# encoding: utf-8
|
||||
"""
|
||||
A very basic, ORM-based backend for simple search during tests.
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
from warnings import warn
|
||||
|
||||
from django.conf import settings
|
||||
from django.db.models import Q
|
||||
from django.utils import six
|
||||
|
||||
from haystack import connections
|
||||
from haystack.backends import BaseEngine, BaseSearchBackend, BaseSearchQuery, log_query, SearchNode
|
||||
from haystack.inputs import PythonData
|
||||
from haystack.models import SearchResult
|
||||
from haystack.utils import get_model_ct_tuple
|
||||
|
||||
if settings.DEBUG:
|
||||
import logging
|
||||
|
||||
class NullHandler(logging.Handler):
|
||||
def emit(self, record):
|
||||
pass
|
||||
|
||||
ch = logging.StreamHandler()
|
||||
ch.setLevel(logging.WARNING)
|
||||
ch.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
|
||||
|
||||
logger = logging.getLogger('haystack.simple_backend')
|
||||
logger.setLevel(logging.WARNING)
|
||||
logger.addHandler(NullHandler())
|
||||
logger.addHandler(ch)
|
||||
else:
|
||||
logger = None
|
||||
|
||||
|
||||
class SimpleSearchBackend(BaseSearchBackend):
|
||||
def update(self, indexer, iterable, commit=True):
|
||||
warn('update is not implemented in this backend')
|
||||
|
||||
def remove(self, obj, commit=True):
|
||||
warn('remove is not implemented in this backend')
|
||||
|
||||
def clear(self, models=[], commit=True):
|
||||
warn('clear is not implemented in this backend')
|
||||
|
||||
@log_query
|
||||
def search(self, query_string, **kwargs):
|
||||
hits = 0
|
||||
results = []
|
||||
result_class = SearchResult
|
||||
models = connections[self.connection_alias].get_unified_index().get_indexed_models()
|
||||
|
||||
if kwargs.get('result_class'):
|
||||
result_class = kwargs['result_class']
|
||||
|
||||
if kwargs.get('models'):
|
||||
models = kwargs['models']
|
||||
|
||||
if query_string:
|
||||
for model in models:
|
||||
if query_string == '*':
|
||||
qs = model.objects.all()
|
||||
else:
|
||||
for term in query_string.split():
|
||||
queries = []
|
||||
|
||||
for field in model._meta.fields:
|
||||
if hasattr(field, 'related'):
|
||||
continue
|
||||
|
||||
if not field.get_internal_type() in ('TextField', 'CharField', 'SlugField'):
|
||||
continue
|
||||
|
||||
queries.append(Q(**{'%s__icontains' % field.name: term}))
|
||||
|
||||
qs = model.objects.filter(six.moves.reduce(lambda x, y: x | y, queries))
|
||||
|
||||
hits += len(qs)
|
||||
|
||||
for match in qs:
|
||||
match.__dict__.pop('score', None)
|
||||
app_label, model_name = get_model_ct_tuple(match)
|
||||
result = result_class(app_label, model_name, match.pk, 0, **match.__dict__)
|
||||
# For efficiency.
|
||||
result._model = match.__class__
|
||||
result._object = match
|
||||
results.append(result)
|
||||
|
||||
return {
|
||||
'results': results,
|
||||
'hits': hits,
|
||||
}
|
||||
|
||||
def prep_value(self, db_field, value):
|
||||
return value
|
||||
|
||||
def more_like_this(self, model_instance, additional_query_string=None,
|
||||
start_offset=0, end_offset=None,
|
||||
limit_to_registered_models=None, result_class=None, **kwargs):
|
||||
return {
|
||||
'results': [],
|
||||
'hits': 0
|
||||
}
|
||||
|
||||
|
||||
class SimpleSearchQuery(BaseSearchQuery):
|
||||
def build_query(self):
|
||||
if not self.query_filter:
|
||||
return '*'
|
||||
|
||||
return self._build_sub_query(self.query_filter)
|
||||
|
||||
def _build_sub_query(self, search_node):
|
||||
term_list = []
|
||||
|
||||
for child in search_node.children:
|
||||
if isinstance(child, SearchNode):
|
||||
term_list.append(self._build_sub_query(child))
|
||||
else:
|
||||
value = child[1]
|
||||
|
||||
if not hasattr(value, 'input_type_name'):
|
||||
value = PythonData(value)
|
||||
|
||||
term_list.append(value.prepare(self))
|
||||
|
||||
return (' ').join(map(six.text_type, term_list))
|
||||
|
||||
|
||||
class SimpleEngine(BaseEngine):
|
||||
backend = SimpleSearchBackend
|
||||
query = SimpleSearchQuery
|
|
@ -0,0 +1,718 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
import warnings
|
||||
|
||||
from django.conf import settings
|
||||
from django.core.exceptions import ImproperlyConfigured
|
||||
from django.utils import six
|
||||
|
||||
from haystack.backends import BaseEngine, BaseSearchBackend, BaseSearchQuery, EmptyResults, log_query
|
||||
from haystack.constants import DJANGO_CT, DJANGO_ID, ID
|
||||
from haystack.exceptions import MissingDependency, MoreLikeThisError, SkipDocument
|
||||
from haystack.inputs import Clean, Exact, PythonData, Raw
|
||||
from haystack.models import SearchResult
|
||||
from haystack.utils import log as logging
|
||||
from haystack.utils import get_identifier, get_model_ct
|
||||
from haystack.utils.app_loading import haystack_get_model
|
||||
|
||||
try:
|
||||
from pysolr import Solr, SolrError
|
||||
except ImportError:
|
||||
raise MissingDependency("The 'solr' backend requires the installation of 'pysolr'. Please refer to the documentation.")
|
||||
|
||||
|
||||
class SolrSearchBackend(BaseSearchBackend):
|
||||
# Word reserved by Solr for special use.
|
||||
RESERVED_WORDS = (
|
||||
'AND',
|
||||
'NOT',
|
||||
'OR',
|
||||
'TO',
|
||||
)
|
||||
|
||||
# Characters reserved by Solr for special use.
|
||||
# The '\\' must come first, so as not to overwrite the other slash replacements.
|
||||
RESERVED_CHARACTERS = (
|
||||
'\\', '+', '-', '&&', '||', '!', '(', ')', '{', '}',
|
||||
'[', ']', '^', '"', '~', '*', '?', ':', '/',
|
||||
)
|
||||
|
||||
def __init__(self, connection_alias, **connection_options):
|
||||
super(SolrSearchBackend, self).__init__(connection_alias, **connection_options)
|
||||
|
||||
if not 'URL' in connection_options:
|
||||
raise ImproperlyConfigured("You must specify a 'URL' in your settings for connection '%s'." % connection_alias)
|
||||
|
||||
self.conn = Solr(connection_options['URL'], timeout=self.timeout, **connection_options.get('KWARGS', {}))
|
||||
self.log = logging.getLogger('haystack')
|
||||
|
||||
def update(self, index, iterable, commit=True):
|
||||
docs = []
|
||||
|
||||
for obj in iterable:
|
||||
try:
|
||||
docs.append(index.full_prepare(obj))
|
||||
except SkipDocument:
|
||||
self.log.debug(u"Indexing for object `%s` skipped", obj)
|
||||
except UnicodeDecodeError:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
# We'll log the object identifier but won't include the actual object
|
||||
# to avoid the possibility of that generating encoding errors while
|
||||
# processing the log message:
|
||||
self.log.error(u"UnicodeDecodeError while preparing object for update", exc_info=True, extra={
|
||||
"data": {
|
||||
"index": index,
|
||||
"object": get_identifier(obj)
|
||||
}
|
||||
})
|
||||
|
||||
if len(docs) > 0:
|
||||
try:
|
||||
self.conn.add(docs, commit=commit, boost=index.get_field_weights())
|
||||
except (IOError, SolrError) as e:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
self.log.error("Failed to add documents to Solr: %s", e)
|
||||
|
||||
def remove(self, obj_or_string, commit=True):
|
||||
solr_id = get_identifier(obj_or_string)
|
||||
|
||||
try:
|
||||
kwargs = {
|
||||
'commit': commit,
|
||||
'id': solr_id
|
||||
}
|
||||
self.conn.delete(**kwargs)
|
||||
except (IOError, SolrError) as e:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
self.log.error("Failed to remove document '%s' from Solr: %s", solr_id, e)
|
||||
|
||||
def clear(self, models=[], commit=True):
|
||||
try:
|
||||
if not models:
|
||||
# *:* matches all docs in Solr
|
||||
self.conn.delete(q='*:*', commit=commit)
|
||||
else:
|
||||
models_to_delete = []
|
||||
|
||||
for model in models:
|
||||
models_to_delete.append("%s:%s" % (DJANGO_CT, get_model_ct(model)))
|
||||
|
||||
self.conn.delete(q=" OR ".join(models_to_delete), commit=commit)
|
||||
|
||||
if commit:
|
||||
# Run an optimize post-clear. http://wiki.apache.org/solr/FAQ#head-9aafb5d8dff5308e8ea4fcf4b71f19f029c4bb99
|
||||
self.conn.optimize()
|
||||
except (IOError, SolrError) as e:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
if len(models):
|
||||
self.log.error("Failed to clear Solr index of models '%s': %s", ','.join(models_to_delete), e)
|
||||
else:
|
||||
self.log.error("Failed to clear Solr index: %s", e)
|
||||
|
||||
@log_query
|
||||
def search(self, query_string, **kwargs):
|
||||
if len(query_string) == 0:
|
||||
return {
|
||||
'results': [],
|
||||
'hits': 0,
|
||||
}
|
||||
|
||||
search_kwargs = self.build_search_kwargs(query_string, **kwargs)
|
||||
|
||||
try:
|
||||
raw_results = self.conn.search(query_string, **search_kwargs)
|
||||
except (IOError, SolrError) as e:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
self.log.error("Failed to query Solr using '%s': %s", query_string, e)
|
||||
raw_results = EmptyResults()
|
||||
|
||||
return self._process_results(raw_results, highlight=kwargs.get('highlight'), result_class=kwargs.get('result_class', SearchResult), distance_point=kwargs.get('distance_point'))
|
||||
|
||||
def build_search_kwargs(self, query_string, sort_by=None, start_offset=0, end_offset=None,
|
||||
fields='', highlight=False, facets=None,
|
||||
date_facets=None, query_facets=None,
|
||||
narrow_queries=None, spelling_query=None,
|
||||
within=None, dwithin=None, distance_point=None,
|
||||
models=None, limit_to_registered_models=None,
|
||||
result_class=None, stats=None):
|
||||
kwargs = {'fl': '* score'}
|
||||
|
||||
if fields:
|
||||
if isinstance(fields, (list, set)):
|
||||
fields = " ".join(fields)
|
||||
|
||||
kwargs['fl'] = fields
|
||||
|
||||
if sort_by is not None:
|
||||
if sort_by in ['distance asc', 'distance desc'] and distance_point:
|
||||
# Do the geo-enabled sort.
|
||||
lng, lat = distance_point['point'].get_coords()
|
||||
kwargs['sfield'] = distance_point['field']
|
||||
kwargs['pt'] = '%s,%s' % (lat, lng)
|
||||
|
||||
if sort_by == 'distance asc':
|
||||
kwargs['sort'] = 'geodist() asc'
|
||||
else:
|
||||
kwargs['sort'] = 'geodist() desc'
|
||||
else:
|
||||
if sort_by.startswith('distance '):
|
||||
warnings.warn("In order to sort by distance, you must call the '.distance(...)' method.")
|
||||
|
||||
# Regular sorting.
|
||||
kwargs['sort'] = sort_by
|
||||
|
||||
if start_offset is not None:
|
||||
kwargs['start'] = start_offset
|
||||
|
||||
if end_offset is not None:
|
||||
kwargs['rows'] = end_offset - start_offset
|
||||
|
||||
if highlight is True:
|
||||
kwargs['hl'] = 'true'
|
||||
kwargs['hl.fragsize'] = '200'
|
||||
|
||||
if self.include_spelling is True:
|
||||
kwargs['spellcheck'] = 'true'
|
||||
kwargs['spellcheck.collate'] = 'true'
|
||||
kwargs['spellcheck.count'] = 1
|
||||
|
||||
if spelling_query:
|
||||
kwargs['spellcheck.q'] = spelling_query
|
||||
|
||||
if facets is not None:
|
||||
kwargs['facet'] = 'on'
|
||||
kwargs['facet.field'] = facets.keys()
|
||||
|
||||
for facet_field, options in facets.items():
|
||||
for key, value in options.items():
|
||||
kwargs['f.%s.facet.%s' % (facet_field, key)] = self.conn._from_python(value)
|
||||
|
||||
if date_facets is not None:
|
||||
kwargs['facet'] = 'on'
|
||||
kwargs['facet.date'] = date_facets.keys()
|
||||
kwargs['facet.date.other'] = 'none'
|
||||
|
||||
for key, value in date_facets.items():
|
||||
kwargs["f.%s.facet.date.start" % key] = self.conn._from_python(value.get('start_date'))
|
||||
kwargs["f.%s.facet.date.end" % key] = self.conn._from_python(value.get('end_date'))
|
||||
gap_by_string = value.get('gap_by').upper()
|
||||
gap_string = "%d%s" % (value.get('gap_amount'), gap_by_string)
|
||||
|
||||
if value.get('gap_amount') != 1:
|
||||
gap_string += "S"
|
||||
|
||||
kwargs["f.%s.facet.date.gap" % key] = '+%s/%s' % (gap_string, gap_by_string)
|
||||
|
||||
if query_facets is not None:
|
||||
kwargs['facet'] = 'on'
|
||||
kwargs['facet.query'] = ["%s:%s" % (field, value) for field, value in query_facets]
|
||||
|
||||
if limit_to_registered_models is None:
|
||||
limit_to_registered_models = getattr(settings, 'HAYSTACK_LIMIT_TO_REGISTERED_MODELS', True)
|
||||
|
||||
if models and len(models):
|
||||
model_choices = sorted(get_model_ct(model) for model in models)
|
||||
elif limit_to_registered_models:
|
||||
# Using narrow queries, limit the results to only models handled
|
||||
# with the current routers.
|
||||
model_choices = self.build_models_list()
|
||||
else:
|
||||
model_choices = []
|
||||
|
||||
if len(model_choices) > 0:
|
||||
if narrow_queries is None:
|
||||
narrow_queries = set()
|
||||
|
||||
narrow_queries.add('%s:(%s)' % (DJANGO_CT, ' OR '.join(model_choices)))
|
||||
|
||||
if narrow_queries is not None:
|
||||
kwargs['fq'] = list(narrow_queries)
|
||||
|
||||
if stats:
|
||||
kwargs['stats'] = "true"
|
||||
|
||||
for k in stats.keys():
|
||||
kwargs['stats.field'] = k
|
||||
|
||||
for facet in stats[k]:
|
||||
kwargs['f.%s.stats.facet' % k] = facet
|
||||
|
||||
if within is not None:
|
||||
from haystack.utils.geo import generate_bounding_box
|
||||
|
||||
kwargs.setdefault('fq', [])
|
||||
((min_lat, min_lng), (max_lat, max_lng)) = generate_bounding_box(within['point_1'], within['point_2'])
|
||||
# Bounding boxes are min, min TO max, max. Solr's wiki was *NOT*
|
||||
# very clear on this.
|
||||
bbox = '%s:[%s,%s TO %s,%s]' % (within['field'], min_lat, min_lng, max_lat, max_lng)
|
||||
kwargs['fq'].append(bbox)
|
||||
|
||||
if dwithin is not None:
|
||||
kwargs.setdefault('fq', [])
|
||||
lng, lat = dwithin['point'].get_coords()
|
||||
geofilt = '{!geofilt pt=%s,%s sfield=%s d=%s}' % (lat, lng, dwithin['field'], dwithin['distance'].km)
|
||||
kwargs['fq'].append(geofilt)
|
||||
|
||||
# Check to see if the backend should try to include distances
|
||||
# (Solr 4.X+) in the results.
|
||||
if self.distance_available and distance_point:
|
||||
# In early testing, you can't just hand Solr 4.X a proper bounding box
|
||||
# & request distances. To enable native distance would take calculating
|
||||
# a center point & a radius off the user-provided box, which kinda
|
||||
# sucks. We'll avoid it for now, since Solr 4.x's release will be some
|
||||
# time yet.
|
||||
# kwargs['fl'] += ' _dist_:geodist()'
|
||||
pass
|
||||
|
||||
return kwargs
|
||||
|
||||
def more_like_this(self, model_instance, additional_query_string=None,
|
||||
start_offset=0, end_offset=None, models=None,
|
||||
limit_to_registered_models=None, result_class=None, **kwargs):
|
||||
from haystack import connections
|
||||
|
||||
# Deferred models will have a different class ("RealClass_Deferred_fieldname")
|
||||
# which won't be in our registry:
|
||||
model_klass = model_instance._meta.concrete_model
|
||||
|
||||
index = connections[self.connection_alias].get_unified_index().get_index(model_klass)
|
||||
field_name = index.get_content_field()
|
||||
params = {
|
||||
'fl': '*,score',
|
||||
}
|
||||
|
||||
if start_offset is not None:
|
||||
params['start'] = start_offset
|
||||
|
||||
if end_offset is not None:
|
||||
params['rows'] = end_offset
|
||||
|
||||
narrow_queries = set()
|
||||
|
||||
if limit_to_registered_models is None:
|
||||
limit_to_registered_models = getattr(settings, 'HAYSTACK_LIMIT_TO_REGISTERED_MODELS', True)
|
||||
|
||||
if models and len(models):
|
||||
model_choices = sorted(get_model_ct(model) for model in models)
|
||||
elif limit_to_registered_models:
|
||||
# Using narrow queries, limit the results to only models handled
|
||||
# with the current routers.
|
||||
model_choices = self.build_models_list()
|
||||
else:
|
||||
model_choices = []
|
||||
|
||||
if len(model_choices) > 0:
|
||||
if narrow_queries is None:
|
||||
narrow_queries = set()
|
||||
|
||||
narrow_queries.add('%s:(%s)' % (DJANGO_CT, ' OR '.join(model_choices)))
|
||||
|
||||
if additional_query_string:
|
||||
narrow_queries.add(additional_query_string)
|
||||
|
||||
if narrow_queries:
|
||||
params['fq'] = list(narrow_queries)
|
||||
|
||||
query = "%s:%s" % (ID, get_identifier(model_instance))
|
||||
|
||||
try:
|
||||
raw_results = self.conn.more_like_this(query, field_name, **params)
|
||||
except (IOError, SolrError) as e:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
self.log.error("Failed to fetch More Like This from Solr for document '%s': %s", query, e)
|
||||
raw_results = EmptyResults()
|
||||
|
||||
return self._process_results(raw_results, result_class=result_class)
|
||||
|
||||
def _process_results(self, raw_results, highlight=False, result_class=None, distance_point=None):
|
||||
from haystack import connections
|
||||
results = []
|
||||
hits = raw_results.hits
|
||||
facets = {}
|
||||
stats = {}
|
||||
spelling_suggestion = None
|
||||
|
||||
if result_class is None:
|
||||
result_class = SearchResult
|
||||
|
||||
if hasattr(raw_results,'stats'):
|
||||
stats = raw_results.stats.get('stats_fields',{})
|
||||
|
||||
if hasattr(raw_results, 'facets'):
|
||||
facets = {
|
||||
'fields': raw_results.facets.get('facet_fields', {}),
|
||||
'dates': raw_results.facets.get('facet_dates', {}),
|
||||
'queries': raw_results.facets.get('facet_queries', {}),
|
||||
}
|
||||
|
||||
for key in ['fields']:
|
||||
for facet_field in facets[key]:
|
||||
# Convert to a two-tuple, as Solr's json format returns a list of
|
||||
# pairs.
|
||||
facets[key][facet_field] = list(zip(facets[key][facet_field][::2], facets[key][facet_field][1::2]))
|
||||
|
||||
if self.include_spelling is True:
|
||||
if hasattr(raw_results, 'spellcheck'):
|
||||
if len(raw_results.spellcheck.get('suggestions', [])):
|
||||
# For some reason, it's an array of pairs. Pull off the
|
||||
# collated result from the end.
|
||||
spelling_suggestion = raw_results.spellcheck.get('suggestions')[-1]
|
||||
|
||||
unified_index = connections[self.connection_alias].get_unified_index()
|
||||
indexed_models = unified_index.get_indexed_models()
|
||||
|
||||
for raw_result in raw_results.docs:
|
||||
app_label, model_name = raw_result[DJANGO_CT].split('.')
|
||||
additional_fields = {}
|
||||
model = haystack_get_model(app_label, model_name)
|
||||
|
||||
if model and model in indexed_models:
|
||||
index = unified_index.get_index(model)
|
||||
index_field_map = index.field_map
|
||||
for key, value in raw_result.items():
|
||||
string_key = str(key)
|
||||
# re-map key if alternate name used
|
||||
if string_key in index_field_map:
|
||||
string_key = index_field_map[key]
|
||||
|
||||
if string_key in index.fields and hasattr(index.fields[string_key], 'convert'):
|
||||
additional_fields[string_key] = index.fields[string_key].convert(value)
|
||||
else:
|
||||
additional_fields[string_key] = self.conn._to_python(value)
|
||||
|
||||
del(additional_fields[DJANGO_CT])
|
||||
del(additional_fields[DJANGO_ID])
|
||||
del(additional_fields['score'])
|
||||
|
||||
if raw_result[ID] in getattr(raw_results, 'highlighting', {}):
|
||||
additional_fields['highlighted'] = raw_results.highlighting[raw_result[ID]]
|
||||
|
||||
if distance_point:
|
||||
additional_fields['_point_of_origin'] = distance_point
|
||||
|
||||
if raw_result.get('__dist__'):
|
||||
from haystack.utils.geo import Distance
|
||||
additional_fields['_distance'] = Distance(km=float(raw_result['__dist__']))
|
||||
else:
|
||||
additional_fields['_distance'] = None
|
||||
|
||||
result = result_class(app_label, model_name, raw_result[DJANGO_ID], raw_result['score'], **additional_fields)
|
||||
results.append(result)
|
||||
else:
|
||||
hits -= 1
|
||||
|
||||
return {
|
||||
'results': results,
|
||||
'hits': hits,
|
||||
'stats': stats,
|
||||
'facets': facets,
|
||||
'spelling_suggestion': spelling_suggestion,
|
||||
}
|
||||
|
||||
def build_schema(self, fields):
|
||||
content_field_name = ''
|
||||
schema_fields = []
|
||||
|
||||
for field_name, field_class in fields.items():
|
||||
field_data = {
|
||||
'field_name': field_class.index_fieldname,
|
||||
'type': 'text_en',
|
||||
'indexed': 'true',
|
||||
'stored': 'true',
|
||||
'multi_valued': 'false',
|
||||
}
|
||||
|
||||
if field_class.document is True:
|
||||
content_field_name = field_class.index_fieldname
|
||||
|
||||
# DRL_FIXME: Perhaps move to something where, if none of these
|
||||
# checks succeed, call a custom method on the form that
|
||||
# returns, per-backend, the right type of storage?
|
||||
if field_class.field_type in ['date', 'datetime']:
|
||||
field_data['type'] = 'date'
|
||||
elif field_class.field_type == 'integer':
|
||||
field_data['type'] = 'long'
|
||||
elif field_class.field_type == 'float':
|
||||
field_data['type'] = 'float'
|
||||
elif field_class.field_type == 'boolean':
|
||||
field_data['type'] = 'boolean'
|
||||
elif field_class.field_type == 'ngram':
|
||||
field_data['type'] = 'ngram'
|
||||
elif field_class.field_type == 'edge_ngram':
|
||||
field_data['type'] = 'edge_ngram'
|
||||
elif field_class.field_type == 'location':
|
||||
field_data['type'] = 'location'
|
||||
|
||||
if field_class.is_multivalued:
|
||||
field_data['multi_valued'] = 'true'
|
||||
|
||||
if field_class.stored is False:
|
||||
field_data['stored'] = 'false'
|
||||
|
||||
# Do this last to override `text` fields.
|
||||
if field_class.indexed is False:
|
||||
field_data['indexed'] = 'false'
|
||||
|
||||
# If it's text and not being indexed, we probably don't want
|
||||
# to do the normal lowercase/tokenize/stemming/etc. dance.
|
||||
if field_data['type'] == 'text_en':
|
||||
field_data['type'] = 'string'
|
||||
|
||||
# If it's a ``FacetField``, make sure we don't postprocess it.
|
||||
if hasattr(field_class, 'facet_for'):
|
||||
# If it's text, it ought to be a string.
|
||||
if field_data['type'] == 'text_en':
|
||||
field_data['type'] = 'string'
|
||||
|
||||
schema_fields.append(field_data)
|
||||
|
||||
return (content_field_name, schema_fields)
|
||||
|
||||
def extract_file_contents(self, file_obj):
|
||||
"""Extract text and metadata from a structured file (PDF, MS Word, etc.)
|
||||
|
||||
Uses the Solr ExtractingRequestHandler, which is based on Apache Tika.
|
||||
See the Solr wiki for details:
|
||||
|
||||
http://wiki.apache.org/solr/ExtractingRequestHandler
|
||||
|
||||
Due to the way the ExtractingRequestHandler is implemented it completely
|
||||
replaces the normal Haystack indexing process with several unfortunate
|
||||
restrictions: only one file per request, the extracted data is added to
|
||||
the index with no ability to modify it, etc. To simplify the process and
|
||||
allow for more advanced use we'll run using the extract-only mode to
|
||||
return the extracted data without adding it to the index so we can then
|
||||
use it within Haystack's normal templating process.
|
||||
|
||||
Returns None if metadata cannot be extracted; otherwise returns a
|
||||
dictionary containing at least two keys:
|
||||
|
||||
:contents:
|
||||
Extracted full-text content, if applicable
|
||||
:metadata:
|
||||
key:value pairs of text strings
|
||||
"""
|
||||
|
||||
try:
|
||||
return self.conn.extract(file_obj)
|
||||
except Exception as e:
|
||||
self.log.warning(u"Unable to extract file contents: %s", e,
|
||||
exc_info=True, extra={"data": {"file": file_obj}})
|
||||
return None
|
||||
|
||||
|
||||
class SolrSearchQuery(BaseSearchQuery):
|
||||
def matching_all_fragment(self):
|
||||
return '*:*'
|
||||
|
||||
def build_query_fragment(self, field, filter_type, value):
|
||||
from haystack import connections
|
||||
query_frag = ''
|
||||
|
||||
if not hasattr(value, 'input_type_name'):
|
||||
# Handle when we've got a ``ValuesListQuerySet``...
|
||||
if hasattr(value, 'values_list'):
|
||||
value = list(value)
|
||||
|
||||
if isinstance(value, six.string_types):
|
||||
# It's not an ``InputType``. Assume ``Clean``.
|
||||
value = Clean(value)
|
||||
else:
|
||||
value = PythonData(value)
|
||||
|
||||
# Prepare the query using the InputType.
|
||||
prepared_value = value.prepare(self)
|
||||
|
||||
if not isinstance(prepared_value, (set, list, tuple)):
|
||||
# Then convert whatever we get back to what pysolr wants if needed.
|
||||
prepared_value = self.backend.conn._from_python(prepared_value)
|
||||
|
||||
# 'content' is a special reserved word, much like 'pk' in
|
||||
# Django's ORM layer. It indicates 'no special field'.
|
||||
if field == 'content':
|
||||
index_fieldname = ''
|
||||
else:
|
||||
index_fieldname = u'%s:' % connections[self._using].get_unified_index().get_index_fieldname(field)
|
||||
|
||||
filter_types = {
|
||||
'contains': u'%s',
|
||||
'startswith': u'%s*',
|
||||
'exact': u'%s',
|
||||
'gt': u'{%s TO *}',
|
||||
'gte': u'[%s TO *]',
|
||||
'lt': u'{* TO %s}',
|
||||
'lte': u'[* TO %s]',
|
||||
}
|
||||
|
||||
if value.post_process is False:
|
||||
query_frag = prepared_value
|
||||
else:
|
||||
if filter_type in ['contains', 'startswith']:
|
||||
if value.input_type_name == 'exact':
|
||||
query_frag = prepared_value
|
||||
else:
|
||||
# Iterate over terms & incorportate the converted form of each into the query.
|
||||
terms = []
|
||||
|
||||
for possible_value in prepared_value.split(' '):
|
||||
terms.append(filter_types[filter_type] % self.backend.conn._from_python(possible_value))
|
||||
|
||||
if len(terms) == 1:
|
||||
query_frag = terms[0]
|
||||
else:
|
||||
query_frag = u"(%s)" % " AND ".join(terms)
|
||||
elif filter_type == 'in':
|
||||
in_options = []
|
||||
|
||||
for possible_value in prepared_value:
|
||||
in_options.append(u'"%s"' % self.backend.conn._from_python(possible_value))
|
||||
|
||||
query_frag = u"(%s)" % " OR ".join(in_options)
|
||||
elif filter_type == 'range':
|
||||
start = self.backend.conn._from_python(prepared_value[0])
|
||||
end = self.backend.conn._from_python(prepared_value[1])
|
||||
query_frag = u'["%s" TO "%s"]' % (start, end)
|
||||
elif filter_type == 'exact':
|
||||
if value.input_type_name == 'exact':
|
||||
query_frag = prepared_value
|
||||
else:
|
||||
prepared_value = Exact(prepared_value).prepare(self)
|
||||
query_frag = filter_types[filter_type] % prepared_value
|
||||
else:
|
||||
if value.input_type_name != 'exact':
|
||||
prepared_value = Exact(prepared_value).prepare(self)
|
||||
|
||||
query_frag = filter_types[filter_type] % prepared_value
|
||||
|
||||
if len(query_frag) and not isinstance(value, Raw):
|
||||
if not query_frag.startswith('(') and not query_frag.endswith(')'):
|
||||
query_frag = "(%s)" % query_frag
|
||||
|
||||
return u"%s%s" % (index_fieldname, query_frag)
|
||||
|
||||
def build_alt_parser_query(self, parser_name, query_string='', **kwargs):
|
||||
if query_string:
|
||||
query_string = Clean(query_string).prepare(self)
|
||||
|
||||
kwarg_bits = []
|
||||
|
||||
for key in sorted(kwargs.keys()):
|
||||
if isinstance(kwargs[key], six.string_types) and ' ' in kwargs[key]:
|
||||
kwarg_bits.append(u"%s='%s'" % (key, kwargs[key]))
|
||||
else:
|
||||
kwarg_bits.append(u"%s=%s" % (key, kwargs[key]))
|
||||
|
||||
return u'_query_:"{!%s %s}%s"' % (parser_name, Clean(' '.join(kwarg_bits)), query_string)
|
||||
|
||||
def build_params(self, spelling_query=None, **kwargs):
|
||||
search_kwargs = {
|
||||
'start_offset': self.start_offset,
|
||||
'result_class': self.result_class
|
||||
}
|
||||
order_by_list = None
|
||||
|
||||
if self.order_by:
|
||||
if order_by_list is None:
|
||||
order_by_list = []
|
||||
|
||||
for order_by in self.order_by:
|
||||
if order_by.startswith('-'):
|
||||
order_by_list.append('%s desc' % order_by[1:])
|
||||
else:
|
||||
order_by_list.append('%s asc' % order_by)
|
||||
|
||||
search_kwargs['sort_by'] = ", ".join(order_by_list)
|
||||
|
||||
if self.date_facets:
|
||||
search_kwargs['date_facets'] = self.date_facets
|
||||
|
||||
if self.distance_point:
|
||||
search_kwargs['distance_point'] = self.distance_point
|
||||
|
||||
if self.dwithin:
|
||||
search_kwargs['dwithin'] = self.dwithin
|
||||
|
||||
if self.end_offset is not None:
|
||||
search_kwargs['end_offset'] = self.end_offset
|
||||
|
||||
if self.facets:
|
||||
search_kwargs['facets'] = self.facets
|
||||
|
||||
if self.fields:
|
||||
search_kwargs['fields'] = self.fields
|
||||
|
||||
if self.highlight:
|
||||
search_kwargs['highlight'] = self.highlight
|
||||
|
||||
if self.models:
|
||||
search_kwargs['models'] = self.models
|
||||
|
||||
if self.narrow_queries:
|
||||
search_kwargs['narrow_queries'] = self.narrow_queries
|
||||
|
||||
if self.query_facets:
|
||||
search_kwargs['query_facets'] = self.query_facets
|
||||
|
||||
if self.within:
|
||||
search_kwargs['within'] = self.within
|
||||
|
||||
if spelling_query:
|
||||
search_kwargs['spelling_query'] = spelling_query
|
||||
|
||||
if self.stats:
|
||||
search_kwargs['stats'] = self.stats
|
||||
|
||||
return search_kwargs
|
||||
|
||||
def run(self, spelling_query=None, **kwargs):
|
||||
"""Builds and executes the query. Returns a list of search results."""
|
||||
final_query = self.build_query()
|
||||
search_kwargs = self.build_params(spelling_query, **kwargs)
|
||||
|
||||
if kwargs:
|
||||
search_kwargs.update(kwargs)
|
||||
|
||||
results = self.backend.search(final_query, **search_kwargs)
|
||||
self._results = results.get('results', [])
|
||||
self._hit_count = results.get('hits', 0)
|
||||
self._facet_counts = self.post_process_facets(results)
|
||||
self._stats = results.get('stats',{})
|
||||
self._spelling_suggestion = results.get('spelling_suggestion', None)
|
||||
|
||||
def run_mlt(self, **kwargs):
|
||||
"""Builds and executes the query. Returns a list of search results."""
|
||||
if self._more_like_this is False or self._mlt_instance is None:
|
||||
raise MoreLikeThisError("No instance was provided to determine 'More Like This' results.")
|
||||
|
||||
additional_query_string = self.build_query()
|
||||
search_kwargs = {
|
||||
'start_offset': self.start_offset,
|
||||
'result_class': self.result_class,
|
||||
'models': self.models
|
||||
}
|
||||
|
||||
if self.end_offset is not None:
|
||||
search_kwargs['end_offset'] = self.end_offset - self.start_offset
|
||||
|
||||
results = self.backend.more_like_this(self._mlt_instance, additional_query_string, **search_kwargs)
|
||||
self._results = results.get('results', [])
|
||||
self._hit_count = results.get('hits', 0)
|
||||
|
||||
|
||||
class SolrEngine(BaseEngine):
|
||||
backend = SolrSearchBackend
|
||||
query = SolrSearchQuery
|
|
@ -0,0 +1,916 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import threading
|
||||
import warnings
|
||||
|
||||
from django.conf import settings
|
||||
from django.core.exceptions import ImproperlyConfigured
|
||||
from django.utils import six
|
||||
from django.utils.datetime_safe import datetime
|
||||
|
||||
from haystack.backends import BaseEngine, BaseSearchBackend, BaseSearchQuery, EmptyResults, log_query
|
||||
from haystack.constants import DJANGO_CT, DJANGO_ID, ID
|
||||
from haystack.exceptions import MissingDependency, SearchBackendError, SkipDocument
|
||||
from haystack.inputs import Clean, Exact, PythonData, Raw
|
||||
from haystack.models import SearchResult
|
||||
from haystack.utils import log as logging
|
||||
from haystack.utils import get_identifier, get_model_ct
|
||||
from haystack.utils.app_loading import haystack_get_model
|
||||
|
||||
try:
|
||||
import json
|
||||
except ImportError:
|
||||
try:
|
||||
import simplejson as json
|
||||
except ImportError:
|
||||
from django.utils import simplejson as json
|
||||
|
||||
try:
|
||||
from django.utils.encoding import force_text
|
||||
except ImportError:
|
||||
from django.utils.encoding import force_unicode as force_text
|
||||
|
||||
try:
|
||||
import whoosh
|
||||
except ImportError:
|
||||
raise MissingDependency("The 'whoosh' backend requires the installation of 'Whoosh'. Please refer to the documentation.")
|
||||
|
||||
# Handle minimum requirement.
|
||||
if not hasattr(whoosh, '__version__') or whoosh.__version__ < (2, 5, 0):
|
||||
raise MissingDependency("The 'whoosh' backend requires version 2.5.0 or greater.")
|
||||
|
||||
# Bubble up the correct error.
|
||||
from whoosh import index
|
||||
from whoosh.analysis import StemmingAnalyzer
|
||||
from whoosh.fields import ID as WHOOSH_ID
|
||||
from whoosh.fields import BOOLEAN, DATETIME, IDLIST, KEYWORD, NGRAM, NGRAMWORDS, NUMERIC, Schema, TEXT
|
||||
from whoosh.filedb.filestore import FileStorage, RamStorage
|
||||
from whoosh.highlight import highlight as whoosh_highlight
|
||||
from whoosh.highlight import ContextFragmenter, HtmlFormatter
|
||||
from whoosh.qparser import QueryParser
|
||||
from whoosh.searching import ResultsPage
|
||||
from whoosh.writing import AsyncWriter
|
||||
|
||||
|
||||
DATETIME_REGEX = re.compile('^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})T(?P<hour>\d{2}):(?P<minute>\d{2}):(?P<second>\d{2})(\.\d{3,6}Z?)?$')
|
||||
LOCALS = threading.local()
|
||||
LOCALS.RAM_STORE = None
|
||||
|
||||
|
||||
class WhooshHtmlFormatter(HtmlFormatter):
|
||||
"""
|
||||
This is a HtmlFormatter simpler than the whoosh.HtmlFormatter.
|
||||
We use it to have consistent results across backends. Specifically,
|
||||
Solr, Xapian and Elasticsearch are using this formatting.
|
||||
"""
|
||||
template = '<%(tag)s>%(t)s</%(tag)s>'
|
||||
|
||||
|
||||
class WhooshSearchBackend(BaseSearchBackend):
|
||||
# Word reserved by Whoosh for special use.
|
||||
RESERVED_WORDS = (
|
||||
'AND',
|
||||
'NOT',
|
||||
'OR',
|
||||
'TO',
|
||||
)
|
||||
|
||||
# Characters reserved by Whoosh for special use.
|
||||
# The '\\' must come first, so as not to overwrite the other slash replacements.
|
||||
RESERVED_CHARACTERS = (
|
||||
'\\', '+', '-', '&&', '||', '!', '(', ')', '{', '}',
|
||||
'[', ']', '^', '"', '~', '*', '?', ':', '.',
|
||||
)
|
||||
|
||||
def __init__(self, connection_alias, **connection_options):
|
||||
super(WhooshSearchBackend, self).__init__(connection_alias, **connection_options)
|
||||
self.setup_complete = False
|
||||
self.use_file_storage = True
|
||||
self.post_limit = getattr(connection_options, 'POST_LIMIT', 128 * 1024 * 1024)
|
||||
self.path = connection_options.get('PATH')
|
||||
|
||||
if connection_options.get('STORAGE', 'file') != 'file':
|
||||
self.use_file_storage = False
|
||||
|
||||
if self.use_file_storage and not self.path:
|
||||
raise ImproperlyConfigured("You must specify a 'PATH' in your settings for connection '%s'." % connection_alias)
|
||||
|
||||
self.log = logging.getLogger('haystack')
|
||||
|
||||
def setup(self):
|
||||
"""
|
||||
Defers loading until needed.
|
||||
"""
|
||||
from haystack import connections
|
||||
new_index = False
|
||||
|
||||
# Make sure the index is there.
|
||||
if self.use_file_storage and not os.path.exists(self.path):
|
||||
os.makedirs(self.path)
|
||||
new_index = True
|
||||
|
||||
if self.use_file_storage and not os.access(self.path, os.W_OK):
|
||||
raise IOError("The path to your Whoosh index '%s' is not writable for the current user/group." % self.path)
|
||||
|
||||
if self.use_file_storage:
|
||||
self.storage = FileStorage(self.path)
|
||||
else:
|
||||
global LOCALS
|
||||
|
||||
if LOCALS.RAM_STORE is None:
|
||||
LOCALS.RAM_STORE = RamStorage()
|
||||
|
||||
self.storage = LOCALS.RAM_STORE
|
||||
|
||||
self.content_field_name, self.schema = self.build_schema(connections[self.connection_alias].get_unified_index().all_searchfields())
|
||||
self.parser = QueryParser(self.content_field_name, schema=self.schema)
|
||||
|
||||
if new_index is True:
|
||||
self.index = self.storage.create_index(self.schema)
|
||||
else:
|
||||
try:
|
||||
self.index = self.storage.open_index(schema=self.schema)
|
||||
except index.EmptyIndexError:
|
||||
self.index = self.storage.create_index(self.schema)
|
||||
|
||||
self.setup_complete = True
|
||||
|
||||
def build_schema(self, fields):
|
||||
schema_fields = {
|
||||
ID: WHOOSH_ID(stored=True, unique=True),
|
||||
DJANGO_CT: WHOOSH_ID(stored=True),
|
||||
DJANGO_ID: WHOOSH_ID(stored=True),
|
||||
}
|
||||
# Grab the number of keys that are hard-coded into Haystack.
|
||||
# We'll use this to (possibly) fail slightly more gracefully later.
|
||||
initial_key_count = len(schema_fields)
|
||||
content_field_name = ''
|
||||
|
||||
for field_name, field_class in fields.items():
|
||||
if field_class.is_multivalued:
|
||||
if field_class.indexed is False:
|
||||
schema_fields[field_class.index_fieldname] = IDLIST(stored=True, field_boost=field_class.boost)
|
||||
else:
|
||||
schema_fields[field_class.index_fieldname] = KEYWORD(stored=True, commas=True, scorable=True, field_boost=field_class.boost)
|
||||
elif field_class.field_type in ['date', 'datetime']:
|
||||
schema_fields[field_class.index_fieldname] = DATETIME(stored=field_class.stored, sortable=True)
|
||||
elif field_class.field_type == 'integer':
|
||||
schema_fields[field_class.index_fieldname] = NUMERIC(stored=field_class.stored, numtype=int, field_boost=field_class.boost)
|
||||
elif field_class.field_type == 'float':
|
||||
schema_fields[field_class.index_fieldname] = NUMERIC(stored=field_class.stored, numtype=float, field_boost=field_class.boost)
|
||||
elif field_class.field_type == 'boolean':
|
||||
# Field boost isn't supported on BOOLEAN as of 1.8.2.
|
||||
schema_fields[field_class.index_fieldname] = BOOLEAN(stored=field_class.stored)
|
||||
elif field_class.field_type == 'ngram':
|
||||
schema_fields[field_class.index_fieldname] = NGRAM(minsize=3, maxsize=15, stored=field_class.stored, field_boost=field_class.boost)
|
||||
elif field_class.field_type == 'edge_ngram':
|
||||
schema_fields[field_class.index_fieldname] = NGRAMWORDS(minsize=2, maxsize=15, at='start', stored=field_class.stored, field_boost=field_class.boost)
|
||||
else:
|
||||
schema_fields[field_class.index_fieldname] = TEXT(stored=True, analyzer=StemmingAnalyzer(), field_boost=field_class.boost, sortable=True)
|
||||
|
||||
if field_class.document is True:
|
||||
content_field_name = field_class.index_fieldname
|
||||
schema_fields[field_class.index_fieldname].spelling = True
|
||||
|
||||
# Fail more gracefully than relying on the backend to die if no fields
|
||||
# are found.
|
||||
if len(schema_fields) <= initial_key_count:
|
||||
raise SearchBackendError("No fields were found in any search_indexes. Please correct this before attempting to search.")
|
||||
|
||||
return (content_field_name, Schema(**schema_fields))
|
||||
|
||||
def update(self, index, iterable, commit=True):
|
||||
if not self.setup_complete:
|
||||
self.setup()
|
||||
|
||||
self.index = self.index.refresh()
|
||||
writer = AsyncWriter(self.index)
|
||||
|
||||
for obj in iterable:
|
||||
try:
|
||||
doc = index.full_prepare(obj)
|
||||
except SkipDocument:
|
||||
self.log.debug(u"Indexing for object `%s` skipped", obj)
|
||||
else:
|
||||
# Really make sure it's unicode, because Whoosh won't have it any
|
||||
# other way.
|
||||
for key in doc:
|
||||
doc[key] = self._from_python(doc[key])
|
||||
|
||||
# Document boosts aren't supported in Whoosh 2.5.0+.
|
||||
if 'boost' in doc:
|
||||
del doc['boost']
|
||||
|
||||
try:
|
||||
writer.update_document(**doc)
|
||||
except Exception as e:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
# We'll log the object identifier but won't include the actual object
|
||||
# to avoid the possibility of that generating encoding errors while
|
||||
# processing the log message:
|
||||
self.log.error(u"%s while preparing object for update" % e.__class__.__name__, exc_info=True, extra={
|
||||
"data": {
|
||||
"index": index,
|
||||
"object": get_identifier(obj)
|
||||
}
|
||||
})
|
||||
|
||||
if len(iterable) > 0:
|
||||
# For now, commit no matter what, as we run into locking issues otherwise.
|
||||
writer.commit()
|
||||
|
||||
def remove(self, obj_or_string, commit=True):
|
||||
if not self.setup_complete:
|
||||
self.setup()
|
||||
|
||||
self.index = self.index.refresh()
|
||||
whoosh_id = get_identifier(obj_or_string)
|
||||
|
||||
try:
|
||||
self.index.delete_by_query(q=self.parser.parse(u'%s:"%s"' % (ID, whoosh_id)))
|
||||
except Exception as e:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
self.log.error("Failed to remove document '%s' from Whoosh: %s", whoosh_id, e)
|
||||
|
||||
def clear(self, models=[], commit=True):
|
||||
if not self.setup_complete:
|
||||
self.setup()
|
||||
|
||||
self.index = self.index.refresh()
|
||||
|
||||
try:
|
||||
if not models:
|
||||
self.delete_index()
|
||||
else:
|
||||
models_to_delete = []
|
||||
|
||||
for model in models:
|
||||
models_to_delete.append(u"%s:%s" % (DJANGO_CT, get_model_ct(model)))
|
||||
|
||||
self.index.delete_by_query(q=self.parser.parse(u" OR ".join(models_to_delete)))
|
||||
except Exception as e:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
self.log.error("Failed to clear documents from Whoosh: %s", e)
|
||||
|
||||
def delete_index(self):
|
||||
# Per the Whoosh mailing list, if wiping out everything from the index,
|
||||
# it's much more efficient to simply delete the index files.
|
||||
if self.use_file_storage and os.path.exists(self.path):
|
||||
shutil.rmtree(self.path)
|
||||
elif not self.use_file_storage:
|
||||
self.storage.clean()
|
||||
|
||||
# Recreate everything.
|
||||
self.setup()
|
||||
|
||||
def optimize(self):
|
||||
if not self.setup_complete:
|
||||
self.setup()
|
||||
|
||||
self.index = self.index.refresh()
|
||||
self.index.optimize()
|
||||
|
||||
def calculate_page(self, start_offset=0, end_offset=None):
|
||||
# Prevent against Whoosh throwing an error. Requires an end_offset
|
||||
# greater than 0.
|
||||
if not end_offset is None and end_offset <= 0:
|
||||
end_offset = 1
|
||||
|
||||
# Determine the page.
|
||||
page_num = 0
|
||||
|
||||
if end_offset is None:
|
||||
end_offset = 1000000
|
||||
|
||||
if start_offset is None:
|
||||
start_offset = 0
|
||||
|
||||
page_length = end_offset - start_offset
|
||||
|
||||
if page_length and page_length > 0:
|
||||
page_num = int(start_offset / page_length)
|
||||
|
||||
# Increment because Whoosh uses 1-based page numbers.
|
||||
page_num += 1
|
||||
return page_num, page_length
|
||||
|
||||
@log_query
|
||||
def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,
|
||||
fields='', highlight=False, facets=None, date_facets=None, query_facets=None,
|
||||
narrow_queries=None, spelling_query=None, within=None,
|
||||
dwithin=None, distance_point=None, models=None,
|
||||
limit_to_registered_models=None, result_class=None, **kwargs):
|
||||
if not self.setup_complete:
|
||||
self.setup()
|
||||
|
||||
# A zero length query should return no results.
|
||||
if len(query_string) == 0:
|
||||
return {
|
||||
'results': [],
|
||||
'hits': 0,
|
||||
}
|
||||
|
||||
query_string = force_text(query_string)
|
||||
|
||||
# A one-character query (non-wildcard) gets nabbed by a stopwords
|
||||
# filter and should yield zero results.
|
||||
if len(query_string) <= 1 and query_string != u'*':
|
||||
return {
|
||||
'results': [],
|
||||
'hits': 0,
|
||||
}
|
||||
|
||||
reverse = False
|
||||
|
||||
if sort_by is not None:
|
||||
# Determine if we need to reverse the results and if Whoosh can
|
||||
# handle what it's being asked to sort by. Reversing is an
|
||||
# all-or-nothing action, unfortunately.
|
||||
sort_by_list = []
|
||||
reverse_counter = 0
|
||||
|
||||
for order_by in sort_by:
|
||||
if order_by.startswith('-'):
|
||||
reverse_counter += 1
|
||||
|
||||
if reverse_counter and reverse_counter != len(sort_by):
|
||||
raise SearchBackendError("Whoosh requires all order_by fields"
|
||||
" to use the same sort direction")
|
||||
|
||||
for order_by in sort_by:
|
||||
if order_by.startswith('-'):
|
||||
sort_by_list.append(order_by[1:])
|
||||
|
||||
if len(sort_by_list) == 1:
|
||||
reverse = True
|
||||
else:
|
||||
sort_by_list.append(order_by)
|
||||
|
||||
if len(sort_by_list) == 1:
|
||||
reverse = False
|
||||
|
||||
sort_by = sort_by_list[0]
|
||||
|
||||
if facets is not None:
|
||||
warnings.warn("Whoosh does not handle faceting.", Warning, stacklevel=2)
|
||||
|
||||
if date_facets is not None:
|
||||
warnings.warn("Whoosh does not handle date faceting.", Warning, stacklevel=2)
|
||||
|
||||
if query_facets is not None:
|
||||
warnings.warn("Whoosh does not handle query faceting.", Warning, stacklevel=2)
|
||||
|
||||
narrowed_results = None
|
||||
self.index = self.index.refresh()
|
||||
|
||||
if limit_to_registered_models is None:
|
||||
limit_to_registered_models = getattr(settings, 'HAYSTACK_LIMIT_TO_REGISTERED_MODELS', True)
|
||||
|
||||
if models and len(models):
|
||||
model_choices = sorted(get_model_ct(model) for model in models)
|
||||
elif limit_to_registered_models:
|
||||
# Using narrow queries, limit the results to only models handled
|
||||
# with the current routers.
|
||||
model_choices = self.build_models_list()
|
||||
else:
|
||||
model_choices = []
|
||||
|
||||
if len(model_choices) > 0:
|
||||
if narrow_queries is None:
|
||||
narrow_queries = set()
|
||||
|
||||
narrow_queries.add(' OR '.join(['%s:%s' % (DJANGO_CT, rm) for rm in model_choices]))
|
||||
|
||||
narrow_searcher = None
|
||||
|
||||
if narrow_queries is not None:
|
||||
# Potentially expensive? I don't see another way to do it in Whoosh...
|
||||
narrow_searcher = self.index.searcher()
|
||||
|
||||
for nq in narrow_queries:
|
||||
recent_narrowed_results = narrow_searcher.search(self.parser.parse(force_text(nq)),
|
||||
limit=None)
|
||||
|
||||
if len(recent_narrowed_results) <= 0:
|
||||
return {
|
||||
'results': [],
|
||||
'hits': 0,
|
||||
}
|
||||
|
||||
if narrowed_results:
|
||||
narrowed_results.filter(recent_narrowed_results)
|
||||
else:
|
||||
narrowed_results = recent_narrowed_results
|
||||
|
||||
self.index = self.index.refresh()
|
||||
|
||||
if self.index.doc_count():
|
||||
searcher = self.index.searcher()
|
||||
parsed_query = self.parser.parse(query_string)
|
||||
|
||||
# In the event of an invalid/stopworded query, recover gracefully.
|
||||
if parsed_query is None:
|
||||
return {
|
||||
'results': [],
|
||||
'hits': 0,
|
||||
}
|
||||
|
||||
page_num, page_length = self.calculate_page(start_offset, end_offset)
|
||||
|
||||
search_kwargs = {
|
||||
'pagelen': page_length,
|
||||
'sortedby': sort_by,
|
||||
'reverse': reverse,
|
||||
}
|
||||
|
||||
# Handle the case where the results have been narrowed.
|
||||
if narrowed_results is not None:
|
||||
search_kwargs['filter'] = narrowed_results
|
||||
|
||||
try:
|
||||
raw_page = searcher.search_page(
|
||||
parsed_query,
|
||||
page_num,
|
||||
**search_kwargs
|
||||
)
|
||||
except ValueError:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
return {
|
||||
'results': [],
|
||||
'hits': 0,
|
||||
'spelling_suggestion': None,
|
||||
}
|
||||
|
||||
# Because as of Whoosh 2.5.1, it will return the wrong page of
|
||||
# results if you request something too high. :(
|
||||
if raw_page.pagenum < page_num:
|
||||
return {
|
||||
'results': [],
|
||||
'hits': 0,
|
||||
'spelling_suggestion': None,
|
||||
}
|
||||
|
||||
results = self._process_results(raw_page, highlight=highlight, query_string=query_string, spelling_query=spelling_query, result_class=result_class)
|
||||
searcher.close()
|
||||
|
||||
if hasattr(narrow_searcher, 'close'):
|
||||
narrow_searcher.close()
|
||||
|
||||
return results
|
||||
else:
|
||||
if self.include_spelling:
|
||||
if spelling_query:
|
||||
spelling_suggestion = self.create_spelling_suggestion(spelling_query)
|
||||
else:
|
||||
spelling_suggestion = self.create_spelling_suggestion(query_string)
|
||||
else:
|
||||
spelling_suggestion = None
|
||||
|
||||
return {
|
||||
'results': [],
|
||||
'hits': 0,
|
||||
'spelling_suggestion': spelling_suggestion,
|
||||
}
|
||||
|
||||
def more_like_this(self, model_instance, additional_query_string=None,
|
||||
start_offset=0, end_offset=None, models=None,
|
||||
limit_to_registered_models=None, result_class=None, **kwargs):
|
||||
if not self.setup_complete:
|
||||
self.setup()
|
||||
|
||||
# Deferred models will have a different class ("RealClass_Deferred_fieldname")
|
||||
# which won't be in our registry:
|
||||
model_klass = model_instance._meta.concrete_model
|
||||
|
||||
field_name = self.content_field_name
|
||||
narrow_queries = set()
|
||||
narrowed_results = None
|
||||
self.index = self.index.refresh()
|
||||
|
||||
if limit_to_registered_models is None:
|
||||
limit_to_registered_models = getattr(settings, 'HAYSTACK_LIMIT_TO_REGISTERED_MODELS', True)
|
||||
|
||||
if models and len(models):
|
||||
model_choices = sorted(get_model_ct(model) for model in models)
|
||||
elif limit_to_registered_models:
|
||||
# Using narrow queries, limit the results to only models handled
|
||||
# with the current routers.
|
||||
model_choices = self.build_models_list()
|
||||
else:
|
||||
model_choices = []
|
||||
|
||||
if len(model_choices) > 0:
|
||||
if narrow_queries is None:
|
||||
narrow_queries = set()
|
||||
|
||||
narrow_queries.add(' OR '.join(['%s:%s' % (DJANGO_CT, rm) for rm in model_choices]))
|
||||
|
||||
if additional_query_string and additional_query_string != '*':
|
||||
narrow_queries.add(additional_query_string)
|
||||
|
||||
narrow_searcher = None
|
||||
|
||||
if narrow_queries is not None:
|
||||
# Potentially expensive? I don't see another way to do it in Whoosh...
|
||||
narrow_searcher = self.index.searcher()
|
||||
|
||||
for nq in narrow_queries:
|
||||
recent_narrowed_results = narrow_searcher.search(self.parser.parse(force_text(nq)),
|
||||
limit=None)
|
||||
|
||||
if len(recent_narrowed_results) <= 0:
|
||||
return {
|
||||
'results': [],
|
||||
'hits': 0,
|
||||
}
|
||||
|
||||
if narrowed_results:
|
||||
narrowed_results.filter(recent_narrowed_results)
|
||||
else:
|
||||
narrowed_results = recent_narrowed_results
|
||||
|
||||
page_num, page_length = self.calculate_page(start_offset, end_offset)
|
||||
|
||||
self.index = self.index.refresh()
|
||||
raw_results = EmptyResults()
|
||||
|
||||
if self.index.doc_count():
|
||||
query = "%s:%s" % (ID, get_identifier(model_instance))
|
||||
searcher = self.index.searcher()
|
||||
parsed_query = self.parser.parse(query)
|
||||
results = searcher.search(parsed_query)
|
||||
|
||||
if len(results):
|
||||
raw_results = results[0].more_like_this(field_name, top=end_offset)
|
||||
|
||||
# Handle the case where the results have been narrowed.
|
||||
if narrowed_results is not None and hasattr(raw_results, 'filter'):
|
||||
raw_results.filter(narrowed_results)
|
||||
|
||||
try:
|
||||
raw_page = ResultsPage(raw_results, page_num, page_length)
|
||||
except ValueError:
|
||||
if not self.silently_fail:
|
||||
raise
|
||||
|
||||
return {
|
||||
'results': [],
|
||||
'hits': 0,
|
||||
'spelling_suggestion': None,
|
||||
}
|
||||
|
||||
# Because as of Whoosh 2.5.1, it will return the wrong page of
|
||||
# results if you request something too high. :(
|
||||
if raw_page.pagenum < page_num:
|
||||
return {
|
||||
'results': [],
|
||||
'hits': 0,
|
||||
'spelling_suggestion': None,
|
||||
}
|
||||
|
||||
results = self._process_results(raw_page, result_class=result_class)
|
||||
searcher.close()
|
||||
|
||||
if hasattr(narrow_searcher, 'close'):
|
||||
narrow_searcher.close()
|
||||
|
||||
return results
|
||||
|
||||
def _process_results(self, raw_page, highlight=False, query_string='', spelling_query=None, result_class=None):
|
||||
from haystack import connections
|
||||
results = []
|
||||
|
||||
# It's important to grab the hits first before slicing. Otherwise, this
|
||||
# can cause pagination failures.
|
||||
hits = len(raw_page)
|
||||
|
||||
if result_class is None:
|
||||
result_class = SearchResult
|
||||
|
||||
facets = {}
|
||||
spelling_suggestion = None
|
||||
unified_index = connections[self.connection_alias].get_unified_index()
|
||||
indexed_models = unified_index.get_indexed_models()
|
||||
|
||||
for doc_offset, raw_result in enumerate(raw_page):
|
||||
score = raw_page.score(doc_offset) or 0
|
||||
app_label, model_name = raw_result[DJANGO_CT].split('.')
|
||||
additional_fields = {}
|
||||
model = haystack_get_model(app_label, model_name)
|
||||
|
||||
if model and model in indexed_models:
|
||||
for key, value in raw_result.items():
|
||||
index = unified_index.get_index(model)
|
||||
string_key = str(key)
|
||||
|
||||
if string_key in index.fields and hasattr(index.fields[string_key], 'convert'):
|
||||
# Special-cased due to the nature of KEYWORD fields.
|
||||
if index.fields[string_key].is_multivalued:
|
||||
if value is None or len(value) is 0:
|
||||
additional_fields[string_key] = []
|
||||
else:
|
||||
additional_fields[string_key] = value.split(',')
|
||||
else:
|
||||
additional_fields[string_key] = index.fields[string_key].convert(value)
|
||||
else:
|
||||
additional_fields[string_key] = self._to_python(value)
|
||||
|
||||
del(additional_fields[DJANGO_CT])
|
||||
del(additional_fields[DJANGO_ID])
|
||||
|
||||
if highlight:
|
||||
sa = StemmingAnalyzer()
|
||||
formatter = WhooshHtmlFormatter('em')
|
||||
terms = [token.text for token in sa(query_string)]
|
||||
|
||||
whoosh_result = whoosh_highlight(
|
||||
additional_fields.get(self.content_field_name),
|
||||
terms,
|
||||
sa,
|
||||
ContextFragmenter(),
|
||||
formatter
|
||||
)
|
||||
additional_fields['highlighted'] = {
|
||||
self.content_field_name: [whoosh_result],
|
||||
}
|
||||
|
||||
result = result_class(app_label, model_name, raw_result[DJANGO_ID], score, **additional_fields)
|
||||
results.append(result)
|
||||
else:
|
||||
hits -= 1
|
||||
|
||||
if self.include_spelling:
|
||||
if spelling_query:
|
||||
spelling_suggestion = self.create_spelling_suggestion(spelling_query)
|
||||
else:
|
||||
spelling_suggestion = self.create_spelling_suggestion(query_string)
|
||||
|
||||
return {
|
||||
'results': results,
|
||||
'hits': hits,
|
||||
'facets': facets,
|
||||
'spelling_suggestion': spelling_suggestion,
|
||||
}
|
||||
|
||||
def create_spelling_suggestion(self, query_string):
|
||||
spelling_suggestion = None
|
||||
reader = self.index.reader()
|
||||
corrector = reader.corrector(self.content_field_name)
|
||||
cleaned_query = force_text(query_string)
|
||||
|
||||
if not query_string:
|
||||
return spelling_suggestion
|
||||
|
||||
# Clean the string.
|
||||
for rev_word in self.RESERVED_WORDS:
|
||||
cleaned_query = cleaned_query.replace(rev_word, '')
|
||||
|
||||
for rev_char in self.RESERVED_CHARACTERS:
|
||||
cleaned_query = cleaned_query.replace(rev_char, '')
|
||||
|
||||
# Break it down.
|
||||
query_words = cleaned_query.split()
|
||||
suggested_words = []
|
||||
|
||||
for word in query_words:
|
||||
suggestions = corrector.suggest(word, limit=1)
|
||||
|
||||
if len(suggestions) > 0:
|
||||
suggested_words.append(suggestions[0])
|
||||
|
||||
spelling_suggestion = ' '.join(suggested_words)
|
||||
return spelling_suggestion
|
||||
|
||||
def _from_python(self, value):
|
||||
"""
|
||||
Converts Python values to a string for Whoosh.
|
||||
|
||||
Code courtesy of pysolr.
|
||||
"""
|
||||
if hasattr(value, 'strftime'):
|
||||
if not hasattr(value, 'hour'):
|
||||
value = datetime(value.year, value.month, value.day, 0, 0, 0)
|
||||
elif isinstance(value, bool):
|
||||
if value:
|
||||
value = 'true'
|
||||
else:
|
||||
value = 'false'
|
||||
elif isinstance(value, (list, tuple)):
|
||||
value = u','.join([force_text(v) for v in value])
|
||||
elif isinstance(value, (six.integer_types, float)):
|
||||
# Leave it alone.
|
||||
pass
|
||||
else:
|
||||
value = force_text(value)
|
||||
return value
|
||||
|
||||
def _to_python(self, value):
|
||||
"""
|
||||
Converts values from Whoosh to native Python values.
|
||||
|
||||
A port of the same method in pysolr, as they deal with data the same way.
|
||||
"""
|
||||
if value == 'true':
|
||||
return True
|
||||
elif value == 'false':
|
||||
return False
|
||||
|
||||
if value and isinstance(value, six.string_types):
|
||||
possible_datetime = DATETIME_REGEX.search(value)
|
||||
|
||||
if possible_datetime:
|
||||
date_values = possible_datetime.groupdict()
|
||||
|
||||
for dk, dv in date_values.items():
|
||||
date_values[dk] = int(dv)
|
||||
|
||||
return datetime(date_values['year'], date_values['month'], date_values['day'], date_values['hour'], date_values['minute'], date_values['second'])
|
||||
|
||||
try:
|
||||
# Attempt to use json to load the values.
|
||||
converted_value = json.loads(value)
|
||||
|
||||
# Try to handle most built-in types.
|
||||
if isinstance(converted_value, (list, tuple, set, dict, six.integer_types, float, complex)):
|
||||
return converted_value
|
||||
except:
|
||||
# If it fails (SyntaxError or its ilk) or we don't trust it,
|
||||
# continue on.
|
||||
pass
|
||||
|
||||
return value
|
||||
|
||||
|
||||
class WhooshSearchQuery(BaseSearchQuery):
|
||||
def _convert_datetime(self, date):
|
||||
if hasattr(date, 'hour'):
|
||||
return force_text(date.strftime('%Y%m%d%H%M%S'))
|
||||
else:
|
||||
return force_text(date.strftime('%Y%m%d000000'))
|
||||
|
||||
def clean(self, query_fragment):
|
||||
"""
|
||||
Provides a mechanism for sanitizing user input before presenting the
|
||||
value to the backend.
|
||||
|
||||
Whoosh 1.X differs here in that you can no longer use a backslash
|
||||
to escape reserved characters. Instead, the whole word should be
|
||||
quoted.
|
||||
"""
|
||||
words = query_fragment.split()
|
||||
cleaned_words = []
|
||||
|
||||
for word in words:
|
||||
if word in self.backend.RESERVED_WORDS:
|
||||
word = word.replace(word, word.lower())
|
||||
|
||||
for char in self.backend.RESERVED_CHARACTERS:
|
||||
if char in word:
|
||||
word = "'%s'" % word
|
||||
break
|
||||
|
||||
cleaned_words.append(word)
|
||||
|
||||
return ' '.join(cleaned_words)
|
||||
|
||||
def build_query_fragment(self, field, filter_type, value):
|
||||
from haystack import connections
|
||||
query_frag = ''
|
||||
is_datetime = False
|
||||
|
||||
if not hasattr(value, 'input_type_name'):
|
||||
# Handle when we've got a ``ValuesListQuerySet``...
|
||||
if hasattr(value, 'values_list'):
|
||||
value = list(value)
|
||||
|
||||
if hasattr(value, 'strftime'):
|
||||
is_datetime = True
|
||||
|
||||
if isinstance(value, six.string_types) and value != ' ':
|
||||
# It's not an ``InputType``. Assume ``Clean``.
|
||||
value = Clean(value)
|
||||
else:
|
||||
value = PythonData(value)
|
||||
|
||||
# Prepare the query using the InputType.
|
||||
prepared_value = value.prepare(self)
|
||||
|
||||
if not isinstance(prepared_value, (set, list, tuple)):
|
||||
# Then convert whatever we get back to what pysolr wants if needed.
|
||||
prepared_value = self.backend._from_python(prepared_value)
|
||||
|
||||
# 'content' is a special reserved word, much like 'pk' in
|
||||
# Django's ORM layer. It indicates 'no special field'.
|
||||
if field == 'content':
|
||||
index_fieldname = ''
|
||||
else:
|
||||
index_fieldname = u'%s:' % connections[self._using].get_unified_index().get_index_fieldname(field)
|
||||
|
||||
filter_types = {
|
||||
'contains': '%s',
|
||||
'startswith': "%s*",
|
||||
'exact': '%s',
|
||||
'gt': "{%s to}",
|
||||
'gte': "[%s to]",
|
||||
'lt': "{to %s}",
|
||||
'lte': "[to %s]",
|
||||
}
|
||||
|
||||
if value.post_process is False:
|
||||
query_frag = prepared_value
|
||||
else:
|
||||
if filter_type in ['contains', 'startswith']:
|
||||
if value.input_type_name == 'exact':
|
||||
query_frag = prepared_value
|
||||
else:
|
||||
# Iterate over terms & incorportate the converted form of each into the query.
|
||||
terms = []
|
||||
|
||||
if isinstance(prepared_value, six.string_types):
|
||||
possible_values = prepared_value.split(' ')
|
||||
else:
|
||||
if is_datetime is True:
|
||||
prepared_value = self._convert_datetime(prepared_value)
|
||||
|
||||
possible_values = [prepared_value]
|
||||
|
||||
for possible_value in possible_values:
|
||||
terms.append(filter_types[filter_type] % self.backend._from_python(possible_value))
|
||||
|
||||
if len(terms) == 1:
|
||||
query_frag = terms[0]
|
||||
else:
|
||||
query_frag = u"(%s)" % " AND ".join(terms)
|
||||
elif filter_type == 'in':
|
||||
in_options = []
|
||||
|
||||
for possible_value in prepared_value:
|
||||
is_datetime = False
|
||||
|
||||
if hasattr(possible_value, 'strftime'):
|
||||
is_datetime = True
|
||||
|
||||
pv = self.backend._from_python(possible_value)
|
||||
|
||||
if is_datetime is True:
|
||||
pv = self._convert_datetime(pv)
|
||||
|
||||
if isinstance(pv, six.string_types) and not is_datetime:
|
||||
in_options.append('"%s"' % pv)
|
||||
else:
|
||||
in_options.append('%s' % pv)
|
||||
|
||||
query_frag = "(%s)" % " OR ".join(in_options)
|
||||
elif filter_type == 'range':
|
||||
start = self.backend._from_python(prepared_value[0])
|
||||
end = self.backend._from_python(prepared_value[1])
|
||||
|
||||
if hasattr(prepared_value[0], 'strftime'):
|
||||
start = self._convert_datetime(start)
|
||||
|
||||
if hasattr(prepared_value[1], 'strftime'):
|
||||
end = self._convert_datetime(end)
|
||||
|
||||
query_frag = u"[%s to %s]" % (start, end)
|
||||
elif filter_type == 'exact':
|
||||
if value.input_type_name == 'exact':
|
||||
query_frag = prepared_value
|
||||
else:
|
||||
prepared_value = Exact(prepared_value).prepare(self)
|
||||
query_frag = filter_types[filter_type] % prepared_value
|
||||
else:
|
||||
if is_datetime is True:
|
||||
prepared_value = self._convert_datetime(prepared_value)
|
||||
|
||||
query_frag = filter_types[filter_type] % prepared_value
|
||||
|
||||
if len(query_frag) and not isinstance(value, Raw):
|
||||
if not query_frag.startswith('(') and not query_frag.endswith(')'):
|
||||
query_frag = "(%s)" % query_frag
|
||||
|
||||
return u"%s%s" % (index_fieldname, query_frag)
|
||||
|
||||
|
||||
# if not filter_type in ('in', 'range'):
|
||||
# # 'in' is a bit of a special case, as we don't want to
|
||||
# # convert a valid list/tuple to string. Defer handling it
|
||||
# # until later...
|
||||
# value = self.backend._from_python(value)
|
||||
|
||||
|
||||
class WhooshEngine(BaseEngine):
|
||||
backend = WhooshSearchBackend
|
||||
query = WhooshSearchQuery
|
|
@ -0,0 +1,33 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
from django.conf import settings
|
||||
|
||||
DEFAULT_ALIAS = 'default'
|
||||
|
||||
# Reserved field names
|
||||
ID = getattr(settings, 'HAYSTACK_ID_FIELD', 'id')
|
||||
DJANGO_CT = getattr(settings, 'HAYSTACK_DJANGO_CT_FIELD', 'django_ct')
|
||||
DJANGO_ID = getattr(settings, 'HAYSTACK_DJANGO_ID_FIELD', 'django_id')
|
||||
|
||||
# Default operator. Valid options are AND/OR.
|
||||
DEFAULT_OPERATOR = getattr(settings, 'HAYSTACK_DEFAULT_OPERATOR', 'AND')
|
||||
|
||||
# Valid expression extensions.
|
||||
VALID_FILTERS = set(['contains', 'exact', 'gt', 'gte', 'lt', 'lte', 'in', 'startswith', 'range'])
|
||||
FILTER_SEPARATOR = '__'
|
||||
|
||||
# The maximum number of items to display in a SearchQuerySet.__repr__
|
||||
REPR_OUTPUT_SIZE = 20
|
||||
|
||||
# Number of SearchResults to load at a time.
|
||||
ITERATOR_LOAD_PER_QUERY = getattr(settings, 'HAYSTACK_ITERATOR_LOAD_PER_QUERY', 10)
|
||||
|
||||
# A marker class in the hierarchy to indicate that it handles search data.
|
||||
class Indexable(object):
|
||||
haystack_use_for_indexing = True
|
||||
|
||||
# For the geo bits, since that's what Solr & Elasticsearch seem to silently
|
||||
# assume...
|
||||
WGS_84_SRID = 4326
|
|
@ -0,0 +1,53 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
|
||||
class HaystackError(Exception):
|
||||
"""A generic exception for all others to extend."""
|
||||
pass
|
||||
|
||||
|
||||
class SearchBackendError(HaystackError):
|
||||
"""Raised when a backend can not be found."""
|
||||
pass
|
||||
|
||||
|
||||
class SearchFieldError(HaystackError):
|
||||
"""Raised when a field encounters an error."""
|
||||
pass
|
||||
|
||||
|
||||
class MissingDependency(HaystackError):
|
||||
"""Raised when a library a backend depends on can not be found."""
|
||||
pass
|
||||
|
||||
|
||||
class NotHandled(HaystackError):
|
||||
"""Raised when a model is not handled by the router setup."""
|
||||
pass
|
||||
|
||||
|
||||
class MoreLikeThisError(HaystackError):
|
||||
"""Raised when a model instance has not been provided for More Like This."""
|
||||
pass
|
||||
|
||||
|
||||
class FacetingError(HaystackError):
|
||||
"""Raised when incorrect arguments have been provided for faceting."""
|
||||
pass
|
||||
|
||||
|
||||
class SpatialError(HaystackError):
|
||||
"""Raised when incorrect arguments have been provided for spatial."""
|
||||
pass
|
||||
|
||||
|
||||
class StatsError(HaystackError):
|
||||
"Raised when incorrect arguments have been provided for stats"
|
||||
pass
|
||||
|
||||
|
||||
class SkipDocument(HaystackError):
|
||||
"""Raised when a document should be skipped while updating"""
|
||||
pass
|
|
@ -0,0 +1,441 @@
|
|||
# encoding: utf-8
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from django.template import Context, loader
|
||||
from django.utils import datetime_safe, six
|
||||
|
||||
from haystack.exceptions import SearchFieldError
|
||||
from haystack.utils import get_model_ct_tuple
|
||||
|
||||
|
||||
class NOT_PROVIDED:
|
||||
pass
|
||||
|
||||
|
||||
DATETIME_REGEX = re.compile('^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})(T|\s+)(?P<hour>\d{2}):(?P<minute>\d{2}):(?P<second>\d{2}).*?$')
|
||||
|
||||
|
||||
# All the SearchFields variants.
|
||||
|
||||
class SearchField(object):
|
||||
"""The base implementation of a search field."""
|
||||
field_type = None
|
||||
|
||||
def __init__(self, model_attr=None, use_template=False, template_name=None,
|
||||
document=False, indexed=True, stored=True, faceted=False,
|
||||
default=NOT_PROVIDED, null=False, index_fieldname=None,
|
||||
facet_class=None, boost=1.0, weight=None):
|
||||
# Track what the index thinks this field is called.
|
||||
self.instance_name = None
|
||||
self.model_attr = model_attr
|
||||
self.use_template = use_template
|
||||
self.template_name = template_name
|
||||
self.document = document
|
||||
self.indexed = indexed
|
||||
self.stored = stored
|
||||
self.faceted = faceted
|
||||
self._default = default
|
||||
self.null = null
|
||||
self.index_fieldname = index_fieldname
|
||||
self.boost = weight or boost
|
||||
self.is_multivalued = False
|
||||
|
||||
# We supply the facet_class for making it easy to create a faceted
|
||||
# field based off of this field.
|
||||
self.facet_class = facet_class
|
||||
|
||||
if self.facet_class is None:
|
||||
self.facet_class = FacetCharField
|
||||
|
||||
self.set_instance_name(None)
|
||||
|
||||
def set_instance_name(self, instance_name):
|
||||
self.instance_name = instance_name
|
||||
|
||||
if self.index_fieldname is None:
|
||||
self.index_fieldname = self.instance_name
|
||||
|
||||
def has_default(self):
|
||||
"""Returns a boolean of whether this field has a default value."""
|
||||
return self._default is not NOT_PROVIDED
|
||||
|
||||
@property
|
||||
def default(self):
|
||||
"""Returns the default value for the field."""
|
||||
if callable(self._default):
|
||||
return self._default()
|
||||
|
||||
return self._default
|
||||
|
||||
def prepare(self, obj):
|
||||
"""
|
||||
Takes data from the provided object and prepares it for storage in the
|
||||
index.
|
||||
"""
|
||||
# Give priority to a template.
|
||||
if self.use_template:
|
||||
return self.prepare_template(obj)
|
||||
elif self.model_attr is not None:
|
||||
# Check for `__` in the field for looking through the relation.
|
||||
attrs = self.model_attr.split('__')
|
||||
current_object = obj
|
||||
|
||||
for attr in attrs:
|
||||
if not hasattr(current_object, attr):
|
||||
raise SearchFieldError("The model '%s' does not have a model_attr '%s'." % (repr(current_object), attr))
|
||||
|
||||
current_object = getattr(current_object, attr, None)
|
||||
|
||||
if current_object is None:
|
||||
if self.has_default():
|
||||
current_object = self._default
|
||||
# Fall out of the loop, given any further attempts at
|
||||
# accesses will fail miserably.
|
||||
break
|
||||
elif self.null:
|
||||
current_object = None
|
||||
# Fall out of the loop, given any further attempts at
|
||||
# accesses will fail miserably.
|
||||
break
|
||||
else:
|
||||
raise SearchFieldError("The model '%s' combined with model_attr '%s' returned None, but doesn't allow a default or null value." % (repr(obj), self.model_attr))
|
||||
|
||||
if callable(current_object):
|
||||
return current_object()
|
||||
|
||||
return current_object
|
||||
|
||||
if self.has_default():
|
||||
return self.default
|
||||
else:
|
||||
return None
|
||||
|
||||
def prepare_template(self, obj):
|
||||
"""
|
||||
Flattens an object for indexing.
|
||||
|
||||
This loads a template
|
||||
(``search/indexes/{app_label}/{model_name}_{field_name}.txt``) and
|
||||
returns the result of rendering that template. ``object`` will be in
|
||||
its context.
|
||||
"""
|
||||
if self.instance_name is None and self.template_name is None:
|
||||
raise SearchFieldError("This field requires either its instance_name variable to be populated or an explicit template_name in order to load the correct template.")
|
||||
|
||||
if self.template_name is not None:
|
||||
template_names = self.template_name
|
||||
|
||||
if not isinstance(template_names, (list, tuple)):
|
||||
template_names = [template_names]
|
||||
else:
|
||||
app_label, model_name = get_model_ct_tuple(obj)
|
||||
template_names = ['search/indexes/%s/%s_%s.txt' % (app_label, model_name, self.instance_name)]
|
||||
|
||||
t = loader.select_template(template_names)
|
||||
return t.render(Context({'object': obj}))
|
||||
|
||||
def convert(self, value):
|
||||
"""
|
||||
Handles conversion between the data found and the type of the field.
|
||||
|
||||
Extending classes should override this method and provide correct
|
||||
data coercion.
|
||||
"""
|
||||
return value
|
||||
|
||||
|
||||
class CharField(SearchField):
|
||||
field_type = 'string'
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
if kwargs.get('facet_class') is None:
|
||||
kwargs['facet_class'] = FacetCharField
|
||||
|
||||
super(CharField, self).__init__(**kwargs)
|
||||
|
||||
def prepare(self, obj):
|
||||
return self.convert(super(CharField, self).prepare(obj))
|
||||
|
||||
def convert(self, value):
|
||||
if value is None:
|
||||
return None
|
||||
|
||||
return six.text_type(value)
|
||||
|
||||
|
||||
class LocationField(SearchField):
|
||||
field_type = 'location'
|
||||
|
||||
def prepare(self, obj):
|
||||
from haystack.utils.geo import ensure_point
|
||||
|
||||
value = super(LocationField, self).prepare(obj)
|
||||
|
||||
if value is None:
|
||||
return None
|
||||
|
||||
pnt = ensure_point(value)
|
||||
pnt_lng, pnt_lat = pnt.get_coords()
|
||||
return "%s,%s" % (pnt_lat, pnt_lng)
|
||||
|
||||
def convert(self, value):
|
||||
from haystack.utils.geo import ensure_point, Point
|
||||
|
||||
if value is None:
|
||||
return None
|
||||
|
||||
if hasattr(value, 'geom_type'):
|
||||
value = ensure_point(value)
|
||||
return value
|
||||
|
||||
if isinstance(value, six.string_types):
|
||||
lat, lng = value.split(',')
|
||||
elif isinstance(value, (list, tuple)):
|
||||
# GeoJSON-alike
|
||||
lat, lng = value[1], value[0]
|
||||
elif isinstance(value, dict):
|
||||
lat = value.get('lat', 0)
|
||||
lng = value.get('lon', 0)
|
||||
|
||||
value = Point(float(lng), float(lat))
|
||||
return value
|
||||
|
||||
|
||||
class NgramField(CharField):
|
||||
field_type = 'ngram'
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
if kwargs.get('faceted') is True:
|
||||
raise SearchFieldError("%s can not be faceted." % self.__class__.__name__)
|
||||
|
||||
super(NgramField, self).__init__(**kwargs)
|
||||
|
||||
|
||||
class EdgeNgramField(NgramField):
|
||||
field_type = 'edge_ngram'
|
||||
|
||||
|
||||
class IntegerField(SearchField):
|
||||
field_type = 'integer'
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
if kwargs.get('facet_class') is None:
|
||||
kwargs['facet_class'] = FacetIntegerField
|
||||
|
||||
super(IntegerField, self).__init__(**kwargs)
|
||||
|
||||
def prepare(self, obj):
|
||||
return self.convert(super(IntegerField, self).prepare(obj))
|
||||
|
||||
def convert(self, value):
|
||||
if value is None:
|
||||
return None
|
||||
|
||||
return int(value)
|
||||
|
||||
|
||||
class FloatField(SearchField):
|
||||
field_type = 'float'
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
if kwargs.get('facet_class') is None:
|
||||
kwargs['facet_class'] = FacetFloatField
|
||||
|
||||
super(FloatField, self).__init__(**kwargs)
|
||||
|
||||
def prepare(self, obj):
|
||||
return self.convert(super(FloatField, self).prepare(obj))
|
||||
|
||||
def convert(self, value):
|
||||
if value is None:
|
||||
return None
|
||||
|
||||
return float(value)
|
||||
|
||||
|
||||
class DecimalField(SearchField):
|
||||
field_type = 'string'
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
if kwargs.get('facet_class') is None:
|
||||
kwargs['facet_class'] = FacetDecimalField
|
||||
|
||||
super(DecimalField, self).__init__(**kwargs)
|
||||
|
||||
def prepare(self, obj):
|
||||
return self.convert(super(DecimalField, self).prepare(obj))
|
||||
|
||||
def convert(self, value):
|
||||
if value is None:
|
||||
return None
|
||||
|
||||
return six.text_type(value)
|
||||
|
||||
|
||||
class BooleanField(SearchField):
|
||||
field_type = 'boolean'
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
if kwargs.get('facet_class') is None:
|
||||
kwargs['facet_class'] = FacetBooleanField
|
||||
|
||||
super(BooleanField, self).__init__(**kwargs)
|
||||
|
||||
def prepare(self, obj):
|
||||
return self.convert(super(BooleanField, self).prepare(obj))
|
||||
|
||||
def convert(self, value):
|
||||
if value is None:
|
||||
return None
|
||||
|
||||
return bool(value)
|
||||
|
||||
|
||||
class DateField(SearchField):
|
||||
field_type = 'date'
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
if kwargs.get('facet_class') is None:
|
||||
kwargs['facet_class'] = FacetDateField
|
||||
|
||||
super(DateField, self).__init__(**kwargs)
|
||||
|
||||
def convert(self, value):
|
||||
if value is None:
|
||||
return None
|
||||
|
||||
if isinstance(value, six.string_types):
|
||||
match = DATETIME_REGEX.search(value)
|
||||
|
||||
if match:
|
||||
data = match.groupdict()
|
||||
return datetime_safe.date(int(data['year']), int(data['month']), int(data['day']))
|
||||
else:
|
||||
raise SearchFieldError("Date provided to '%s' field doesn't appear to be a valid date string: '%s'" % (self.instance_name, value))
|
||||
|
||||
return value
|
||||
|
||||
|
||||
class DateTimeField(SearchField):
|
||||
field_type = 'datetime'
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
if kwargs.get('facet_class') is None:
|
||||
kwargs['facet_class'] = FacetDateTimeField
|
||||
|
||||
super(DateTimeField, self).__init__(**kwargs)
|
||||
|
||||
def convert(self, value):
|
||||
if value is None:
|
||||
return None
|
||||
|
||||
if isinstance(value, six.string_types):
|
||||
match = DATETIME_REGEX.search(value)
|
||||
|
||||
if match:
|
||||
data = match.groupdict()
|
||||
return datetime_safe.datetime(int(data['year']), int(data['month']), int(data['day']), int(data['hour']), int(data['minute']), int(data['second']))
|
||||
else:
|
||||
raise SearchFieldError("Datetime provided to '%s' field doesn't appear to be a valid datetime string: '%s'" % (self.instance_name, value))
|
||||
|
||||
return value
|
||||
|
||||
|
||||
class MultiValueField(SearchField):
|
||||
field_type = 'string'
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
if kwargs.get('facet_class') is None:
|
||||
kwargs['facet_class'] = FacetMultiValueField
|
||||
|
||||
if kwargs.get('use_template') is True:
|
||||
raise SearchFieldError("'%s' fields can not use templates to prepare their data." % self.__class__.__name__)
|
||||
|
||||
super(MultiValueField, self).__init__(**kwargs)
|
||||
self.is_multivalued = True
|
||||
|
||||
def prepare(self, obj):
|
||||
return self.convert(super(MultiValueField, self).prepare(obj))
|
||||
|
||||
def convert(self, value):
|
||||
if value is None:
|
||||
return None
|
||||
|
||||
return list(value)
|
||||
|
||||
|
||||
class FacetField(SearchField):
|
||||
"""
|
||||
``FacetField`` is slightly different than the other fields because it can
|
||||
work in conjunction with other fields as its data source.
|
||||
|
||||
Accepts an optional ``facet_for`` kwarg, which should be the field name
|
||||
(not ``index_fieldname``) of the field it should pull data from.
|
||||
"""
|
||||
instance_name = None
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
handled_kwargs = self.handle_facet_parameters(kwargs)
|
||||
super(FacetField, self).__init__(**handled_kwargs)
|
||||
|
||||
def handle_facet_parameters(self, kwargs):
|
||||
if kwargs.get('faceted', False):
|
||||
raise SearchFieldError("FacetField (%s) does not accept the 'faceted' argument." % self.instance_name)
|
||||
|
||||
if not kwargs.get('null', True):
|
||||
raise SearchFieldError("FacetField (%s) does not accept False for the 'null' argument." % self.instance_name)
|
||||
|
||||
if not kwargs.get('indexed', True):
|
||||
raise SearchFieldError("FacetField (%s) does not accept False for the 'indexed' argument." % self.instance_name)
|
||||
|
||||
if kwargs.get('facet_class'):
|
||||
raise SearchFieldError("FacetField (%s) does not accept the 'facet_class' argument." % self.instance_name)
|
||||
|
||||
self.facet_for = None
|
||||
self.facet_class = None
|
||||
|
||||
# Make sure the field is nullable.
|
||||
kwargs['null'] = True
|
||||
|
||||
if 'facet_for' in kwargs:
|
||||
self.facet_for = kwargs['facet_for']
|
||||
del(kwargs['facet_for'])
|
||||
|
||||
return kwargs
|
||||
|
||||
def get_facet_for_name(self):
|
||||
return self.facet_for or self.instance_name
|
||||
|
||||
|
||||
class FacetCharField(FacetField, CharField):
|
||||
pass
|
||||
|
||||
|
||||
class FacetIntegerField(FacetField, IntegerField):
|
||||
pass
|
||||
|
||||
|
||||
class FacetFloatField(FacetField, FloatField):
|
||||
pass
|
||||
|
||||
|
||||
class FacetDecimalField(FacetField, DecimalField):
|
||||
pass
|
||||
|
||||
|
||||
class FacetBooleanField(FacetField, BooleanField):
|
||||
pass
|
||||
|
||||
|
||||
class FacetDateField(FacetField, DateField):
|
||||
pass
|
||||
|
||||
|
||||
class FacetDateTimeField(FacetField, DateTimeField):
|
||||
pass
|
||||
|
||||
|
||||
class FacetMultiValueField(FacetField, MultiValueField):
|
||||
pass
|
|
@ -0,0 +1,133 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
from django import forms
|
||||
from django.db import models
|
||||
from django.utils.text import capfirst
|
||||
from django.utils.translation import ugettext_lazy as _
|
||||
|
||||
from haystack import connections
|
||||
from haystack.constants import DEFAULT_ALIAS
|
||||
from haystack.query import EmptySearchQuerySet, SearchQuerySet
|
||||
from haystack.utils import get_model_ct
|
||||
|
||||
try:
|
||||
from django.utils.encoding import smart_text
|
||||
except ImportError:
|
||||
from django.utils.encoding import smart_unicode as smart_text
|
||||
|
||||
|
||||
def model_choices(using=DEFAULT_ALIAS):
|
||||
choices = [(get_model_ct(m), capfirst(smart_text(m._meta.verbose_name_plural)))
|
||||
for m in connections[using].get_unified_index().get_indexed_models()]
|
||||
return sorted(choices, key=lambda x: x[1])
|
||||
|
||||
|
||||
class SearchForm(forms.Form):
|
||||
q = forms.CharField(required=False, label=_('Search'),
|
||||
widget=forms.TextInput(attrs={'type': 'search'}))
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
self.searchqueryset = kwargs.pop('searchqueryset', None)
|
||||
self.load_all = kwargs.pop('load_all', False)
|
||||
|
||||
if self.searchqueryset is None:
|
||||
self.searchqueryset = SearchQuerySet()
|
||||
|
||||
super(SearchForm, self).__init__(*args, **kwargs)
|
||||
|
||||
def no_query_found(self):
|
||||
"""
|
||||
Determines the behavior when no query was found.
|
||||
|
||||
By default, no results are returned (``EmptySearchQuerySet``).
|
||||
|
||||
Should you want to show all results, override this method in your
|
||||
own ``SearchForm`` subclass and do ``return self.searchqueryset.all()``.
|
||||
"""
|
||||
return EmptySearchQuerySet()
|
||||
|
||||
def search(self):
|
||||
if not self.is_valid():
|
||||
return self.no_query_found()
|
||||
|
||||
if not self.cleaned_data.get('q'):
|
||||
return self.no_query_found()
|
||||
|
||||
sqs = self.searchqueryset.auto_query(self.cleaned_data['q'])
|
||||
|
||||
if self.load_all:
|
||||
sqs = sqs.load_all()
|
||||
|
||||
return sqs
|
||||
|
||||
def get_suggestion(self):
|
||||
if not self.is_valid():
|
||||
return None
|
||||
|
||||
return self.searchqueryset.spelling_suggestion(self.cleaned_data['q'])
|
||||
|
||||
|
||||
class HighlightedSearchForm(SearchForm):
|
||||
def search(self):
|
||||
return super(HighlightedSearchForm, self).search().highlight()
|
||||
|
||||
|
||||
class FacetedSearchForm(SearchForm):
|
||||
def __init__(self, *args, **kwargs):
|
||||
self.selected_facets = kwargs.pop("selected_facets", [])
|
||||
super(FacetedSearchForm, self).__init__(*args, **kwargs)
|
||||
|
||||
def search(self):
|
||||
sqs = super(FacetedSearchForm, self).search()
|
||||
|
||||
# We need to process each facet to ensure that the field name and the
|
||||
# value are quoted correctly and separately:
|
||||
for facet in self.selected_facets:
|
||||
if ":" not in facet:
|
||||
continue
|
||||
|
||||
field, value = facet.split(":", 1)
|
||||
|
||||
if value:
|
||||
sqs = sqs.narrow(u'%s:"%s"' % (field, sqs.query.clean(value)))
|
||||
|
||||
return sqs
|
||||
|
||||
|
||||
class ModelSearchForm(SearchForm):
|
||||
def __init__(self, *args, **kwargs):
|
||||
super(ModelSearchForm, self).__init__(*args, **kwargs)
|
||||
self.fields['models'] = forms.MultipleChoiceField(choices=model_choices(), required=False, label=_('Search In'), widget=forms.CheckboxSelectMultiple)
|
||||
|
||||
def get_models(self):
|
||||
"""Return an alphabetical list of model classes in the index."""
|
||||
search_models = []
|
||||
|
||||
if self.is_valid():
|
||||
for model in self.cleaned_data['models']:
|
||||
search_models.append(models.get_model(*model.split('.')))
|
||||
|
||||
return search_models
|
||||
|
||||
def search(self):
|
||||
sqs = super(ModelSearchForm, self).search()
|
||||
return sqs.models(*self.get_models())
|
||||
|
||||
|
||||
class HighlightedModelSearchForm(ModelSearchForm):
|
||||
def search(self):
|
||||
return super(HighlightedModelSearchForm, self).search().highlight()
|
||||
|
||||
|
||||
class FacetedModelSearchForm(ModelSearchForm):
|
||||
selected_facets = forms.CharField(required=False, widget=forms.HiddenInput)
|
||||
|
||||
def search(self):
|
||||
sqs = super(FacetedModelSearchForm, self).search()
|
||||
|
||||
if hasattr(self, 'cleaned_data') and self.cleaned_data['selected_facets']:
|
||||
sqs = sqs.narrow(self.cleaned_data['selected_facets'])
|
||||
|
||||
return sqs.models(*self.get_models())
|
|
@ -0,0 +1,126 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
from django.conf import settings
|
||||
from django.core.paginator import Paginator
|
||||
from django.views.generic import FormView
|
||||
from django.views.generic.edit import FormMixin
|
||||
from django.views.generic.list import MultipleObjectMixin
|
||||
|
||||
from .forms import FacetedSearchForm, ModelSearchForm
|
||||
from .query import SearchQuerySet
|
||||
|
||||
RESULTS_PER_PAGE = getattr(settings, 'HAYSTACK_SEARCH_RESULTS_PER_PAGE', 20)
|
||||
|
||||
|
||||
class SearchMixin(MultipleObjectMixin, FormMixin):
|
||||
"""
|
||||
A mixin that allows adding in Haystacks search functionality into
|
||||
another view class.
|
||||
|
||||
This mixin exhibits similar end functionality as the base Haystack search
|
||||
view, but with some important distinctions oriented around greater
|
||||
compatibility with Django's built-in class based views and mixins.
|
||||
|
||||
Normal flow:
|
||||
|
||||
self.request = request
|
||||
|
||||
self.form = self.build_form()
|
||||
self.query = self.get_query()
|
||||
self.results = self.get_results()
|
||||
|
||||
return self.create_response()
|
||||
|
||||
This mixin should:
|
||||
|
||||
1. Make the form
|
||||
2. Get the queryset
|
||||
3. Return the paginated queryset
|
||||
|
||||
"""
|
||||
template_name = 'search/search.html'
|
||||
load_all = True
|
||||
form_class = ModelSearchForm
|
||||
queryset = SearchQuerySet()
|
||||
context_object_name = None
|
||||
paginate_by = RESULTS_PER_PAGE
|
||||
paginate_orphans = 0
|
||||
paginator_class = Paginator
|
||||
page_kwarg = 'page'
|
||||
form_name = 'form'
|
||||
search_field = 'q'
|
||||
object_list = None
|
||||
|
||||
def get_form_kwargs(self):
|
||||
"""
|
||||
Returns the keyword arguments for instantiating the form.
|
||||
"""
|
||||
kwargs = {'initial': self.get_initial()}
|
||||
if self.request.method == 'GET':
|
||||
kwargs.update({
|
||||
'data': self.request.GET,
|
||||
})
|
||||
kwargs.update({'searchqueryset': self.get_queryset()})
|
||||
return kwargs
|
||||
|
||||
def form_invalid(self, form):
|
||||
context = self.get_context_data(**{
|
||||
self.form_name: form,
|
||||
'object_list': self.get_queryset()
|
||||
})
|
||||
return self.render_to_response(context)
|
||||
|
||||
def form_valid(self, form):
|
||||
self.queryset = form.search()
|
||||
context = self.get_context_data(**{
|
||||
self.form_name: form,
|
||||
'query': form.cleaned_data.get(self.search_field),
|
||||
'object_list': self.queryset
|
||||
})
|
||||
return self.render_to_response(context)
|
||||
|
||||
|
||||
class FacetedSearchMixin(SearchMixin):
|
||||
"""
|
||||
A mixin that allows adding in a Haystack search functionality with search
|
||||
faceting.
|
||||
"""
|
||||
form_class = FacetedSearchForm
|
||||
|
||||
def get_form_kwargs(self):
|
||||
kwargs = super(SearchMixin, self).get_form_kwargs()
|
||||
kwargs.update({
|
||||
'selected_facets': self.request.GET.getlist("selected_facets")
|
||||
})
|
||||
return kwargs
|
||||
|
||||
def get_context_data(self, **kwargs):
|
||||
context = super(FacetedSearchMixin, self).get_context_data(**kwargs)
|
||||
context.update({'facets': self.results.facet_counts()})
|
||||
return context
|
||||
|
||||
|
||||
class SearchView(SearchMixin, FormView):
|
||||
"""A view class for searching a Haystack managed search index"""
|
||||
|
||||
def get(self, request, *args, **kwargs):
|
||||
"""
|
||||
Handles GET requests and instantiates a blank version of the form.
|
||||
"""
|
||||
form_class = self.get_form_class()
|
||||
form = self.get_form(form_class)
|
||||
|
||||
if form.is_valid():
|
||||
return self.form_valid(form)
|
||||
else:
|
||||
return self.form_invalid(form)
|
||||
|
||||
|
||||
class FacetedSearchView(FacetedSearchMixin, SearchView):
|
||||
"""
|
||||
A view class for searching a Haystack managed search index with
|
||||
facets
|
||||
"""
|
||||
pass
|
|
@ -0,0 +1,497 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
import copy
|
||||
import threading
|
||||
import warnings
|
||||
|
||||
from django.core.exceptions import ImproperlyConfigured
|
||||
from django.utils.six import with_metaclass
|
||||
|
||||
from haystack import connection_router, connections
|
||||
from haystack.constants import DEFAULT_ALIAS, DJANGO_CT, DJANGO_ID, ID, Indexable
|
||||
from haystack.fields import *
|
||||
from haystack.manager import SearchIndexManager
|
||||
from haystack.utils import get_facet_field_name, get_identifier, get_model_ct
|
||||
|
||||
try:
|
||||
from django.utils.encoding import force_text
|
||||
except ImportError:
|
||||
from django.utils.encoding import force_unicode as force_text
|
||||
|
||||
|
||||
class DeclarativeMetaclass(type):
|
||||
def __new__(cls, name, bases, attrs):
|
||||
attrs['fields'] = {}
|
||||
|
||||
# Inherit any fields from parent(s).
|
||||
try:
|
||||
parents = [b for b in bases if issubclass(b, SearchIndex)]
|
||||
# Simulate the MRO.
|
||||
parents.reverse()
|
||||
|
||||
for p in parents:
|
||||
fields = getattr(p, 'fields', None)
|
||||
|
||||
if fields:
|
||||
attrs['fields'].update(fields)
|
||||
except NameError:
|
||||
pass
|
||||
|
||||
# Build a dictionary of faceted fields for cross-referencing.
|
||||
facet_fields = {}
|
||||
|
||||
for field_name, obj in attrs.items():
|
||||
# Only need to check the FacetFields.
|
||||
if hasattr(obj, 'facet_for'):
|
||||
if not obj.facet_for in facet_fields:
|
||||
facet_fields[obj.facet_for] = []
|
||||
|
||||
facet_fields[obj.facet_for].append(field_name)
|
||||
|
||||
built_fields = {}
|
||||
|
||||
for field_name, obj in attrs.items():
|
||||
if isinstance(obj, SearchField):
|
||||
field = attrs[field_name]
|
||||
field.set_instance_name(field_name)
|
||||
built_fields[field_name] = field
|
||||
|
||||
# Only check non-faceted fields for the following info.
|
||||
if not hasattr(field, 'facet_for'):
|
||||
if field.faceted == True:
|
||||
# If no other field is claiming this field as
|
||||
# ``facet_for``, create a shadow ``FacetField``.
|
||||
if not field_name in facet_fields:
|
||||
shadow_facet_name = get_facet_field_name(field_name)
|
||||
shadow_facet_field = field.facet_class(facet_for=field_name)
|
||||
shadow_facet_field.set_instance_name(shadow_facet_name)
|
||||
built_fields[shadow_facet_name] = shadow_facet_field
|
||||
|
||||
attrs['fields'].update(built_fields)
|
||||
|
||||
# Assigning default 'objects' query manager if it does not already exist
|
||||
if not 'objects' in attrs:
|
||||
try:
|
||||
attrs['objects'] = SearchIndexManager(attrs['Meta'].index_label)
|
||||
except (KeyError, AttributeError):
|
||||
attrs['objects'] = SearchIndexManager(DEFAULT_ALIAS)
|
||||
|
||||
return super(DeclarativeMetaclass, cls).__new__(cls, name, bases, attrs)
|
||||
|
||||
|
||||
class SearchIndex(with_metaclass(DeclarativeMetaclass, threading.local)):
|
||||
"""
|
||||
Base class for building indexes.
|
||||
|
||||
An example might look like this::
|
||||
|
||||
import datetime
|
||||
from haystack import indexes
|
||||
from myapp.models import Note
|
||||
|
||||
class NoteIndex(indexes.SearchIndex, indexes.Indexable):
|
||||
text = indexes.CharField(document=True, use_template=True)
|
||||
author = indexes.CharField(model_attr='user')
|
||||
pub_date = indexes.DateTimeField(model_attr='pub_date')
|
||||
|
||||
def get_model(self):
|
||||
return Note
|
||||
|
||||
def index_queryset(self, using=None):
|
||||
return self.get_model().objects.filter(pub_date__lte=datetime.datetime.now())
|
||||
|
||||
"""
|
||||
def __init__(self):
|
||||
self.prepared_data = None
|
||||
content_fields = []
|
||||
|
||||
self.field_map = dict()
|
||||
for field_name, field in self.fields.items():
|
||||
#form field map
|
||||
self.field_map[field.index_fieldname] = field_name
|
||||
if field.document is True:
|
||||
content_fields.append(field_name)
|
||||
|
||||
if not len(content_fields) == 1:
|
||||
raise SearchFieldError("The index '%s' must have one (and only one) SearchField with document=True." % self.__class__.__name__)
|
||||
|
||||
def get_model(self):
|
||||
"""
|
||||
Should return the ``Model`` class (not an instance) that the rest of the
|
||||
``SearchIndex`` should use.
|
||||
|
||||
This method is required & you must override it to return the correct class.
|
||||
"""
|
||||
raise NotImplementedError("You must provide a 'model' method for the '%r' index." % self)
|
||||
|
||||
def index_queryset(self, using=None):
|
||||
"""
|
||||
Get the default QuerySet to index when doing a full update.
|
||||
|
||||
Subclasses can override this method to avoid indexing certain objects.
|
||||
"""
|
||||
return self.get_model()._default_manager.all()
|
||||
|
||||
def read_queryset(self, using=None):
|
||||
"""
|
||||
Get the default QuerySet for read actions.
|
||||
|
||||
Subclasses can override this method to work with other managers.
|
||||
Useful when working with default managers that filter some objects.
|
||||
"""
|
||||
return self.index_queryset(using=using)
|
||||
|
||||
def build_queryset(self, using=None, start_date=None, end_date=None):
|
||||
"""
|
||||
Get the default QuerySet to index when doing an index update.
|
||||
|
||||
Subclasses can override this method to take into account related
|
||||
model modification times.
|
||||
|
||||
The default is to use ``SearchIndex.index_queryset`` and filter
|
||||
based on ``SearchIndex.get_updated_field``
|
||||
"""
|
||||
extra_lookup_kwargs = {}
|
||||
model = self.get_model()
|
||||
updated_field = self.get_updated_field()
|
||||
|
||||
update_field_msg = ("No updated date field found for '%s' "
|
||||
"- not restricting by age.") % model.__name__
|
||||
|
||||
if start_date:
|
||||
if updated_field:
|
||||
extra_lookup_kwargs['%s__gte' % updated_field] = start_date
|
||||
else:
|
||||
warnings.warn(update_field_msg)
|
||||
|
||||
if end_date:
|
||||
if updated_field:
|
||||
extra_lookup_kwargs['%s__lte' % updated_field] = end_date
|
||||
else:
|
||||
warnings.warn(update_field_msg)
|
||||
|
||||
index_qs = None
|
||||
|
||||
if hasattr(self, 'get_queryset'):
|
||||
warnings.warn("'SearchIndex.get_queryset' was deprecated in Haystack v2. Please rename the method 'index_queryset'.")
|
||||
index_qs = self.get_queryset()
|
||||
else:
|
||||
index_qs = self.index_queryset(using=using)
|
||||
|
||||
if not hasattr(index_qs, 'filter'):
|
||||
raise ImproperlyConfigured("The '%r' class must return a 'QuerySet' in the 'index_queryset' method." % self)
|
||||
|
||||
# `.select_related()` seems like a good idea here but can fail on
|
||||
# nullable `ForeignKey` as well as what seems like other cases.
|
||||
return index_qs.filter(**extra_lookup_kwargs).order_by(model._meta.pk.name)
|
||||
|
||||
def prepare(self, obj):
|
||||
"""
|
||||
Fetches and adds/alters data before indexing.
|
||||
"""
|
||||
self.prepared_data = {
|
||||
ID: get_identifier(obj),
|
||||
DJANGO_CT: get_model_ct(obj),
|
||||
DJANGO_ID: force_text(obj.pk),
|
||||
}
|
||||
|
||||
for field_name, field in self.fields.items():
|
||||
# Use the possibly overridden name, which will default to the
|
||||
# variable name of the field.
|
||||
self.prepared_data[field.index_fieldname] = field.prepare(obj)
|
||||
|
||||
if hasattr(self, "prepare_%s" % field_name):
|
||||
value = getattr(self, "prepare_%s" % field_name)(obj)
|
||||
self.prepared_data[field.index_fieldname] = value
|
||||
|
||||
return self.prepared_data
|
||||
|
||||
def full_prepare(self, obj):
|
||||
self.prepared_data = self.prepare(obj)
|
||||
|
||||
for field_name, field in self.fields.items():
|
||||
# Duplicate data for faceted fields.
|
||||
if getattr(field, 'facet_for', None):
|
||||
source_field_name = self.fields[field.facet_for].index_fieldname
|
||||
|
||||
# If there's data there, leave it alone. Otherwise, populate it
|
||||
# with whatever the related field has.
|
||||
if self.prepared_data[field_name] is None and source_field_name in self.prepared_data:
|
||||
self.prepared_data[field.index_fieldname] = self.prepared_data[source_field_name]
|
||||
|
||||
# Remove any fields that lack a value and are ``null=True``.
|
||||
if field.null is True:
|
||||
if self.prepared_data[field.index_fieldname] is None:
|
||||
del(self.prepared_data[field.index_fieldname])
|
||||
|
||||
return self.prepared_data
|
||||
|
||||
def get_content_field(self):
|
||||
"""Returns the field that supplies the primary document to be indexed."""
|
||||
for field_name, field in self.fields.items():
|
||||
if field.document is True:
|
||||
return field.index_fieldname
|
||||
|
||||
def get_field_weights(self):
|
||||
"""Returns a dict of fields with weight values"""
|
||||
weights = {}
|
||||
for field_name, field in self.fields.items():
|
||||
if field.boost:
|
||||
weights[field_name] = field.boost
|
||||
return weights
|
||||
|
||||
def _get_backend(self, using):
|
||||
if using is None:
|
||||
try:
|
||||
using = connection_router.for_write(index=self)[0]
|
||||
except IndexError:
|
||||
# There's no backend to handle it. Bomb out.
|
||||
return None
|
||||
|
||||
return connections[using].get_backend()
|
||||
|
||||
def update(self, using=None):
|
||||
"""
|
||||
Updates the entire index.
|
||||
|
||||
If ``using`` is provided, it specifies which connection should be
|
||||
used. Default relies on the routers to decide which backend should
|
||||
be used.
|
||||
"""
|
||||
backend = self._get_backend(using)
|
||||
|
||||
if backend is not None:
|
||||
backend.update(self, self.index_queryset(using=using))
|
||||
|
||||
def update_object(self, instance, using=None, **kwargs):
|
||||
"""
|
||||
Update the index for a single object. Attached to the class's
|
||||
post-save hook.
|
||||
|
||||
If ``using`` is provided, it specifies which connection should be
|
||||
used. Default relies on the routers to decide which backend should
|
||||
be used.
|
||||
"""
|
||||
# Check to make sure we want to index this first.
|
||||
if self.should_update(instance, **kwargs):
|
||||
backend = self._get_backend(using)
|
||||
|
||||
if backend is not None:
|
||||
backend.update(self, [instance])
|
||||
|
||||
def remove_object(self, instance, using=None, **kwargs):
|
||||
"""
|
||||
Remove an object from the index. Attached to the class's
|
||||
post-delete hook.
|
||||
|
||||
If ``using`` is provided, it specifies which connection should be
|
||||
used. Default relies on the routers to decide which backend should
|
||||
be used.
|
||||
"""
|
||||
backend = self._get_backend(using)
|
||||
|
||||
if backend is not None:
|
||||
backend.remove(instance, **kwargs)
|
||||
|
||||
def clear(self, using=None):
|
||||
"""
|
||||
Clears the entire index.
|
||||
|
||||
If ``using`` is provided, it specifies which connection should be
|
||||
used. Default relies on the routers to decide which backend should
|
||||
be used.
|
||||
"""
|
||||
backend = self._get_backend(using)
|
||||
|
||||
if backend is not None:
|
||||
backend.clear(models=[self.get_model()])
|
||||
|
||||
def reindex(self, using=None):
|
||||
"""
|
||||
Completely clear the index for this model and rebuild it.
|
||||
|
||||
If ``using`` is provided, it specifies which connection should be
|
||||
used. Default relies on the routers to decide which backend should
|
||||
be used.
|
||||
"""
|
||||
self.clear(using=using)
|
||||
self.update(using=using)
|
||||
|
||||
def get_updated_field(self):
|
||||
"""
|
||||
Get the field name that represents the updated date for the model.
|
||||
|
||||
If specified, this is used by the reindex command to filter out results
|
||||
from the QuerySet, enabling you to reindex only recent records. This
|
||||
method should either return None (reindex everything always) or a
|
||||
string of the Model's DateField/DateTimeField name.
|
||||
"""
|
||||
return None
|
||||
|
||||
def should_update(self, instance, **kwargs):
|
||||
"""
|
||||
Determine if an object should be updated in the index.
|
||||
|
||||
It's useful to override this when an object may save frequently and
|
||||
cause excessive reindexing. You should check conditions on the instance
|
||||
and return False if it is not to be indexed.
|
||||
|
||||
By default, returns True (always reindex).
|
||||
"""
|
||||
return True
|
||||
|
||||
def load_all_queryset(self):
|
||||
"""
|
||||
Provides the ability to override how objects get loaded in conjunction
|
||||
with ``SearchQuerySet.load_all``.
|
||||
|
||||
This is useful for post-processing the results from the query, enabling
|
||||
things like adding ``select_related`` or filtering certain data.
|
||||
|
||||
By default, returns ``all()`` on the model's default manager.
|
||||
"""
|
||||
return self.get_model()._default_manager.all()
|
||||
|
||||
|
||||
class BasicSearchIndex(SearchIndex):
|
||||
text = CharField(document=True, use_template=True)
|
||||
|
||||
|
||||
# End SearchIndexes
|
||||
# Begin ModelSearchIndexes
|
||||
|
||||
|
||||
def index_field_from_django_field(f, default=CharField):
|
||||
"""
|
||||
Returns the Haystack field type that would likely be associated with each
|
||||
Django type.
|
||||
"""
|
||||
result = default
|
||||
|
||||
if f.get_internal_type() in ('DateField', 'DateTimeField'):
|
||||
result = DateTimeField
|
||||
elif f.get_internal_type() in ('BooleanField', 'NullBooleanField'):
|
||||
result = BooleanField
|
||||
elif f.get_internal_type() in ('CommaSeparatedIntegerField',):
|
||||
result = MultiValueField
|
||||
elif f.get_internal_type() in ('DecimalField', 'FloatField'):
|
||||
result = FloatField
|
||||
elif f.get_internal_type() in ('IntegerField', 'PositiveIntegerField', 'PositiveSmallIntegerField', 'SmallIntegerField'):
|
||||
result = IntegerField
|
||||
|
||||
return result
|
||||
|
||||
|
||||
class ModelSearchIndex(SearchIndex):
|
||||
"""
|
||||
Introspects the model assigned to it and generates a `SearchIndex` based on
|
||||
the fields of that model.
|
||||
|
||||
In addition, it adds a `text` field that is the `document=True` field and
|
||||
has `use_template=True` option set, just like the `BasicSearchIndex`.
|
||||
|
||||
Usage of this class might result in inferior `SearchIndex` objects, which
|
||||
can directly affect your search results. Use this to establish basic
|
||||
functionality and move to custom `SearchIndex` objects for better control.
|
||||
|
||||
At this time, it does not handle related fields.
|
||||
"""
|
||||
text = CharField(document=True, use_template=True)
|
||||
# list of reserved field names
|
||||
fields_to_skip = (ID, DJANGO_CT, DJANGO_ID, 'content', 'text')
|
||||
|
||||
def __init__(self, extra_field_kwargs=None):
|
||||
self.model = None
|
||||
|
||||
self.prepared_data = None
|
||||
content_fields = []
|
||||
self.extra_field_kwargs = extra_field_kwargs or {}
|
||||
|
||||
# Introspect the model, adding/removing fields as needed.
|
||||
# Adds/Excludes should happen only if the fields are not already
|
||||
# defined in `self.fields`.
|
||||
self._meta = getattr(self, 'Meta', None)
|
||||
|
||||
if self._meta:
|
||||
self.model = getattr(self._meta, 'model', None)
|
||||
fields = getattr(self._meta, 'fields', [])
|
||||
excludes = getattr(self._meta, 'excludes', [])
|
||||
|
||||
# Add in the new fields.
|
||||
self.fields.update(self.get_fields(fields, excludes))
|
||||
|
||||
for field_name, field in self.fields.items():
|
||||
if field.document is True:
|
||||
content_fields.append(field_name)
|
||||
|
||||
if not len(content_fields) == 1:
|
||||
raise SearchFieldError("The index '%s' must have one (and only one) SearchField with document=True." % self.__class__.__name__)
|
||||
|
||||
def should_skip_field(self, field):
|
||||
"""
|
||||
Given a Django model field, return if it should be included in the
|
||||
contributed SearchFields.
|
||||
"""
|
||||
# Skip fields in skip list
|
||||
if field.name in self.fields_to_skip:
|
||||
return True
|
||||
|
||||
# Ignore certain fields (AutoField, related fields).
|
||||
if field.primary_key or getattr(field, 'rel'):
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
def get_model(self):
|
||||
return self.model
|
||||
|
||||
def get_index_fieldname(self, f):
|
||||
"""
|
||||
Given a Django field, return the appropriate index fieldname.
|
||||
"""
|
||||
return f.name
|
||||
|
||||
def get_fields(self, fields=None, excludes=None):
|
||||
"""
|
||||
Given any explicit fields to include and fields to exclude, add
|
||||
additional fields based on the associated model.
|
||||
"""
|
||||
final_fields = {}
|
||||
fields = fields or []
|
||||
excludes = excludes or []
|
||||
|
||||
for f in self.model._meta.fields:
|
||||
# If the field name is already present, skip
|
||||
if f.name in self.fields:
|
||||
continue
|
||||
|
||||
# If field is not present in explicit field listing, skip
|
||||
if fields and f.name not in fields:
|
||||
continue
|
||||
|
||||
# If field is in exclude list, skip
|
||||
if excludes and f.name in excludes:
|
||||
continue
|
||||
|
||||
if self.should_skip_field(f):
|
||||
continue
|
||||
|
||||
index_field_class = index_field_from_django_field(f)
|
||||
|
||||
kwargs = copy.copy(self.extra_field_kwargs)
|
||||
kwargs.update({
|
||||
'model_attr': f.name,
|
||||
})
|
||||
|
||||
if f.null is True:
|
||||
kwargs['null'] = True
|
||||
|
||||
if f.has_default():
|
||||
kwargs['default'] = f.default
|
||||
|
||||
final_fields[f.name] = index_field_class(**kwargs)
|
||||
final_fields[f.name].set_instance_name(self.get_index_fieldname(f))
|
||||
|
||||
return final_fields
|
|
@ -0,0 +1,159 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
import re
|
||||
import warnings
|
||||
|
||||
from django.utils.encoding import python_2_unicode_compatible
|
||||
|
||||
try:
|
||||
from django.utils.encoding import force_text
|
||||
except ImportError:
|
||||
from django.utils.encoding import force_unicode as force_text
|
||||
|
||||
|
||||
@python_2_unicode_compatible
|
||||
class BaseInput(object):
|
||||
"""
|
||||
The base input type. Doesn't do much. You want ``Raw`` instead.
|
||||
"""
|
||||
input_type_name = 'base'
|
||||
post_process = True
|
||||
|
||||
def __init__(self, query_string, **kwargs):
|
||||
self.query_string = query_string
|
||||
self.kwargs = kwargs
|
||||
|
||||
def __repr__(self):
|
||||
return u"<%s '%s'>" % (self.__class__.__name__, self.__unicode__().encode('utf8'))
|
||||
|
||||
def __str__(self):
|
||||
return force_text(self.query_string)
|
||||
|
||||
def prepare(self, query_obj):
|
||||
return self.query_string
|
||||
|
||||
|
||||
class Raw(BaseInput):
|
||||
"""
|
||||
An input type for passing a query directly to the backend.
|
||||
|
||||
Prone to not being very portable.
|
||||
"""
|
||||
input_type_name = 'raw'
|
||||
post_process = False
|
||||
|
||||
|
||||
class PythonData(BaseInput):
|
||||
"""
|
||||
Represents a bare Python non-string type.
|
||||
|
||||
Largely only for internal use.
|
||||
"""
|
||||
input_type_name = 'python_data'
|
||||
|
||||
|
||||
class Clean(BaseInput):
|
||||
"""
|
||||
An input type for sanitizing user/untrusted input.
|
||||
"""
|
||||
input_type_name = 'clean'
|
||||
|
||||
def prepare(self, query_obj):
|
||||
query_string = super(Clean, self).prepare(query_obj)
|
||||
return query_obj.clean(query_string)
|
||||
|
||||
|
||||
class Exact(BaseInput):
|
||||
"""
|
||||
An input type for making exact matches.
|
||||
"""
|
||||
input_type_name = 'exact'
|
||||
|
||||
def prepare(self, query_obj):
|
||||
query_string = super(Exact, self).prepare(query_obj)
|
||||
|
||||
if self.kwargs.get('clean', False):
|
||||
# We need to clean each part of the exact match.
|
||||
exact_bits = [Clean(bit).prepare(query_obj) for bit in query_string.split(' ') if bit]
|
||||
query_string = u' '.join(exact_bits)
|
||||
|
||||
return query_obj.build_exact_query(query_string)
|
||||
|
||||
|
||||
class Not(Clean):
|
||||
"""
|
||||
An input type for negating a query.
|
||||
"""
|
||||
input_type_name = 'not'
|
||||
|
||||
def prepare(self, query_obj):
|
||||
query_string = super(Not, self).prepare(query_obj)
|
||||
return query_obj.build_not_query(query_string)
|
||||
|
||||
|
||||
class AutoQuery(BaseInput):
|
||||
"""
|
||||
A convenience class that handles common user queries.
|
||||
|
||||
In addition to cleaning all tokens, it handles double quote bits as
|
||||
exact matches & terms with '-' in front as NOT queries.
|
||||
"""
|
||||
input_type_name = 'auto_query'
|
||||
post_process = False
|
||||
exact_match_re = re.compile(r'"(?P<phrase>.*?)"')
|
||||
|
||||
def prepare(self, query_obj):
|
||||
query_string = super(AutoQuery, self).prepare(query_obj)
|
||||
exacts = self.exact_match_re.findall(query_string)
|
||||
tokens = []
|
||||
query_bits = []
|
||||
|
||||
for rough_token in self.exact_match_re.split(query_string):
|
||||
if not rough_token:
|
||||
continue
|
||||
elif not rough_token in exacts:
|
||||
# We have something that's not an exact match but may have more
|
||||
# than on word in it.
|
||||
tokens.extend(rough_token.split(' '))
|
||||
else:
|
||||
tokens.append(rough_token)
|
||||
|
||||
for token in tokens:
|
||||
if not token:
|
||||
continue
|
||||
if token in exacts:
|
||||
query_bits.append(Exact(token, clean=True).prepare(query_obj))
|
||||
elif token.startswith('-') and len(token) > 1:
|
||||
# This might break Xapian. Check on this.
|
||||
query_bits.append(Not(token[1:]).prepare(query_obj))
|
||||
else:
|
||||
query_bits.append(Clean(token).prepare(query_obj))
|
||||
|
||||
return u' '.join(query_bits)
|
||||
|
||||
|
||||
class AltParser(BaseInput):
|
||||
"""
|
||||
If the engine supports it, this input type allows for submitting a query
|
||||
that uses a different parser.
|
||||
"""
|
||||
input_type_name = 'alt_parser'
|
||||
post_process = False
|
||||
use_parens = False
|
||||
|
||||
def __init__(self, parser_name, query_string='', **kwargs):
|
||||
self.parser_name = parser_name
|
||||
self.query_string = query_string
|
||||
self.kwargs = kwargs
|
||||
|
||||
def __repr__(self):
|
||||
return u"<%s '%s' '%s' '%s'>" % (self.__class__.__name__, self.parser_name, self.query_string, self.kwargs)
|
||||
|
||||
def prepare(self, query_obj):
|
||||
if not hasattr(query_obj, 'build_alt_parser_query'):
|
||||
warnings.warn("Use of 'AltParser' input type is being ignored, as the '%s' backend doesn't support them." % query_obj)
|
||||
return ''
|
||||
|
||||
return query_obj.build_alt_parser_query(self.parser_name, self.query_string, **self.kwargs)
|
|
@ -0,0 +1,70 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
import sys
|
||||
from optparse import make_option
|
||||
|
||||
from django.core.exceptions import ImproperlyConfigured
|
||||
from django.core.management.base import BaseCommand
|
||||
from django.template import Context, loader
|
||||
|
||||
from haystack import constants
|
||||
from haystack.backends.solr_backend import SolrSearchBackend
|
||||
|
||||
|
||||
class Command(BaseCommand):
|
||||
help = "Generates a Solr schema that reflects the indexes."
|
||||
base_options = (
|
||||
make_option("-f", "--filename", action="store", type="string", dest="filename",
|
||||
help='If provided, directs output to a file instead of stdout.'),
|
||||
make_option("-u", "--using", action="store", type="string", dest="using", default=constants.DEFAULT_ALIAS,
|
||||
help='If provided, chooses a connection to work with.'),
|
||||
)
|
||||
option_list = BaseCommand.option_list + base_options
|
||||
|
||||
def handle(self, **options):
|
||||
"""Generates a Solr schema that reflects the indexes."""
|
||||
using = options.get('using')
|
||||
schema_xml = self.build_template(using=using)
|
||||
|
||||
if options.get('filename'):
|
||||
self.write_file(options.get('filename'), schema_xml)
|
||||
else:
|
||||
self.print_stdout(schema_xml)
|
||||
|
||||
def build_context(self, using):
|
||||
from haystack import connections, connection_router
|
||||
backend = connections[using].get_backend()
|
||||
|
||||
if not isinstance(backend, SolrSearchBackend):
|
||||
raise ImproperlyConfigured("'%s' isn't configured as a SolrEngine)." % backend.connection_alias)
|
||||
|
||||
content_field_name, fields = backend.build_schema(connections[using].get_unified_index().all_searchfields())
|
||||
return Context({
|
||||
'content_field_name': content_field_name,
|
||||
'fields': fields,
|
||||
'default_operator': constants.DEFAULT_OPERATOR,
|
||||
'ID': constants.ID,
|
||||
'DJANGO_CT': constants.DJANGO_CT,
|
||||
'DJANGO_ID': constants.DJANGO_ID,
|
||||
})
|
||||
|
||||
def build_template(self, using):
|
||||
t = loader.get_template('search_configuration/solr.xml')
|
||||
c = self.build_context(using=using)
|
||||
return t.render(c)
|
||||
|
||||
def print_stdout(self, schema_xml):
|
||||
sys.stderr.write("\n")
|
||||
sys.stderr.write("\n")
|
||||
sys.stderr.write("\n")
|
||||
sys.stderr.write("Save the following output to 'schema.xml' and place it in your Solr configuration directory.\n")
|
||||
sys.stderr.write("--------------------------------------------------------------------------------------------\n")
|
||||
sys.stderr.write("\n")
|
||||
print(schema_xml)
|
||||
|
||||
def write_file(self, filename, schema_xml):
|
||||
schema_file = open(filename, 'w')
|
||||
schema_file.write(schema_xml)
|
||||
schema_file.close()
|
|
@ -0,0 +1,59 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
import sys
|
||||
from optparse import make_option
|
||||
|
||||
from django.core.management.base import BaseCommand
|
||||
from django.utils import six
|
||||
|
||||
|
||||
class Command(BaseCommand):
|
||||
help = "Clears out the search index completely."
|
||||
base_options = (
|
||||
make_option('--noinput', action='store_false', dest='interactive', default=True,
|
||||
help='If provided, no prompts will be issued to the user and the data will be wiped out.'
|
||||
),
|
||||
make_option("-u", "--using", action="append", dest="using",
|
||||
default=[],
|
||||
help='Update only the named backend (can be used multiple times). '
|
||||
'By default all backends will be updated.'
|
||||
),
|
||||
make_option('--nocommit', action='store_false', dest='commit',
|
||||
default=True, help='Will pass commit=False to the backend.'
|
||||
),
|
||||
)
|
||||
option_list = BaseCommand.option_list + base_options
|
||||
|
||||
def handle(self, **options):
|
||||
"""Clears out the search index completely."""
|
||||
from haystack import connections
|
||||
self.verbosity = int(options.get('verbosity', 1))
|
||||
self.commit = options.get('commit', True)
|
||||
|
||||
using = options.get('using')
|
||||
if not using:
|
||||
using = connections.connections_info.keys()
|
||||
|
||||
if options.get('interactive', True):
|
||||
print()
|
||||
print("WARNING: This will irreparably remove EVERYTHING from your search index in connection '%s'." % "', '".join(using))
|
||||
print("Your choices after this are to restore from backups or rebuild via the `rebuild_index` command.")
|
||||
|
||||
yes_or_no = six.moves.input("Are you sure you wish to continue? [y/N] ")
|
||||
print
|
||||
|
||||
if not yes_or_no.lower().startswith('y'):
|
||||
print("No action taken.")
|
||||
sys.exit()
|
||||
|
||||
if self.verbosity >= 1:
|
||||
print("Removing all documents from your index because you said so.")
|
||||
|
||||
for backend_name in using:
|
||||
backend = connections[backend_name].get_backend()
|
||||
backend.clear(commit=self.commit)
|
||||
|
||||
if self.verbosity >= 1:
|
||||
print("All documents removed.")
|
|
@ -0,0 +1,21 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
from django.core.management.base import NoArgsCommand
|
||||
|
||||
|
||||
class Command(NoArgsCommand):
|
||||
help = "Provides feedback about the current Haystack setup."
|
||||
|
||||
def handle_noargs(self, **options):
|
||||
"""Provides feedback about the current Haystack setup."""
|
||||
from haystack import connections
|
||||
|
||||
unified_index = connections['default'].get_unified_index()
|
||||
indexed = unified_index.get_indexed_models()
|
||||
index_count = len(indexed)
|
||||
print("Number of handled %s index(es)." % index_count)
|
||||
|
||||
for index in indexed:
|
||||
print(" - Model: %s by Index: %s" % (index.__name__, unified_index.get_indexes()[index]))
|
|
@ -0,0 +1,26 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
from django.core.management import call_command
|
||||
from django.core.management.base import BaseCommand
|
||||
|
||||
from haystack.management.commands.clear_index import Command as ClearCommand
|
||||
from haystack.management.commands.update_index import Command as UpdateCommand
|
||||
|
||||
__all__ = ['Command']
|
||||
|
||||
_combined_options = list(BaseCommand.option_list)
|
||||
_combined_options.extend(option for option in UpdateCommand.base_options
|
||||
if option.get_opt_string() not in [i.get_opt_string() for i in _combined_options])
|
||||
_combined_options.extend(option for option in ClearCommand.base_options
|
||||
if option.get_opt_string() not in [i.get_opt_string() for i in _combined_options])
|
||||
|
||||
|
||||
class Command(BaseCommand):
|
||||
help = "Completely rebuilds the search index by removing the old data and then updating."
|
||||
option_list = _combined_options
|
||||
|
||||
def handle(self, **options):
|
||||
call_command('clear_index', **options)
|
||||
call_command('update_index', **options)
|
|
@ -0,0 +1,289 @@
|
|||
# encoding: utf-8
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
import warnings
|
||||
from datetime import timedelta
|
||||
from optparse import make_option
|
||||
|
||||
try:
|
||||
from django.db import close_old_connections
|
||||
except ImportError:
|
||||
# This can be removed when we drop support for Django 1.7 and earlier:
|
||||
from django.db import close_connection as close_old_connections
|
||||
|
||||
from django.core.management.base import LabelCommand
|
||||
from django.db import reset_queries
|
||||
|
||||
from haystack import connections as haystack_connections
|
||||
from haystack.query import SearchQuerySet
|
||||
from haystack.utils.app_loading import haystack_get_models, haystack_load_apps
|
||||
|
||||
try:
|
||||
from django.utils.encoding import force_text
|
||||
except ImportError:
|
||||
from django.utils.encoding import force_unicode as force_text
|
||||
|
||||
try:
|
||||
from django.utils.encoding import smart_bytes
|
||||
except ImportError:
|
||||
from django.utils.encoding import smart_str as smart_bytes
|
||||
|
||||
try:
|
||||
from django.utils.timezone import now
|
||||
except ImportError:
|
||||
from datetime import datetime
|
||||
now = datetime.now
|
||||
|
||||
|
||||
DEFAULT_BATCH_SIZE = None
|
||||
DEFAULT_AGE = None
|
||||
APP = 'app'
|
||||
MODEL = 'model'
|
||||
|
||||
|
||||
def worker(bits):
|
||||
# We need to reset the connections, otherwise the different processes
|
||||
# will try to share the connection, which causes things to blow up.
|
||||
from django.db import connections
|
||||
|
||||
for alias, info in connections.databases.items():
|
||||
# We need to also tread lightly with SQLite, because blindly wiping
|
||||
# out connections (via ``... = {}``) destroys in-memory DBs.
|
||||
if 'sqlite3' not in info['ENGINE']:
|
||||
try:
|
||||
close_old_connections()
|
||||
if isinstance(connections._connections, dict):
|
||||
del(connections._connections[alias])
|
||||
else:
|
||||
delattr(connections._connections, alias)
|
||||
except KeyError:
|
||||
pass
|
||||
|
||||
if bits[0] == 'do_update':
|
||||
func, model, start, end, total, using, start_date, end_date, verbosity, commit = bits
|
||||
elif bits[0] == 'do_remove':
|
||||
func, model, pks_seen, start, upper_bound, using, verbosity, commit = bits
|
||||
else:
|
||||
return
|
||||
|
||||
unified_index = haystack_connections[using].get_unified_index()
|
||||
index = unified_index.get_index(model)
|
||||
backend = haystack_connections[using].get_backend()
|
||||
|
||||
if func == 'do_update':
|
||||
qs = index.build_queryset(start_date=start_date, end_date=end_date)
|
||||
do_update(backend, index, qs, start, end, total, verbosity=verbosity, commit=commit)
|
||||
else:
|
||||
raise NotImplementedError('Unknown function %s' % func)
|
||||
|
||||
|
||||
def do_update(backend, index, qs, start, end, total, verbosity=1, commit=True):
|
||||
# Get a clone of the QuerySet so that the cache doesn't bloat up
|
||||
# in memory. Useful when reindexing large amounts of data.
|
||||
small_cache_qs = qs.all()
|
||||
current_qs = small_cache_qs[start:end]
|
||||
|
||||
if verbosity >= 2:
|
||||
if hasattr(os, 'getppid') and os.getpid() == os.getppid():
|
||||
print(" indexed %s - %d of %d." % (start + 1, end, total))
|
||||
else:
|
||||
print(" indexed %s - %d of %d (by %s)." % (start + 1, end, total, os.getpid()))
|
||||
|
||||
# FIXME: Get the right backend.
|
||||
backend.update(index, current_qs, commit=commit)
|
||||
|
||||
# Clear out the DB connections queries because it bloats up RAM.
|
||||
reset_queries()
|
||||
|
||||
|
||||
class Command(LabelCommand):
|
||||
help = "Freshens the index for the given app(s)."
|
||||
base_options = (
|
||||
make_option('-a', '--age', action='store', dest='age',
|
||||
default=DEFAULT_AGE, type='int',
|
||||
help='Number of hours back to consider objects new.'
|
||||
),
|
||||
make_option('-s', '--start', action='store', dest='start_date',
|
||||
default=None, type='string',
|
||||
help='The start date for indexing within. Can be any dateutil-parsable string, recommended to be YYYY-MM-DDTHH:MM:SS.'
|
||||
),
|
||||
make_option('-e', '--end', action='store', dest='end_date',
|
||||
default=None, type='string',
|
||||
help='The end date for indexing within. Can be any dateutil-parsable string, recommended to be YYYY-MM-DDTHH:MM:SS.'
|
||||
),
|
||||
make_option('-b', '--batch-size', action='store', dest='batchsize',
|
||||
default=None, type='int',
|
||||
help='Number of items to index at once.'
|
||||
),
|
||||
make_option('-r', '--remove', action='store_true', dest='remove',
|
||||
default=False, help='Remove objects from the index that are no longer present in the database.'
|
||||
),
|
||||
make_option("-u", "--using", action="append", dest="using",
|
||||
default=[],
|
||||
help='Update only the named backend (can be used multiple times). '
|
||||
'By default all backends will be updated.'
|
||||
),
|
||||
make_option('-k', '--workers', action='store', dest='workers',
|
||||
default=0, type='int',
|
||||
help='Allows for the use multiple workers to parallelize indexing. Requires multiprocessing.'
|
||||
),
|
||||
make_option('--nocommit', action='store_false', dest='commit',
|
||||
default=True, help='Will pass commit=False to the backend.'
|
||||
),
|
||||
)
|
||||
option_list = LabelCommand.option_list + base_options
|
||||
|
||||
def handle(self, *items, **options):
|
||||
self.verbosity = int(options.get('verbosity', 1))
|
||||
self.batchsize = options.get('batchsize', DEFAULT_BATCH_SIZE)
|
||||
self.start_date = None
|
||||
self.end_date = None
|
||||
self.remove = options.get('remove', False)
|
||||
self.workers = int(options.get('workers', 0))
|
||||
self.commit = options.get('commit', True)
|
||||
|
||||
if sys.version_info < (2, 7):
|
||||
warnings.warn('multiprocessing is disabled on Python 2.6 and earlier. '
|
||||
'See https://github.com/toastdriven/django-haystack/issues/1001')
|
||||
self.workers = 0
|
||||
|
||||
self.backends = options.get('using')
|
||||
if not self.backends:
|
||||
self.backends = haystack_connections.connections_info.keys()
|
||||
|
||||
age = options.get('age', DEFAULT_AGE)
|
||||
start_date = options.get('start_date')
|
||||
end_date = options.get('end_date')
|
||||
|
||||
if age is not None:
|
||||
self.start_date = now() - timedelta(hours=int(age))
|
||||
|
||||
if start_date is not None:
|
||||
from dateutil.parser import parse as dateutil_parse
|
||||
|
||||
try:
|
||||
self.start_date = dateutil_parse(start_date)
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
if end_date is not None:
|
||||
from dateutil.parser import parse as dateutil_parse
|
||||
|
||||
try:
|
||||
self.end_date = dateutil_parse(end_date)
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
if not items:
|
||||
items = haystack_load_apps()
|
||||
|
||||
return super(Command, self).handle(*items, **options)
|
||||
|
||||
def handle_label(self, label, **options):
|
||||
for using in self.backends:
|
||||
try:
|
||||
self.update_backend(label, using)
|
||||
except:
|
||||
logging.exception("Error updating %s using %s ", label, using)
|
||||
raise
|
||||
|
||||
def update_backend(self, label, using):
|
||||
from haystack.exceptions import NotHandled
|
||||
|
||||
backend = haystack_connections[using].get_backend()
|
||||
unified_index = haystack_connections[using].get_unified_index()
|
||||
|
||||
if self.workers > 0:
|
||||
import multiprocessing
|
||||
|
||||
for model in haystack_get_models(label):
|
||||
try:
|
||||
index = unified_index.get_index(model)
|
||||
except NotHandled:
|
||||
if self.verbosity >= 2:
|
||||
print("Skipping '%s' - no index." % model)
|
||||
continue
|
||||
|
||||
if self.workers > 0:
|
||||
# workers resetting connections leads to references to models / connections getting
|
||||
# stale and having their connection disconnected from under them. Resetting before
|
||||
# the loop continues and it accesses the ORM makes it better.
|
||||
close_old_connections()
|
||||
|
||||
qs = index.build_queryset(using=using, start_date=self.start_date,
|
||||
end_date=self.end_date)
|
||||
|
||||
total = qs.count()
|
||||
|
||||
if self.verbosity >= 1:
|
||||
print(u"Indexing %d %s" % (total, force_text(model._meta.verbose_name_plural)))
|
||||
|
||||
batch_size = self.batchsize or backend.batch_size
|
||||
|
||||
if self.workers > 0:
|
||||
ghetto_queue = []
|
||||
|
||||
for start in range(0, total, batch_size):
|
||||
end = min(start + batch_size, total)
|
||||
|
||||
if self.workers == 0:
|
||||
do_update(backend, index, qs, start, end, total, verbosity=self.verbosity, commit=self.commit)
|
||||
else:
|
||||
ghetto_queue.append(('do_update', model, start, end, total, using, self.start_date, self.end_date, self.verbosity, self.commit))
|
||||
|
||||
if self.workers > 0:
|
||||
pool = multiprocessing.Pool(self.workers)
|
||||
pool.map(worker, ghetto_queue)
|
||||
pool.close()
|
||||
pool.join()
|
||||
|
||||
if self.remove:
|
||||
if self.start_date or self.end_date or total <= 0:
|
||||
# They're using a reduced set, which may not incorporate
|
||||
# all pks. Rebuild the list with everything.
|
||||
qs = index.index_queryset().values_list('pk', flat=True)
|
||||
database_pks = set(smart_bytes(pk) for pk in qs)
|
||||
|
||||
total = len(database_pks)
|
||||
else:
|
||||
database_pks = set(smart_bytes(pk) for pk in qs.values_list('pk', flat=True))
|
||||
|
||||
# Since records may still be in the search index but not the local database
|
||||
# we'll use that to create batches for processing.
|
||||
# See https://github.com/django-haystack/django-haystack/issues/1186
|
||||
index_total = SearchQuerySet(using=backend.connection_alias).models(model).count()
|
||||
|
||||
# Retrieve PKs from the index. Note that this cannot be a numeric range query because although
|
||||
# pks are normally numeric they can be non-numeric UUIDs or other custom values. To reduce
|
||||
# load on the search engine, we only retrieve the pk field, which will be checked against the
|
||||
# full list obtained from the database, and the id field, which will be used to delete the
|
||||
# record should it be found to be stale.
|
||||
index_pks = SearchQuerySet(using=backend.connection_alias).models(model)
|
||||
index_pks = index_pks.values_list('pk', 'id')
|
||||
|
||||
# We'll collect all of the record IDs which are no longer present in the database and delete
|
||||
# them after walking the entire index. This uses more memory than the incremental approach but
|
||||
# avoids needing the pagination logic below to account for both commit modes:
|
||||
stale_records = set()
|
||||
|
||||
for start in range(0, index_total, batch_size):
|
||||
upper_bound = start + batch_size
|
||||
|
||||
# If the database pk is no longer present, queue the index key for removal:
|
||||
for pk, rec_id in index_pks[start:upper_bound]:
|
||||
if smart_bytes(pk) not in database_pks:
|
||||
stale_records.add(rec_id)
|
||||
|
||||
if stale_records:
|
||||
if self.verbosity >= 1:
|
||||
print(" removing %d stale records." % len(stale_records))
|
||||
|
||||
for rec_id in stale_records:
|
||||
# Since the PK was not in the database list, we'll delete the record from the search index:
|
||||
if self.verbosity >= 2:
|
||||
print(" removing %s." % rec_id)
|
||||
|
||||
backend.remove(rec_id, commit=self.commit)
|
|
@ -0,0 +1,107 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
from haystack.query import EmptySearchQuerySet, SearchQuerySet
|
||||
|
||||
|
||||
class SearchIndexManager(object):
|
||||
def __init__(self, using=None):
|
||||
super(SearchIndexManager, self).__init__()
|
||||
self.using = using
|
||||
|
||||
def get_search_queryset(self):
|
||||
"""Returns a new SearchQuerySet object. Subclasses can override this method
|
||||
to easily customize the behavior of the Manager.
|
||||
"""
|
||||
return SearchQuerySet(using=self.using)
|
||||
|
||||
def get_empty_query_set(self):
|
||||
return EmptySearchQuerySet(using=self.using)
|
||||
|
||||
def all(self):
|
||||
return self.get_search_queryset()
|
||||
|
||||
def none(self):
|
||||
return self.get_empty_query_set()
|
||||
|
||||
def filter(self, *args, **kwargs):
|
||||
return self.get_search_queryset().filter(*args, **kwargs)
|
||||
|
||||
def exclude(self, *args, **kwargs):
|
||||
return self.get_search_queryset().exclude(*args, **kwargs)
|
||||
|
||||
def filter_and(self, *args, **kwargs):
|
||||
return self.get_search_queryset().filter_and(*args, **kwargs)
|
||||
|
||||
def filter_or(self, *args, **kwargs):
|
||||
return self.get_search_queryset().filter_or(*args, **kwargs)
|
||||
|
||||
def order_by(self, *args):
|
||||
return self.get_search_queryset().order_by(*args)
|
||||
|
||||
def highlight(self):
|
||||
return self.get_search_queryset().highlight()
|
||||
|
||||
def boost(self, term, boost):
|
||||
return self.get_search_queryset().boost(term, boost)
|
||||
|
||||
def facet(self, field):
|
||||
return self.get_search_queryset().facet(field)
|
||||
|
||||
def within(self, field, point_1, point_2):
|
||||
return self.get_search_queryset().within(field, point_1, point_2)
|
||||
|
||||
def dwithin(self, field, point, distance):
|
||||
return self.get_search_queryset().dwithin(field, point, distance)
|
||||
|
||||
def distance(self, field, point):
|
||||
return self.get_search_queryset().distance(field, point)
|
||||
|
||||
def date_facet(self, field, start_date, end_date, gap_by, gap_amount=1):
|
||||
return self.get_search_queryset().date_facet(field, start_date, end_date, gap_by, gap_amount=1)
|
||||
|
||||
def query_facet(self, field, query):
|
||||
return self.get_search_queryset().query_facet(field, query)
|
||||
|
||||
def narrow(self, query):
|
||||
return self.get_search_queryset().narrow(query)
|
||||
|
||||
def raw_search(self, query_string, **kwargs):
|
||||
return self.get_search_queryset().raw_search(query_string, **kwargs)
|
||||
|
||||
def load_all(self):
|
||||
return self.get_search_queryset().load_all()
|
||||
|
||||
def auto_query(self, query_string, fieldname='content'):
|
||||
return self.get_search_queryset().auto_query(query_string, fieldname=fieldname)
|
||||
|
||||
def autocomplete(self, **kwargs):
|
||||
return self.get_search_queryset().autocomplete(**kwargs)
|
||||
|
||||
def using(self, connection_name):
|
||||
return self.get_search_queryset().using(connection_name)
|
||||
|
||||
def count(self):
|
||||
return self.get_search_queryset().count()
|
||||
|
||||
def best_match(self):
|
||||
return self.get_search_queryset().best_match()
|
||||
|
||||
def latest(self, date_field):
|
||||
return self.get_search_queryset().latest(date_field)
|
||||
|
||||
def more_like_this(self, model_instance):
|
||||
return self.get_search_queryset().more_like_this(model_instance)
|
||||
|
||||
def facet_counts(self):
|
||||
return self.get_search_queryset().facet_counts()
|
||||
|
||||
def spelling_suggestion(self, preferred_query=None):
|
||||
return self.get_search_queryset().spelling_suggestion(preferred_query=None)
|
||||
|
||||
def values(self, *fields):
|
||||
return self.get_search_queryset().values(*fields)
|
||||
|
||||
def values_list(self, *fields, **kwargs):
|
||||
return self.get_search_queryset().values_list(*fields, **kwargs)
|
|
@ -0,0 +1,247 @@
|
|||
# encoding: utf-8
|
||||
|
||||
# "Hey, Django! Look at me, I'm an app! For Serious!"
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
from django.conf import settings
|
||||
from django.core.exceptions import ObjectDoesNotExist
|
||||
from django.db import models
|
||||
from django.utils import six
|
||||
from django.utils.text import capfirst
|
||||
|
||||
from haystack.exceptions import NotHandled, SpatialError
|
||||
from haystack.utils import log as logging
|
||||
|
||||
try:
|
||||
from django.utils.encoding import force_text
|
||||
except ImportError:
|
||||
from django.utils.encoding import force_unicode as force_text
|
||||
|
||||
try:
|
||||
from geopy import distance as geopy_distance
|
||||
except ImportError:
|
||||
geopy_distance = None
|
||||
|
||||
|
||||
# Not a Django model, but tightly tied to them and there doesn't seem to be a
|
||||
# better spot in the tree.
|
||||
class SearchResult(object):
|
||||
"""
|
||||
A single search result. The actual object is loaded lazily by accessing
|
||||
object; until then this object only stores the model, pk, and score.
|
||||
|
||||
Note that iterating over SearchResults and getting the object for each
|
||||
result will do O(N) database queries, which may not fit your needs for
|
||||
performance.
|
||||
"""
|
||||
def __init__(self, app_label, model_name, pk, score, **kwargs):
|
||||
self.app_label, self.model_name = app_label, model_name
|
||||
self.pk = pk
|
||||
self.score = score
|
||||
self._object = None
|
||||
self._model = None
|
||||
self._verbose_name = None
|
||||
self._additional_fields = []
|
||||
self._point_of_origin = kwargs.pop('_point_of_origin', None)
|
||||
self._distance = kwargs.pop('_distance', None)
|
||||
self.stored_fields = None
|
||||
self.log = self._get_log()
|
||||
|
||||
for key, value in kwargs.items():
|
||||
if not key in self.__dict__:
|
||||
self.__dict__[key] = value
|
||||
self._additional_fields.append(key)
|
||||
|
||||
def _get_log(self):
|
||||
return logging.getLogger('haystack')
|
||||
|
||||
def __repr__(self):
|
||||
return "<SearchResult: %s.%s (pk=%r)>" % (self.app_label, self.model_name, self.pk)
|
||||
|
||||
def __unicode__(self):
|
||||
return force_text(self.__repr__())
|
||||
|
||||
def __getattr__(self, attr):
|
||||
if attr == '__getnewargs__':
|
||||
raise AttributeError
|
||||
|
||||
return self.__dict__.get(attr, None)
|
||||
|
||||
def _get_searchindex(self):
|
||||
from haystack import connections
|
||||
return connections['default'].get_unified_index().get_index(self.model)
|
||||
|
||||
searchindex = property(_get_searchindex)
|
||||
|
||||
def _get_object(self):
|
||||
if self._object is None:
|
||||
if self.model is None:
|
||||
self.log.error("Model could not be found for SearchResult '%s'.", self)
|
||||
return None
|
||||
|
||||
try:
|
||||
try:
|
||||
self._object = self.searchindex.read_queryset().get(pk=self.pk)
|
||||
except NotHandled:
|
||||
self.log.warning("Model '%s.%s' not handled by the routers.", self.app_label, self.model_name)
|
||||
# Revert to old behaviour
|
||||
self._object = self.model._default_manager.get(pk=self.pk)
|
||||
except ObjectDoesNotExist:
|
||||
self.log.error("Object could not be found in database for SearchResult '%s'.", self)
|
||||
self._object = None
|
||||
|
||||
return self._object
|
||||
|
||||
def _set_object(self, obj):
|
||||
self._object = obj
|
||||
|
||||
object = property(_get_object, _set_object)
|
||||
|
||||
def _get_model(self):
|
||||
if self._model is None:
|
||||
try:
|
||||
self._model = models.get_model(self.app_label, self.model_name)
|
||||
except LookupError:
|
||||
# this changed in change 1.7 to throw an error instead of
|
||||
# returning None when the model isn't found. So catch the
|
||||
# lookup error and keep self._model == None.
|
||||
pass
|
||||
|
||||
return self._model
|
||||
|
||||
def _set_model(self, obj):
|
||||
self._model = obj
|
||||
|
||||
model = property(_get_model, _set_model)
|
||||
|
||||
def _get_distance(self):
|
||||
from haystack.utils.geo import Distance
|
||||
|
||||
if self._distance is None:
|
||||
# We didn't get it from the backend & we haven't tried calculating
|
||||
# it yet. Check if geopy is available to do it the "slow" way
|
||||
# (even though slow meant 100 distance calculations in 0.004 seconds
|
||||
# in my testing).
|
||||
if geopy_distance is None:
|
||||
raise SpatialError("The backend doesn't have 'DISTANCE_AVAILABLE' enabled & the 'geopy' library could not be imported, so distance information is not available.")
|
||||
|
||||
if not self._point_of_origin:
|
||||
raise SpatialError("The original point is not available.")
|
||||
|
||||
if not hasattr(self, self._point_of_origin['field']):
|
||||
raise SpatialError("The field '%s' was not included in search results, so the distance could not be calculated." % self._point_of_origin['field'])
|
||||
|
||||
po_lng, po_lat = self._point_of_origin['point'].get_coords()
|
||||
location_field = getattr(self, self._point_of_origin['field'])
|
||||
|
||||
if location_field is None:
|
||||
return None
|
||||
|
||||
lf_lng, lf_lat = location_field.get_coords()
|
||||
self._distance = Distance(km=geopy_distance.distance((po_lat, po_lng), (lf_lat, lf_lng)).km)
|
||||
|
||||
# We've either already calculated it or the backend returned it, so
|
||||
# let's use that.
|
||||
return self._distance
|
||||
|
||||
def _set_distance(self, dist):
|
||||
self._distance = dist
|
||||
|
||||
distance = property(_get_distance, _set_distance)
|
||||
|
||||
def _get_verbose_name(self):
|
||||
if self.model is None:
|
||||
self.log.error("Model could not be found for SearchResult '%s'.", self)
|
||||
return u''
|
||||
|
||||
return force_text(capfirst(self.model._meta.verbose_name))
|
||||
|
||||
verbose_name = property(_get_verbose_name)
|
||||
|
||||
def _get_verbose_name_plural(self):
|
||||
if self.model is None:
|
||||
self.log.error("Model could not be found for SearchResult '%s'.", self)
|
||||
return u''
|
||||
|
||||
return force_text(capfirst(self.model._meta.verbose_name_plural))
|
||||
|
||||
verbose_name_plural = property(_get_verbose_name_plural)
|
||||
|
||||
def content_type(self):
|
||||
"""Returns the content type for the result's model instance."""
|
||||
if self.model is None:
|
||||
self.log.error("Model could not be found for SearchResult '%s'.", self)
|
||||
return u''
|
||||
|
||||
return six.text_type(self.model._meta)
|
||||
|
||||
def get_additional_fields(self):
|
||||
"""
|
||||
Returns a dictionary of all of the fields from the raw result.
|
||||
|
||||
Useful for serializing results. Only returns what was seen from the
|
||||
search engine, so it may have extra fields Haystack's indexes aren't
|
||||
aware of.
|
||||
"""
|
||||
additional_fields = {}
|
||||
|
||||
for fieldname in self._additional_fields:
|
||||
additional_fields[fieldname] = getattr(self, fieldname)
|
||||
|
||||
return additional_fields
|
||||
|
||||
def get_stored_fields(self):
|
||||
"""
|
||||
Returns a dictionary of all of the stored fields from the SearchIndex.
|
||||
|
||||
Useful for serializing results. Only returns the fields Haystack's
|
||||
indexes are aware of as being 'stored'.
|
||||
"""
|
||||
if self._stored_fields is None:
|
||||
from haystack import connections
|
||||
from haystack.exceptions import NotHandled
|
||||
|
||||
try:
|
||||
index = connections['default'].get_unified_index().get_index(self.model)
|
||||
except NotHandled:
|
||||
# Not found? Return nothing.
|
||||
return {}
|
||||
|
||||
self._stored_fields = {}
|
||||
|
||||
# Iterate through the index's fields, pulling out the fields that
|
||||
# are stored.
|
||||
for fieldname, field in index.fields.items():
|
||||
if field.stored is True:
|
||||
self._stored_fields[fieldname] = getattr(self, fieldname, u'')
|
||||
|
||||
return self._stored_fields
|
||||
|
||||
def __getstate__(self):
|
||||
"""
|
||||
Returns a dictionary representing the ``SearchResult`` in order to
|
||||
make it pickleable.
|
||||
"""
|
||||
# The ``log`` is excluded because, under the hood, ``logging`` uses
|
||||
# ``threading.Lock``, which doesn't pickle well.
|
||||
ret_dict = self.__dict__.copy()
|
||||
del(ret_dict['log'])
|
||||
return ret_dict
|
||||
|
||||
def __setstate__(self, data_dict):
|
||||
"""
|
||||
Updates the object's attributes according to data passed by pickle.
|
||||
"""
|
||||
self.__dict__.update(data_dict)
|
||||
self.log = self._get_log()
|
||||
|
||||
|
||||
def reload_indexes(sender, *args, **kwargs):
|
||||
from haystack import connections
|
||||
|
||||
for conn in connections.all():
|
||||
ui = conn.get_unified_index()
|
||||
# Note: Unlike above, we're resetting the ``UnifiedIndex`` here.
|
||||
# Thi gives us a clean slate.
|
||||
ui.reset()
|
|
@ -0,0 +1,86 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
import datetime
|
||||
|
||||
from debug_toolbar.panels import DebugPanel
|
||||
from django.template.loader import render_to_string
|
||||
from django.utils import six
|
||||
from django.utils.translation import ugettext_lazy as _
|
||||
|
||||
from haystack import connections
|
||||
|
||||
|
||||
class HaystackDebugPanel(DebugPanel):
|
||||
"""
|
||||
Panel that displays information about the Haystack queries run while
|
||||
processing the request.
|
||||
"""
|
||||
name = 'Haystack'
|
||||
has_content = True
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
super(self.__class__, self).__init__(*args, **kwargs)
|
||||
self._offset = dict((alias, len(connections[alias].queries)) for alias in connections.connections_info.keys())
|
||||
self._search_time = 0
|
||||
self._queries = []
|
||||
self._backends = {}
|
||||
|
||||
def nav_title(self):
|
||||
return _('Haystack')
|
||||
|
||||
def nav_subtitle(self):
|
||||
self._queries = []
|
||||
self._backends = {}
|
||||
|
||||
for alias in connections.connections_info.keys():
|
||||
search_queries = connections[alias].queries[self._offset[alias]:]
|
||||
self._backends[alias] = {
|
||||
'time_spent': sum(float(q['time']) for q in search_queries),
|
||||
'queries': len(search_queries),
|
||||
}
|
||||
self._queries.extend([(alias, q) for q in search_queries])
|
||||
|
||||
self._queries.sort(key=lambda x: x[1]['start'])
|
||||
self._search_time = sum([d['time_spent'] for d in self._backends.itervalues()])
|
||||
num_queries = len(self._queries)
|
||||
return "%d %s in %.2fms" % (
|
||||
num_queries,
|
||||
(num_queries == 1) and 'query' or 'queries',
|
||||
self._search_time
|
||||
)
|
||||
|
||||
def title(self):
|
||||
return _('Search Queries')
|
||||
|
||||
def url(self):
|
||||
return ''
|
||||
|
||||
def content(self):
|
||||
width_ratio_tally = 0
|
||||
|
||||
for alias, query in self._queries:
|
||||
query['alias'] = alias
|
||||
query['query'] = query['query_string']
|
||||
|
||||
if query.get('additional_kwargs'):
|
||||
if query['additional_kwargs'].get('result_class'):
|
||||
query['additional_kwargs']['result_class'] = six.text_type(query['additional_kwargs']['result_class'])
|
||||
|
||||
try:
|
||||
query['width_ratio'] = (float(query['time']) / self._search_time) * 100
|
||||
except ZeroDivisionError:
|
||||
query['width_ratio'] = 0
|
||||
|
||||
query['start_offset'] = width_ratio_tally
|
||||
width_ratio_tally += query['width_ratio']
|
||||
|
||||
context = self.context.copy()
|
||||
context.update({
|
||||
'backends': sorted(self._backends.items(), key=lambda x: -x[1]['time_spent']),
|
||||
'queries': [q for a, q in self._queries],
|
||||
'sql_time': self._search_time,
|
||||
})
|
||||
|
||||
return render_to_string('panels/haystack.html', context)
|
|
@ -0,0 +1,841 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
import operator
|
||||
import warnings
|
||||
|
||||
from django.utils import six
|
||||
|
||||
from haystack import connection_router, connections
|
||||
from haystack.backends import SQ
|
||||
from haystack.constants import DEFAULT_OPERATOR, ITERATOR_LOAD_PER_QUERY, REPR_OUTPUT_SIZE
|
||||
from haystack.exceptions import NotHandled
|
||||
from haystack.inputs import AutoQuery, Clean, Raw
|
||||
from haystack.utils import log as logging
|
||||
|
||||
|
||||
class SearchQuerySet(object):
|
||||
"""
|
||||
Provides a way to specify search parameters and lazily load results.
|
||||
|
||||
Supports chaining (a la QuerySet) to narrow the search.
|
||||
"""
|
||||
def __init__(self, using=None, query=None):
|
||||
# ``_using`` should only ever be a value other than ``None`` if it's
|
||||
# been forced with the ``.using`` method.
|
||||
self._using = using
|
||||
self.query = None
|
||||
self._determine_backend()
|
||||
|
||||
# If ``query`` is present, it should override even what the routers
|
||||
# think.
|
||||
if query is not None:
|
||||
self.query = query
|
||||
|
||||
self._result_cache = []
|
||||
self._result_count = None
|
||||
self._cache_full = False
|
||||
self._load_all = False
|
||||
self._ignored_result_count = 0
|
||||
self.log = logging.getLogger('haystack')
|
||||
|
||||
def _determine_backend(self):
|
||||
from haystack import connections
|
||||
# A backend has been manually selected. Use it instead.
|
||||
if self._using is not None:
|
||||
self.query = connections[self._using].get_query()
|
||||
return
|
||||
|
||||
# No backend, so rely on the routers to figure out what's right.
|
||||
hints = {}
|
||||
|
||||
if self.query:
|
||||
hints['models'] = self.query.models
|
||||
|
||||
backend_alias = connection_router.for_read(**hints)
|
||||
|
||||
if isinstance(backend_alias, (list, tuple)) and len(backend_alias):
|
||||
# We can only effectively read from one engine.
|
||||
backend_alias = backend_alias[0]
|
||||
|
||||
# The ``SearchQuery`` might swap itself out for a different variant
|
||||
# here.
|
||||
if self.query:
|
||||
self.query = self.query.using(backend_alias)
|
||||
else:
|
||||
self.query = connections[backend_alias].get_query()
|
||||
|
||||
def __getstate__(self):
|
||||
"""
|
||||
For pickling.
|
||||
"""
|
||||
len(self)
|
||||
obj_dict = self.__dict__.copy()
|
||||
obj_dict['_iter'] = None
|
||||
obj_dict['log'] = None
|
||||
return obj_dict
|
||||
|
||||
def __setstate__(self, data_dict):
|
||||
"""
|
||||
For unpickling.
|
||||
"""
|
||||
self.__dict__ = data_dict
|
||||
self.log = logging.getLogger('haystack')
|
||||
|
||||
def __repr__(self):
|
||||
data = list(self[:REPR_OUTPUT_SIZE])
|
||||
|
||||
if len(self) > REPR_OUTPUT_SIZE:
|
||||
data[-1] = "...(remaining elements truncated)..."
|
||||
|
||||
return repr(data)
|
||||
|
||||
def __len__(self):
|
||||
if not self._result_count:
|
||||
self._result_count = self.query.get_count()
|
||||
|
||||
# Some backends give weird, false-y values here. Convert to zero.
|
||||
if not self._result_count:
|
||||
self._result_count = 0
|
||||
|
||||
# This needs to return the actual number of hits, not what's in the cache.
|
||||
return self._result_count - self._ignored_result_count
|
||||
|
||||
def __iter__(self):
|
||||
if self._cache_is_full():
|
||||
# We've got a fully populated cache. Let Python do the hard work.
|
||||
return iter(self._result_cache)
|
||||
|
||||
return self._manual_iter()
|
||||
|
||||
def __and__(self, other):
|
||||
if isinstance(other, EmptySearchQuerySet):
|
||||
return other._clone()
|
||||
combined = self._clone()
|
||||
combined.query.combine(other.query, SQ.AND)
|
||||
return combined
|
||||
|
||||
def __or__(self, other):
|
||||
combined = self._clone()
|
||||
if isinstance(other, EmptySearchQuerySet):
|
||||
return combined
|
||||
combined.query.combine(other.query, SQ.OR)
|
||||
return combined
|
||||
|
||||
def _cache_is_full(self):
|
||||
if not self.query.has_run():
|
||||
return False
|
||||
|
||||
if len(self) <= 0:
|
||||
return True
|
||||
|
||||
try:
|
||||
self._result_cache.index(None)
|
||||
return False
|
||||
except ValueError:
|
||||
# No ``None``s found in the results. Check the length of the cache.
|
||||
return len(self._result_cache) > 0
|
||||
|
||||
def _manual_iter(self):
|
||||
# If we're here, our cache isn't fully populated.
|
||||
# For efficiency, fill the cache as we go if we run out of results.
|
||||
# Also, this can't be part of the __iter__ method due to Python's rules
|
||||
# about generator functions.
|
||||
current_position = 0
|
||||
current_cache_max = 0
|
||||
|
||||
while True:
|
||||
if len(self._result_cache) > 0:
|
||||
try:
|
||||
current_cache_max = self._result_cache.index(None)
|
||||
except ValueError:
|
||||
current_cache_max = len(self._result_cache)
|
||||
|
||||
while current_position < current_cache_max:
|
||||
yield self._result_cache[current_position]
|
||||
current_position += 1
|
||||
|
||||
if self._cache_is_full():
|
||||
raise StopIteration
|
||||
|
||||
# We've run out of results and haven't hit our limit.
|
||||
# Fill more of the cache.
|
||||
if not self._fill_cache(current_position, current_position + ITERATOR_LOAD_PER_QUERY):
|
||||
raise StopIteration
|
||||
|
||||
def _fill_cache(self, start, end, **kwargs):
|
||||
# Tell the query where to start from and how many we'd like.
|
||||
self.query._reset()
|
||||
self.query.set_limits(start, end)
|
||||
results = self.query.get_results(**kwargs)
|
||||
|
||||
if results == None or len(results) == 0:
|
||||
return False
|
||||
|
||||
# Setup the full cache now that we know how many results there are.
|
||||
# We need the ``None``s as placeholders to know what parts of the
|
||||
# cache we have/haven't filled.
|
||||
# Using ``None`` like this takes up very little memory. In testing,
|
||||
# an array of 100,000 ``None``s consumed less than .5 Mb, which ought
|
||||
# to be an acceptable loss for consistent and more efficient caching.
|
||||
if len(self._result_cache) == 0:
|
||||
self._result_cache = [None for i in range(self.query.get_count())]
|
||||
|
||||
if start is None:
|
||||
start = 0
|
||||
|
||||
if end is None:
|
||||
end = self.query.get_count()
|
||||
|
||||
to_cache = self.post_process_results(results)
|
||||
|
||||
# Assign by slice.
|
||||
self._result_cache[start:start + len(to_cache)] = to_cache
|
||||
return True
|
||||
|
||||
def post_process_results(self, results):
|
||||
to_cache = []
|
||||
|
||||
# Check if we wish to load all objects.
|
||||
if self._load_all:
|
||||
models_pks = {}
|
||||
loaded_objects = {}
|
||||
|
||||
# Remember the search position for each result so we don't have to resort later.
|
||||
for result in results:
|
||||
models_pks.setdefault(result.model, []).append(result.pk)
|
||||
|
||||
# Load the objects for each model in turn.
|
||||
for model in models_pks:
|
||||
try:
|
||||
ui = connections[self.query._using].get_unified_index()
|
||||
index = ui.get_index(model)
|
||||
objects = index.read_queryset(using=self.query._using)
|
||||
loaded_objects[model] = objects.in_bulk(models_pks[model])
|
||||
except NotHandled:
|
||||
self.log.warning("Model '%s' not handled by the routers", model)
|
||||
# Revert to old behaviour
|
||||
loaded_objects[model] = model._default_manager.in_bulk(models_pks[model])
|
||||
|
||||
for result in results:
|
||||
if self._load_all:
|
||||
# We have to deal with integer keys being cast from strings
|
||||
model_objects = loaded_objects.get(result.model, {})
|
||||
if not result.pk in model_objects:
|
||||
try:
|
||||
result.pk = int(result.pk)
|
||||
except ValueError:
|
||||
pass
|
||||
try:
|
||||
result._object = model_objects[result.pk]
|
||||
except KeyError:
|
||||
# The object was either deleted since we indexed or should
|
||||
# be ignored; fail silently.
|
||||
self._ignored_result_count += 1
|
||||
continue
|
||||
|
||||
to_cache.append(result)
|
||||
|
||||
return to_cache
|
||||
|
||||
def __getitem__(self, k):
|
||||
"""
|
||||
Retrieves an item or slice from the set of results.
|
||||
"""
|
||||
if not isinstance(k, (slice, six.integer_types)):
|
||||
raise TypeError
|
||||
assert ((not isinstance(k, slice) and (k >= 0))
|
||||
or (isinstance(k, slice) and (k.start is None or k.start >= 0)
|
||||
and (k.stop is None or k.stop >= 0))), \
|
||||
"Negative indexing is not supported."
|
||||
|
||||
# Remember if it's a slice or not. We're going to treat everything as
|
||||
# a slice to simply the logic and will `.pop()` at the end as needed.
|
||||
if isinstance(k, slice):
|
||||
is_slice = True
|
||||
start = k.start
|
||||
|
||||
if k.stop is not None:
|
||||
bound = int(k.stop)
|
||||
else:
|
||||
bound = None
|
||||
else:
|
||||
is_slice = False
|
||||
start = k
|
||||
bound = k + 1
|
||||
|
||||
# We need check to see if we need to populate more of the cache.
|
||||
if len(self._result_cache) <= 0 or (None in self._result_cache[start:bound] and not self._cache_is_full()):
|
||||
try:
|
||||
self._fill_cache(start, bound)
|
||||
except StopIteration:
|
||||
# There's nothing left, even though the bound is higher.
|
||||
pass
|
||||
|
||||
# Cache should be full enough for our needs.
|
||||
if is_slice:
|
||||
return self._result_cache[start:bound]
|
||||
else:
|
||||
return self._result_cache[start]
|
||||
|
||||
# Methods that return a SearchQuerySet.
|
||||
def all(self):
|
||||
"""Returns all results for the query."""
|
||||
return self._clone()
|
||||
|
||||
def none(self):
|
||||
"""Returns an empty result list for the query."""
|
||||
return self._clone(klass=EmptySearchQuerySet)
|
||||
|
||||
def filter(self, *args, **kwargs):
|
||||
"""Narrows the search based on certain attributes and the default operator."""
|
||||
if DEFAULT_OPERATOR == 'OR':
|
||||
return self.filter_or(*args, **kwargs)
|
||||
else:
|
||||
return self.filter_and(*args, **kwargs)
|
||||
|
||||
def exclude(self, *args, **kwargs):
|
||||
"""Narrows the search by ensuring certain attributes are not included."""
|
||||
clone = self._clone()
|
||||
clone.query.add_filter(~SQ(*args, **kwargs))
|
||||
return clone
|
||||
|
||||
def filter_and(self, *args, **kwargs):
|
||||
"""Narrows the search by looking for (and including) certain attributes."""
|
||||
clone = self._clone()
|
||||
clone.query.add_filter(SQ(*args, **kwargs))
|
||||
return clone
|
||||
|
||||
def filter_or(self, *args, **kwargs):
|
||||
"""Narrows the search by ensuring certain attributes are not included."""
|
||||
clone = self._clone()
|
||||
clone.query.add_filter(SQ(*args, **kwargs), use_or=True)
|
||||
return clone
|
||||
|
||||
def order_by(self, *args):
|
||||
"""Alters the order in which the results should appear."""
|
||||
clone = self._clone()
|
||||
|
||||
for field in args:
|
||||
clone.query.add_order_by(field)
|
||||
|
||||
return clone
|
||||
|
||||
def highlight(self):
|
||||
"""Adds highlighting to the results."""
|
||||
clone = self._clone()
|
||||
clone.query.add_highlight()
|
||||
return clone
|
||||
|
||||
def models(self, *models):
|
||||
"""Accepts an arbitrary number of Model classes to include in the search."""
|
||||
clone = self._clone()
|
||||
|
||||
for model in models:
|
||||
if not model in connections[self.query._using].get_unified_index().get_indexed_models():
|
||||
warnings.warn('The model %r is not registered for search.' % (model,))
|
||||
|
||||
clone.query.add_model(model)
|
||||
|
||||
return clone
|
||||
|
||||
def result_class(self, klass):
|
||||
"""
|
||||
Allows specifying a different class to use for results.
|
||||
|
||||
Overrides any previous usages. If ``None`` is provided, Haystack will
|
||||
revert back to the default ``SearchResult`` object.
|
||||
"""
|
||||
clone = self._clone()
|
||||
clone.query.set_result_class(klass)
|
||||
return clone
|
||||
|
||||
def boost(self, term, boost):
|
||||
"""Boosts a certain aspect of the query."""
|
||||
clone = self._clone()
|
||||
clone.query.add_boost(term, boost)
|
||||
return clone
|
||||
|
||||
def facet(self, field, **options):
|
||||
"""Adds faceting to a query for the provided field."""
|
||||
clone = self._clone()
|
||||
clone.query.add_field_facet(field, **options)
|
||||
return clone
|
||||
|
||||
def within(self, field, point_1, point_2):
|
||||
"""Spatial: Adds a bounding box search to the query."""
|
||||
clone = self._clone()
|
||||
clone.query.add_within(field, point_1, point_2)
|
||||
return clone
|
||||
|
||||
def dwithin(self, field, point, distance):
|
||||
"""Spatial: Adds a distance-based search to the query."""
|
||||
clone = self._clone()
|
||||
clone.query.add_dwithin(field, point, distance)
|
||||
return clone
|
||||
|
||||
def stats(self, field):
|
||||
"""Adds stats to a query for the provided field."""
|
||||
return self.stats_facet(field, facet_fields=None)
|
||||
|
||||
def stats_facet(self, field, facet_fields=None):
|
||||
"""Adds stats facet for the given field and facet_fields represents
|
||||
the faceted fields."""
|
||||
clone = self._clone()
|
||||
stats_facets = []
|
||||
try:
|
||||
stats_facets.append(sum(facet_fields,[]))
|
||||
except TypeError:
|
||||
if facet_fields: stats_facets.append(facet_fields)
|
||||
clone.query.add_stats_query(field,stats_facets)
|
||||
return clone
|
||||
|
||||
def distance(self, field, point):
|
||||
"""
|
||||
Spatial: Denotes results must have distance measurements from the
|
||||
provided point.
|
||||
"""
|
||||
clone = self._clone()
|
||||
clone.query.add_distance(field, point)
|
||||
return clone
|
||||
|
||||
def date_facet(self, field, start_date, end_date, gap_by, gap_amount=1):
|
||||
"""Adds faceting to a query for the provided field by date."""
|
||||
clone = self._clone()
|
||||
clone.query.add_date_facet(field, start_date, end_date, gap_by, gap_amount=gap_amount)
|
||||
return clone
|
||||
|
||||
def query_facet(self, field, query):
|
||||
"""Adds faceting to a query for the provided field with a custom query."""
|
||||
clone = self._clone()
|
||||
clone.query.add_query_facet(field, query)
|
||||
return clone
|
||||
|
||||
def narrow(self, query):
|
||||
"""Pushes existing facet choices into the search."""
|
||||
|
||||
if isinstance(query, SQ):
|
||||
# produce query string using empty query of the same class
|
||||
empty_query = self.query._clone()
|
||||
empty_query._reset()
|
||||
query = query.as_query_string(empty_query.build_query_fragment)
|
||||
|
||||
clone = self._clone()
|
||||
clone.query.add_narrow_query(query)
|
||||
return clone
|
||||
|
||||
def raw_search(self, query_string, **kwargs):
|
||||
"""Passes a raw query directly to the backend."""
|
||||
return self.filter(content=Raw(query_string, **kwargs))
|
||||
|
||||
def load_all(self):
|
||||
"""Efficiently populates the objects in the search results."""
|
||||
clone = self._clone()
|
||||
clone._load_all = True
|
||||
return clone
|
||||
|
||||
def auto_query(self, query_string, fieldname='content'):
|
||||
"""
|
||||
Performs a best guess constructing the search query.
|
||||
|
||||
This method is somewhat naive but works well enough for the simple,
|
||||
common cases.
|
||||
"""
|
||||
kwargs = {
|
||||
fieldname: AutoQuery(query_string)
|
||||
}
|
||||
return self.filter(**kwargs)
|
||||
|
||||
def autocomplete(self, **kwargs):
|
||||
"""
|
||||
A shortcut method to perform an autocomplete search.
|
||||
|
||||
Must be run against fields that are either ``NgramField`` or
|
||||
``EdgeNgramField``.
|
||||
"""
|
||||
clone = self._clone()
|
||||
query_bits = []
|
||||
|
||||
for field_name, query in kwargs.items():
|
||||
for word in query.split(' '):
|
||||
bit = clone.query.clean(word.strip())
|
||||
if bit:
|
||||
kwargs = {
|
||||
field_name: bit,
|
||||
}
|
||||
query_bits.append(SQ(**kwargs))
|
||||
|
||||
return clone.filter(six.moves.reduce(operator.__and__, query_bits))
|
||||
|
||||
def using(self, connection_name):
|
||||
"""
|
||||
Allows switching which connection the ``SearchQuerySet`` uses to
|
||||
search in.
|
||||
"""
|
||||
clone = self._clone()
|
||||
clone.query = self.query.using(connection_name)
|
||||
clone._using = connection_name
|
||||
return clone
|
||||
|
||||
# Methods that do not return a SearchQuerySet.
|
||||
|
||||
def count(self):
|
||||
"""Returns the total number of matching results."""
|
||||
return len(self)
|
||||
|
||||
def best_match(self):
|
||||
"""Returns the best/top search result that matches the query."""
|
||||
return self[0]
|
||||
|
||||
def latest(self, date_field):
|
||||
"""Returns the most recent search result that matches the query."""
|
||||
clone = self._clone()
|
||||
clone.query.clear_order_by()
|
||||
clone.query.add_order_by("-%s" % date_field)
|
||||
return clone.best_match()
|
||||
|
||||
def more_like_this(self, model_instance):
|
||||
"""Finds similar results to the object passed in."""
|
||||
clone = self._clone()
|
||||
clone.query.more_like_this(model_instance)
|
||||
return clone
|
||||
|
||||
def facet_counts(self):
|
||||
"""
|
||||
Returns the facet counts found by the query.
|
||||
|
||||
This will cause the query to execute and should generally be used when
|
||||
presenting the data.
|
||||
"""
|
||||
if self.query.has_run():
|
||||
return self.query.get_facet_counts()
|
||||
else:
|
||||
clone = self._clone()
|
||||
return clone.query.get_facet_counts()
|
||||
|
||||
def stats_results(self):
|
||||
"""
|
||||
Returns the stats results found by the query.
|
||||
"""
|
||||
if self.query.has_run():
|
||||
return self.query.get_stats()
|
||||
else:
|
||||
clone = self._clone()
|
||||
return clone.query.get_stats()
|
||||
|
||||
def spelling_suggestion(self, preferred_query=None):
|
||||
"""
|
||||
Returns the spelling suggestion found by the query.
|
||||
|
||||
To work, you must set ``INCLUDE_SPELLING`` within your connection's
|
||||
settings dictionary to ``True``. Otherwise, ``None`` will be returned.
|
||||
|
||||
This will cause the query to execute and should generally be used when
|
||||
presenting the data.
|
||||
"""
|
||||
if self.query.has_run():
|
||||
return self.query.get_spelling_suggestion(preferred_query)
|
||||
else:
|
||||
clone = self._clone()
|
||||
return clone.query.get_spelling_suggestion(preferred_query)
|
||||
|
||||
def values(self, *fields):
|
||||
"""
|
||||
Returns a list of dictionaries, each containing the key/value pairs for
|
||||
the result, exactly like Django's ``ValuesQuerySet``.
|
||||
"""
|
||||
qs = self._clone(klass=ValuesSearchQuerySet)
|
||||
qs._fields.extend(fields)
|
||||
return qs
|
||||
|
||||
def values_list(self, *fields, **kwargs):
|
||||
"""
|
||||
Returns a list of field values as tuples, exactly like Django's
|
||||
``QuerySet.values``.
|
||||
|
||||
Optionally accepts a ``flat=True`` kwarg, which in the case of a
|
||||
single field being provided, will return a flat list of that field
|
||||
rather than a list of tuples.
|
||||
"""
|
||||
flat = kwargs.pop("flat", False)
|
||||
|
||||
if flat and len(fields) > 1:
|
||||
raise TypeError("'flat' is not valid when values_list is called with more than one field.")
|
||||
|
||||
qs = self._clone(klass=ValuesListSearchQuerySet)
|
||||
qs._fields.extend(fields)
|
||||
qs._flat = flat
|
||||
return qs
|
||||
|
||||
# Utility methods.
|
||||
|
||||
def _clone(self, klass=None):
|
||||
if klass is None:
|
||||
klass = self.__class__
|
||||
|
||||
query = self.query._clone()
|
||||
clone = klass(query=query)
|
||||
clone._load_all = self._load_all
|
||||
return clone
|
||||
|
||||
|
||||
class EmptySearchQuerySet(SearchQuerySet):
|
||||
"""
|
||||
A stubbed SearchQuerySet that behaves as normal but always returns no
|
||||
results.
|
||||
"""
|
||||
def __len__(self):
|
||||
return 0
|
||||
|
||||
def _cache_is_full(self):
|
||||
# Pretend the cache is always full with no results.
|
||||
return True
|
||||
|
||||
def _clone(self, klass=None):
|
||||
clone = super(EmptySearchQuerySet, self)._clone(klass=klass)
|
||||
clone._result_cache = []
|
||||
return clone
|
||||
|
||||
def _fill_cache(self, start, end):
|
||||
return False
|
||||
|
||||
def facet_counts(self):
|
||||
return {}
|
||||
|
||||
|
||||
class ValuesListSearchQuerySet(SearchQuerySet):
|
||||
"""
|
||||
A ``SearchQuerySet`` which returns a list of field values as tuples, exactly
|
||||
like Django's ``ValuesListQuerySet``.
|
||||
"""
|
||||
def __init__(self, *args, **kwargs):
|
||||
super(ValuesListSearchQuerySet, self).__init__(*args, **kwargs)
|
||||
self._flat = False
|
||||
self._fields = []
|
||||
|
||||
# Removing this dependency would require refactoring much of the backend
|
||||
# code (_process_results, etc.) and these aren't large enough to make it
|
||||
# an immediate priority:
|
||||
self._internal_fields = ['id', 'django_ct', 'django_id', 'score']
|
||||
|
||||
def _clone(self, klass=None):
|
||||
clone = super(ValuesListSearchQuerySet, self)._clone(klass=klass)
|
||||
clone._fields = self._fields
|
||||
clone._flat = self._flat
|
||||
return clone
|
||||
|
||||
def _fill_cache(self, start, end):
|
||||
query_fields = set(self._internal_fields)
|
||||
query_fields.update(self._fields)
|
||||
kwargs = {
|
||||
'fields': query_fields
|
||||
}
|
||||
return super(ValuesListSearchQuerySet, self)._fill_cache(start, end, **kwargs)
|
||||
|
||||
def post_process_results(self, results):
|
||||
to_cache = []
|
||||
|
||||
if self._flat:
|
||||
accum = to_cache.extend
|
||||
else:
|
||||
accum = to_cache.append
|
||||
|
||||
for result in results:
|
||||
accum([getattr(result, i, None) for i in self._fields])
|
||||
|
||||
return to_cache
|
||||
|
||||
|
||||
class ValuesSearchQuerySet(ValuesListSearchQuerySet):
|
||||
"""
|
||||
A ``SearchQuerySet`` which returns a list of dictionaries, each containing
|
||||
the key/value pairs for the result, exactly like Django's
|
||||
``ValuesQuerySet``.
|
||||
"""
|
||||
def _fill_cache(self, start, end):
|
||||
query_fields = set(self._internal_fields)
|
||||
query_fields.update(self._fields)
|
||||
kwargs = {
|
||||
'fields': query_fields
|
||||
}
|
||||
return super(ValuesListSearchQuerySet, self)._fill_cache(start, end, **kwargs)
|
||||
|
||||
def post_process_results(self, results):
|
||||
to_cache = []
|
||||
|
||||
for result in results:
|
||||
to_cache.append(dict((i, getattr(result, i, None)) for i in self._fields))
|
||||
|
||||
return to_cache
|
||||
|
||||
|
||||
class RelatedSearchQuerySet(SearchQuerySet):
|
||||
"""
|
||||
A variant of the SearchQuerySet that can handle `load_all_queryset`s.
|
||||
|
||||
This is predominantly different in the `_fill_cache` method, as it is
|
||||
far less efficient but needs to fill the cache before it to maintain
|
||||
consistency.
|
||||
"""
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
super(RelatedSearchQuerySet, self).__init__(*args, **kwargs)
|
||||
self._load_all_querysets = {}
|
||||
self._result_cache = []
|
||||
|
||||
def _cache_is_full(self):
|
||||
return len(self._result_cache) >= len(self)
|
||||
|
||||
def _manual_iter(self):
|
||||
# If we're here, our cache isn't fully populated.
|
||||
# For efficiency, fill the cache as we go if we run out of results.
|
||||
# Also, this can't be part of the __iter__ method due to Python's rules
|
||||
# about generator functions.
|
||||
current_position = 0
|
||||
current_cache_max = 0
|
||||
|
||||
while True:
|
||||
current_cache_max = len(self._result_cache)
|
||||
|
||||
while current_position < current_cache_max:
|
||||
yield self._result_cache[current_position]
|
||||
current_position += 1
|
||||
|
||||
if self._cache_is_full():
|
||||
raise StopIteration
|
||||
|
||||
# We've run out of results and haven't hit our limit.
|
||||
# Fill more of the cache.
|
||||
start = current_position + self._ignored_result_count
|
||||
|
||||
if not self._fill_cache(start, start + ITERATOR_LOAD_PER_QUERY):
|
||||
raise StopIteration
|
||||
|
||||
def _fill_cache(self, start, end):
|
||||
# Tell the query where to start from and how many we'd like.
|
||||
self.query._reset()
|
||||
self.query.set_limits(start, end)
|
||||
results = self.query.get_results()
|
||||
|
||||
if len(results) == 0:
|
||||
return False
|
||||
|
||||
if start is None:
|
||||
start = 0
|
||||
|
||||
if end is None:
|
||||
end = self.query.get_count()
|
||||
|
||||
# Check if we wish to load all objects.
|
||||
if self._load_all:
|
||||
models_pks = {}
|
||||
loaded_objects = {}
|
||||
|
||||
# Remember the search position for each result so we don't have to resort later.
|
||||
for result in results:
|
||||
models_pks.setdefault(result.model, []).append(result.pk)
|
||||
|
||||
# Load the objects for each model in turn.
|
||||
for model in models_pks:
|
||||
if model in self._load_all_querysets:
|
||||
# Use the overriding queryset.
|
||||
loaded_objects[model] = self._load_all_querysets[model].in_bulk(models_pks[model])
|
||||
else:
|
||||
# Check the SearchIndex for the model for an override.
|
||||
try:
|
||||
index = connections[self.query._using].get_unified_index().get_index(model)
|
||||
qs = index.load_all_queryset()
|
||||
loaded_objects[model] = qs.in_bulk(models_pks[model])
|
||||
except NotHandled:
|
||||
# The model returned doesn't seem to be handled by the
|
||||
# routers. We should silently fail and populate
|
||||
# nothing for those objects.
|
||||
loaded_objects[model] = []
|
||||
|
||||
if len(results) + len(self._result_cache) < len(self) and len(results) < ITERATOR_LOAD_PER_QUERY:
|
||||
self._ignored_result_count += ITERATOR_LOAD_PER_QUERY - len(results)
|
||||
|
||||
for result in results:
|
||||
if self._load_all:
|
||||
# We have to deal with integer keys being cast from strings; if this
|
||||
# fails we've got a character pk.
|
||||
try:
|
||||
result.pk = int(result.pk)
|
||||
except ValueError:
|
||||
pass
|
||||
try:
|
||||
result._object = loaded_objects[result.model][result.pk]
|
||||
except (KeyError, IndexError):
|
||||
# The object was either deleted since we indexed or should
|
||||
# be ignored; fail silently.
|
||||
self._ignored_result_count += 1
|
||||
continue
|
||||
|
||||
self._result_cache.append(result)
|
||||
|
||||
return True
|
||||
|
||||
def __getitem__(self, k):
|
||||
"""
|
||||
Retrieves an item or slice from the set of results.
|
||||
"""
|
||||
if not isinstance(k, (slice, six.integer_types)):
|
||||
raise TypeError
|
||||
assert ((not isinstance(k, slice) and (k >= 0))
|
||||
or (isinstance(k, slice) and (k.start is None or k.start >= 0)
|
||||
and (k.stop is None or k.stop >= 0))), \
|
||||
"Negative indexing is not supported."
|
||||
|
||||
# Remember if it's a slice or not. We're going to treat everything as
|
||||
# a slice to simply the logic and will `.pop()` at the end as needed.
|
||||
if isinstance(k, slice):
|
||||
is_slice = True
|
||||
start = k.start
|
||||
|
||||
if k.stop is not None:
|
||||
bound = int(k.stop)
|
||||
else:
|
||||
bound = None
|
||||
else:
|
||||
is_slice = False
|
||||
start = k
|
||||
bound = k + 1
|
||||
|
||||
# We need check to see if we need to populate more of the cache.
|
||||
if len(self._result_cache) <= 0 or not self._cache_is_full():
|
||||
try:
|
||||
while len(self._result_cache) < bound and not self._cache_is_full():
|
||||
current_max = len(self._result_cache) + self._ignored_result_count
|
||||
self._fill_cache(current_max, current_max + ITERATOR_LOAD_PER_QUERY)
|
||||
except StopIteration:
|
||||
# There's nothing left, even though the bound is higher.
|
||||
pass
|
||||
|
||||
# Cache should be full enough for our needs.
|
||||
if is_slice:
|
||||
return self._result_cache[start:bound]
|
||||
else:
|
||||
return self._result_cache[start]
|
||||
|
||||
def load_all_queryset(self, model, queryset):
|
||||
"""
|
||||
Allows for specifying a custom ``QuerySet`` that changes how ``load_all``
|
||||
will fetch records for the provided model.
|
||||
|
||||
This is useful for post-processing the results from the query, enabling
|
||||
things like adding ``select_related`` or filtering certain data.
|
||||
"""
|
||||
clone = self._clone()
|
||||
clone._load_all_querysets[model] = queryset
|
||||
return clone
|
||||
|
||||
def _clone(self, klass=None):
|
||||
if klass is None:
|
||||
klass = self.__class__
|
||||
|
||||
query = self.query._clone()
|
||||
clone = klass(query=query)
|
||||
clone._load_all = self._load_all
|
||||
clone._load_all_querysets = self._load_all_querysets
|
||||
return clone
|
|
@ -0,0 +1,18 @@
|
|||
# encoding: utf-8
|
||||
|
||||
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||
|
||||
from haystack.constants import DEFAULT_ALIAS
|
||||
|
||||
|
||||
class BaseRouter(object):
|
||||
# Reserved for future extension.
|
||||
pass
|
||||
|
||||
|
||||
class DefaultRouter(BaseRouter):
|
||||
def for_read(self, **hints):
|
||||
return DEFAULT_ALIAS
|
||||
|
||||
def for_write(self, **hints):
|
||||
return DEFAULT_ALIAS
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue