Adds a whole comparison with other ORM caches.
This commit is contained in:
parent
6249ade208
commit
130be2a434
|
@ -4,7 +4,7 @@ API
|
|||
---
|
||||
|
||||
Use these tools to interact with django-cachalot, especially if you face
|
||||
:ref:`Raw queries limits` or if you need to create a cache key from the
|
||||
:ref:`raw queries limits <Raw SQL queries>` or if you need to create a cache key from the
|
||||
last table invalidation timestamp.
|
||||
|
||||
.. automodule:: cachalot.api
|
||||
|
|
|
@ -112,7 +112,7 @@ pygments_style = 'sphinx'
|
|||
|
||||
# The theme to use for HTML and HTML Help pages. See the documentation for
|
||||
# a list of builtin themes.
|
||||
html_theme = 'default'
|
||||
html_theme = 'sphinx_rtd_theme'
|
||||
|
||||
# Theme options are theme-specific and customize the look and feel of a theme
|
||||
# further. For a list of options available for each theme, see the
|
||||
|
|
|
@ -24,6 +24,7 @@ Caches your Django ORM queries and automatically invalidates them.
|
|||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
introduction
|
||||
quickstart
|
||||
limits
|
||||
api
|
||||
|
|
|
@ -0,0 +1,210 @@
|
|||
Introduction
|
||||
------------
|
||||
|
||||
Should you use it?
|
||||
..................
|
||||
|
||||
Django-cachalot is the perfect speedup tool for most Django projects.
|
||||
It will speedup a website of 100 000 visits per month without any problem.
|
||||
In fact, **the more visitors you have, the faster the website becomes**.
|
||||
That’s because every possible SQL query on the project ends up being cached.
|
||||
|
||||
Django-cachalot is especially efficient in the Django administration website
|
||||
since it’s unfortunately badly optimised (use foreign keys in list_editable
|
||||
if you need to be convinced).
|
||||
|
||||
However, it’s not suited for projects where there is **a high number
|
||||
of modifications per minute** on each table, like a social network with
|
||||
more than a 30 messages per minute. Django-cachalot may still give a small
|
||||
speedup in such cases, but it may also slow things a bit
|
||||
(in the worst case scenario, a 20% slowdown,
|
||||
according to :ref:`the benchmark <Benchmark>`).
|
||||
If you have a website like that, optimising your SQL database and queries
|
||||
is the number one thing you have to do.
|
||||
|
||||
There is also an obvious case where you don’t need django-cachalot:
|
||||
when the project is already fast enough (all pages load in less than 300 ms).
|
||||
Like any other dependency, django-cachalot is a potential source of problems
|
||||
(even though it’s currently bug free).
|
||||
Don’t use dependencies you can avoid, a “future you” may thank you for that.
|
||||
|
||||
Features
|
||||
........
|
||||
|
||||
- **Saves in cache the results of any SQL query** generated by the Django ORM
|
||||
that reads data. These saved results are then returned instead
|
||||
of executing the same SQL query, which is faster.
|
||||
- The first time a query is executed is about 10% slower, then the following
|
||||
times are way faster (7× faster being the average).
|
||||
- Automatically invalidates saved results,
|
||||
so that **you never get stale results**.
|
||||
- **Invalidates per table, not per object**: if you change an object,
|
||||
all the queries done on other objects of the same model are also invalidated.
|
||||
This is unfortunately technically impossible to make a reliable
|
||||
per-object cache. Don’t be fooled by packages pretending having
|
||||
that per-object feature, they are unreliable and dangerous for your data.
|
||||
- **Handles everything in the ORM**. You can use the most advanced features
|
||||
from the ORM without a single issue, django-cachalot is extremely robust.
|
||||
- An easy control thanks to :ref:`settings` and :ref:`a simple API <API>`.
|
||||
But that’s only required if you have a complex infrastructure. Most people
|
||||
will never use settings or the API.
|
||||
- A few bonus features like
|
||||
:ref:`a signal triggered at each database change <Signal>`
|
||||
(including bulk changes) and
|
||||
:ref:`a template tag for a better template fragment caching <Template tag>`.
|
||||
|
||||
Comparison with similar tools
|
||||
.............................
|
||||
|
||||
This comparison was done in October 2015. It compares django-cachalot
|
||||
to the other popular automatic ORM caches at the moment:
|
||||
`django-cache-machine <https://github.com/django-cache-machine/django-cache-machine>`_
|
||||
& `django-cacheops <https://github.com/Suor/django-cacheops>`_.
|
||||
|
||||
Features
|
||||
~~~~~~~~
|
||||
|
||||
======================================================== ========= ============= =========
|
||||
Feature cachalot cache-machine cacheops
|
||||
======================================================== ========= ============= =========
|
||||
Type of invalidation per table per object per table
|
||||
CPU & memory performance optimal bad terrible
|
||||
Easy to install ✔ ✘ quite
|
||||
Cache agnostic ✔ ✔ ✘
|
||||
Reliable ✔ ✘ quite
|
||||
Handles ``QuerySet.count`` ✔ ✘ ✔
|
||||
Handles empty queries ✔ ✘ ✔
|
||||
Handles multi-table inheritance ✔ probably not ✘
|
||||
Handles proxy models ✔ ✘ ✔
|
||||
Handles many-to-many fields ✔ ✘ ✔
|
||||
Handles transactions ✔ probably not ✘
|
||||
Handles ``QuerySet.aggregate``/``annotate`` ✔ probably not ✘
|
||||
Handles ``QuerySet.bulk_create``/``update``/``delete`` ✔ probably not ✘
|
||||
Handles ``QuerySet.select_related``/``prefetch_related`` ✔ partially ✘
|
||||
Handles ``cursor.execute`` ✔ ✘ ✘
|
||||
Handles GeoDjango ✔ maybe ✔
|
||||
Handles django.contrib.postgres ✔ maybe partially
|
||||
======================================================== ========= ============= =========
|
||||
|
||||
To find if a package supports a feature, I searched in the documentation,
|
||||
the issues, the tests and the code.
|
||||
I really tried to avoid writing “maybe”, “probably not”, etc.
|
||||
Unfortunately, the absence of tests for such cases and sometimes the confusion
|
||||
of the authors themselves about these features makes it difficult to know
|
||||
whether they support a feature or not.
|
||||
|
||||
Explanations
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Of course, I can’t just throw a table with such
|
||||
“Reliable” and “CPU & memory performance” lines without explanation.
|
||||
My goal is not to start another stupid open source conflict, nor
|
||||
to be pretentious about my work. I’m just trying to inform users here, so they
|
||||
can fully grasp the consequences of using one or another tool.
|
||||
I actually used django-cache-machine in production for a week
|
||||
and django-cacheops for a month. On both solutions, I faced a lot
|
||||
of invalidation issues, and the bigger the cache became,
|
||||
the worst the performance was.
|
||||
|
||||
I now know the reason of these issues: in short, this is due to
|
||||
their invalidation systems. Read the following paragraphs for more detail.
|
||||
|
||||
django-cache-machine
|
||||
''''''''''''''''''''
|
||||
|
||||
django-cache-machine is using “flush lists” to remember which SQL queries are
|
||||
linked to which objects. This is the approach I chose when I created
|
||||
a prototype of django-cachalot, except it was invalidated per table,
|
||||
not per object like django-cache-machine does. Unfortunately, there are several
|
||||
important issues due to this approach that lead me to drop it.
|
||||
|
||||
The smaller issue is that each time you execute a new SQL query,
|
||||
django-cache-machine needs to fetch the “flush list” from the cache,
|
||||
update it and add it back to the cache. This means we have to make two
|
||||
cache calls in addition of the cache call to store the SQL query results.
|
||||
It may seem small tiny, but when your cache size increases,
|
||||
the “flush lists” start becoming huge (a list of hundreds of cache keys
|
||||
for each database object), leading to an exponentially growing cache size
|
||||
and a longer time to fetch the always-growing “flush list”.
|
||||
So **bad memory and CPU usage**.
|
||||
|
||||
The second issue is only linked to the per object invalidation.
|
||||
When django-cache-machine invalidates an object, it also needs to invalidate
|
||||
the queries of the related objects, otherwise they may contain stale data.
|
||||
Django-cache-machine invalidates foreign keys only, not many-to-many
|
||||
or generic foreign keys (because… I don’t know). This degrades performance
|
||||
at each writing operation to the database, because it needs to fetch
|
||||
related objects, fetch “flush lists” and delete these cache keys.
|
||||
And of course it can’t invalidate basic queries such as count or empty queries
|
||||
(probably aggregations too, but I’m not sure).
|
||||
|
||||
And at last but not least: a critical issue. It simply proves that the
|
||||
django-cache-machine team **doesn’t know how caches work**.
|
||||
Caches are fast because they are stupid: when your cache is full and
|
||||
needs room, it randomly fetches a few keys, selects the older ones if possible
|
||||
then deletes them. This means that **a cache key with a 1 year timeout
|
||||
can be deleted before a cache key with a 1 minute timeout**.
|
||||
But django-cache-machine assumes its “flush lists” will always stay longer
|
||||
in cache than the saved query results will, because they have the same timeout
|
||||
and “flush list” are saved a few milli-seconds after query results.
|
||||
Until the cache is full, this is kind of true because no cache key is deleted.
|
||||
But when it is full, the “flush list” can be removed at any moment,
|
||||
so the other cache keys will never be invalidated until they are deleted.
|
||||
|
||||
**To sum up, django-cache-machine has bad memory and CPU performance
|
||||
and is absolutely not reliable.**
|
||||
|
||||
django-cacheops
|
||||
'''''''''''''''
|
||||
|
||||
django-cacheops uses
|
||||
`a debug feature from Redis, KEYS, <http://redis.io/commands/KEYS>`_
|
||||
to invalidate cache keys (that’s why it only supports Redis).
|
||||
It’s a feature that becomes linearly slower as your cache size grows.
|
||||
I measured, one single call of this command by django-cacheops
|
||||
slows down any database save by 50 ms to 3.5 seconds,
|
||||
depending on your database and cache sizes.
|
||||
The problem is also that django-cacheops runs this command several times
|
||||
at each save. Suppose you have a model with 3 many-to-many. Suppose you save
|
||||
an object with 3 related objects each many-to-many. django-cacheops will run
|
||||
the Redis ``KEYS`` command at least 10 times! If you have
|
||||
a large cache and database, it means **you can wait 30 seconds
|
||||
while this object is saved!**
|
||||
|
||||
Another bad consequence of that use of the ``KEYS`` command is that Redis jumps
|
||||
to a 100% CPU usage when the command is running, degrading performance for
|
||||
other users or even blocking them until the command is finished.
|
||||
|
||||
In a general way, the workflow of django-cacheops is totally unoptimised.
|
||||
When an object is modified, an ``invalidate_obj`` function is called,
|
||||
calling an ``invalidate_dict`` function, calling the ``manage.py invalidate``
|
||||
command with a serialized version of the object (yes!)
|
||||
calling an ``invalidate_model`` function that calls the Redis ``KEYS`` command
|
||||
to get all the cache keys from that model then delete them.
|
||||
And as I said above, it executes all that N times,
|
||||
N being the number of related objects to the current object,
|
||||
even though multiple objects have the same model and we therefore
|
||||
don’t need to invalidate the model multiple times.
|
||||
|
||||
**To sum up, django-cacheops has a terrible performance,
|
||||
but is reliable on what it handles.
|
||||
If you set it up correctly and never use some features such as
|
||||
transactions (used by Django admin),
|
||||
multi-table inheritance, or
|
||||
raw queries (the three features being used by Wagtail and django CMS),
|
||||
you’re good to go.**
|
||||
|
||||
Number of lines of code
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Django-cachalot tries to be as minimalist as possible, while handling most
|
||||
use cases. Being minimalist is essential to create maintainable projects,
|
||||
and having a large test suite is essential to get an excellent quality.
|
||||
The statistics below speak for themselves…
|
||||
|
||||
============ ======== ============= ========
|
||||
Project part cachalot cache-machine cacheops
|
||||
============ ======== ============= ========
|
||||
Application 743 843 1662
|
||||
Tests 3023 659 1491
|
||||
============ ======== ============= ========
|
|
@ -44,6 +44,7 @@ memcached. If you use Ubuntu and installed the package, you can modify
|
|||
per cache key to 10 MB, and if you want increase the already existing ``-m 64``
|
||||
to something like ``-m 1000`` to set the maximum cache size to 1 GB.
|
||||
|
||||
.. _Locmem:
|
||||
|
||||
Locmem
|
||||
......
|
||||
|
@ -62,6 +63,8 @@ If you use range fields from `django.contrib.postgres` and your Django
|
|||
version is affected by this bug, you need to add the tables using range fields
|
||||
to :ref:`CACHALOT_UNCACHABLE_TABLES`.
|
||||
|
||||
.. _MySQL:
|
||||
|
||||
MySQL
|
||||
.....
|
||||
|
||||
|
@ -72,7 +75,7 @@ Django-cachalot will slow down your queries if that query cache is enabled.
|
|||
If it’s not enabled, django-cachalot will make queries much faster.
|
||||
But you should probably better enable the query cache instead.
|
||||
|
||||
.. _Raw queries limits:
|
||||
.. _Raw SQL queries:
|
||||
|
||||
Raw SQL queries
|
||||
...............
|
||||
|
|
|
@ -1,33 +1,6 @@
|
|||
Quick start
|
||||
-----------
|
||||
|
||||
Should you use it?
|
||||
..................
|
||||
|
||||
Django-cachalot is the perfect speedup tool for most Django projects.
|
||||
It will speedup a website of 100 000 visits per month without any problem.
|
||||
In fact, **the more visitors you have, the faster the website becomes**.
|
||||
That’s because every possible SQL query on the project ends up being cached.
|
||||
|
||||
Django-cachalot is especially efficient in the Django administration website
|
||||
since it’s unfortunately badly optimised (use foreign keys in list_editable
|
||||
if you need to be convinced).
|
||||
|
||||
However, it’s not suited for projects where there is **a high number
|
||||
of modifications per minute** on each table, like a social network with
|
||||
more than a 30 messages per minute. Django-cachalot may still give a small
|
||||
speedup in such cases, but it may also slow things a bit
|
||||
(in the worst case scenario, a 20% slowdown,
|
||||
according to :ref:`the benchmark <Benchmark>`).
|
||||
If you have a website like that, optimising your SQL database and queries
|
||||
is the number one thing you have to do.
|
||||
|
||||
There is also an obvious case where you don’t need django-cachalot:
|
||||
when the project is already fast enough (all pages load in less than 300 ms).
|
||||
Like any other dependency, django-cachalot is a potential source of problems
|
||||
(even though it’s currently bug free).
|
||||
Don’t use dependencies you can avoid, a “future you” may thank you for that.
|
||||
|
||||
Requirements
|
||||
............
|
||||
|
||||
|
@ -40,14 +13,14 @@ Requirements
|
|||
(using either python-memcached or pylibmc)
|
||||
- `filebased <https://docs.djangoproject.com/en/1.7/topics/cache/#filesystem-caching>`_
|
||||
- `locmem <https://docs.djangoproject.com/en/1.7/topics/cache/#local-memory-caching>`_
|
||||
(but it’s not shared between processes, see :ref:`Limits`)
|
||||
(but it’s not shared between processes, see :ref:`locmem limits <Locmem>`)
|
||||
|
||||
- one of these databases:
|
||||
|
||||
- PostgreSQL
|
||||
- SQLite
|
||||
- MySQL (but you probably don’t need django-cachalot in this case,
|
||||
see :ref:`Limits`)
|
||||
see :ref:`MySQL limits <MySQL>`)
|
||||
|
||||
Usage
|
||||
.....
|
||||
|
@ -56,17 +29,19 @@ Usage
|
|||
#. Add ``'cachalot',`` to your ``INSTALLED_APPS``
|
||||
#. If you use multiple servers with a common cache server,
|
||||
:ref:`double check their clock synchronisation <multiple servers>`
|
||||
#. If you modify data outside Django
|
||||
– typically after restoring a SQL database –, run
|
||||
``./manage.py invalidate_cachalot``
|
||||
#. Be aware of :ref:`the few other limits <limits>`
|
||||
#. If you use
|
||||
`django-debug-toolbar <https://github.com/django-debug-toolbar/django-debug-toolbar>`_,
|
||||
you can add ``'cachalot.panels.CachalotPanel',``
|
||||
to your ``DEBUG_TOOLBAR_PANELS``
|
||||
#. If you need to invalidate all django-cachalot cache keys from an external script
|
||||
– typically after restoring a SQL database –, simply run
|
||||
``./manage.py invalidate_cachalot``
|
||||
#. Enjoy!
|
||||
|
||||
|
||||
.. _Settings:
|
||||
|
||||
Settings
|
||||
........
|
||||
|
||||
|
@ -100,7 +75,7 @@ Settings
|
|||
|
||||
:Default: ``True``
|
||||
:Description: If set to ``False``, disables automatic invalidation on raw
|
||||
SQL queries – read :ref:`Raw queries limits` for more info
|
||||
SQL queries – read :ref:`raw queries limits <Raw SQL queries>` for more info
|
||||
|
||||
|
||||
``CACHALOT_ONLY_CACHABLE_TABLES``
|
||||
|
@ -168,6 +143,8 @@ For example:
|
|||
settings.CACHALOT_ENABLED = False
|
||||
|
||||
|
||||
.. _Template tag:
|
||||
|
||||
Template tag
|
||||
............
|
||||
|
||||
|
@ -215,6 +192,8 @@ are also available (see
|
|||
:meth:`cachalot.api.get_last_invalidation`).
|
||||
|
||||
|
||||
.. _Signal:
|
||||
|
||||
Signal
|
||||
......
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@ What still needs to be done
|
|||
---------------------------
|
||||
|
||||
- Cache raw queries (may not be possible due to database cursors
|
||||
being written in C)
|
||||
being written in C)
|
||||
- Test multi-location caches if possible
|
||||
- Allow setting `CACHALOT_CACHE` to `None` in order to disable django-cachalot
|
||||
persistence. SQL queries would only be cached during transactions, so setting
|
||||
|
|
Loading…
Reference in New Issue