Adds a whole comparison with other ORM caches.

This commit is contained in:
Bertrand Bordage 2015-10-25 20:08:18 +01:00
parent 6249ade208
commit 130be2a434
7 changed files with 230 additions and 37 deletions

View File

@ -4,7 +4,7 @@ API
---
Use these tools to interact with django-cachalot, especially if you face
:ref:`Raw queries limits` or if you need to create a cache key from the
:ref:`raw queries limits <Raw SQL queries>` or if you need to create a cache key from the
last table invalidation timestamp.
.. automodule:: cachalot.api

View File

@ -112,7 +112,7 @@ pygments_style = 'sphinx'
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'default'
html_theme = 'sphinx_rtd_theme'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the

View File

@ -24,6 +24,7 @@ Caches your Django ORM queries and automatically invalidates them.
.. toctree::
:maxdepth: 2
introduction
quickstart
limits
api

210
docs/introduction.rst Normal file
View File

@ -0,0 +1,210 @@
Introduction
------------
Should you use it?
..................
Django-cachalot is the perfect speedup tool for most Django projects.
It will speedup a website of 100 000 visits per month without any problem.
In fact, **the more visitors you have, the faster the website becomes**.
Thats because every possible SQL query on the project ends up being cached.
Django-cachalot is especially efficient in the Django administration website
since its unfortunately badly optimised (use foreign keys in list_editable
if you need to be convinced).
However, its not suited for projects where there is **a high number
of modifications per minute** on each table, like a social network with
more than a 30 messages per minute. Django-cachalot may still give a small
speedup in such cases, but it may also slow things a bit
(in the worst case scenario, a 20% slowdown,
according to :ref:`the benchmark <Benchmark>`).
If you have a website like that, optimising your SQL database and queries
is the number one thing you have to do.
There is also an obvious case where you dont need django-cachalot:
when the project is already fast enough (all pages load in less than 300 ms).
Like any other dependency, django-cachalot is a potential source of problems
(even though its currently bug free).
Dont use dependencies you can avoid, a “future you” may thank you for that.
Features
........
- **Saves in cache the results of any SQL query** generated by the Django ORM
that reads data. These saved results are then returned instead
of executing the same SQL query, which is faster.
- The first time a query is executed is about 10% slower, then the following
times are way faster (7× faster being the average).
- Automatically invalidates saved results,
so that **you never get stale results**.
- **Invalidates per table, not per object**: if you change an object,
all the queries done on other objects of the same model are also invalidated.
This is unfortunately technically impossible to make a reliable
per-object cache. Dont be fooled by packages pretending having
that per-object feature, they are unreliable and dangerous for your data.
- **Handles everything in the ORM**. You can use the most advanced features
from the ORM without a single issue, django-cachalot is extremely robust.
- An easy control thanks to :ref:`settings` and :ref:`a simple API <API>`.
But thats only required if you have a complex infrastructure. Most people
will never use settings or the API.
- A few bonus features like
:ref:`a signal triggered at each database change <Signal>`
(including bulk changes) and
:ref:`a template tag for a better template fragment caching <Template tag>`.
Comparison with similar tools
.............................
This comparison was done in October 2015. It compares django-cachalot
to the other popular automatic ORM caches at the moment:
`django-cache-machine <https://github.com/django-cache-machine/django-cache-machine>`_
& `django-cacheops <https://github.com/Suor/django-cacheops>`_.
Features
~~~~~~~~
======================================================== ========= ============= =========
Feature cachalot cache-machine cacheops
======================================================== ========= ============= =========
Type of invalidation per table per object per table
CPU & memory performance optimal bad terrible
Easy to install ✔ ✘ quite
Cache agnostic ✔ ✔ ✘
Reliable ✔ ✘ quite
Handles ``QuerySet.count`` ✔ ✘ ✔
Handles empty queries ✔ ✘ ✔
Handles multi-table inheritance ✔ probably not ✘
Handles proxy models ✔ ✘ ✔
Handles many-to-many fields ✔ ✘ ✔
Handles transactions ✔ probably not ✘
Handles ``QuerySet.aggregate``/``annotate`` ✔ probably not ✘
Handles ``QuerySet.bulk_create``/``update``/``delete`` ✔ probably not ✘
Handles ``QuerySet.select_related``/``prefetch_related`` ✔ partially ✘
Handles ``cursor.execute`` ✔ ✘ ✘
Handles GeoDjango ✔ maybe ✔
Handles django.contrib.postgres ✔ maybe partially
======================================================== ========= ============= =========
To find if a package supports a feature, I searched in the documentation,
the issues, the tests and the code.
I really tried to avoid writing “maybe”, “probably not”, etc.
Unfortunately, the absence of tests for such cases and sometimes the confusion
of the authors themselves about these features makes it difficult to know
whether they support a feature or not.
Explanations
~~~~~~~~~~~~
Of course, I cant just throw a table with such
“Reliable” and “CPU & memory performance” lines without explanation.
My goal is not to start another stupid open source conflict, nor
to be pretentious about my work. Im just trying to inform users here, so they
can fully grasp the consequences of using one or another tool.
I actually used django-cache-machine in production for a week
and django-cacheops for a month. On both solutions, I faced a lot
of invalidation issues, and the bigger the cache became,
the worst the performance was.
I now know the reason of these issues: in short, this is due to
their invalidation systems. Read the following paragraphs for more detail.
django-cache-machine
''''''''''''''''''''
django-cache-machine is using “flush lists” to remember which SQL queries are
linked to which objects. This is the approach I chose when I created
a prototype of django-cachalot, except it was invalidated per table,
not per object like django-cache-machine does. Unfortunately, there are several
important issues due to this approach that lead me to drop it.
The smaller issue is that each time you execute a new SQL query,
django-cache-machine needs to fetch the “flush list” from the cache,
update it and add it back to the cache. This means we have to make two
cache calls in addition of the cache call to store the SQL query results.
It may seem small tiny, but when your cache size increases,
the “flush lists” start becoming huge (a list of hundreds of cache keys
for each database object), leading to an exponentially growing cache size
and a longer time to fetch the always-growing “flush list”.
So **bad memory and CPU usage**.
The second issue is only linked to the per object invalidation.
When django-cache-machine invalidates an object, it also needs to invalidate
the queries of the related objects, otherwise they may contain stale data.
Django-cache-machine invalidates foreign keys only, not many-to-many
or generic foreign keys (because… I dont know). This degrades performance
at each writing operation to the database, because it needs to fetch
related objects, fetch “flush lists” and delete these cache keys.
And of course it cant invalidate basic queries such as count or empty queries
(probably aggregations too, but Im not sure).
And at last but not least: a critical issue. It simply proves that the
django-cache-machine team **doesnt know how caches work**.
Caches are fast because they are stupid: when your cache is full and
needs room, it randomly fetches a few keys, selects the older ones if possible
then deletes them. This means that **a cache key with a 1 year timeout
can be deleted before a cache key with a 1 minute timeout**.
But django-cache-machine assumes its “flush lists” will always stay longer
in cache than the saved query results will, because they have the same timeout
and “flush list” are saved a few milli-seconds after query results.
Until the cache is full, this is kind of true because no cache key is deleted.
But when it is full, the “flush list” can be removed at any moment,
so the other cache keys will never be invalidated until they are deleted.
**To sum up, django-cache-machine has bad memory and CPU performance
and is absolutely not reliable.**
django-cacheops
'''''''''''''''
django-cacheops uses
`a debug feature from Redis, KEYS, <http://redis.io/commands/KEYS>`_
to invalidate cache keys (thats why it only supports Redis).
Its a feature that becomes linearly slower as your cache size grows.
I measured, one single call of this command by django-cacheops
slows down any database save by 50 ms to 3.5 seconds,
depending on your database and cache sizes.
The problem is also that django-cacheops runs this command several times
at each save. Suppose you have a model with 3 many-to-many. Suppose you save
an object with 3 related objects each many-to-many. django-cacheops will run
the Redis ``KEYS`` command at least 10 times! If you have
a large cache and database, it means **you can wait 30 seconds
while this object is saved!**
Another bad consequence of that use of the ``KEYS`` command is that Redis jumps
to a 100% CPU usage when the command is running, degrading performance for
other users or even blocking them until the command is finished.
In a general way, the workflow of django-cacheops is totally unoptimised.
When an object is modified, an ``invalidate_obj`` function is called,
calling an ``invalidate_dict`` function, calling the ``manage.py invalidate``
command with a serialized version of the object (yes!)
calling an ``invalidate_model`` function that calls the Redis ``KEYS`` command
to get all the cache keys from that model then delete them.
And as I said above, it executes all that N times,
N being the number of related objects to the current object,
even though multiple objects have the same model and we therefore
dont need to invalidate the model multiple times.
**To sum up, django-cacheops has a terrible performance,
but is reliable on what it handles.
If you set it up correctly and never use some features such as
transactions (used by Django admin),
multi-table inheritance, or
raw queries (the three features being used by Wagtail and django CMS),
youre good to go.**
Number of lines of code
~~~~~~~~~~~~~~~~~~~~~~~
Django-cachalot tries to be as minimalist as possible, while handling most
use cases. Being minimalist is essential to create maintainable projects,
and having a large test suite is essential to get an excellent quality.
The statistics below speak for themselves…
============ ======== ============= ========
Project part cachalot cache-machine cacheops
============ ======== ============= ========
Application 743 843 1662
Tests 3023 659 1491
============ ======== ============= ========

View File

@ -44,6 +44,7 @@ memcached. If you use Ubuntu and installed the package, you can modify
per cache key to 10 MB, and if you want increase the already existing ``-m 64``
to something like ``-m 1000`` to set the maximum cache size to 1 GB.
.. _Locmem:
Locmem
......
@ -62,6 +63,8 @@ If you use range fields from `django.contrib.postgres` and your Django
version is affected by this bug, you need to add the tables using range fields
to :ref:`CACHALOT_UNCACHABLE_TABLES`.
.. _MySQL:
MySQL
.....
@ -72,7 +75,7 @@ Django-cachalot will slow down your queries if that query cache is enabled.
If its not enabled, django-cachalot will make queries much faster.
But you should probably better enable the query cache instead.
.. _Raw queries limits:
.. _Raw SQL queries:
Raw SQL queries
...............

View File

@ -1,33 +1,6 @@
Quick start
-----------
Should you use it?
..................
Django-cachalot is the perfect speedup tool for most Django projects.
It will speedup a website of 100 000 visits per month without any problem.
In fact, **the more visitors you have, the faster the website becomes**.
Thats because every possible SQL query on the project ends up being cached.
Django-cachalot is especially efficient in the Django administration website
since its unfortunately badly optimised (use foreign keys in list_editable
if you need to be convinced).
However, its not suited for projects where there is **a high number
of modifications per minute** on each table, like a social network with
more than a 30 messages per minute. Django-cachalot may still give a small
speedup in such cases, but it may also slow things a bit
(in the worst case scenario, a 20% slowdown,
according to :ref:`the benchmark <Benchmark>`).
If you have a website like that, optimising your SQL database and queries
is the number one thing you have to do.
There is also an obvious case where you dont need django-cachalot:
when the project is already fast enough (all pages load in less than 300 ms).
Like any other dependency, django-cachalot is a potential source of problems
(even though its currently bug free).
Dont use dependencies you can avoid, a “future you” may thank you for that.
Requirements
............
@ -40,14 +13,14 @@ Requirements
(using either python-memcached or pylibmc)
- `filebased <https://docs.djangoproject.com/en/1.7/topics/cache/#filesystem-caching>`_
- `locmem <https://docs.djangoproject.com/en/1.7/topics/cache/#local-memory-caching>`_
(but its not shared between processes, see :ref:`Limits`)
(but its not shared between processes, see :ref:`locmem limits <Locmem>`)
- one of these databases:
- PostgreSQL
- SQLite
- MySQL (but you probably dont need django-cachalot in this case,
see :ref:`Limits`)
see :ref:`MySQL limits <MySQL>`)
Usage
.....
@ -56,17 +29,19 @@ Usage
#. Add ``'cachalot',`` to your ``INSTALLED_APPS``
#. If you use multiple servers with a common cache server,
:ref:`double check their clock synchronisation <multiple servers>`
#. If you modify data outside Django
 typically after restoring a SQL database , run
``./manage.py invalidate_cachalot``
#. Be aware of :ref:`the few other limits <limits>`
#. If you use
`django-debug-toolbar <https://github.com/django-debug-toolbar/django-debug-toolbar>`_,
you can add ``'cachalot.panels.CachalotPanel',``
to your ``DEBUG_TOOLBAR_PANELS``
#. If you need to invalidate all django-cachalot cache keys from an external script
 typically after restoring a SQL database , simply run
``./manage.py invalidate_cachalot``
#. Enjoy!
.. _Settings:
Settings
........
@ -100,7 +75,7 @@ Settings
:Default: ``True``
:Description: If set to ``False``, disables automatic invalidation on raw
SQL queries read :ref:`Raw queries limits` for more info
SQL queries read :ref:`raw queries limits <Raw SQL queries>` for more info
``CACHALOT_ONLY_CACHABLE_TABLES``
@ -168,6 +143,8 @@ For example:
settings.CACHALOT_ENABLED = False
.. _Template tag:
Template tag
............
@ -215,6 +192,8 @@ are also available (see
:meth:`cachalot.api.get_last_invalidation`).
.. _Signal:
Signal
......

View File

@ -2,7 +2,7 @@ What still needs to be done
---------------------------
- Cache raw queries (may not be possible due to database cursors
being written in C)
being written in C)
- Test multi-location caches if possible
- Allow setting `CACHALOT_CACHE` to `None` in order to disable django-cachalot
persistence. SQL queries would only be cached during transactions, so setting