324 lines
14 KiB
Plaintext
324 lines
14 KiB
Plaintext
Quixote Session Management
|
|
==========================
|
|
|
|
HTTP was originally designed as a stateless protocol, meaning that every
|
|
request for a document or image was conducted in a separate TCP
|
|
connection, and that there was no way for a web server to tell if two
|
|
separate requests actually come from the same user. It's no longer
|
|
necessarily true that every request is conducted in a separate TCP
|
|
connection, but HTTP is still fundamentally stateless. However, there
|
|
are many applications where it is desirable or even essential to
|
|
establish a "session" for each user, ie. where all requests performed by
|
|
that user are somehow tied together on the server.
|
|
|
|
HTTP cookies were invented to address this requirement, and they are
|
|
still the best solution for establishing sessions on top of HTTP. Thus,
|
|
the session management mechanism that comes with Quixote is
|
|
cookie-based. (The most common alternative is to embed the session
|
|
identifier in the URL. Since Quixote views the URL as a fundamental
|
|
part of the web user interface, a URL-based session management scheme is
|
|
considered un-Quixotic.)
|
|
|
|
For further reading: the standard for cookies that is approximately
|
|
implemented by most current browsers is RFC 2109; the latest version of
|
|
the standard is RFC 2965.
|
|
|
|
In a nutshell, session management with Quixote works like this:
|
|
|
|
* when a user-agent first requests a page from a Quixote application
|
|
that implements session management, Quixote creates a Session object
|
|
and generates a session ID (a random 128-bit number). The Session
|
|
object is attached to the current HTTPRequest object, so that
|
|
application code involved in processing this request has access to
|
|
the Session object. The quixote.get_session() function provides
|
|
uniform access to the current Session object.
|
|
|
|
* if, at the end of processing that request, the application code has
|
|
stored any information in the Session object, Quixote saves the
|
|
session in its SessionManager object for use by future requests and
|
|
sends a session cookie, called ``QX_session`` by default, to the user.
|
|
The session cookie contains the session ID encoded as a URL-safe
|
|
base-64 string, and is included in the response headers, eg. ::
|
|
|
|
Set-Cookie: QX_session="pJX1bU47T-6hbfjP2f5pPA"
|
|
|
|
(You can instruct Quixote to specify the domain and path for
|
|
URLs to which this cookie should be sent.)
|
|
|
|
* the user agent stores this cookie for future requests
|
|
|
|
* the next time the user agent requests a resource that matches the
|
|
cookie's domain and path, it includes the ``QX_session`` cookie
|
|
previously generated by Quixote in the request headers, eg.::
|
|
|
|
Cookie: QX_session="pJX1bU47T-6hbfjP2f5pPA"
|
|
|
|
* while processing the request, Quixote decodes the session ID and
|
|
looks up the corresponding Session object in its SessionManager. If
|
|
there is no such session, the session cookie is bogus or
|
|
out-of-date, so Quixote raises SessionError; ultimately the user
|
|
gets an error page. Otherwise, the Session object is made
|
|
available, through the get_session() function, as the application
|
|
code processes the request.
|
|
|
|
There are two caveats to keep in mind before proceeding, one major and
|
|
one minor:
|
|
|
|
* Quixote's standard Session and SessionManager class do not
|
|
implement any sort of persistence, meaning that all sessions
|
|
disappear when the process handling web requests terminates.
|
|
Thus, session management is completely useless with a plain
|
|
CGI driver script unless you add some persistence to the mix;
|
|
see "Session persistence" below for information.
|
|
|
|
* Quixote never expires sessions; if you want user sessions to
|
|
be cleaned up after a period of inactivity, you will have to
|
|
write code to do it yourself.
|
|
|
|
|
|
Session management demo
|
|
-----------------------
|
|
|
|
There's a simple demo of Quixote's session management in demo/altdemo.py.
|
|
If the durus (http://www.mems-exchange.org/software/durus/) package is
|
|
installed, the demo uses a durus database to store sessions, so sessions
|
|
will be preserved, even if you are running it with plain cgi.
|
|
|
|
This particular application uses sessions to keep track of just two
|
|
things: the user's identity and the number of requests made in this
|
|
session. The first is addressed by Quixote's standard Session class --
|
|
every Session object has a ``user`` attribute, which you can use for
|
|
anything you like. In the session demo, we simply store a string, the
|
|
user's name, which is entered by the user.
|
|
|
|
Tracking the number of requests is a bit more interesting: from the
|
|
DemoSession class in altdemo.py::
|
|
|
|
def __init__ (self, id):
|
|
Session.__init__(self, id)
|
|
self.num_requests = 0
|
|
|
|
def start_request (self):
|
|
Session.start_request(self)
|
|
self.num_requests += 1
|
|
|
|
When the session is created, we initialize the request counter; and
|
|
when we start processing each request, we increment it. Using the
|
|
session information in the application code is simple. If you want the
|
|
value of the user attribute of the current session, just call
|
|
get_user(). If you want some other attribute or method Use
|
|
get_session() to get the current Session if you need access to other
|
|
attributes (such as ``num_requests`` in the demo) or methods of the
|
|
current Session instance.
|
|
|
|
Note that the Session class initializes the user attribute to None,
|
|
so get_user() will return None if no user has been identified for
|
|
this session. Application code can use this to change behavior,
|
|
as in the following::
|
|
|
|
if not get_user():
|
|
content += htmltext('<p>%s</p>' % href('login', 'login'))
|
|
else:
|
|
content += htmltext(
|
|
'<p>Hello, %s.</p>') % get_user()
|
|
content += htmltext('<p>%s</p>' % href('logout', 'logout'))
|
|
|
|
|
|
Note that we must quote the user's name, because they are free to enter
|
|
anything they please, including special HTML characters like ``&`` or
|
|
``<``.
|
|
|
|
Of course, ``session.user`` will never be set if we don't set it
|
|
ourselves. The code that processes the login form is just this (from
|
|
``login()`` in ``demo/altdemo.py``) ::
|
|
|
|
if get_field("name"):
|
|
session = get_session()
|
|
session.set_user(get_field("name")) # This is the important part.
|
|
|
|
This is obviously a very simple application -- we're not doing any
|
|
verification of the user's input. We have no user database, no
|
|
passwords, and no limitations on what constitutes a "user name". A real
|
|
application would have all of these, as well as a way for users to add
|
|
themselves to the user database -- ie. register with your web site.
|
|
|
|
|
|
Configuring the session cookie
|
|
------------------------------
|
|
|
|
Quixote allows you to configure several aspects of the session cookie
|
|
that it exchanges with clients. First, you can set the name of the
|
|
cookie; this is important if you have multiple independent Quixote
|
|
applications running on the same server. For example, the config file
|
|
for the first application might have ::
|
|
|
|
SESSION_COOKIE_NAME = "foo_session"
|
|
|
|
and the second application might have ::
|
|
|
|
SESSION_COOKIE_NAME = "bar_session"
|
|
|
|
Next, you can use ``SESSION_COOKIE_DOMAIN`` and ``SESSION_COOKIE_PATH``
|
|
to set the cookie attributes that control which requests the cookie is
|
|
included with. By default, these are both ``None``, which instructs
|
|
Quixote to send the cookie without ``Domain`` or ``Path`` qualifiers.
|
|
For example, if the client requests ``/foo/bar/`` from
|
|
www.example.com, and Quixote decides that it must set the session
|
|
cookie in the response to that request, then the server would send ::
|
|
|
|
Set-Cookie: QX_session="pJX1bU47T-6hbfjP2f5pPA"
|
|
|
|
in the response headers. Since no domain or path were specified with
|
|
that cookie, the browser will only include the cookie with requests to
|
|
www.example.com for URIs that start with ``/foo/bar/``.
|
|
|
|
If you want to ensure that your session cookie is included with all
|
|
requests to www.example.com, you should set ``SESSION_COOKIE_PATH`` in your
|
|
config file::
|
|
|
|
SESSION_COOKIE_PATH = "/"
|
|
|
|
which will cause Quixote to set the cookie like this::
|
|
|
|
Set-Cookie: QX_session="pJX1bU47T-6hbfjP2f5pPA"; Path="/"
|
|
|
|
which will instruct the browser to include that cookie with *all*
|
|
requests to www.example.com.
|
|
|
|
However, think carefully about what you set ``SESSION_COOKIE_PATH`` to
|
|
-- eg. if you set it to "/", but all of your Quixote code is under "/q/"
|
|
in your server's URL-space, then your user's session cookies could be
|
|
unnecessarily exposed. On shared servers where you don't control all of
|
|
the code, this is especially dangerous; be sure to use (eg.) ::
|
|
|
|
SESSION_COOKIE_PATH = "/q/"
|
|
|
|
on such servers. The trailing slash is important; without it, your
|
|
session cookies will be sent to URIs like ``/qux`` and ``/qix``, even if
|
|
you don't control those URIs.
|
|
|
|
If you want to share the cookie across servers in your domain,
|
|
eg. www1.example.com and www2.example.com, you'll also need to set
|
|
``SESSION_COOKIE_DOMAIN``:
|
|
|
|
SESSION_COOKIE_DOMAIN = ".example.com"
|
|
|
|
Finally, note that the ``SESSION_COOKIE_*`` configuration variables
|
|
*only* affect Quixote's session cookie; if you set your own cookies
|
|
using the ``HTTPResponse.set_cookie()`` method, then the cookie sent to
|
|
the client is completely determined by that ``set_cookie()`` call.
|
|
|
|
See RFCs 2109 and 2965 for more information on the rules browsers are
|
|
supposed to follow for including cookies with HTTP requests.
|
|
|
|
|
|
Writing the session class
|
|
-------------------------
|
|
|
|
You will almost certainly have to write a custom session class for your
|
|
application by subclassing Quixote's standard Session class. Every
|
|
custom session class has two essential responsibilities:
|
|
|
|
* initialize the attributes that will be used by your application
|
|
|
|
* override the ``has_info()`` method, so the session manager knows when
|
|
it must save your session object
|
|
|
|
The first one is fairly obvious and just good practice. The second is
|
|
essential, and not at all obvious. The has_info() method exists because
|
|
SessionManager does not automatically hang on to all session objects;
|
|
this is a defense against clients that ignore cookies, making your
|
|
session manager create lots of session objects that are just used once.
|
|
As long as those session objects are not saved, the burden imposed by
|
|
these clients is not too bad -- at least they aren't sucking up your
|
|
memory, or bogging down the database that you save session data to.
|
|
Thus, the session manager uses has_info() to know if it should hang on
|
|
to a session object or not: if a session has information that must be
|
|
saved, the session manager saves it and sends a session cookie to the
|
|
client.
|
|
|
|
For development/testing work, it's fine to say that your session objects
|
|
should always be saved::
|
|
|
|
def has_info (self):
|
|
return 1
|
|
|
|
The opposite extreme is to forget to override ``has_info()`` altogether,
|
|
in which case session management most likely won't work: unless you
|
|
tickle the Session object such that the base ``has_info()`` method
|
|
returns true, the session manager won't save the sessions that it
|
|
creates, and Quixote will never drop a session cookie on the client.
|
|
|
|
In a real application, you need to think carefully about what data to
|
|
store in your sessions, and how ``has_info()`` should react to the
|
|
presence of that data. If you try and track something about every
|
|
single visitor to your site, sooner or later one of those a
|
|
broken/malicious client that ignores cookies and ``robots.txt`` will
|
|
come along and crawl your entire site, wreaking havoc on your Quixote
|
|
application (or the database underlying it).
|
|
|
|
|
|
Session persistence
|
|
-------------------
|
|
|
|
Keeping session data across requests is all very nice, but in the real
|
|
world you want that data to survive across process termination. With
|
|
CGI, this is essential, since each process serves exactly one request
|
|
and then terminates. With other execution mechanisms, though, it's
|
|
still important -- you don't want to lose all your session data just
|
|
because your long-lived server process was restarted, or your server
|
|
machine was rebooted.
|
|
|
|
However, every application is different, so Quixote doesn't provide any
|
|
built-in mechanism for session persistence. Instead, it provides a
|
|
number of hooks, most in the SessionManager class, that let you plug in
|
|
your preferred persistence mechanism.
|
|
|
|
The first and most important hook is in the SessionManager
|
|
constructor: you can provide an alternate mapping object that
|
|
SessionManager will use to store session objects in. By default,
|
|
SessionManager uses an ordinary dictionary; if you provide a mapping
|
|
object that implements persistence, then your session data will
|
|
automatically persist across processes.
|
|
|
|
The second hook (two hooks, really) apply if you use a transactional
|
|
persistence mechanism to provide your SessionManager's mapping. The
|
|
``altdemo.py`` script does this with Durus, if the durus package is
|
|
installed, but you could also use ZODB or a relational database for
|
|
this purpose. The hooks make sure that session (and other) changes
|
|
get committed or aborted at the appropriate times. SessionManager
|
|
provides two methods for you to override: ``forget_changes()`` and
|
|
``commit_changes()``. ``forget_changes()`` is called by
|
|
SessionPublisher whenever a request crashes, ie. whenever your
|
|
application raises an exception other than PublishError.
|
|
``commit_changes()`` is called for requests that complete
|
|
successfully, or that raise a PublishError exception. You'll have to
|
|
use your own SessionManager subclass if you need to take advantage of
|
|
these hooks for transactional session persistence.
|
|
|
|
The third available hook is the Session's is_dirty() method. This is
|
|
used when your mapping class uses a more primitive storage mechanism,
|
|
as, for example, the standard 'shelve' module, which provides a
|
|
mapping object on top of a DBM or Berkeley DB file::
|
|
|
|
import shelve
|
|
sessions = shelve.open("/tmp/quixote-sessions")
|
|
session_manager = SessionManager(session_mapping=sessions)
|
|
|
|
If you use one of these relatively simple persistent mapping types,
|
|
you'll also need to override ``is_dirty()`` in your Session class.
|
|
That's in addition to overriding ``has_info()``, which determines if a
|
|
session object is *ever* saved; ``is_dirty()`` is only called on
|
|
sessions that have already been added to the session mapping, to see
|
|
if they need to be "re-added". The default implementation always
|
|
returns false, because once an object has been added to a normal
|
|
dictionary, there's no need to add it again. However, with simple
|
|
persistent mapping types like shelve, you need to store the object
|
|
again each time it changes. Thus, ``is_dirty()`` should return true
|
|
if the session object needs to be re-written. For a simple, naive,
|
|
but inefficient implementation, making is_dirty an alias for
|
|
``has_info()`` will work -- that just means that once the session has
|
|
been written once, it will be re-written on every request.
|
|
|
|
|