This repository has been archived on 2023-02-21. You can view files and clone it, but cannot push or open issues or pull requests.
bistro/stackexchange/README

63 lines
1.9 KiB
Plaintext

this app's function will be to:
* install it's own tables (#todo: not yet automated)
* read SE xml dump into DjangoDB (automated)
* populate askbot database (automated)
* remove SE tables (#todo: not done yet)
Current process to load SE data into Askbot is:
==============================================
1) backup database
2) unzip SE dump into dump_dir (any directory name)
you may want to make sure that your dump directory in .gitignore file
so that you don't publish it by mistake
3) enable 'stackexchange' in the list of installed apps (probably aready in settings.py)
4) (optional - create models.py for SE, which is included anyway) run:
#a) run in-place removal of xml namspace prefix to make parsing easier
perl -pi -w -e 's/xs://g' $SE_DUMP_PATH/xsd/*.xsd
cd stackexchange
python parse_models.py $SE_DUMP_PATH/xsd/*.xsd > models.py
5) Install stackexchange models (as well as any other missing models)
python manage.py syncdb
6) make sure that badges are installed
if not, run (example for mysql):
mysql -u user -p dbname < sql_scripts/badges.sql
7) load SE data:
python manage.py load_stackexchange dump_dir
if anything doesn't go right - run 'python manage.py flush' and repeat
steps 6 and 7
NOTES:
============
Here is the load script that I used for the testing
it assumes that SE dump has been unzipped inside the tmp directory
#!/bin/sh$
python manage.py flush
#delete all data
mysql -u askbot -p aksbot < sql_scripts/badges.sql
python manage.py load_stackexchange tmp
Untested parts are tagged with comments starting with
#todo:
The test set did not have all the usage cases of StackExchange represented so
it may break with other sets.
The job takes some time to run, especially
content revisions and votes - may be optimized
Some of the fringe cases are described in file stackexchange/ANOMALIES