TODO clean html consistent schema (_book _metadata) ? /usr/bin/convert ? ------------------------ Create a virtualenv virtualenv create booksearch_env cd booksearch_env # activate this virtualenv . ./bin/activate Check for /usr/bin/convert (see web.py Popen) Install dependencies easy_install pip pip install whoosh pip install pypdf pip install flask pip install pdfminer ( Clone ) git clone http://xapek.org/~yvesf/repos/booksearch.git cd booksearch Create index python indexer.py ~/my_books Test index python query.py query> test Term('content', 'test') Match in /home/XXXXX6-4.pdf Match in /home/XXXXX6-4.pdf 2 results query> Run Webapp python web.py * Running on http://0.0.0.0:5000/ * Restarting with reloader... Check your whoosh version: http://bitbucket.org/mchaput/whoosh/issue/48/temp-directories-are-not-deleted-when Using gunicorn $ pip install gunicorn $ pip install eventlet $ gunicorn -w `getconf _NPROCESSORS_ONLN` -b 0.0.0.0:8000 web:app #freebsd: $ gunicorn -w `sysctl -n kern.smp.cpus` -b 0.0.0.0:8000 web:app # http://localhost:8000 gunicorn