TODO

clean html
consistent schema (_book _metadata)
 ? /usr/bin/convert ?
check pdfminer for better text-extraction (whitespace)


------------------------
Create a virtualenv
   virtualenv create booksearch_env
   cd booksearch_env
   # activate this virtualenv
   . ./bin/activate

Check for /usr/bin/convert (see web.py Popen)

Install dependencies
   easy_install pip
   pip install whoosh
   pip install pypdf
   pip install flask

(  Clone  )
   git clone http://xapek.org/~yvesf/repos/booksearch.git
   cd booksearch

Create index
   python indexer.py ~/my_books

Test index
  python query.py
    query> test
    Term('content', 'test')
    Match in /home/XXXXX6-4.pdf
    Match in /home/XXXXX6-4.pdf
    2 results
    query> 

Run Webapp
  python web.py
   * Running on http://0.0.0.0:5000/
   * Restarting with reloader...


Check your whoosh version:
    http://bitbucket.org/mchaput/whoosh/issue/48/temp-directories-are-not-deleted-when

Using gunicorn
    $ pip install gunicorn
    $ pip install eventlet
    $ gunicorn -w `getconf _NPROCESSORS_ONLN` -b 0.0.0.0:8000 web:app
     #freebsd:
    $ gunicorn -w `sysctl -n kern.smp.cpus` -b 0.0.0.0:8000 web:app
    # http://localhost:8000

    gunicorn