summaryrefslogtreecommitdiff
path: root/INSTALL
blob: c2ff89fd13243f3b982acffdd73df7eef9c5fd56 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
TODO

clean html
consistent schema (_book _metadata)
 ? /usr/bin/convert ?
check pdfminer for better text-extraction (whitespace)



------------------------
Create a virtualenv
   virtualenv create booksearch_env
   cd booksearch_env
   # activate this virtualenv
   . ./bin/activate

Check for /usr/bin/convert (see web.py Popen)

Install dependencies
   easy_install pip
   pip install whoosh
   pip install pypdf
   pip install flask

(  Clone  )
   git clone http://xapek.org/~yvesf/repos/booksearch.git
   cd booksearch

Create index
   python indexer.py ~/my_books

Test index
  python query.py
    query> test
    Term('content', 'test')
    Match in /home/XXXXX6-4.pdf
    Match in /home/XXXXX6-4.pdf
    2 results
    query> 

Run Webapp
  python web.py
   * Running on http://0.0.0.0:5000/
   * Restarting with reloader...


Check your whoosh version:
    http://bitbucket.org/mchaput/whoosh/issue/48/temp-directories-are-not-deleted-when

Using gunicorn
    $ pip install gunicorn
    $ pip install eventlet
    $ gunicorn -w `getconf _NPROCESSORS_ONLN` -b 0.0.0.0:8000 web:app
     #freebsd:
    $ gunicorn -w `sysctl -n kern.smp.cpus` -b 0.0.0.0:8000 web:app
    # http://localhost:8000

    gunicorn