Extending DokangΒΆ

Dokang currently supports a single backend: Whoosh. Whoosh is responsible for the indexation and the actual search. As of now, Dokang does not let you easily use another backend such as Elasticsearch. Contributions are welcome.

However, you may want to add your own harvester. The harvester is responsible for retrieving data (title and content) from a document. Dokang provides a few harvesters but you may implement your own.

A harvester should be a subclass of dokang.harvesters.Harvester and implement a harvest_file(path) method that should return a dictionary with the following keys.

title
The title of the document.
content
The concatenated content of the document.
kind
The kind of document: HTML, PDF, etc.

Here is an example of a simple harvester for text files.

import os

from dokang.harvesters import Harvester

class TextHarvester(Harvester):

    def harvest_file(path):
        with open(path, encoding='utf-8') as fp:
            return {
                'title': os.path.basename(path),  # Use the filename as the title
                'content: 'fp.read()',
                'kind': 'TXT',
            }