API

Backends

Index and search backends for Whoosh.

class dokang.backends.whoosh.WhooshIndexer(index_path)

Encapsulate indexation through Whoosh.

initialize()

Initialize the index.

If an index already exists, it is deleted and recreated from scratch.

clear_set(doc_set)

Remove all documents of this set from the index.

index_documents(documents)

Add or update documents in the index.

delete_documents(doc_set, paths)

Delete documents from the index.

class dokang.backends.whoosh.WhooshSearcher(index_path)

Encapsulate search through Whoosh.

get_hashes()

Return the hash of each indexed document.

search(query_string, limit=None)

Search the query string in the index.

Predefined harvesting configuration

dokang.harvesters.html.html_config(harvester=<class 'dokang.harvesters.html.HtmlHarvester'>, include=None, exclude=None, **extensions)

Return a configuration that is suitable for an HTML document set.

dokang.harvesters.sphinx.sphinx_config(harvester=<class 'dokang.harvesters.sphinx.SphinxHarvester'>, include=None, exclude=None, **extensions)

Return a configuration that is suitable for a Sphinx-based documentation.

If the documentation uses “Read The Docs” theme, you should rather use sphinx_rtd_config.

dokang.harvesters.sphinx.sphinx_rtd_config(harvester=<class 'dokang.harvesters.sphinx.ReadTheDocsSphinxHarvester'>, include=None, exclude=None, **extensions)

Return a configuration that is suitable for a Sphinx-based documentation that uses the ReadTheDocs theme.

Harvesters

class dokang.harvesters.base.Harvester

An abstract class for all harvesters.

class dokang.harvesters.html.HtmlHarvester

Harvest content from HTML files.

class dokang.harvesters.sphinx.SphinxHarvester

Harvest content from the HTML rendered version of a Sphinx-based set of documents.

We look at the rendered HTML and not the source files to avoid wrongly indexing files included with the include directive.

class dokang.harvesters.sphinx.ReadTheDocsSphinxHarvester

Harvest content from the HTML rendered version of a Sphinx-based set of documents that uses the “Read The Docs” theme.

The “Read The Docs” theme does not generate the <div> that we look for in the super class. We have to look for a different one.