* Start Listeners documentation
* intro tabel of different architectures
* initialization, linking, dim inference
* internal comm (WIP)
* expand internal comm section
* frozen components and replacing listeners
* various small fixes
* fix content table
* fix link
* overfitting test on non-overlapping entities
* add failing overfitting test for overlapping entities
* failing test for list comprehension
* remove test that was put in separate PR
* bugfix
* cleanup
* test for error after Doc has been garbage collected
* warn about using a SpanGroup when the Doc has been garbage collected
* add warning to the docs
* rephrase slightly
* raise error instead of warning
* update
* move warning to doc property
* Fix incorrect pickling of Japanese and Korean pipelines, which led to
the entire pipeline being reset if pickled
* Enable pickling of Vietnamese tokenizer
* Update tokenizer APIs for Chinese, Japanese, Korean, Thai, and
Vietnamese so that only the `Vocab` is required for initialization
* Refactor to use list comps and enumerate.
Replace loops that append to a list with a list comprehensions where this does not change the behavior; replace range(len(...)) loops with enumerate. Correct one typo in a comment. Replace a call to set() with a set literal.
* Undo double assignment.
Expand `tokens_to_key[j] = k = self._get_matcher_key(key, i, j)` to two statements.
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Sign contributors agreement
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Add training data section
Not entirely sure this is in the right location on the page - maybe it
should be after quickstart?
* Add pointer from binary format to training data section
* Minor cleanup
* Add to ToC, fix filename
* Update website/docs/usage/training.md
Co-authored-by: Ines Montani <ines@ines.io>
* Update website/docs/usage/training.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/usage/training.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Move the training data section further down the page
* Update website/docs/usage/training.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/usage/training.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Run prettier
Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Add training data section
Not entirely sure this is in the right location on the page - maybe it
should be after quickstart?
* Add pointer from binary format to training data section
* Minor cleanup
* Add to ToC, fix filename
* Update website/docs/usage/training.md
Co-authored-by: Ines Montani <ines@ines.io>
* Update website/docs/usage/training.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/usage/training.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Move the training data section further down the page
* Update website/docs/usage/training.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/usage/training.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Run prettier
Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>