* add language extensions for norwegian nynorsk and faroese
* update docstring for nn/examples.py
* use relative imports
* add fo and nn tokenizers to pytest fixtures
* add unittests for fo and nn and fix bug in nn
* remove module docstring from fo/__init__.py
* add comments about example sentences' origin
* add license information to faroese data credit
* format unittests using black
* add __init__ files to test/lang/nn and tests/lang/fo
* fix import order and use relative imports in fo/__nit__.py and nn/__init__.py
* Make the tests a bit more compact
* Add fo and nn to website languages
* Add note about jul.
* Add "jul." as exception
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* adding rolegal model to the spaCy universe
* Fix formatting
* Use raw URL
* update image url and example
* fix pip and update url to raw
* okay, let's add thumb instead of image 🐙
* Update website/meta/universe.json
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* initial
* initial documentation run
* fix typo
* Remove mentions of Torchscript and quantization
Both are disabled in the initial release of `spacy-curated-transformers`.
* Fix `piece_encoder` entries
* Remove `spacy-transformers`-specific warning
* Fix duplicate entries in tables
* Doc fixes
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Remove type aliases
* Fix copy-paste typo
* Change `debug pieces` version tag to `3.7`
* Set curated transformers API version to `3.7`
* Fix transformer listener naming
* Add docs for `init fill-config-transformer`
* Update CLI command invocation syntax
* Update intro section of the pipeline component docs
* Fix source URL
* Add a note to the architectures section about the `init fill-config-transformer` CLI command
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update CLI command name, args
* Remove hyphen from the `curated-transformers.mdx` filename
* Fix links
* Remove placeholder text
* Add text to the model/tokenizer loader sections
* Fill in the `DocTransformerOutput` section
* Formatting fixes
* Add curated transformer page to API docs sidebar
* More formatting fixes
* Remove TODO comment
* Remove outdated info about default config
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Add link to HF model hub
* `prettier`
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Support registered vectors
* Format
* Auto-fill [nlp] on load from config and from bytes/disk
* Only auto-fill [nlp]
* Undo all changes to Language.from_disk
* Expand BaseVectors
These methods are needed in various places for training and vector
similarity.
* isort
* More linting
* Only fill [nlp.vectors]
* Update spacy/vocab.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Revert changes to test related to auto-filling [nlp]
* Add vectors registry
* Rephrase error about vocab methods for vectors
* Switch to dummy implementation for BaseVectors.to_ops
* Add initial draft of docs
* Remove example from BaseVectors docs
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/api/basevectors.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix type and lint bpemb example
* Update website/docs/api/basevectors.mdx
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Add SpanMarker for NER to spaCy universe
* Escape the newlines in the text in the code example
Or at least, attempt to
* Remove now unnecessary import
* Disable NER pipeline component in code example
* span finder integrated into spacy from experimental
* black
* isort
* black
* default spankey constant
* black
* Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* rename
* rename
* max_length and min_length as Optional[int] and strict checking
* black
* mypy fix for integer type infinity
* revert line order
* implement all comparison operators for inf int
* avoid two for loops over all docs by not precomputing
* interleave thresholding with span creation
* black
* revert to not interleaving (relized its faster)
* black
* Update spacy/errors.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* update dosctring
* enforce that the gold and predicted documents have the same text
* new error for ensuring reference and predicted texts are the same
* remove todo
* adjust test
* black
* handle misaligned tokenization
* return correct variable
* failing overfit test
* only use a single spans_key like in spancat
* black
* remove debug lines
* typo
* remove comment
* remove near duplicate reduntant method
* use the 'spans_key' variable name everywhere
* Update spacy/pipeline/span_finder.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* flaky test fix suggestion, hand set bias terms
* only test suggester and test result exhaustively
* make it clear that the span_finder_suggester is more general (not specific to span_finder)
* Update spacy/tests/pipeline/test_span_finder.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Apply suggestions from code review
* remove question comment
* move preset_spans_suggester test to spancat tests
* Add docs and unify default configs for spancat and span finder
* Add `allow_overlap=True` to span finder scorer
* Fix offset bug in set_annotations
* Ignore labels in span finder scorer
* Format
* Add span_finder to quickstart template
* Move settings to self.cfg, store min/max unset as None
* Remove debugging
* Update docstrings and docs
* Update spacy/pipeline/span_finder.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix imports
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* add vetiver to spacy universe
* remove image
* update logo to render correctly in thumbnail
* apply Basil's suggestion
Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
* refer to the same model
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
* parsigs universe
* added model installation explanation in the description
* Update website/meta/universe.json
Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
* added model installement instruction in the code example
---------
Co-authored-by: Basile Dura <bdura@users.noreply.github.com>