* Matcher support for Span, as well as Doc #5056
* Removes an import unused
* Signed contributors agreement
* Code optimization and better test
* Add error message for bad Matcher call argument
* Fix merging
* Use max(uint64) for OOV lexeme rank
* Add test for default OOV rank
* Revert back to thinc==7.4.0
Requiring the updated version of thinc was unnecessary.
* Define OOV_RANK in one place
Define OOV_RANK in one place in `util`.
* Fix formatting [ci skip]
* Switch to external definitions of max(uint64)
Switch to external defintions of max(uint64) and confirm that they are
equal.
* Add Doc init from list of words and text
Add an option to initialize a `Doc` from a text and list of words where
the words may or may not include all whitespace tokens. If the text and
words are mismatched, raise an error.
* Fix error code
* Remove all whitespace before aligning words/text
* Move words/text init to util function
* Update error message
* Rename to get_words_and_spaces
* Fix formatting
* Fixed typo in cli warning
Fixed a typo in the warning for the provision of exactly two labels, which have not been designated as binary, to textcat.
* Create and signed contributor form
* Add "whatlies"
We're releasing it on our side officially on the 16th of April. If possible, let's announce around the same time :)
* sign contributor thing
* Added fancy gif
as the image
* Update universe.json
Spellin error and spaCy clarification.
* Use inline flags in token_match patterns
Use inline flags in `token_match` patterns so that serializing does not
lose the flag information.
* Modify inline flag
* Modify inline flag
* Add "whatlies"
We're releasing it on our side officially on the 16th of April. If possible, let's announce around the same time :)
* sign contributor thing
* Added fancy gif
as the image
* Update universe.json
Spellin error and spaCy clarification.
* Add pos and morph scoring to Scorer
Add pos, morph, and morph_per_type to `Scorer`. Report pos and morph
accuracy in `spacy evaluate`.
* Update morphologizer for v3
* switch to tagger-based morphologizer
* use `spacy.HashCharEmbedCNN` for morphologizer defaults
* add `Doc.is_morphed` flag
* Add morphologizer to train CLI
* Add basic morphologizer pipeline tests
* Add simple morphologizer training example
* Remove subword_features from CharEmbed models
Remove `subword_features` argument from `spacy.HashCharEmbedCNN.v1` and
`spacy.HashCharEmbedBiLSTM.v1` since in these cases `subword_features`
is always `False`.
* Rename setting in morphologizer example
Use `with_pos_tags` instead of `without_pos_tags`.
* Fix kwargs for spacy.HashCharEmbedBiLSTM.v1
* Remove defaults for spacy.HashCharEmbedBiLSTM.v1
Remove default `nM/nC` for `spacy.HashCharEmbedBiLSTM.v1`.
* Set random seed for textcat overfitting test