* add v1 and v2 tests for tok2vec architectures
* textcat architectures are not "layers"
* test older textcat architectures
* test older parser architecture
* Fix StringStore.__getitem__ return type depending on parameter types
Small fix using `@overload` so that `StringStore.__getitem__` returns an `int` when given a `str` or `bytes` and a `str` when given an `int`.
* Update spacy/strings.pyi
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update docs for displacy style kwargs
Added "span" to the accepted values for the style kwarg in the displacy.serve and displacy.render top-level functions. These styles are new as of SpaCy 3.3, so I added the "new" tag for that option only
* restored alpha ordering
* Document different ways to create a pipeline: moved up/slightly modified paragraph on pipeline creation.
* Document different ways to create a pipeline: changed Finnish to Ukrainian in example for language without trained pipeline.
* Document different ways to create a pipeline: added explanation of blank pipeline.
* Document different ways to create a pipeline: exchanged Ukrainian with Yoruba.
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
* `Matcher`: Remove superfluous GIL-acquiring check in `get_is_final`
This check incurred a significant performance penalty due to implict interactions between the GIL and Cython ref-counting code.
* `Matcher`: Inline `PatternStateC` accessors
* signing contributor agreement
* adding new content to the spaCy universe
* updating outdated example codes
* resolving issues for the PR
* resolve review for klayers
* remove contributor-agreement file from the PR
* Update code example of spaCySentiWS
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update spacy-sentiws code example
Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* signing contributor agreement
* adding new content to the spaCy universe
* updating outdated example codes
* resolving issues for the PR
* resolve review for klayers
* remove contributor-agreement file from the PR
* Update code example of spaCySentiWS
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update spacy-sentiws code example
Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Test for arc levels for identical arcs
Also moves the test in order with the other numbered tests.
* displaCy: filter identical arcs
Avoid increased levels due to identical arcs by first
filtering any identical arcs.
* Sort keys before filtering
Manual entry with keys out of order would previously become
different tuples and therefore not filtered correctly.
Co-authored-by: Joachim Fainberg <joachimfainberg@Joachims-MBP.lan>
* added crosslingual coreference to spacy universe
* Updated example to introduce batching example.
Co-authored-by: David Berenstein <david.berenstein@pandoraintelligence.com>
* added crosslingual coreference to spacy universe
* Updated example to introduce batching example.
Co-authored-by: David Berenstein <david.berenstein@pandoraintelligence.com>