* Allow assets to be optional in spacy project: draft for optional flag/download_all options.
* Allow assets to be optional in spacy project: added OPTIONAL_DEFAULT reflecting default asset optionality.
* Allow assets to be optional in spacy project: renamed --all to --extra.
* Allow assets to be optional in spacy project: included optional flag in project config test.
* Allow assets to be optional in spacy project: added documentation.
* Allow assets to be optional in spacy project: fixing deprecated --all reference.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Allow assets to be optional in spacy project: fixed project_assets() docstring.
* Allow assets to be optional in spacy project: adjusted wording in justification of optional assets.
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Allow assets to be optional in spacy project: switched to as keyword in project.yml. Updated docs.
* Allow assets to be optional in spacy project: updated comment.
* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in output.
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in docstring..
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in test..
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in test.
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Allow assets to be optional in spacy project: renamed OPTIONAL_DEFAULT to EXTRA_DEFAULT.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* add v1 and v2 tests for tok2vec architectures
* textcat architectures are not "layers"
* test older textcat architectures
* test older parser architecture
* Document different ways to create a pipeline: moved up/slightly modified paragraph on pipeline creation.
* Document different ways to create a pipeline: changed Finnish to Ukrainian in example for language without trained pipeline.
* Document different ways to create a pipeline: added explanation of blank pipeline.
* Document different ways to create a pipeline: exchanged Ukrainian with Yoruba.
* Fix StringStore.__getitem__ return type depending on parameter types
Small fix using `@overload` so that `StringStore.__getitem__` returns an `int` when given a `str` or `bytes` and a `str` when given an `int`.
* Update spacy/strings.pyi
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
* `Matcher`: Remove superfluous GIL-acquiring check in `get_is_final`
This check incurred a significant performance penalty due to implict interactions between the GIL and Cython ref-counting code.
* `Matcher`: Inline `PatternStateC` accessors
* signing contributor agreement
* adding new content to the spaCy universe
* updating outdated example codes
* resolving issues for the PR
* resolve review for klayers
* remove contributor-agreement file from the PR
* Update code example of spaCySentiWS
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update spacy-sentiws code example
Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Test for arc levels for identical arcs
Also moves the test in order with the other numbered tests.
* displaCy: filter identical arcs
Avoid increased levels due to identical arcs by first
filtering any identical arcs.
* Sort keys before filtering
Manual entry with keys out of order would previously become
different tuples and therefore not filtered correctly.
Co-authored-by: Joachim Fainberg <joachimfainberg@Joachims-MBP.lan>
This is necessary because one of the three old methods relied on scipy
for some complex problem solving. LEA is generally better for
evaluations.
The downside is that this means evaluations aren't comparable with many
papers, but canonical scoring can be supported using external eval
scripts or other methods.