Commit Graph

15698 Commits

Author SHA1 Message Date
github-actions[bot]
71884d0942
Auto-format code with black ()
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-09-02 11:43:20 +02:00
Madeesh Kannan
d1760ebe02
Better handling of unexpected types in SetPredicate ()
* `Matcher`: Better type checking of values in `SetPredicate`
`SetPredicate`: Emit warning and return `False` on unexpected value types

* Rename `value_type_mismatch` variable

* Inline warning

* Remove unexpected type warning from `_SetPredicate`

* Ensure that `str` values are not interpreted as sequences
Check elements of sequence values for convertibility to `str` or `int`

* Add more `INTERSECT` and `IN` test cases

* Test for inputs with multiple characters

* Return `False` early instead of using a boolean flag

* Remove superfluous `int` check, parentheses

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Appy suggestions from code review

* Clarify test comment

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-09-02 09:09:48 +02:00
Adriane Boyd
78f5503a29
Check for any non-Doc returned value for components () 2022-09-01 19:37:23 +02:00
Madeesh Kannan
604a7c3c26
SpanGroup(s)-related optimizations ()
* `SpanGroup`: Add support for binding copies to a new reference document

* `SpanGroups`: Replace superfluous serialize-deserialize roundtrip in `copy`

Instead, directly copy the in-memory representations of the constituent `SpanGroup`s.

* Update `SpanGroup.copy()` signature

* Rename `new_doc` param to `doc`

* Fix kwdarg

* Update `.pyi` file and docstrings

* `mypy` fix

* Update spacy/tokens/span_group.pyx

* Update docs

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-31 09:03:20 +02:00
Sofie Van Landeghem
8fc0efc502
Allow string argument for disable/enable/exclude ()
* adding unit test for spacy.load with disable/exclude string arg

* allow pure strings in from_config

* update docs

* upstream type adjustements

* docs update

* make docstring more consistent

* Update spacy/language.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* two more cleanups

* fix type in internal method

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-31 09:02:34 +02:00
Daniël de Kok
3f4b4b7b4f
Fix test_{prefer,require}_gpu ()
* Fix `test_{prefer,require}_gpu`

These tests assumed that GPUs are only supported with CuPy, but since Thinc 8.1
we also support Metal Performance Shaders.

* test_misc: arrange thinc imports to be together
2022-08-30 14:21:02 +02:00
Patrick J. Burns
5ae63b1fbd
Add Latin language support ()
* Add lang folder for la (Latin)

* Add Latin lang classes

* Add minimal tokenizer exceptions

* Add minimal stopwords

* Add minimal lex_attrs

* Update stopwords, tokenizer exceptions

* Add la tests; register la_tokenizer in conftest.py

* Update spacy/lang/la/lex_attrs.py

Remove duplicate form in Latin lex_attrs

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update natto-py version spec ()

* Update natto-py version spec

* Update setup.cfg

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Add scorer to textcat API docs config settings ()

* Update docs for pipeline initialize() methods ()

* Update documentation for dependency parser

* Update documentation for trainable_lemmatizer

* Update documentation for entity_linker

* Update documentation for ner

* Update documentation for morphologizer

* Update documentation for senter

* Update documentation for spancat

* Update documentation for tagger

* Update documentation for textcat

* Update documentation for tok2vec

* Run prettier on edited files

* Apply similar changes in transformer docs

* Remove need to say annotated example explicitly

I removed the need to say "Must contain at least one annotated Example"
because it's often a given that Examples will contain some gold-standard
annotation.

* Run prettier on transformer docs

* chore: add 'concepCy' to spacy universe ()

* chore: add 'concepCy' to spacy universe

* docs: add 'slogan' to concepCy

* Support full prerelease versions in the compat table ()

* Support full prerelease versions in the compat table

* Fix types

* adding spans to doc_annotation in Example.to_dict ()

* adding spans to doc_annotation in Example.to_dict

* to_dict compatible with from_dict: tuples instead of spans

* use strings for label and kb_id

* Simplify test

* Update data formats docs

Co-authored-by: Stefanie Wolf <stefanie.wolf@vitecsoftware.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Fix regex invalid escape sequences ()

* Add W605 to the errors raised by flake8 in the CI ()

* Clean up automated label-based issue handling ()

* Clean up automated label-based issue handline

1. upgrade tiangolo/issue-manager to latest
2. move needs-more-info to tiangolo
3. change needs-more-info close time to 7 days
4. delete old needs-more-info config

* Use old, longer message

* Fix label name

* Fix Dutch noun chunks to skip overlapping spans ()

* Add test for overlapping noun chunks

* Skip overlapping noun chunks

* Update spacy/tests/lang/nl/test_noun_chunks.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Docs: displaCy documentation - data types, `parse_{deps,ents,spans}`, spans example ()

* add in spans example and parse references

* rm autoformatter

* rm extra ents copy

* TypedDict draft

* type fixes

* restore non-documentation files

* docs update

* fix spans example

* fix hyperlinks

* add parse example

* example fix + argument fix

* fix api arg in docs

* fix bad variable replacement

* fix spacing in style

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* fix spacing on table

* fix spacing on table

* rm temp files

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* include span_ruler for default warning filter ()

* Add uk pipelines to website ()

* Check for . in factory names ()

* Make fixes for PR 

* Fix roman numeral coverage in 

Co-authored-by: Patrick J. Burns <patricks@diyclassics.org>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Lj Miranda <12949683+ljvmiranda921@users.noreply.github.com>
Co-authored-by: Jules Belveze <32683010+JulesBelveze@users.noreply.github.com>
Co-authored-by: stefawolf <wlf.ste@gmail.com>
Co-authored-by: Stefanie Wolf <stefanie.wolf@vitecsoftware.com>
Co-authored-by: Peter Baumgartner <5107405+pmbaumgartner@users.noreply.github.com>
2022-08-30 14:04:54 +02:00
Paul O'Leary McCann
aafee5e1b7
Fix lookup usage in French/Catalan (fix ) ()
* Fix lookup usage (fix )

Before using the lookups table in the French (and Catalan) lemmatizers,
there's a check to see if the current term is in the table. But it's
checking a string against hashes, so it's always false. Also the table
lookup function is designed so you don't have to do that anyway.

* Use the lookup table directly

* Use string, not token
2022-08-29 10:32:38 +02:00
Edward
6723d76f24
Add ConsoleLogger.v2 ()
* Init

* Change logger to ConsoleLogger.v2

* adjust naming

* More naming adjustments

* Fix output_file reference error

* ignore type

* Add basic test for logger

* Hopefully fix mypy issue

* mypy ignore line

* Update mypy line

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update test method name

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Change file saving logic

* Fix finalize method

* increase spacy-legacy version in requirements

* Update docs

* small adjustments

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-29 10:23:05 +02:00
Adriane Boyd
ba33200979
Remove pathy from pyproject.toml () 2022-08-26 16:07:16 +02:00
Paul O'Leary McCann
7a2c58864c
Move deps outside explosion to "third-party" () 2022-08-26 10:23:10 +02:00
Adriane Boyd
6fd3b4d9d6
Merge pull request from adrianeboyd/chore/update-develop-from-master-v3.5-1
Update develop from master for v3.5
2022-08-24 20:41:25 +02:00
Adriane Boyd
81874265e9 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5-1 2022-08-24 12:47:42 +02:00
Tobius Saul
c09d2fa25b
luganda language extension ()
* luganda language extension

* __init__.py changes

* New enhancements

* Lexical attribute changed

* punctuaction and sentence additions

* Remove comment header

* Fix typos, reformat

* reformated version

* Add tokenizer test

* Remove contractions from stop words

* Format

* Add Luganda to website

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-23 13:09:36 +02:00
Edward
5afa98aabf
Support custom attributes for tokens and spans in json conversion ()
* Add token and span custom attributes to to_json()

* Change logic for to_json

* Add functionality to from_json

* Small adjustments

* Move token/span attributes to new dict key

* Fix test

* Fix the same test but much better

* Add backwards compatibility tests and adjust logic

* Add test to check if attributes not set in underscore are not saved in the json

* Add tests for json compatibility

* Adjust test names

* Fix tests and clean up code

* Fix assert json tests

* small adjustment

* adjust naming and code readability

* Adjust naming, added more tests and changed logic

* Fix typo

* Adjust errors, naming, and small test optimization

* Fix byte tests

* Fix bytes tests

* Change naming and json structure

* update schema

* Update spacy/schemas.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/tokens/doc.pyx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/tokens/doc.pyx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/schemas.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update schema for underscore attributes

* Adjust underscore schema

* adjust schema tests

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-23 10:05:02 +02:00
Tal Zussman
7e75327893
Fix menu order in linguistic-features.md ()
Swap 'Vectors & Similarity' and 'Mappings & Exceptions' in menu to match order in body
2022-08-23 14:40:38 +09:00
Sofie Van Landeghem
6e20842370
dev docs: numeric comparators ()
* add section on numeric comparators

* edit

* prettier

* Update extra/DEVELOPER_DOCS/Code Conventions.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* note on typing imports

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-22 15:52:53 +02:00
Adriane Boyd
f55bb7470d
Clean up warnings in the test suite () 2022-08-22 12:04:30 +02:00
Paul O'Leary McCann
0f07defe2c
Remove reference to voting on issue ()
Not clear which issue this refers to, we don't suggest this for any
other issues, and we don't use votes in general.
2022-08-22 11:29:05 +02:00
Adriane Boyd
04c6e5cb95
Improve floret vectors display in pipeline docs () 2022-08-22 11:28:13 +02:00
Adriane Boyd
5fa8f4faca
Switch ru and uk lemmatizers to pymorphy3 ()
* Switch ru and uk lemmatizers to pymorphy3

* Switch to pymorphy3 in tests
2022-08-22 11:27:14 +02:00
Adriane Boyd
3e4cf1bbe1
Check for . in factory names () 2022-08-19 09:52:12 +02:00
Adriane Boyd
09b3118b26
Add uk pipelines to website () 2022-08-18 14:04:57 +02:00
Sofie Van Landeghem
cab263791f
include span_ruler for default warning filter () 2022-08-17 19:55:54 +02:00
Peter Baumgartner
db7b9938a4
Docs: displaCy documentation - data types, parse_{deps,ents,spans}, spans example ()
* add in spans example and parse references

* rm autoformatter

* rm extra ents copy

* TypedDict draft

* type fixes

* restore non-documentation files

* docs update

* fix spans example

* fix hyperlinks

* add parse example

* example fix + argument fix

* fix api arg in docs

* fix bad variable replacement

* fix spacing in style

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* fix spacing on table

* fix spacing on table

* rm temp files

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-08-16 11:23:34 -04:00
Adriane Boyd
ed4ad309e6
Fix Dutch noun chunks to skip overlapping spans ()
* Add test for overlapping noun chunks

* Skip overlapping noun chunks

* Update spacy/tests/lang/nl/test_noun_chunks.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-08-10 09:49:08 +02:00
Paul O'Leary McCann
231a17817d
Clean up automated label-based issue handling ()
* Clean up automated label-based issue handline

1. upgrade tiangolo/issue-manager to latest
2. move needs-more-info to tiangolo
3. change needs-more-info close time to 7 days
4. delete old needs-more-info config

* Use old, longer message

* Fix label name
2022-08-09 14:50:50 +02:00
Adriane Boyd
e700358ba0
Add W605 to the errors raised by flake8 in the CI () 2022-08-09 12:15:13 +02:00
Adriane Boyd
fc4246558b
Fix regex invalid escape sequences () 2022-08-09 10:59:36 +02:00
stefawolf
23749cfc91
adding spans to doc_annotation in Example.to_dict ()
* adding spans to doc_annotation in Example.to_dict

* to_dict compatible with from_dict: tuples instead of spans

* use strings for label and kb_id

* Simplify test

* Update data formats docs

Co-authored-by: Stefanie Wolf <stefanie.wolf@vitecsoftware.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-05 12:26:38 +02:00
Luka Dragar
b64243ed55
Updates to Slovenian language ()
* Added examples for Slovene

* Update spacy/lang/sl/examples.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Corrected a typo in one of the sentences

* Updated support for Slovenian

* Some minor changes to corrections

* Added forint currency

* Corrected HYPHENS_PERMITTED regex and some formatting

* Minor changes

* Un-xfail tokenizer test

* Format

Co-authored-by: Luka Dragar <D20124481@mytudublin.ie>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-05 10:10:18 +02:00
Adriane Boyd
b5d9d0897e
Merge pull request from adrianeboyd/chore/update-develop-v3.5
Prepare develop for v3.5
2022-08-04 21:17:26 +02:00
Adriane Boyd
a3f6d6bce1 Merge remote-tracking branch 'upstream/master' into develop 2022-08-04 18:19:28 +02:00
Adriane Boyd
b07708d5d0
Support full prerelease versions in the compat table ()
* Support full prerelease versions in the compat table

* Fix types
2022-08-04 15:14:19 +02:00
Jules Belveze
cd09614ab2
chore: add 'concepCy' to spacy universe ()
* chore: add 'concepCy' to spacy universe

* docs: add 'slogan' to concepCy
2022-08-04 15:42:38 +09:00
Lj Miranda
d993df41e5
Update docs for pipeline initialize() methods ()
* Update documentation for dependency parser

* Update documentation for trainable_lemmatizer

* Update documentation for entity_linker

* Update documentation for ner

* Update documentation for morphologizer

* Update documentation for senter

* Update documentation for spancat

* Update documentation for tagger

* Update documentation for textcat

* Update documentation for tok2vec

* Run prettier on edited files

* Apply similar changes in transformer docs

* Remove need to say annotated example explicitly

I removed the need to say "Must contain at least one annotated Example"
because it's often a given that Examples will contain some gold-standard
annotation.

* Run prettier on transformer docs
2022-08-03 16:53:02 +02:00
Adriane Boyd
d0578c2ede
Add scorer to textcat API docs config settings () 2022-08-03 16:41:20 +02:00
Paul O'Leary McCann
2d89dd9db8
Update natto-py version spec ()
* Update natto-py version spec

* Update setup.cfg

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-07-28 07:45:02 +02:00
ninjalu
95a1b8aca6
add additional REL_OP ()
* add additional  REL_OP

* change to condition and new rel_op symbols

* add operators to docs

* add the anchor while we're in here

* add tests

Co-authored-by: Peter Baumgartner <5107405+pmbaumgartner@users.noreply.github.com>
2022-07-27 13:16:44 +02:00
Madeesh Kannan
1829d7120a
ExplosionBot: Add note about case-sensitivity () 2022-07-27 14:24:22 +09:00
Edward
360a702ecd
Add parent argument () 2022-07-26 14:35:18 +02:00
Adriane Boyd
5c2a00cef0
Set version to v3.4.1 () 2022-07-26 12:52:38 +02:00
Adriane Boyd
c8f5b752bb
Add link to developer docs code conventions () 2022-07-26 10:56:53 +02:00
Daniël de Kok
4ee8a06149
Fix compatibility with CuPy 9.x ()
After the precomputable affine table of shape [nB, nF, nO, nP] is
computed, padding with shape [1, nF, nO, nP] is assigned to the first
row of the precomputed affine table. However, when we are indexing the
precomputed table, we get a row of shape [nF, nO, nP]. CuPy versions
before 10.0 cannot paper over this shape difference.

This change fixes compatibility with CuPy < 10.0 by squeezing the first
dimension of the padding before assignment.
2022-07-26 10:52:01 +02:00
Adriane Boyd
36ff2a5441
Merge pull request from adrianeboyd/chore/reenable-model-tests
Revert "Temporarily skip tests that require models/compat"
2022-07-25 20:13:44 +02:00
Adriane Boyd
e5990db713 Revert "Temporarily skip tests that require models/compat"
This reverts commit d9320db7db.
2022-07-25 18:12:18 +02:00
Paul O'Leary McCann
1c12812d1a
Replace link to old label () 2022-07-25 16:39:34 +09:00
Adriane Boyd
7a99fe3c65
Move sent-patterns to correct section of universe.json () 2022-07-25 09:14:50 +02:00
0xpeIpeI
93960dc4b5
[universe project] create English interpretation project ()
* [add] my universe  project setting

* [modify] A few adjustments

* [Modify] change package description
2022-07-24 19:01:04 +09:00
Dan Radenkovic
a5aa3a818f
fix docs () 2022-07-24 17:16:36 +09:00