Commit Graph

15192 Commits

Author SHA1 Message Date
Paul O'Leary McCann
0f01f46e02
Update Cython string types (#9143)
* Replace all basestring references with unicode

`basestring` was a compatability type introduced by Cython to make
dealing with utf-8 strings in Python2 easier. In Python3 it is
equivalent to the unicode (or str) type.

I replaced all references to basestring with unicode, since that was
used elsewhere, but we could also just replace them with str, which
shoudl also be equivalent.

All tests pass locally.

* Replace all references to unicode type with str

Since we only support python3 this is simpler.

* Remove all references to unicode type

This removes all references to the unicode type across the codebase and
replaces them with `str`, which makes it more drastic than the prior
commits. In order to make this work importing `unicode_literals` had to
be removed, and one explicit unicode literal also had to be removed (it
is unclear why this is necessary in Cython with language level 3, but
without doing it there were errors about implicit conversion).

When `unicode` is used as a type in comments it was also edited to be
`str`.

Additionally `coding: utf8` headers were removed from a few files.
2021-09-13 17:02:17 +02:00
j-frei
5d0cc0d2ab Correct parser.py use_upper param info (#9180) 2021-09-13 09:29:11 +02:00
Renat Shigapov
d5cc009faf
Merge branch 'explosion:master' into master 2021-09-13 08:43:48 +02:00
Renat Shigapov
e61d93f8c3
add NEL-visualisation to manual-usage 2021-09-13 08:38:58 +02:00
Renat Shigapov
f4b5c4209d
specify kb_id and kb_url for URL visualisation 2021-09-13 08:15:07 +02:00
Renat Shigapov
7562fb5354
add links to entities into the TPL_ENT-template 2021-09-13 08:06:54 +02:00
Paul O'Leary McCann
9c4e84d4a1 Minor typo fix in docs 2021-09-11 14:23:11 +09:00
Paul O'Leary McCann
f89e1c34c9
Minor typo fix in docs 2021-09-11 14:22:05 +09:00
Renat Shigapov
2e2d0e8701 added spaCyOpenTapioca (#9181)
* add spaCyOpenTapioca to universe

* add agreement

* fix misprint in tags
2021-09-11 13:25:25 +09:00
mylibrar
d621df6422 Update example code of forte (#9175)
Co-authored-by: Suqi Sun <suqi.sun@petuum.com>
2021-09-11 13:25:17 +09:00
Renat Shigapov
646f3a54db
added spaCyOpenTapioca (#9181)
* add spaCyOpenTapioca to universe

* add agreement

* fix misprint in tags
2021-09-11 13:16:51 +09:00
mylibrar
ee28aac68e
Update example code of forte (#9175)
Co-authored-by: Suqi Sun <suqi.sun@petuum.com>
2021-09-11 13:13:13 +09:00
j-frei
462b009648
Correct parser.py use_upper param info (#9180) 2021-09-10 16:19:58 +02:00
Renat Shigapov
c1927fe994
fix misprint in tags 2021-09-09 15:37:34 +02:00
Renat Shigapov
8940e0baca
add agreement 2021-09-09 15:33:29 +02:00
Renat Shigapov
ea58294076
add spaCyOpenTapioca to universe 2021-09-09 15:13:18 +02:00
Adriane Boyd
aba6ce3a43
Handle spacy-legacy in package CLI for dependencies (#9163)
* Handle spacy-legacy in package CLI for dependencies

* Implement legacy backoff in spacy registry.find

* Remove unused import

* Update and format test
2021-09-08 11:46:40 +02:00
Sofie Van Landeghem
632d8d4c35
bump thinc to 8.0.9 (#9133) 2021-09-03 13:34:42 +02:00
github-actions[bot]
584fae5807
Auto-format code with black (#9130)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-09-03 10:47:03 +02:00
Kevin Humphreys
ca93504660
Pass alignments to Matcher callbacks (#9001)
* pass alignments to callbacks

* refactor for single callback loop

* Update spacy/matcher/matcher.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-02 12:58:05 +02:00
Sofie Van Landeghem
721f4554c8 matcher doc corrections (#9115)
* update error message to current UX

* clarify uppercase effect

* fix docstring
2021-09-02 09:29:44 +02:00
Sofie Van Landeghem
8895e3c9ad
matcher doc corrections (#9115)
* update error message to current UX

* clarify uppercase effect

* fix docstring
2021-09-02 09:26:33 +02:00
Robyn Speer
d60b748e3c
Fix surprises when asking for the root of a git repo (#9074)
* Fix surprises when asking for the root of a git repo

In the case of the first asset I wanted to get from git, the data I
wanted was the entire repository. I tried leaving "path" blank, which
gave a less-than-helpful error, and then I tried `path: "/"`, which
started copying my entire filesystem into the project. The path I should
have used was "".

I've made two changes to make this smoother for others:

- The 'path' within a git clone defaults to ""
- If the path points outside of the tmpdir that the git clone goes
into, we fail with an error

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

* use a descriptive error instead of a default

plus some minor fixes from PR review

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

* check for None values in assets

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

Co-authored-by: Elia Robyn Speer <elia@explosion.ai>
2021-09-01 22:52:08 +02:00
Paul O'Leary McCann
752696f134 Document Assigned Attributes of Pipeline Components (#9041)
* Add textcat docs

* Add NER docs

* Add Entity Linker docs

* Add assigned fields docs for the tagger

This also adds a preamble, since there wasn't one.

* Add morphologizer docs

* Add dependency parser docs

* Update entityrecognizer docs

This is a little weird because `Doc.ents` is the only thing assigned to,
but it's actually a bidirectional property.

* Add token fields for entityrecognizer

* Fix section name

* Add entity ruler docs

* Add lemmatizer docs

* Add sentencizer/recognizer docs

* Update website/docs/api/entityrecognizer.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/entityruler.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/tagger.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/entityruler.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update type for Doc.ents

This was `Tuple[Span, ...]` everywhere but `Tuple[Span]` seems to be
correct.

* Run prettier

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Run prettier

* Add transformers section

This basically just moves and renames the "custom attributes" section
from the bottom of the page to be consistent with "assigned attributes"
on other pages.

I looked at moving the paragraph just above the section into the
section, but it includes the unrelated registry additions, so it seemed
better to leave it unchanged.

* Make table header consistent

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-01 12:10:52 +02:00
Paul O'Leary McCann
ba6a37d358
Document Assigned Attributes of Pipeline Components (#9041)
* Add textcat docs

* Add NER docs

* Add Entity Linker docs

* Add assigned fields docs for the tagger

This also adds a preamble, since there wasn't one.

* Add morphologizer docs

* Add dependency parser docs

* Update entityrecognizer docs

This is a little weird because `Doc.ents` is the only thing assigned to,
but it's actually a bidirectional property.

* Add token fields for entityrecognizer

* Fix section name

* Add entity ruler docs

* Add lemmatizer docs

* Add sentencizer/recognizer docs

* Update website/docs/api/entityrecognizer.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/entityruler.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/tagger.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/entityruler.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update type for Doc.ents

This was `Tuple[Span, ...]` everywhere but `Tuple[Span]` seems to be
correct.

* Run prettier

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Run prettier

* Add transformers section

This basically just moves and renames the "custom attributes" section
from the bottom of the page to be consistent with "assigned attributes"
on other pages.

I looked at moving the paragraph just above the section into the
section, but it includes the unrelated registry additions, so it seemed
better to leave it unchanged.

* Make table header consistent

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-01 12:09:39 +02:00
Paul O'Leary McCann
f803a84571
Fix inference of epoch_resume (#9084)
* Fix inference of epoch_resume

When an epoch_resume value is not specified individually, it can often
be inferred from the filename. The value inference code was there but
the value wasn't passed back to the training loop.

This also adds a specific error in the case where no epoch_resume value
is provided and it can't be inferred from the filename.

* Add new error

* Always use the epoch resume value if specified

Before this the value in the filename was used if found
2021-09-01 14:17:42 +09:00
Sofie Van Landeghem
a17b06d18b
allow typer 0.4 (#9089) 2021-08-31 20:53:51 +10:00
svlandeg
3f16c45281 Merge branch 'spacy.io' of https://github.com/explosion/spaCy into spacy.io 2021-08-31 10:58:40 +02:00
Davide Fiocco
5c88998b9d Fix point typo on docbin docs (#9097) 2021-08-31 10:58:31 +02:00
Ines Montani
753149bc88 Update references to contributor agreement [ci skip] 2021-08-31 10:58:22 +02:00
Davide Fiocco
1dd69be1f1
Fix point typo on docbin docs (#9097) 2021-08-31 10:55:44 +02:00
Ines Montani
1a86d545af Update references to contributor agreement [ci skip] 2021-08-31 10:03:38 +10:00
Sofie Van Landeghem
5af88427a2
Dev docs: listeners (#9061)
* Start Listeners documentation

* intro tabel of different architectures

* initialization, linking, dim inference

* internal comm (WIP)

* expand internal comm section

* frozen components and replacing listeners

* various small fixes

* fix content table

* fix link
2021-08-30 14:56:35 +02:00
Adriane Boyd
1e9b4b55ee
Pass overrides to subcommands in workflows (#9059)
* Pass overrides to subcommands in workflows

* Add missing docstring
2021-08-30 09:23:54 +02:00
Meenal Jhajharia
db42ba5240 benepar usage example has deprecated imports 2021-08-29 14:44:18 +09:00
Paul O'Leary McCann
6ff8d90070
Merge pull request #9081 from mjhajharia/patch-1
benepar usage example has deprecated imports
2021-08-29 14:41:52 +09:00
Meenal Jhajharia
2613f0e98f
benepar usage example has deprecated imports 2021-08-28 16:35:58 +05:30
Sofie Van Landeghem
689535c264 config is not Optional (#9024) 2021-08-27 11:53:54 +02:00
Sofie Van Landeghem
1e974de837
config is not Optional (#9024) 2021-08-27 11:44:31 +02:00
github-actions[bot]
fb9c31fbda
Auto-format code with black (#9065)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-08-27 11:42:27 +02:00
Sofie Van Landeghem
8c1d86ea92 Document use-case of freezing tok2vec (#8992)
* update error msg

* add sentence to docs

* expand note on frozen components
2021-08-26 09:53:29 +02:00
Sofie Van Landeghem
31c0a75e6d fix docs for Span constructor arguments (#9023) 2021-08-26 09:52:59 +02:00
Sofie Van Landeghem
4d39430b82
Document use-case of freezing tok2vec (#8992)
* update error msg

* add sentence to docs

* expand note on frozen components
2021-08-26 09:50:35 +02:00
Sofie Van Landeghem
94fb840443
fix docs for Span constructor arguments (#9023) 2021-08-25 16:06:22 +02:00
David Strouk
31e9b126a0
Fix verbs list in lang/fr/tokenizer_exceptions.py (#9033) 2021-08-25 15:55:09 +02:00
Ines Montani
4cd052e81d
Include component factories in third-party dependencies resolver (#9009)
* Include component factories in third-party dependencies resolver

* Increment catalogue and update test
2021-08-25 14:58:01 +02:00
svlandeg
fb8c2f794a Merge remote-tracking branch 'upstream/master' into spacy.io 2021-08-20 14:49:51 +02:00
Sofie Van Landeghem
e1f88de729
bump to 3.1.2 (#9008) 2021-08-20 12:41:09 +02:00
Sofie Van Landeghem
4d52d7051c
Fix spancat training on nested entities (#9007)
* overfitting test on non-overlapping entities

* add failing overfitting test for overlapping entities

* failing test for list comprehension

* remove test that was put in separate PR

* bugfix

* cleanup
2021-08-20 12:37:50 +02:00
Paul O'Leary McCann
9cc3dc2b67
Add glossary entry for _SP (#8983) 2021-08-20 12:04:02 +02:00