Commit Graph

14812 Commits

Author SHA1 Message Date
github-actions[bot]
015d439eb6
Auto-format code with black (#9234)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-09-20 08:49:19 +02:00
Edward
8bda39f088
Update Hammurabi example code to v3 (#9218)
* Update Hammurabi example code

* Fix typo
2021-09-16 13:32:44 +02:00
Paul O'Leary McCann
c4f0800fb8
Validate pos values when creating Doc (#9148)
* Validate pos values when creating Doc

* Add clear error when setting invalid pos

This also changes the error language slightly.

* Fix variable name

* Update spacy/tokens/doc.pyx

* Test that setting invalid pos raises an error

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-16 13:28:05 +02:00
Jozef Harag
865cfbc903
feat: add spacy.WandbLogger.v3 with optional run_name and entity parameters (#9202)
* feat: add `spacy.WandbLogger.v3` with optional `run_name` and `entity` parameters

* update versioning in docs

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-09-16 12:26:41 +02:00
Paul O'Leary McCann
1d57d78758 Make docs consistent (fix #9126) 2021-09-16 15:54:12 +09:00
Paul O'Leary McCann
9ceb8f413c
StringStore/Vocab dev docs (#9142)
* First take at StringStore/Vocab docs

Things to check:

1. The mysterious vocab members
2. How to make table of contents? Is it autogenerated?
3. Anything I missed / needs more detail?

* Update docs

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Updates based on review feedback

* Minor fix

* Move example code down

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-16 12:50:22 +09:00
Ines Montani
20f63e7154
Only include runtime-relevant config in package CLI dependency detection (#9211) 2021-09-15 23:16:01 +02:00
Adriane Boyd
d74870d38c
Prepare for v3.1.3 (#9200)
* Update thinc and spacy-legacy requirements

* Set version to v3.1.3
2021-09-14 11:03:51 +02:00
Paul O'Leary McCann
f89e1c34c9
Minor typo fix in docs 2021-09-11 14:22:05 +09:00
Renat Shigapov
646f3a54db
added spaCyOpenTapioca (#9181)
* add spaCyOpenTapioca to universe

* add agreement

* fix misprint in tags
2021-09-11 13:16:51 +09:00
mylibrar
ee28aac68e
Update example code of forte (#9175)
Co-authored-by: Suqi Sun <suqi.sun@petuum.com>
2021-09-11 13:13:13 +09:00
j-frei
462b009648
Correct parser.py use_upper param info (#9180) 2021-09-10 16:19:58 +02:00
Adriane Boyd
aba6ce3a43
Handle spacy-legacy in package CLI for dependencies (#9163)
* Handle spacy-legacy in package CLI for dependencies

* Implement legacy backoff in spacy registry.find

* Remove unused import

* Update and format test
2021-09-08 11:46:40 +02:00
Sofie Van Landeghem
632d8d4c35
bump thinc to 8.0.9 (#9133) 2021-09-03 13:34:42 +02:00
github-actions[bot]
584fae5807
Auto-format code with black (#9130)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-09-03 10:47:03 +02:00
Kevin Humphreys
ca93504660
Pass alignments to Matcher callbacks (#9001)
* pass alignments to callbacks

* refactor for single callback loop

* Update spacy/matcher/matcher.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-02 12:58:05 +02:00
Sofie Van Landeghem
8895e3c9ad
matcher doc corrections (#9115)
* update error message to current UX

* clarify uppercase effect

* fix docstring
2021-09-02 09:26:33 +02:00
Robyn Speer
d60b748e3c
Fix surprises when asking for the root of a git repo (#9074)
* Fix surprises when asking for the root of a git repo

In the case of the first asset I wanted to get from git, the data I
wanted was the entire repository. I tried leaving "path" blank, which
gave a less-than-helpful error, and then I tried `path: "/"`, which
started copying my entire filesystem into the project. The path I should
have used was "".

I've made two changes to make this smoother for others:

- The 'path' within a git clone defaults to ""
- If the path points outside of the tmpdir that the git clone goes
into, we fail with an error

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

* use a descriptive error instead of a default

plus some minor fixes from PR review

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

* check for None values in assets

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

Co-authored-by: Elia Robyn Speer <elia@explosion.ai>
2021-09-01 22:52:08 +02:00
Paul O'Leary McCann
ba6a37d358
Document Assigned Attributes of Pipeline Components (#9041)
* Add textcat docs

* Add NER docs

* Add Entity Linker docs

* Add assigned fields docs for the tagger

This also adds a preamble, since there wasn't one.

* Add morphologizer docs

* Add dependency parser docs

* Update entityrecognizer docs

This is a little weird because `Doc.ents` is the only thing assigned to,
but it's actually a bidirectional property.

* Add token fields for entityrecognizer

* Fix section name

* Add entity ruler docs

* Add lemmatizer docs

* Add sentencizer/recognizer docs

* Update website/docs/api/entityrecognizer.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/entityruler.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/tagger.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/entityruler.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update type for Doc.ents

This was `Tuple[Span, ...]` everywhere but `Tuple[Span]` seems to be
correct.

* Run prettier

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Run prettier

* Add transformers section

This basically just moves and renames the "custom attributes" section
from the bottom of the page to be consistent with "assigned attributes"
on other pages.

I looked at moving the paragraph just above the section into the
section, but it includes the unrelated registry additions, so it seemed
better to leave it unchanged.

* Make table header consistent

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-01 12:09:39 +02:00
Paul O'Leary McCann
f803a84571
Fix inference of epoch_resume (#9084)
* Fix inference of epoch_resume

When an epoch_resume value is not specified individually, it can often
be inferred from the filename. The value inference code was there but
the value wasn't passed back to the training loop.

This also adds a specific error in the case where no epoch_resume value
is provided and it can't be inferred from the filename.

* Add new error

* Always use the epoch resume value if specified

Before this the value in the filename was used if found
2021-09-01 14:17:42 +09:00
Sofie Van Landeghem
a17b06d18b
allow typer 0.4 (#9089) 2021-08-31 20:53:51 +10:00
Davide Fiocco
1dd69be1f1
Fix point typo on docbin docs (#9097) 2021-08-31 10:55:44 +02:00
Ines Montani
1a86d545af Update references to contributor agreement [ci skip] 2021-08-31 10:03:38 +10:00
Sofie Van Landeghem
5af88427a2
Dev docs: listeners (#9061)
* Start Listeners documentation

* intro tabel of different architectures

* initialization, linking, dim inference

* internal comm (WIP)

* expand internal comm section

* frozen components and replacing listeners

* various small fixes

* fix content table

* fix link
2021-08-30 14:56:35 +02:00
Adriane Boyd
1e9b4b55ee
Pass overrides to subcommands in workflows (#9059)
* Pass overrides to subcommands in workflows

* Add missing docstring
2021-08-30 09:23:54 +02:00
Paul O'Leary McCann
6ff8d90070
Merge pull request #9081 from mjhajharia/patch-1
benepar usage example has deprecated imports
2021-08-29 14:41:52 +09:00
Meenal Jhajharia
2613f0e98f
benepar usage example has deprecated imports 2021-08-28 16:35:58 +05:30
Sofie Van Landeghem
1e974de837
config is not Optional (#9024) 2021-08-27 11:44:31 +02:00
github-actions[bot]
fb9c31fbda
Auto-format code with black (#9065)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-08-27 11:42:27 +02:00
Sofie Van Landeghem
4d39430b82
Document use-case of freezing tok2vec (#8992)
* update error msg

* add sentence to docs

* expand note on frozen components
2021-08-26 09:50:35 +02:00
Sofie Van Landeghem
94fb840443
fix docs for Span constructor arguments (#9023) 2021-08-25 16:06:22 +02:00
David Strouk
31e9b126a0
Fix verbs list in lang/fr/tokenizer_exceptions.py (#9033) 2021-08-25 15:55:09 +02:00
Ines Montani
4cd052e81d
Include component factories in third-party dependencies resolver (#9009)
* Include component factories in third-party dependencies resolver

* Increment catalogue and update test
2021-08-25 14:58:01 +02:00
Sofie Van Landeghem
e1f88de729
bump to 3.1.2 (#9008) 2021-08-20 12:41:09 +02:00
Sofie Van Landeghem
4d52d7051c
Fix spancat training on nested entities (#9007)
* overfitting test on non-overlapping entities

* add failing overfitting test for overlapping entities

* failing test for list comprehension

* remove test that was put in separate PR

* bugfix

* cleanup
2021-08-20 12:37:50 +02:00
Paul O'Leary McCann
9cc3dc2b67
Add glossary entry for _SP (#8983) 2021-08-20 12:04:02 +02:00
Sofie Van Landeghem
de025beb5f
Warn and document spangroup.doc weakref (#8980)
* test for error after Doc has been garbage collected

* warn about using a SpanGroup when the Doc has been garbage collected

* add warning to the docs

* rephrase slightly

* raise error instead of warning

* update

* move warning to doc property
2021-08-20 11:06:19 +02:00
Paul O'Leary McCann
37fe847af4 Fix type annotation in docs 2021-08-20 15:34:22 +09:00
Ines Montani
f2b61b77a5 Fix universe.json [ci skip] 2021-08-20 11:26:29 +10:00
Ines Montani
894e16f5ca
Merge pull request #9003 from bbieniek/add-spacy-api-v3 [ci skip] 2021-08-20 11:23:30 +10:00
Baltazar
4d85cb88a5 added contribution license 2021-08-19 21:45:18 +02:00
Baltazar
71e65fe943 added spacy api v3 docker 2021-08-19 21:29:25 +02:00
Adriane Boyd
6722dc3dc5
Fix allow_overlap default for spancat scoring (#8970)
* Remove irrelevant default options
2021-08-18 09:56:56 +02:00
Steele Farnsworth
b18cb1cd2a
Refactor dependencymatcher.pyx to use list comps and enumerate. (#8956)
* Refactor to use list comps and enumerate.

Replace loops that append to a list with a list comprehensions where this does not change the behavior; replace range(len(...)) loops with enumerate. Correct one typo in a comment. Replace a call to set() with a set literal.

* Undo double assignment.

Expand `tokens_to_key[j] = k = self._get_matcher_key(key, i, j)` to two statements.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Sign contributors agreement

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-18 09:55:45 +02:00
Ines Montani
d94ddd5686
Auto-detect package dependencies in spacy package (#8948)
* Auto-detect package dependencies in spacy package

* Add simple get_third_party_dependencies test

* Import packages_distributions explicitly

* Inline packages_distributions

* Fix docstring [ci skip]

* Relax catalogue requirement

* Move importlib_metadata to spacy.compat with note

* Include license information [ci skip]
2021-08-17 14:05:13 +02:00
Sofie Van Landeghem
0a6b68848f
Fix making span_group (#8975)
* fix _make_span_group

* fix imports
2021-08-17 10:36:34 +02:00
Ines Montani
593a22cf2d
Add development docs for Language and code conventions (#8745)
* WIP: add dev docs for Language / config [ci skip]

* Add section on initialization [ci skip]

* Fix wording [ci skip]

* Add code conventions WIP [ci skip]

* Update code convention docs [ci skip]

* Update contributing guide and conventions [ci skip]

* Update Code Conventions.md [ci skip]

* Clarify sourced components + vectors

* Apply suggestions from code review [ci skip]

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update wording and add link [ci skip]

* restructure slightly + extended index

* remove paragraph that breaks flow and is repeated in more detail later

* fix anchors

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-08-17 09:38:15 +02:00
Paul O'Leary McCann
9391998c77
Add notes on preparing training data to docs (#8964)
* Add training data section

Not entirely sure this is in the right location on the page - maybe it
should be after quickstart?

* Add pointer from binary format to training data section

* Minor cleanup

* Add to ToC, fix filename

* Update website/docs/usage/training.md

Co-authored-by: Ines Montani <ines@ines.io>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Move the training data section further down the page

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Run prettier

Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-16 17:37:21 +02:00
Ines Montani
a894fe0440
Merge pull request #8951 from HLasse/master 2021-08-16 11:41:32 +10:00
Lasse
839ea0f987 change tags formatting to match 2021-08-13 14:40:08 +02:00