Commit Graph

14946 Commits

Author SHA1 Message Date
mylibrar
d621df6422 Update example code of forte (#9175)
Co-authored-by: Suqi Sun <suqi.sun@petuum.com>
2021-09-11 13:25:17 +09:00
Sofie Van Landeghem
721f4554c8 matcher doc corrections (#9115)
* update error message to current UX

* clarify uppercase effect

* fix docstring
2021-09-02 09:29:44 +02:00
Paul O'Leary McCann
752696f134 Document Assigned Attributes of Pipeline Components (#9041)
* Add textcat docs

* Add NER docs

* Add Entity Linker docs

* Add assigned fields docs for the tagger

This also adds a preamble, since there wasn't one.

* Add morphologizer docs

* Add dependency parser docs

* Update entityrecognizer docs

This is a little weird because `Doc.ents` is the only thing assigned to,
but it's actually a bidirectional property.

* Add token fields for entityrecognizer

* Fix section name

* Add entity ruler docs

* Add lemmatizer docs

* Add sentencizer/recognizer docs

* Update website/docs/api/entityrecognizer.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/entityruler.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/tagger.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/entityruler.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update type for Doc.ents

This was `Tuple[Span, ...]` everywhere but `Tuple[Span]` seems to be
correct.

* Run prettier

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Run prettier

* Add transformers section

This basically just moves and renames the "custom attributes" section
from the bottom of the page to be consistent with "assigned attributes"
on other pages.

I looked at moving the paragraph just above the section into the
section, but it includes the unrelated registry additions, so it seemed
better to leave it unchanged.

* Make table header consistent

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-01 12:10:52 +02:00
svlandeg
3f16c45281 Merge branch 'spacy.io' of https://github.com/explosion/spaCy into spacy.io 2021-08-31 10:58:40 +02:00
Davide Fiocco
5c88998b9d Fix point typo on docbin docs (#9097) 2021-08-31 10:58:31 +02:00
Ines Montani
753149bc88 Update references to contributor agreement [ci skip] 2021-08-31 10:58:22 +02:00
Meenal Jhajharia
db42ba5240 benepar usage example has deprecated imports 2021-08-29 14:44:18 +09:00
Sofie Van Landeghem
689535c264 config is not Optional (#9024) 2021-08-27 11:53:54 +02:00
Sofie Van Landeghem
8c1d86ea92 Document use-case of freezing tok2vec (#8992)
* update error msg

* add sentence to docs

* expand note on frozen components
2021-08-26 09:53:29 +02:00
Sofie Van Landeghem
31c0a75e6d fix docs for Span constructor arguments (#9023) 2021-08-26 09:52:59 +02:00
svlandeg
fb8c2f794a Merge remote-tracking branch 'upstream/master' into spacy.io 2021-08-20 14:49:51 +02:00
Sofie Van Landeghem
e1f88de729
bump to 3.1.2 (#9008) 2021-08-20 12:41:09 +02:00
Sofie Van Landeghem
4d52d7051c
Fix spancat training on nested entities (#9007)
* overfitting test on non-overlapping entities

* add failing overfitting test for overlapping entities

* failing test for list comprehension

* remove test that was put in separate PR

* bugfix

* cleanup
2021-08-20 12:37:50 +02:00
Paul O'Leary McCann
9cc3dc2b67
Add glossary entry for _SP (#8983) 2021-08-20 12:04:02 +02:00
Sofie Van Landeghem
de025beb5f
Warn and document spangroup.doc weakref (#8980)
* test for error after Doc has been garbage collected

* warn about using a SpanGroup when the Doc has been garbage collected

* add warning to the docs

* rephrase slightly

* raise error instead of warning

* update

* move warning to doc property
2021-08-20 11:06:19 +02:00
Paul O'Leary McCann
0e4da8ed70 Fix type annotation in docs 2021-08-20 15:35:41 +09:00
Paul O'Leary McCann
37fe847af4 Fix type annotation in docs 2021-08-20 15:34:22 +09:00
Ines Montani
8444aa75e2 Fix universe.json [ci skip] 2021-08-20 11:26:46 +10:00
Ines Montani
f2b61b77a5 Fix universe.json [ci skip] 2021-08-20 11:26:29 +10:00
Ines Montani
f2d19e6dc2 Merge pull request #9003 from bbieniek/add-spacy-api-v3 [ci skip] 2021-08-20 11:23:50 +10:00
Ines Montani
894e16f5ca
Merge pull request #9003 from bbieniek/add-spacy-api-v3 [ci skip] 2021-08-20 11:23:30 +10:00
Baltazar
4d85cb88a5 added contribution license 2021-08-19 21:45:18 +02:00
Baltazar
71e65fe943 added spacy api v3 docker 2021-08-19 21:29:25 +02:00
Adriane Boyd
6722dc3dc5
Fix allow_overlap default for spancat scoring (#8970)
* Remove irrelevant default options
2021-08-18 09:56:56 +02:00
Steele Farnsworth
b18cb1cd2a
Refactor dependencymatcher.pyx to use list comps and enumerate. (#8956)
* Refactor to use list comps and enumerate.

Replace loops that append to a list with a list comprehensions where this does not change the behavior; replace range(len(...)) loops with enumerate. Correct one typo in a comment. Replace a call to set() with a set literal.

* Undo double assignment.

Expand `tokens_to_key[j] = k = self._get_matcher_key(key, i, j)` to two statements.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Sign contributors agreement

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-18 09:55:45 +02:00
Ines Montani
d94ddd5686
Auto-detect package dependencies in spacy package (#8948)
* Auto-detect package dependencies in spacy package

* Add simple get_third_party_dependencies test

* Import packages_distributions explicitly

* Inline packages_distributions

* Fix docstring [ci skip]

* Relax catalogue requirement

* Move importlib_metadata to spacy.compat with note

* Include license information [ci skip]
2021-08-17 14:05:13 +02:00
Sofie Van Landeghem
0a6b68848f
Fix making span_group (#8975)
* fix _make_span_group

* fix imports
2021-08-17 10:36:34 +02:00
Ines Montani
593a22cf2d
Add development docs for Language and code conventions (#8745)
* WIP: add dev docs for Language / config [ci skip]

* Add section on initialization [ci skip]

* Fix wording [ci skip]

* Add code conventions WIP [ci skip]

* Update code convention docs [ci skip]

* Update contributing guide and conventions [ci skip]

* Update Code Conventions.md [ci skip]

* Clarify sourced components + vectors

* Apply suggestions from code review [ci skip]

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update wording and add link [ci skip]

* restructure slightly + extended index

* remove paragraph that breaks flow and is repeated in more detail later

* fix anchors

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-08-17 09:38:15 +02:00
Paul O'Leary McCann
4ed5d9ad5a Add notes on preparing training data to docs (#8964)
* Add training data section

Not entirely sure this is in the right location on the page - maybe it
should be after quickstart?

* Add pointer from binary format to training data section

* Minor cleanup

* Add to ToC, fix filename

* Update website/docs/usage/training.md

Co-authored-by: Ines Montani <ines@ines.io>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Move the training data section further down the page

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Run prettier

Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-16 17:39:19 +02:00
Paul O'Leary McCann
9391998c77
Add notes on preparing training data to docs (#8964)
* Add training data section

Not entirely sure this is in the right location on the page - maybe it
should be after quickstart?

* Add pointer from binary format to training data section

* Minor cleanup

* Add to ToC, fix filename

* Update website/docs/usage/training.md

Co-authored-by: Ines Montani <ines@ines.io>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Move the training data section further down the page

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Run prettier

Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-16 17:37:21 +02:00
Ines Montani
d65e03adae Merge pull request #8951 from HLasse/master 2021-08-16 11:41:53 +10:00
Ines Montani
a894fe0440
Merge pull request #8951 from HLasse/master 2021-08-16 11:41:32 +10:00
Lasse
839ea0f987 change tags formatting to match 2021-08-13 14:40:08 +02:00
Lasse
70ab596f61 Merge branch 'master' of https://github.com/HLasse/spaCy 2021-08-13 14:35:21 +02:00
Lasse
195e4e48c3 add textdescriptives to universe 2021-08-13 14:35:18 +02:00
github-actions[bot]
92071326d8
Auto-format code with black (#8950)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-08-13 11:48:38 +02:00
Adriane Boyd
8448c7dbc5
Update da trf recommendation (#8921)
Update the da trf recommendation to the same model used in the
pretrained pipelines.
2021-08-12 13:54:02 +02:00
Ines Montani
647abe186c Merge pull request #8938 from explosion/docs/prodigy-v1-11-project [ci skip]
Update Prodigy project template for v1.11
2021-08-12 21:17:14 +10:00
Ines Montani
6260f044cc
Merge pull request #8938 from explosion/docs/prodigy-v1-11-project [ci skip]
Update Prodigy project template for v1.11
2021-08-12 21:16:49 +10:00
Ines Montani
4f769ff913 Update Prodigy project template for v1.11 [ci skip] 2021-08-12 13:46:20 +10:00
Paul O'Leary McCann
e227d24d43
Allow passing in array vars for speedup (#8882)
* Allow passing in array vars for speedup

This fixes #8845. Not sure about the docstring changes here...

* Update docs

Types maybe need more detail? Maybe not?

* Run prettier on docs

* Update spacy/tokens/span.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-10 15:13:53 +02:00
Paul O'Leary McCann
6029cfc391
Add scores to output in spancat (#8855)
* Add scores to output in spancat

This exposes the scores as an attribute on the SpanGroup. Includes a
basic test.

* Add basic doc note

* Vectorize score calcs

* Add "annotation format" section

* Update website/docs/api/spancategorizer.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Clean up doc section

* Ran prettier on docs

* Get arrays off the gpu before iterating over them

* Remove int() calls

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-08-10 13:47:49 +02:00
Ines Montani
c581848cbb Merge pull request #8910 from DuyguA/patch-1 [ci skip]
updated unv json for new book
2021-08-09 23:13:17 +10:00
Ines Montani
a1e9f19460
Merge pull request #8910 from DuyguA/patch-1 [ci skip]
updated unv json for new book
2021-08-09 23:12:50 +10:00
Paul O'Leary McCann
35255786a1 Fix #8902 (bad link in docs)
typo fix
2021-08-09 13:59:59 +02:00
Duygu Altinok
380b2817cf
updated unv json for new book 2021-08-09 12:39:22 +02:00
Paul O'Leary McCann
cac298471f
Fix #8902 (bad link in docs)
typo fix
2021-08-08 22:04:00 +09:00
Eduard Zorita
439f30faad
Add stub files for main cython classes (#8427)
* Add stub files for main API classes

* Add contributor agreement for ezorita

* Update types for ndarray and hash()

* Fix __getitem__ and __iter__

* Add attributes of Doc and Token classes

* Overload type hints for Span.__getitem__

* Fix type hint overload for Span.__getitem__

Co-authored-by: Luca Dorigo <dorigoluca@gmail.com>
2021-08-07 12:30:03 +02:00
github-actions[bot]
56d4d87aeb
Auto-format code with black (#8895)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-08-06 13:38:06 +02:00
Kabir Khan
1dfffe5fb4
No output info message in train (#8885)
* Add info message that no output directory was provided in train

* Update train.py

* Fix logging
2021-08-05 09:21:22 +02:00