Commit Graph

15756 Commits

Author SHA1 Message Date
kadarakos
7cf6bcca0e merge misery 2022-05-10 17:19:16 +00:00
kadarakos
e512874c80 small refactor and docs 2022-05-10 16:40:31 +00:00
Paul O'Leary McCann
33f4f90ff0 Formatting 2022-05-10 19:09:52 +09:00
Paul O'Leary McCann
41fc092674 Split span predictor model into its own file 2022-05-10 19:08:21 +09:00
Paul O'Leary McCann
f852c5cea4 Split span predictor component into its own file
This runs. The imports in both of the split files could probably use a
close check to remove extras.
2022-05-10 18:53:45 +09:00
Paul O'Leary McCann
117a9ef2bf Initial coref docs
A few unresolved points:

- SpanPredictor should probably get its own file
- What's the right way to document MentionClusters?
2022-05-10 18:33:25 +09:00
Raphael Mitsch
2904359685
Allow assets to be optional in spacy project (#10714)
* Allow assets to be optional in spacy project: draft for optional flag/download_all options.

* Allow assets to be optional in spacy project: added OPTIONAL_DEFAULT reflecting default asset optionality.

* Allow assets to be optional in spacy project: renamed --all to --extra.

* Allow assets to be optional in spacy project: included optional flag in project config test.

* Allow assets to be optional in spacy project: added documentation.

* Allow assets to be optional in spacy project: fixing deprecated --all reference.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Allow assets to be optional in spacy project: fixed project_assets() docstring.

* Allow assets to be optional in spacy project: adjusted wording in justification of optional assets.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Allow assets to be optional in spacy project: switched to  as keyword in project.yml. Updated docs.

* Allow assets to be optional in spacy project: updated comment.

* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in output.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in docstring..

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in test..

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Allow assets to be optional in spacy project: replacing 'optional' with 'extra' in test.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Allow assets to be optional in spacy project: renamed OPTIONAL_DEFAULT to EXTRA_DEFAULT.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-05-10 10:40:11 +02:00
Sofie Van Landeghem
1543558d08
Add test for old architectures (#10751)
* add v1 and v2 tests for tok2vec architectures

* textcat architectures are not "layers"

* test older textcat architectures

* test older parser architecture
2022-05-10 08:24:42 +02:00
Madeesh Kannan
733114bdd9
training.md: Fix typos (#10775) 2022-05-09 19:44:14 +02:00
svlandeg
6b51258a58 clean up unused imports + black formatting 2022-05-09 13:34:50 +02:00
Raphael Mitsch
e626df959f
Document different ways to create a pipeline (#10762)
* Document different ways to create a pipeline: moved up/slightly modified paragraph on pipeline creation.

* Document different ways to create a pipeline: changed Finnish to Ukrainian in example for language without trained pipeline.

* Document different ways to create a pipeline: added explanation of blank pipeline.

* Document different ways to create a pipeline: exchanged Ukrainian with Yoruba.
2022-05-06 15:40:59 +02:00
Richard Hudson
c32e1a0079
Updated Coreferee Universe entry (#10763) 2022-05-06 13:21:39 +02:00
Luca Dorigo
0a92d5644e
Fix StringStore.__getitem__ return type depending on parameter types (#10741)
* Fix StringStore.__getitem__ return type depending on parameter types

Small fix using  `@overload` so that `StringStore.__getitem__` returns an `int` when given a `str` or `bytes` and a `str` when given an `int`.

* Update spacy/strings.pyi

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-05-03 17:57:07 +02:00
Sofie Van Landeghem
e03b9f8095
Small doc typos (#10750)
* fix typos

* formatting
2022-05-03 13:55:27 +02:00
Raphael Mitsch
f5390e278a
Refactor error messages to remove hardcoded strings (#10729)
* Use custom error msg instead of hardcoded string: replaced remaining hardcoded error message strings.

* Use custom error msg instead of hardcoded string: fixing faulty Errors import.
2022-05-02 13:38:46 +02:00
Madeesh Kannan
0a503ce5e0
Remove vestigial debug print statement in walk_head_nodes (#10718)
* `graph`: Remove vestigial debug print statement in `walk_head_nodes`

* Revert whitespace changes

* Remove more debug print statements
2022-05-02 13:36:35 +02:00
vincent d warmerdam
f3de976513
Update universe.json to Include spaCy video #6 (#10723)
* Update universe.json

I noticed that episode 6 was missing, so I added it.

* Update universe.json

* Update universe.json
2022-05-02 13:35:14 +02:00
Adriane Boyd
497a708c71
Docs for v3.3 (#10628)
* Temporarily disable CI tests

* Start v3.3 website updates

* Add trainable lemmatizer to pipeline design

* Fix Vectors.most_similar

* Add floret vector info to pipeline design

* Add Lower and Upper Sorbian

* Add span to sidebar

* Work on release notes

* Copy from release notes

* Update pipeline design graphic

* Upgrading note about Doc.from_docs

* Add tables and details

* Update website/docs/models/index.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix da lemma acc

* Add minimal intro, various updates

* Round lemma acc

* Add section on floret / word lists

* Add new pipelines table, minor edits

* Fix displacy spans example title

* Clarify adding non-trainable lemmatizer

* Update adding-languages URLs

* Revert "Temporarily disable CI tests"

This reverts commit 1dee505920.

* Spell out words/sec

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-04-28 14:09:35 +02:00
Adriane Boyd
10377fb945
Set version to v3.3.0 (#10614)
* Set version to v3.3.0

* Revert "Temporarily skip tests that require models/compat"

This reverts commit e422101e00.
2022-04-28 13:07:49 +02:00
Raphael Mitsch
3579507ba1
Bumped black to 22.3.0 due to a fix for https://github.com/psf/black/issues/2964. (#10715) 2022-04-27 14:49:24 +02:00
harmbuisman
c066fb8a4e
#10672: fixes displacy output for manual unsorted entities (#10673)
* #10672: fixes displacy output for manual unsorted entities

* #10672: removed unused import

* fix prettier formatting

Co-authored-by: Harm Buisman <h.buisman@iknl.nl>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-04-27 09:51:58 +02:00
Sofie Van Landeghem
b3717ba53a
removing print statements from the test suite (#10712) 2022-04-27 09:14:25 +02:00
Adriane Boyd
455f089c9b
Support exclude in Doc.from_docs (#10689)
* Support exclude in Doc.from_docs

* Update API docs

* Add new tag to docs
2022-04-25 18:19:03 +02:00
Mike
3b208197c3
Fixed example for spacy_syllables (#10705)
There was a typo in the example for the spacy_syllables project.
2022-04-25 16:40:54 +02:00
github-actions[bot]
e07500369c
Auto-format code with black (#10687)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-04-22 11:24:53 +02:00
Sofie Van Landeghem
2c2dbb844c Syntax for a branch from a PR 2022-04-22 09:45:49 +02:00
Ryn Daniels
29afbdb91e
add readme for explosion-bot (#10677) 2022-04-20 09:52:34 +02:00
Richard Hudson
4b227f4861
Merge pull request #10669 from mgrojo/develop
Fix some issues in Spanish stop-word list and examples
2022-04-19 09:37:34 +02:00
mgr
3d50b1a989 Fix some issues in Spanish examples
- Spelling: nationalities in lowercase, accent.
- Incorrect verb composition
- Untranslated word
2022-04-18 22:12:57 +02:00
mgr
2a2654c756 Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:

https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100

Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
  actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
  pais, principalmente, raras

Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve

Some reformatting to 79 columns.

When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 22:04:02 +02:00
Madeesh Kannan
aa6780eb27
Matcher: Remove superfluous GIL-acquiring check in get_is_final (#10659)
* `Matcher`: Remove superfluous GIL-acquiring check in `get_is_final`

This check incurred a significant performance penalty due to  implict interactions between the GIL and Cython ref-counting code.

* `Matcher`: Inline `PatternStateC` accessors
2022-04-18 12:59:34 +02:00
Duy Ngo
229ecaf0ea
Add numbers and definitions (#10665) 2022-04-18 12:58:32 +02:00
Paul O'Leary McCann
683f470852 Merge branch 'master' into feature/coref 2022-04-18 18:39:08 +09:00
Schero1994
d622883a42
Adding and updating content in the spacy universe (#10493)
* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-04-15 15:36:54 +02:00
Joachim Fainberg
4e1716223c
displaCy: Avoid increasing levels for identical arcs (#10639)
* Test for arc levels for identical arcs

Also moves the test in order with the other numbered tests.

* displaCy: filter identical arcs

Avoid increased levels due to identical arcs by first
filtering any identical arcs.

* Sort keys before filtering

Manual entry with keys out of order would previously become
different tuples and therefore not filtered correctly.

Co-authored-by: Joachim Fainberg <joachimfainberg@Joachims-MBP.lan>
2022-04-14 16:48:00 +02:00
Philip Vollet
e63a5d4888
Update newsletter id (#10655) 2022-04-14 13:34:01 +02:00
Paul O'Leary McCann
afd255c0ed Undo multiply by 100
This was mistaken, not sure why my score seemed to be off before.
2022-04-14 18:42:09 +09:00
Paul O'Leary McCann
08729e0fbd Remove end adjustment
The difference in environments was due to a change in Thinc, the code
here is fine.
2022-04-14 18:31:30 +09:00
fonfonx
028cbad05e
Add feminine form of word "one" in French (#10653)
* Add French number

* Add fonfonx.md

* Add feminine ordinal words for French
2022-04-14 10:21:27 +02:00
Schero1994
caf8528af7
Batch #1 | spaCy universe cleanup (#10642)
* delete universe object: wmd-relax

* delete universe object: spaCy.jl

* delete universe object: saber

* delete universe object: languagecrunch

* delete universe object: gracyql

* delete universe object: ExcelCy

* delete universe object: EpiTator

Co-authored-by: schaeran <schaeran1994@gmail.com>
2022-04-14 10:08:19 +02:00
single-fingal
4228f3c757
Fix a few minor bugs in the SpanGroup API web docs (#10650)
* Fix a few minor bugs in the SpanGroup API web docs

* Update SpanGroup docs examples to have Spans reflect intended "errors"
2022-04-14 09:59:48 +02:00
Paul O'Leary McCann
8181d4570c Multiply accuracy by 100
This seems to match with the scorer expectations better
2022-04-14 15:56:38 +09:00
Paul O'Leary McCann
e8af02700f Remove all coref scoring exept LEA
This is necessary because one of the three old methods relied on scipy
for some complex problem solving. LEA is generally better for
evaluations.

The downside is that this means evaluations aren't comparable with many
papers, but canonical scoring can be supported using external eval
scripts or other methods.
2022-04-13 21:02:18 +09:00
Paul O'Leary McCann
2300f4df3d Fix span score logging 2022-04-13 20:37:06 +09:00
Paul O'Leary McCann
d470fa03c1 Adjust end indices
It's not clear if this is technically correct or not but it won't run
without it for me.
2022-04-13 20:19:21 +09:00
kadarakos
b53113e3b8
Preparing span predictor for predicting from gold (#10547)
Note this is squashed because rebasing had conflicts.

* remove unnecessary .device

* span predictor debug start

* gearing up SpanPredictor for gold-heads

* merge SpanPredictor attributes

* remove useless extra prefix and device from spanpredictor

* make sure predicted and reference keeps aligned

* handle empty head_ids

* handle empty clusters

* addressing suggestions by @polm

* nicer restore

* fix score overwriting bug

* prepare for aligned heads-spans training

* span accuracy score

* update with eg.predited as other components

* add backprop callback to spanpredictor

* report start- and end-accuracies separately

* fixing scorer

Co-authored-by: Kádár Ákos <akos@onyx.uvt.nl>
2022-04-13 19:42:49 +09:00
Adriane Boyd
64602d997d
Require srsly v2.4.3+ due to buffer overflow vulnerability (#10651) 2022-04-13 11:41:40 +02:00
Richard Hudson
75fbbcdc18
Display warning when spacy.explain() finds no term (#10645)
* Display warning when spacy.explain() finds no term

* Updated warning message text
2022-04-12 10:48:28 +02:00
Kádár Ákos
6aedd98d02 fixing scorer 2022-04-11 16:10:14 +02:00
Kádár Ákos
7a239f2ec7 report start- and end-accuracies separately 2022-04-08 14:57:19 +02:00