Commit Graph

15078 Commits

Author SHA1 Message Date
Paul O'Leary McCann
fd759a881b
Fix inconsistent lemmas (#9405)
* Add util function to unique lists and preserve order

* Use unique function instead of list(set())

list(set()) has the issue that it's not consistent between runs of the
Python interpreter, so order can vary.

list(set()) calls were left in a few places where they were behind calls
to sorted(). I think in this case the calls to list() can be removed,
but this commit doesn't do that.

* Use the existing pattern for this
2021-10-11 11:38:45 +02:00
Adriane Boyd
fd91e6a33c Fix types descriptions of sm and sent models (#9401) 2021-10-11 11:18:10 +02:00
Adriane Boyd
fd7edbc645
Fix types descriptions of sm and sent models (#9401) 2021-10-11 11:17:18 +02:00
Adriane Boyd
bbe4d3300a Remove traces of lexemes from vocab serialization (#9400) 2021-10-11 11:15:51 +02:00
Sofie Van Landeghem
a6ac36bcb3 Doc fixes in convert API (#9350)
* add more info on the spacy debug command

* formatting
2021-10-11 11:15:20 +02:00
Adriane Boyd
a5231cb044
Remove traces of lexemes from vocab serialization (#9400) 2021-10-11 11:13:35 +02:00
Jette16
3b144a3a51 Add universe test (#9278)
* Added test for universe.json

* Added contributor agreement

* Ran black on test_universe_json.py
2021-10-11 11:08:46 +02:00
Ines Montani
5003a9c3c7
Move core training logic in CLI into standalone function (#9398) 2021-10-11 10:56:14 +02:00
Paul O'Leary McCann
2a7e327310
Fix Dependency Matcher Ordering Issue (#9337)
* Fix inconsistency

This makes the failing test pass, so that behavior is consistent whether
patterns are added in one call or two.

The issue is that the hash for patterns depended on the index of the
pattern in the list of current patterns, not the list of total patterns,
so a second call would get identical match ids.

* Add illustrative test case

* Add failing test for remove case

Patterns are not removed from the internal matcher on calls to remove,
which causes spurious weird matches (or misses).

* Fix removal issue

Remove patterns from the internal matcher.

* Check that the single add call also gets no matches
2021-10-11 10:26:13 +02:00
Paul O'Leary McCann
5dbe4e8392 Update new issue config with Python 3.10 info
Also adds note that Install issues go to Discussions.
2021-10-11 15:41:32 +09:00
Paul O'Leary McCann
48ba4e60f4
Add new style citation file (#9388) 2021-10-07 17:47:39 +02:00
Sofie Van Landeghem
f87ae3cb7d
Doc fixes in convert API (#9350)
* add more info on the spacy debug command

* formatting
2021-10-06 13:13:18 +09:00
Adriane Boyd
4192e71599
Sync vocab in vectors and components sourced in configs (#9335)
Since a component may reference anything in the vocab, share the full
vocab when loading source components and vectors (which will include
`strings` as of #8909).

When loading a source component from a config, save and restore the
vocab state after loading source pipelines, in particular to preserve
the original state without vectors, since `[initialize.vectors]
= null` skips rather than resets the vectors.

The vocab references are not synced for components loaded with
`Language.add_pipe(source=)` because the pipelines are already loaded
and not necessarily with the same vocab. A warning could be added in
`Language.create_pipe_from_source` that it may be necessary to save and
reload before training, but it's a rare enough case that this kind of
warning may be too noisy overall.
2021-10-04 12:19:02 +02:00
Paul O'Leary McCann
23badbd55c Updating Troubleshooting Docs (#9329)
* Add link to Discussions FAQ

* Remove old FAQ entries

I think these are no longer relevant.

- no-cache-dir: affected pip versions are *very* old now
- narrow unicode: not an issue from py3.3+
- utf-8 osx: upstream bug closed in 2019

Some of the other issues are also maybe not frequent.
2021-10-01 12:31:41 +02:00
Paul O'Leary McCann
6e833b617a
Updating Troubleshooting Docs (#9329)
* Add link to Discussions FAQ

* Remove old FAQ entries

I think these are no longer relevant.

- no-cache-dir: affected pip versions are *very* old now
- narrow unicode: not an issue from py3.3+
- utf-8 osx: upstream bug closed in 2019

Some of the other issues are also maybe not frequent.
2021-10-01 12:28:22 +02:00
github-actions[bot]
42a76c758f
Auto-format code with black (#9346)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-10-01 11:17:11 +02:00
Adriane Boyd
b3192ddea3
Sync thinc install dep in setup, fix test packaging (#9336)
* Sync thinc install dep in setup

* Add __init__.py to include package tests in package

* Include *.toml in package
2021-09-30 19:02:10 +02:00
Paul O'Leary McCann
0508795d67 Fix invalid json 2021-09-30 15:24:47 +09:00
Paul O'Leary McCann
78a88f7de7 Fix invalid json 2021-09-30 15:23:55 +09:00
Martin Vallone
f15bb40941 Adding PhruzzMatcher to spaCy universe (#9321)
* Adding PhruzzMatcher to spaCy universe

* Fixes to make the package work properly
2021-09-30 14:26:40 +09:00
Martin Vallone
a14ab7e882
Adding PhruzzMatcher to spaCy universe (#9321)
* Adding PhruzzMatcher to spaCy universe

* Fixes to make the package work properly
2021-09-30 13:46:53 +09:00
Adriane Boyd
e750c1760c
Restore tokenization timing in Language.evaluate (#9305)
Restore tokenization timing steps that were accidentally removed in #6765.
2021-09-27 20:44:14 +02:00
Sofie Van Landeghem
a361df00cd
Raise E983 early on in docbin init (#9247)
* raise E983 early on in docbin init

* catch situation before error is raised

* add more info on the spacy debug command
2021-09-27 20:43:03 +02:00
Adriane Boyd
effae12cbd
Update slow readers test to use textcat_multilabel (#9300) 2021-09-27 20:04:02 +02:00
github-actions[bot]
4da2af4e0e
Auto-format code with black (#9284)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-09-24 10:46:43 +02:00
Ines Montani
6bb0324b81 Adjust kb_id visualizer templating and docs 2021-09-23 11:59:02 +02:00
Ines Montani
beb4a8c524
Merge pull request #9199 from shigapov/master (resolves #9129) 2021-09-23 19:41:53 +10:00
Philip Vollet
d2adfe1efa
Add projects to spaCy Universe (#9269)
* Added spaCy Universe projects

* Added user license agreement Philip Vollet

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-23 10:56:45 +02:00
Ines Montani
57b5fc1995
Apply suggestions from code review
Co-authored-by: Renat Shigapov <57352291+shigapov@users.noreply.github.com>
2021-09-23 17:58:32 +10:00
Sofie Van Landeghem
3fc3b7a13a
avoid crash when unicode in title (#9254) 2021-09-22 21:01:34 +02:00
Daniël de Kok
17802836be
Allow overriding vars in the project assets subcommand (#9248)
This change makes the `project assets` subcommand accept variables to
override as well, making the interface more similar to `project run`.
2021-09-21 10:49:45 +02:00
Adriane Boyd
00bdb31150
Fix vector for 0-length span (#9244) 2021-09-20 20:22:49 +02:00
svlandeg
ec621e6853 Merge remote-tracking branch 'upstream/master' into spacy.io 2021-09-20 15:54:00 +02:00
svlandeg
e0e3e9653b Revert "raise E983 early on in docbin init"
This reverts commit f3f7afa21f.
2021-09-20 15:52:02 +02:00
svlandeg
f3f7afa21f raise E983 early on in docbin init 2021-09-20 15:49:31 +02:00
github-actions[bot]
015d439eb6
Auto-format code with black (#9234)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-09-20 08:49:19 +02:00
Edward
79c7c62970 Update Hammurabi example code to v3 (#9218)
* Update Hammurabi example code

* Fix typo
2021-09-16 13:35:00 +02:00
Edward
8bda39f088
Update Hammurabi example code to v3 (#9218)
* Update Hammurabi example code

* Fix typo
2021-09-16 13:32:44 +02:00
Paul O'Leary McCann
c4f0800fb8
Validate pos values when creating Doc (#9148)
* Validate pos values when creating Doc

* Add clear error when setting invalid pos

This also changes the error language slightly.

* Fix variable name

* Update spacy/tokens/doc.pyx

* Test that setting invalid pos raises an error

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-16 13:28:05 +02:00
Jozef Harag
865cfbc903
feat: add spacy.WandbLogger.v3 with optional run_name and entity parameters (#9202)
* feat: add `spacy.WandbLogger.v3` with optional `run_name` and `entity` parameters

* update versioning in docs

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-09-16 12:26:41 +02:00
Sofie Van Landeghem
00836c2d7d
Update spacy/displacy/templates.py 2021-09-16 09:23:21 +02:00
Sofie Van Landeghem
4bf2606adf
Update spacy/displacy/render.py
Co-authored-by: Renat Shigapov <57352291+shigapov@users.noreply.github.com>
2021-09-16 09:22:38 +02:00
Paul O'Leary McCann
fd99438fb2 Make docs consistent (fix #9126) 2021-09-16 15:56:19 +09:00
Paul O'Leary McCann
1d57d78758 Make docs consistent (fix #9126) 2021-09-16 15:54:12 +09:00
Paul O'Leary McCann
9ceb8f413c
StringStore/Vocab dev docs (#9142)
* First take at StringStore/Vocab docs

Things to check:

1. The mysterious vocab members
2. How to make table of contents? Is it autogenerated?
3. Anything I missed / needs more detail?

* Update docs

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Updates based on review feedback

* Minor fix

* Move example code down

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-16 12:50:22 +09:00
Ines Montani
20f63e7154
Only include runtime-relevant config in package CLI dependency detection (#9211) 2021-09-15 23:16:01 +02:00
Adriane Boyd
d74870d38c
Prepare for v3.1.3 (#9200)
* Update thinc and spacy-legacy requirements

* Set version to v3.1.3
2021-09-14 11:03:51 +02:00
j-frei
5d0cc0d2ab Correct parser.py use_upper param info (#9180) 2021-09-13 09:29:11 +02:00
Renat Shigapov
d5cc009faf
Merge branch 'explosion:master' into master 2021-09-13 08:43:48 +02:00
Renat Shigapov
e61d93f8c3
add NEL-visualisation to manual-usage 2021-09-13 08:38:58 +02:00