Commit Graph

14847 Commits

Author SHA1 Message Date
Paul O'Leary McCann
fd759a881b
Fix inconsistent lemmas (#9405)
* Add util function to unique lists and preserve order

* Use unique function instead of list(set())

list(set()) has the issue that it's not consistent between runs of the
Python interpreter, so order can vary.

list(set()) calls were left in a few places where they were behind calls
to sorted(). I think in this case the calls to list() can be removed,
but this commit doesn't do that.

* Use the existing pattern for this
2021-10-11 11:38:45 +02:00
Adriane Boyd
fd7edbc645
Fix types descriptions of sm and sent models (#9401) 2021-10-11 11:17:18 +02:00
Adriane Boyd
a5231cb044
Remove traces of lexemes from vocab serialization (#9400) 2021-10-11 11:13:35 +02:00
Jette16
3b144a3a51 Add universe test (#9278)
* Added test for universe.json

* Added contributor agreement

* Ran black on test_universe_json.py
2021-10-11 11:08:46 +02:00
Ines Montani
5003a9c3c7
Move core training logic in CLI into standalone function (#9398) 2021-10-11 10:56:14 +02:00
Paul O'Leary McCann
2a7e327310
Fix Dependency Matcher Ordering Issue (#9337)
* Fix inconsistency

This makes the failing test pass, so that behavior is consistent whether
patterns are added in one call or two.

The issue is that the hash for patterns depended on the index of the
pattern in the list of current patterns, not the list of total patterns,
so a second call would get identical match ids.

* Add illustrative test case

* Add failing test for remove case

Patterns are not removed from the internal matcher on calls to remove,
which causes spurious weird matches (or misses).

* Fix removal issue

Remove patterns from the internal matcher.

* Check that the single add call also gets no matches
2021-10-11 10:26:13 +02:00
Paul O'Leary McCann
5dbe4e8392 Update new issue config with Python 3.10 info
Also adds note that Install issues go to Discussions.
2021-10-11 15:41:32 +09:00
Paul O'Leary McCann
48ba4e60f4
Add new style citation file (#9388) 2021-10-07 17:47:39 +02:00
Sofie Van Landeghem
f87ae3cb7d
Doc fixes in convert API (#9350)
* add more info on the spacy debug command

* formatting
2021-10-06 13:13:18 +09:00
Adriane Boyd
4192e71599
Sync vocab in vectors and components sourced in configs (#9335)
Since a component may reference anything in the vocab, share the full
vocab when loading source components and vectors (which will include
`strings` as of #8909).

When loading a source component from a config, save and restore the
vocab state after loading source pipelines, in particular to preserve
the original state without vectors, since `[initialize.vectors]
= null` skips rather than resets the vectors.

The vocab references are not synced for components loaded with
`Language.add_pipe(source=)` because the pipelines are already loaded
and not necessarily with the same vocab. A warning could be added in
`Language.create_pipe_from_source` that it may be necessary to save and
reload before training, but it's a rare enough case that this kind of
warning may be too noisy overall.
2021-10-04 12:19:02 +02:00
Paul O'Leary McCann
6e833b617a
Updating Troubleshooting Docs (#9329)
* Add link to Discussions FAQ

* Remove old FAQ entries

I think these are no longer relevant.

- no-cache-dir: affected pip versions are *very* old now
- narrow unicode: not an issue from py3.3+
- utf-8 osx: upstream bug closed in 2019

Some of the other issues are also maybe not frequent.
2021-10-01 12:28:22 +02:00
github-actions[bot]
42a76c758f
Auto-format code with black (#9346)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-10-01 11:17:11 +02:00
Adriane Boyd
b3192ddea3
Sync thinc install dep in setup, fix test packaging (#9336)
* Sync thinc install dep in setup

* Add __init__.py to include package tests in package

* Include *.toml in package
2021-09-30 19:02:10 +02:00
Paul O'Leary McCann
78a88f7de7 Fix invalid json 2021-09-30 15:23:55 +09:00
Martin Vallone
a14ab7e882
Adding PhruzzMatcher to spaCy universe (#9321)
* Adding PhruzzMatcher to spaCy universe

* Fixes to make the package work properly
2021-09-30 13:46:53 +09:00
Adriane Boyd
e750c1760c
Restore tokenization timing in Language.evaluate (#9305)
Restore tokenization timing steps that were accidentally removed in #6765.
2021-09-27 20:44:14 +02:00
Sofie Van Landeghem
a361df00cd
Raise E983 early on in docbin init (#9247)
* raise E983 early on in docbin init

* catch situation before error is raised

* add more info on the spacy debug command
2021-09-27 20:43:03 +02:00
Adriane Boyd
effae12cbd
Update slow readers test to use textcat_multilabel (#9300) 2021-09-27 20:04:02 +02:00
github-actions[bot]
4da2af4e0e
Auto-format code with black (#9284)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-09-24 10:46:43 +02:00
Ines Montani
6bb0324b81 Adjust kb_id visualizer templating and docs 2021-09-23 11:59:02 +02:00
Ines Montani
beb4a8c524
Merge pull request #9199 from shigapov/master (resolves #9129) 2021-09-23 19:41:53 +10:00
Philip Vollet
d2adfe1efa
Add projects to spaCy Universe (#9269)
* Added spaCy Universe projects

* Added user license agreement Philip Vollet

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-23 10:56:45 +02:00
Ines Montani
57b5fc1995
Apply suggestions from code review
Co-authored-by: Renat Shigapov <57352291+shigapov@users.noreply.github.com>
2021-09-23 17:58:32 +10:00
Sofie Van Landeghem
3fc3b7a13a
avoid crash when unicode in title (#9254) 2021-09-22 21:01:34 +02:00
Daniël de Kok
17802836be
Allow overriding vars in the project assets subcommand (#9248)
This change makes the `project assets` subcommand accept variables to
override as well, making the interface more similar to `project run`.
2021-09-21 10:49:45 +02:00
Adriane Boyd
00bdb31150
Fix vector for 0-length span (#9244) 2021-09-20 20:22:49 +02:00
github-actions[bot]
015d439eb6
Auto-format code with black (#9234)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-09-20 08:49:19 +02:00
Edward
8bda39f088
Update Hammurabi example code to v3 (#9218)
* Update Hammurabi example code

* Fix typo
2021-09-16 13:32:44 +02:00
Paul O'Leary McCann
c4f0800fb8
Validate pos values when creating Doc (#9148)
* Validate pos values when creating Doc

* Add clear error when setting invalid pos

This also changes the error language slightly.

* Fix variable name

* Update spacy/tokens/doc.pyx

* Test that setting invalid pos raises an error

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-16 13:28:05 +02:00
Jozef Harag
865cfbc903
feat: add spacy.WandbLogger.v3 with optional run_name and entity parameters (#9202)
* feat: add `spacy.WandbLogger.v3` with optional `run_name` and `entity` parameters

* update versioning in docs

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-09-16 12:26:41 +02:00
Sofie Van Landeghem
00836c2d7d
Update spacy/displacy/templates.py 2021-09-16 09:23:21 +02:00
Sofie Van Landeghem
4bf2606adf
Update spacy/displacy/render.py
Co-authored-by: Renat Shigapov <57352291+shigapov@users.noreply.github.com>
2021-09-16 09:22:38 +02:00
Paul O'Leary McCann
1d57d78758 Make docs consistent (fix #9126) 2021-09-16 15:54:12 +09:00
Paul O'Leary McCann
9ceb8f413c
StringStore/Vocab dev docs (#9142)
* First take at StringStore/Vocab docs

Things to check:

1. The mysterious vocab members
2. How to make table of contents? Is it autogenerated?
3. Anything I missed / needs more detail?

* Update docs

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Updates based on review feedback

* Minor fix

* Move example code down

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-16 12:50:22 +09:00
Ines Montani
20f63e7154
Only include runtime-relevant config in package CLI dependency detection (#9211) 2021-09-15 23:16:01 +02:00
Adriane Boyd
d74870d38c
Prepare for v3.1.3 (#9200)
* Update thinc and spacy-legacy requirements

* Set version to v3.1.3
2021-09-14 11:03:51 +02:00
Renat Shigapov
d5cc009faf
Merge branch 'explosion:master' into master 2021-09-13 08:43:48 +02:00
Renat Shigapov
e61d93f8c3
add NEL-visualisation to manual-usage 2021-09-13 08:38:58 +02:00
Renat Shigapov
f4b5c4209d
specify kb_id and kb_url for URL visualisation 2021-09-13 08:15:07 +02:00
Renat Shigapov
7562fb5354
add links to entities into the TPL_ENT-template 2021-09-13 08:06:54 +02:00
Paul O'Leary McCann
f89e1c34c9
Minor typo fix in docs 2021-09-11 14:22:05 +09:00
Renat Shigapov
646f3a54db
added spaCyOpenTapioca (#9181)
* add spaCyOpenTapioca to universe

* add agreement

* fix misprint in tags
2021-09-11 13:16:51 +09:00
mylibrar
ee28aac68e
Update example code of forte (#9175)
Co-authored-by: Suqi Sun <suqi.sun@petuum.com>
2021-09-11 13:13:13 +09:00
j-frei
462b009648
Correct parser.py use_upper param info (#9180) 2021-09-10 16:19:58 +02:00
Renat Shigapov
c1927fe994
fix misprint in tags 2021-09-09 15:37:34 +02:00
Renat Shigapov
8940e0baca
add agreement 2021-09-09 15:33:29 +02:00
Renat Shigapov
ea58294076
add spaCyOpenTapioca to universe 2021-09-09 15:13:18 +02:00
Adriane Boyd
aba6ce3a43
Handle spacy-legacy in package CLI for dependencies (#9163)
* Handle spacy-legacy in package CLI for dependencies

* Implement legacy backoff in spacy registry.find

* Remove unused import

* Update and format test
2021-09-08 11:46:40 +02:00
Sofie Van Landeghem
632d8d4c35
bump thinc to 8.0.9 (#9133) 2021-09-03 13:34:42 +02:00
github-actions[bot]
584fae5807
Auto-format code with black (#9130)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-09-03 10:47:03 +02:00