Commit Graph

15443 Commits

Author SHA1 Message Date
richardpaulhudson
5d150f2fc0 Add comment 2022-07-20 11:18:57 +02:00
richardpaulhudson
9d005a7d1e Correct test 2022-07-20 08:59:09 +02:00
richardpaulhudson
bc79e5a41d Fix for Windows 2022-07-19 19:44:37 +02:00
richardpaulhudson
d87fcabc66 Corrections / improvements 2022-07-19 19:25:16 +02:00
richardpaulhudson
44d51f4103 Add failure test 2022-07-19 17:41:47 +02:00
richardpaulhudson
012578f0f6 Improvements / corrections 2022-07-19 17:10:59 +02:00
richardpaulhudson
2c1f58e74f Formal state machine 2022-07-19 15:17:25 +02:00
richardpaulhudson
83d0738ff3 Remove unnecessary changes 2022-07-19 14:17:10 +02:00
richardpaulhudson
4c2fc56a5b Refactoring into separate module 2022-07-19 14:08:46 +02:00
richardpaulhudson
e9ee680873 Saved (intermediate version, doesn't compile yet) 2022-07-18 18:26:11 +02:00
richardpaulhudson
ca403f8b02 Seems to work, not yet tested in a structured way 2022-07-18 16:40:18 +02:00
richardpaulhudson
3c64e825d0 Correction 2022-07-14 22:19:57 +02:00
richardpaulhudson
9d7a79e305 First draft of new implementation (incomplete, doesn't run yet) 2022-07-14 22:14:57 +02:00
richardpaulhudson
e6a11b58ca Extend scope of test 2022-06-20 14:34:09 +02:00
richardpaulhudson
1cdb92d1bf Correction 2022-06-20 13:03:18 +02:00
richardpaulhudson
5de1009654 Add max_parallel_processes documentation 2022-06-20 12:47:41 +02:00
richardpaulhudson
9e665f9ad2 Changes after internal discussions 2022-06-20 12:38:22 +02:00
richardpaulhudson
2eb13f2656 Readability improvement 2022-05-23 10:13:03 +02:00
richardpaulhudson
4daffdd653 Changes based on review 2022-05-23 10:03:59 +02:00
richardpaulhudson
ae825686aa Changes after review 2022-05-20 21:50:56 +02:00
richardpaulhudson
a481698ae6 Corrections 2022-05-10 09:58:47 +02:00
richardpaulhudson
8c8b81a413 Fixed formatting issues 2022-05-10 09:51:37 +02:00
richardpaulhudson
a2bd489a8c Secondary functionality and documentation 2022-05-10 09:40:10 +02:00
richardpaulhudson
e3b4ee7b15 Mypy corrections 2022-05-09 19:03:57 +02:00
richardpaulhudson
12e86004c8 Basic multiprocessing functionality 2022-05-09 17:30:26 +02:00
richardpaulhudson
8d08a68174 Permit multiprocessing groups in YAML 2022-05-09 12:50:25 +02:00
Raphael Mitsch
e626df959f
Document different ways to create a pipeline (#10762)
* Document different ways to create a pipeline: moved up/slightly modified paragraph on pipeline creation.

* Document different ways to create a pipeline: changed Finnish to Ukrainian in example for language without trained pipeline.

* Document different ways to create a pipeline: added explanation of blank pipeline.

* Document different ways to create a pipeline: exchanged Ukrainian with Yoruba.
2022-05-06 15:40:59 +02:00
Richard Hudson
c32e1a0079
Updated Coreferee Universe entry (#10763) 2022-05-06 13:21:39 +02:00
Luca Dorigo
0a92d5644e
Fix StringStore.__getitem__ return type depending on parameter types (#10741)
* Fix StringStore.__getitem__ return type depending on parameter types

Small fix using  `@overload` so that `StringStore.__getitem__` returns an `int` when given a `str` or `bytes` and a `str` when given an `int`.

* Update spacy/strings.pyi

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-05-03 17:57:07 +02:00
Sofie Van Landeghem
e03b9f8095
Small doc typos (#10750)
* fix typos

* formatting
2022-05-03 13:55:27 +02:00
Raphael Mitsch
f5390e278a
Refactor error messages to remove hardcoded strings (#10729)
* Use custom error msg instead of hardcoded string: replaced remaining hardcoded error message strings.

* Use custom error msg instead of hardcoded string: fixing faulty Errors import.
2022-05-02 13:38:46 +02:00
Madeesh Kannan
0a503ce5e0
Remove vestigial debug print statement in walk_head_nodes (#10718)
* `graph`: Remove vestigial debug print statement in `walk_head_nodes`

* Revert whitespace changes

* Remove more debug print statements
2022-05-02 13:36:35 +02:00
vincent d warmerdam
f3de976513
Update universe.json to Include spaCy video #6 (#10723)
* Update universe.json

I noticed that episode 6 was missing, so I added it.

* Update universe.json

* Update universe.json
2022-05-02 13:35:14 +02:00
Adriane Boyd
497a708c71
Docs for v3.3 (#10628)
* Temporarily disable CI tests

* Start v3.3 website updates

* Add trainable lemmatizer to pipeline design

* Fix Vectors.most_similar

* Add floret vector info to pipeline design

* Add Lower and Upper Sorbian

* Add span to sidebar

* Work on release notes

* Copy from release notes

* Update pipeline design graphic

* Upgrading note about Doc.from_docs

* Add tables and details

* Update website/docs/models/index.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix da lemma acc

* Add minimal intro, various updates

* Round lemma acc

* Add section on floret / word lists

* Add new pipelines table, minor edits

* Fix displacy spans example title

* Clarify adding non-trainable lemmatizer

* Update adding-languages URLs

* Revert "Temporarily disable CI tests"

This reverts commit 1dee505920.

* Spell out words/sec

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-04-28 14:09:35 +02:00
Adriane Boyd
10377fb945
Set version to v3.3.0 (#10614)
* Set version to v3.3.0

* Revert "Temporarily skip tests that require models/compat"

This reverts commit e422101e00.
2022-04-28 13:07:49 +02:00
Raphael Mitsch
3579507ba1
Bumped black to 22.3.0 due to a fix for https://github.com/psf/black/issues/2964. (#10715) 2022-04-27 14:49:24 +02:00
harmbuisman
c066fb8a4e
#10672: fixes displacy output for manual unsorted entities (#10673)
* #10672: fixes displacy output for manual unsorted entities

* #10672: removed unused import

* fix prettier formatting

Co-authored-by: Harm Buisman <h.buisman@iknl.nl>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-04-27 09:51:58 +02:00
Sofie Van Landeghem
b3717ba53a
removing print statements from the test suite (#10712) 2022-04-27 09:14:25 +02:00
Adriane Boyd
455f089c9b
Support exclude in Doc.from_docs (#10689)
* Support exclude in Doc.from_docs

* Update API docs

* Add new tag to docs
2022-04-25 18:19:03 +02:00
Mike
3b208197c3
Fixed example for spacy_syllables (#10705)
There was a typo in the example for the spacy_syllables project.
2022-04-25 16:40:54 +02:00
github-actions[bot]
e07500369c
Auto-format code with black (#10687)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-04-22 11:24:53 +02:00
Sofie Van Landeghem
2c2dbb844c Syntax for a branch from a PR 2022-04-22 09:45:49 +02:00
Ryn Daniels
29afbdb91e
add readme for explosion-bot (#10677) 2022-04-20 09:52:34 +02:00
Richard Hudson
4b227f4861
Merge pull request #10669 from mgrojo/develop
Fix some issues in Spanish stop-word list and examples
2022-04-19 09:37:34 +02:00
mgr
3d50b1a989 Fix some issues in Spanish examples
- Spelling: nationalities in lowercase, accent.
- Incorrect verb composition
- Untranslated word
2022-04-18 22:12:57 +02:00
mgr
2a2654c756 Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:

https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100

Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
  actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
  pais, principalmente, raras

Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve

Some reformatting to 79 columns.

When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 22:04:02 +02:00
Madeesh Kannan
aa6780eb27
Matcher: Remove superfluous GIL-acquiring check in get_is_final (#10659)
* `Matcher`: Remove superfluous GIL-acquiring check in `get_is_final`

This check incurred a significant performance penalty due to  implict interactions between the GIL and Cython ref-counting code.

* `Matcher`: Inline `PatternStateC` accessors
2022-04-18 12:59:34 +02:00
Duy Ngo
229ecaf0ea
Add numbers and definitions (#10665) 2022-04-18 12:58:32 +02:00
Schero1994
d622883a42
Adding and updating content in the spacy universe (#10493)
* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-04-15 15:36:54 +02:00
Joachim Fainberg
4e1716223c
displaCy: Avoid increasing levels for identical arcs (#10639)
* Test for arc levels for identical arcs

Also moves the test in order with the other numbered tests.

* displaCy: filter identical arcs

Avoid increased levels due to identical arcs by first
filtering any identical arcs.

* Sort keys before filtering

Manual entry with keys out of order would previously become
different tuples and therefore not filtered correctly.

Co-authored-by: Joachim Fainberg <joachimfainberg@Joachims-MBP.lan>
2022-04-14 16:48:00 +02:00