Dimitar Ganev
733ffe439d
Improve the stop words and the tokenizer exceptions in Bulgarian language. ( #8862 )
...
* Add more stop words and Improve the readability
* Add and categorize the tokenizer exceptions for `bg` lang
* Create syrull.md
* Add references for the additional stop words and tokenizer exc abbrs
2021-08-10 13:44:23 +02:00
Adriane Boyd
415dee587c
Merge pull request #8911 from adrianeboyd/chore/update-develop-from-master-v3.1-1
...
Update develop from master
2021-08-09 15:41:36 +02:00
Ines Montani
a1e9f19460
Merge pull request #8910 from DuyguA/patch-1 [ci skip]
...
updated unv json for new book
2021-08-09 23:12:50 +10:00
Adriane Boyd
a79888ed67
Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.1-1
2021-08-09 13:13:13 +02:00
Duygu Altinok
380b2817cf
updated unv json for new book
2021-08-09 12:39:22 +02:00
Paul O'Leary McCann
cac298471f
Fix #8902 (bad link in docs)
...
typo fix
2021-08-08 22:04:00 +09:00
Eduard Zorita
439f30faad
Add stub files for main cython classes ( #8427 )
...
* Add stub files for main API classes
* Add contributor agreement for ezorita
* Update types for ndarray and hash()
* Fix __getitem__ and __iter__
* Add attributes of Doc and Token classes
* Overload type hints for Span.__getitem__
* Fix type hint overload for Span.__getitem__
Co-authored-by: Luca Dorigo <dorigoluca@gmail.com>
2021-08-07 12:30:03 +02:00
github-actions[bot]
56d4d87aeb
Auto-format code with black ( #8895 )
...
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-08-06 13:38:06 +02:00
Kabir Khan
1dfffe5fb4
No output info message in train ( #8885 )
...
* Add info message that no output directory was provided in train
* Update train.py
* Fix logging
2021-08-05 09:21:22 +02:00
Adriane Boyd
fa2e7a4bbf
Fix spancat tests on GPU ( #8872 )
...
* Fix spancat tests on GPU
* Fix more spancat tests
2021-08-04 14:29:43 +02:00
Paul O'Leary McCann
77d698dcae
Fix check for RIGHT_ATTRS in dep matcher ( #8807 )
...
* Fix check for RIGHT_ATTRs in dep matcher
If a non-anchor node does not have RIGHT_ATTRS, the dep matcher throws
an E100, which says that non-anchor nodes must have LEFT_ID, REL_OP, and
RIGHT_ID. It specifically does not say RIGHT_ATTRS is required.
A blank RIGHT_ATTRS is also valid, and patterns with one will be
excepted. While not normal, sometimes a REL_OP is enough to specify a
non-anchor node - maybe you just want the head of another node
unconditionally, for example.
This change just sets RIGHT_ATTRS to {} if not present. Alternatively
changing E100 to state RIGHT_ATTRS is required could also be reasonable.
* Fix test
This test was written on the assumption that if `RIGHT_ATTRS` isn't
present an error will be raised. Since the proposed changes make it so
an error won't be raised this is no longer necessary.
* Revert test, update error message
Error message now lists missing keys, and RIGHT_ATTRS is required.
* Use list of required keys in error message
Also removes unused key param arg.
2021-08-04 09:20:41 +02:00
Adriane Boyd
941a591f3c
Pass excludes when serializing vocab ( #8824 )
...
* Pass excludes when serializing vocab
Additional minor bug fix:
* Deserialize vocab in `EntityLinker.from_disk`
* Add test for excluding strings on load
* Fix formatting
2021-08-03 14:42:44 +02:00
Adriane Boyd
175847f92c
Support list values and INTERSECTS in Matcher ( #8784 )
...
* Support list values and IS_INTERSECT in Matcher
* Support list values as token attributes for set operators, not just as
pattern values.
* Add `IS_INTERSECT` operator.
* Fix incorrect `ISSUBSET` and `ISSUPERSET` in schema and docs.
* Rename IS_INTERSECT to INTERSECTS
2021-08-02 19:39:26 +02:00
Adriane Boyd
fbbbda1954
Fix start/end chars for empty and out-of-bounds spans ( #8816 )
2021-08-02 19:07:19 +02:00
Adriane Boyd
9ad3b8cf8d
Only add sourced vectors hashes to meta if necessary ( #8830 )
2021-08-02 18:22:35 +02:00
Nick Sorros
0485cdefcc
Add logger debug for project push and pull ( #8860 )
...
* Add logger debug for project push and pull
* Sign contributor agreement
2021-08-02 18:13:53 +02:00
themrmax
de076194c4
Make ConsoleLogger flush after each logging line ( #8810 )
...
This is necessary to avoid "logging blackouts" when running training on Kubernetes pods
2021-08-02 14:33:38 +02:00
Ines Montani
30f20496d5
Merge pull request #8840 from polm/docs/evaluate-speed [ci skip]
2021-07-30 09:10:15 +10:00
Ines Montani
65d163fab5
Adjust formatting [ci skip]
2021-07-30 09:10:04 +10:00
Ines Montani
3a701d3645
Merge pull request #8841 from adrianeboyd/docs/ent-id-sep [ci skip]
...
Fix formatting of ent_id_sep in EntityRuler API docs
2021-07-30 09:09:25 +10:00
Ines Montani
f08be084fb
Merge pull request #8844 from thomashacker/bugfix/fix-doc-transformer-typo [ci skip]
...
Fix typo in Tok2VecTransformer example config
2021-07-30 09:08:59 +10:00
thomashacker
02258916c8
Fix example config typo for transformer architecture
2021-07-29 11:19:40 +02:00
Adriane Boyd
15b12f3e35
Fix formatting of ent_id_sep in EntityRuler API docs
2021-07-29 10:10:12 +02:00
Paul O'Leary McCann
a60cb13910
Update speed entry in metrics table
2021-07-29 16:35:19 +09:00
Paul O'Leary McCann
e125313a50
Revert "Add note about SPEED in output"
...
This reverts commit c92d268176
.
2021-07-29 16:34:08 +09:00
Ines Montani
0a1e299d30
Merge pull request #8814 from polm/docs/migrate-lexeme-tables [ci skip]
2021-07-29 17:18:02 +10:00
Paul O'Leary McCann
c92d268176
Add note about SPEED in output
...
In #8823 it was pointed out that the `SPEED` value wasn't documented
anywhere.
2021-07-29 15:03:07 +09:00
Paul O'Leary McCann
8867e60fbb
Update website/docs/usage/v3.md
...
Co-authored-by: Ines Montani <ines@ines.io>
2021-07-29 14:56:56 +09:00
Adriane Boyd
8547514aa4
Remove labels from textcat component config example ( #8815 )
2021-07-27 13:14:38 +02:00
Paul O'Leary McCann
76ac95923a
Add note to migration guide about lexeme tables ( fix #7290 )
...
This just adds the resolution from #6388 to the docs.
2021-07-27 19:19:25 +09:00
Paul O'Leary McCann
67ecdcc3ac
Update subset/superset docs ( #8795 )
...
* Update subset/superset docs
* Update website/docs/usage/rule-based-matching.md
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-27 12:08:46 +02:00
Adriane Boyd
81d3a1edb1
Use tokenizer URL_MATCH pattern in LIKE_URL ( #8765 )
2021-07-27 12:07:01 +02:00
Adriane Boyd
4f28190afe
Merge pull request #8813 from adrianeboyd/chore/develop-v3.2
...
Update develop for v3.2
2021-07-27 11:26:18 +02:00
Ines Montani
7f21c7dfa2
Merge pull request #8794 from explosion/autoblack
...
Auto-format code with black
2021-07-27 12:17:15 +10:00
Ines Montani
34c401f04f
Merge pull request #8801 from polm/fix/respect-no-skip ( fixes #8796 )
...
Respect the no_skip value
2021-07-27 12:16:47 +10:00
Ines Montani
134cb06af3
Merge pull request #8808 from kevinlu1248/master [ci skip]
...
Changed a CLI command in data-formats.md due to erroneous information
2021-07-27 12:15:16 +10:00
Ines Montani
9bf0d6f2fd
Merge pull request #8806 from Ledenel/master [ci skip]
...
fix typo
2021-07-27 12:14:22 +10:00
Kevin Lu
4a8e9e4e4e
Update data-formats.md
2021-07-25 22:58:53 -07:00
Ledenel
413f745c68
fix broken example in spaCy universe Chatterbot
2021-07-25 15:53:32 +00:00
Paul O'Leary McCann
284b530c63
Respect the no_skip value
...
Seems like the logic for this was just left out. See #8796 .
2021-07-24 15:31:17 +09:00
explosion-bot
a58ab6ea22
Auto-format code with black
2021-07-23 08:04:09 +00:00
Adriane Boyd
6bbc2b1956
Reload train corpus in debug data after initialize ( #8776 )
2021-07-21 22:38:40 +02:00
Adriane Boyd
d48c01a6f7
Remove extraneous grc test file ( #8768 )
2021-07-20 15:51:15 +02:00
Sofie Van Landeghem
ffaead8fe0
bump to 3.1.1
2021-07-19 14:48:27 +02:00
Sofie Van Landeghem
83e27d262e
negative tag annotation ( #8731 )
...
* unit test to unlearn tag via negative annotation
* bump thinc to 8.0.8
2021-07-19 14:39:11 +02:00
Adriane Boyd
0e4b96c97e
Update lexeme ranks for loaded vectors ( #8640 )
...
Update the ranks for any lexemes that have been added to the vocab
before the vectors are added to the model.
2021-07-19 18:25:54 +10:00
Adriane Boyd
e532c69475
Update Language.replace_pipe for disabled components ( #8729 )
...
* Fix the index where the replacement in inserted to account for
disabled components
* Allow `Language.replace_pipe` to replace disabled components
2021-07-19 18:06:12 +10:00
Paul O'Leary McCann
d717593eb7
Merge pull request #8754 from KennethEnevoldsen/patch-1
...
[minor] removed outdated spacy version for spacymoji
2021-07-18 19:17:33 +09:00
Paul O'Leary McCann
ac67639eaf
Merge pull request #8755 from KennethEnevoldsen/patch-2
...
fixed GitHub link and thumbnail
2021-07-18 19:14:57 +09:00
Kenneth Enevoldsen
5d6aed0773
fixed GitHub link and thumbnail
...
Sorry, I seem to have misunderstood that the GitHub reference shouldn't be a link.
2021-07-18 10:22:00 +02:00