spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-10 18:51:21 +03:00

Author	SHA1	Message	Date
Adriane Boyd	b278f31ee6	Document scorers in registry and components from #8766 (#8929 ) * Document scorers in registry and components from #8766 * Update spacy/pipeline/lemmatizer.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update website/docs/api/dependencyparser.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Reformat Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-08-12 12:50:03 +02:00
Edward	944ad6b1d4	Add new parameter for saving every n epoch in pretraining (#8912 ) * Add parameter for saving every n epoch * Add new parameter in schemas * Add new parameter in default_config * Adjust schemas * format code	2021-08-12 11:14:48 +02:00
Adriane Boyd	f99d6d5e39	Refactor scoring methods to use registered functions (#8766 ) * Add scorer option to components Add an optional `scorer` parameter to all pipeline components. If a scoring function is provided, it overrides the default scoring method for that component. * Add registered scorers for all components * Add `scorers` registry * Move all scoring methods outside of components as independent functions and register * Use the registered scoring methods as defaults in configs and inits Additional: * The scoring methods no longer have access to the full component, so use settings from `cfg` as default scorer options to handle settings such as `labels`, `threshold`, and `positive_label` * The `attribute_ruler` scoring method no longer has access to the patterns, so all scoring methods are called * Bug fix: `spancat` scoring method is updated to set `allow_overlap` to score overlapping spans correctly * Update Russian lemmatizer to use direct score method * Check type of cfg in Pipe.score * Fix check * Update spacy/pipeline/sentencizer.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Remove validate_examples from scoring functions * Use Pipe.labels instead of Pipe.cfg["labels"] Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-08-10 15:13:39 +02:00
fgaim	ee011ca963	Update Tigrinya ትግርኛ language support (#8900 ) * Add missing punctuation for Tigrinya and Amharic * Fix numeral and ordinal numbers for Tigrinya - Amharic was used in many cases - Also fixed some typos * Update Tigrinya stop-words * Contributor agreement for fgaim * Fix typo in "ti" lang test * Remove multi-word entries from numbers and ordinals	2021-08-10 13:55:08 +02:00
Dimitar Ganev	733ffe439d	Improve the stop words and the tokenizer exceptions in Bulgarian language. (#8862 ) * Add more stop words and Improve the readability * Add and categorize the tokenizer exceptions for `bg` lang * Create syrull.md * Add references for the additional stop words and tokenizer exc abbrs	2021-08-10 13:44:23 +02:00
Adriane Boyd	415dee587c	Merge pull request #8911 from adrianeboyd/chore/update-develop-from-master-v3.1-1 Update develop from master	2021-08-09 15:41:36 +02:00
Adriane Boyd	a79888ed67	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.1-1	2021-08-09 13:13:13 +02:00
Paul O'Leary McCann	cac298471f	Fix #8902 (bad link in docs) typo fix	2021-08-08 22:04:00 +09:00
Eduard Zorita	439f30faad	Add stub files for main cython classes (#8427 ) * Add stub files for main API classes * Add contributor agreement for ezorita * Update types for ndarray and hash() * Fix __getitem__ and __iter__ * Add attributes of Doc and Token classes * Overload type hints for Span.__getitem__ * Fix type hint overload for Span.__getitem__ Co-authored-by: Luca Dorigo <dorigoluca@gmail.com>	2021-08-07 12:30:03 +02:00
github-actions[bot]	56d4d87aeb	Auto-format code with black (#8895 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2021-08-06 13:38:06 +02:00
Kabir Khan	1dfffe5fb4	No output info message in train (#8885 ) * Add info message that no output directory was provided in train * Update train.py * Fix logging	2021-08-05 09:21:22 +02:00
Adriane Boyd	fa2e7a4bbf	Fix spancat tests on GPU (#8872 ) * Fix spancat tests on GPU * Fix more spancat tests	2021-08-04 14:29:43 +02:00
Paul O'Leary McCann	77d698dcae	Fix check for RIGHT_ATTRS in dep matcher (#8807 ) * Fix check for RIGHT_ATTRs in dep matcher If a non-anchor node does not have RIGHT_ATTRS, the dep matcher throws an E100, which says that non-anchor nodes must have LEFT_ID, REL_OP, and RIGHT_ID. It specifically does not say RIGHT_ATTRS is required. A blank RIGHT_ATTRS is also valid, and patterns with one will be excepted. While not normal, sometimes a REL_OP is enough to specify a non-anchor node - maybe you just want the head of another node unconditionally, for example. This change just sets RIGHT_ATTRS to {} if not present. Alternatively changing E100 to state RIGHT_ATTRS is required could also be reasonable. * Fix test This test was written on the assumption that if `RIGHT_ATTRS` isn't present an error will be raised. Since the proposed changes make it so an error won't be raised this is no longer necessary. * Revert test, update error message Error message now lists missing keys, and RIGHT_ATTRS is required. * Use list of required keys in error message Also removes unused key param arg.	2021-08-04 09:20:41 +02:00
Adriane Boyd	941a591f3c	Pass excludes when serializing vocab (#8824 ) * Pass excludes when serializing vocab Additional minor bug fix: * Deserialize vocab in `EntityLinker.from_disk` * Add test for excluding strings on load * Fix formatting	2021-08-03 14:42:44 +02:00
Adriane Boyd	175847f92c	Support list values and INTERSECTS in Matcher (#8784 ) * Support list values and IS_INTERSECT in Matcher * Support list values as token attributes for set operators, not just as pattern values. * Add `IS_INTERSECT` operator. * Fix incorrect `ISSUBSET` and `ISSUPERSET` in schema and docs. * Rename IS_INTERSECT to INTERSECTS	2021-08-02 19:39:26 +02:00
Adriane Boyd	fbbbda1954	Fix start/end chars for empty and out-of-bounds spans (#8816 )	2021-08-02 19:07:19 +02:00
Adriane Boyd	9ad3b8cf8d	Only add sourced vectors hashes to meta if necessary (#8830 )	2021-08-02 18:22:35 +02:00
Nick Sorros	0485cdefcc	Add logger debug for project push and pull (#8860 ) * Add logger debug for project push and pull * Sign contributor agreement	2021-08-02 18:13:53 +02:00
themrmax	de076194c4	Make ConsoleLogger flush after each logging line (#8810 ) This is necessary to avoid "logging blackouts" when running training on Kubernetes pods	2021-08-02 14:33:38 +02:00
Ines Montani	30f20496d5	Merge pull request #8840 from polm/docs/evaluate-speed [ci skip]	2021-07-30 09:10:15 +10:00
Ines Montani	65d163fab5	Adjust formatting [ci skip]	2021-07-30 09:10:04 +10:00
Ines Montani	3a701d3645	Merge pull request #8841 from adrianeboyd/docs/ent-id-sep [ci skip] Fix formatting of ent_id_sep in EntityRuler API docs	2021-07-30 09:09:25 +10:00
Ines Montani	f08be084fb	Merge pull request #8844 from thomashacker/bugfix/fix-doc-transformer-typo [ci skip] Fix typo in Tok2VecTransformer example config	2021-07-30 09:08:59 +10:00
thomashacker	02258916c8	Fix example config typo for transformer architecture	2021-07-29 11:19:40 +02:00
Adriane Boyd	15b12f3e35	Fix formatting of ent_id_sep in EntityRuler API docs	2021-07-29 10:10:12 +02:00
Paul O'Leary McCann	a60cb13910	Update speed entry in metrics table	2021-07-29 16:35:19 +09:00
Paul O'Leary McCann	e125313a50	Revert "Add note about SPEED in output" This reverts commit `c92d268176`.	2021-07-29 16:34:08 +09:00
Ines Montani	0a1e299d30	Merge pull request #8814 from polm/docs/migrate-lexeme-tables [ci skip]	2021-07-29 17:18:02 +10:00
Paul O'Leary McCann	c92d268176	Add note about SPEED in output In #8823 it was pointed out that the `SPEED` value wasn't documented anywhere.	2021-07-29 15:03:07 +09:00
Paul O'Leary McCann	8867e60fbb	Update website/docs/usage/v3.md Co-authored-by: Ines Montani <ines@ines.io>	2021-07-29 14:56:56 +09:00
Adriane Boyd	8547514aa4	Remove labels from textcat component config example (#8815 )	2021-07-27 13:14:38 +02:00
Paul O'Leary McCann	76ac95923a	Add note to migration guide about lexeme tables (fix #7290 ) This just adds the resolution from #6388 to the docs.	2021-07-27 19:19:25 +09:00
Paul O'Leary McCann	67ecdcc3ac	Update subset/superset docs (#8795 ) * Update subset/superset docs * Update website/docs/usage/rule-based-matching.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-07-27 12:08:46 +02:00
Adriane Boyd	81d3a1edb1	Use tokenizer URL_MATCH pattern in LIKE_URL (#8765 )	2021-07-27 12:07:01 +02:00
Adriane Boyd	4f28190afe	Merge pull request #8813 from adrianeboyd/chore/develop-v3.2 Update develop for v3.2	2021-07-27 11:26:18 +02:00
Ines Montani	7f21c7dfa2	Merge pull request #8794 from explosion/autoblack Auto-format code with black	2021-07-27 12:17:15 +10:00
Ines Montani	34c401f04f	Merge pull request #8801 from polm/fix/respect-no-skip (fixes #8796 ) Respect the no_skip value	2021-07-27 12:16:47 +10:00
Ines Montani	134cb06af3	Merge pull request #8808 from kevinlu1248/master [ci skip] Changed a CLI command in data-formats.md due to erroneous information	2021-07-27 12:15:16 +10:00
Ines Montani	9bf0d6f2fd	Merge pull request #8806 from Ledenel/master [ci skip] fix typo	2021-07-27 12:14:22 +10:00
Kevin Lu	4a8e9e4e4e	Update data-formats.md	2021-07-25 22:58:53 -07:00
Ledenel	413f745c68	fix broken example in spaCy universe Chatterbot	2021-07-25 15:53:32 +00:00
Paul O'Leary McCann	284b530c63	Respect the no_skip value Seems like the logic for this was just left out. See #8796.	2021-07-24 15:31:17 +09:00
explosion-bot	a58ab6ea22	Auto-format code with black	2021-07-23 08:04:09 +00:00
Adriane Boyd	6bbc2b1956	Reload train corpus in debug data after initialize (#8776 )	2021-07-21 22:38:40 +02:00
Adriane Boyd	d48c01a6f7	Remove extraneous grc test file (#8768 )	2021-07-20 15:51:15 +02:00
Sofie Van Landeghem	ffaead8fe0	bump to 3.1.1	2021-07-19 14:48:27 +02:00
Sofie Van Landeghem	83e27d262e	negative tag annotation (#8731 ) * unit test to unlearn tag via negative annotation * bump thinc to 8.0.8	2021-07-19 14:39:11 +02:00
Adriane Boyd	0e4b96c97e	Update lexeme ranks for loaded vectors (#8640 ) Update the ranks for any lexemes that have been added to the vocab before the vectors are added to the model.	2021-07-19 18:25:54 +10:00
Adriane Boyd	e532c69475	Update Language.replace_pipe for disabled components (#8729 ) * Fix the index where the replacement in inserted to account for disabled components * Allow `Language.replace_pipe` to replace disabled components	2021-07-19 18:06:12 +10:00
Paul O'Leary McCann	d717593eb7	Merge pull request #8754 from KennethEnevoldsen/patch-1 [minor] removed outdated spacy version for spacymoji	2021-07-18 19:17:33 +09:00

1 2 3 4 5 ...

14761 Commits