spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-04-16 07:01:58 +03:00

Author	SHA1	Message	Date
Adriane Boyd	00bdb31150	Fix vector for 0-length span (#9244 )	2021-09-20 20:22:49 +02:00
github-actions[bot]	015d439eb6	Auto-format code with black (#9234 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2021-09-20 08:49:19 +02:00
Paul O'Leary McCann	c4f0800fb8	Validate pos values when creating Doc (#9148 ) * Validate pos values when creating Doc * Add clear error when setting invalid pos This also changes the error language slightly. * Fix variable name * Update spacy/tokens/doc.pyx * Test that setting invalid pos raises an error Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-09-16 13:28:05 +02:00
Jozef Harag	865cfbc903	feat: add `spacy.WandbLogger.v3` with optional `run_name` and `entity` parameters (#9202 ) * feat: add `spacy.WandbLogger.v3` with optional `run_name` and `entity` parameters * update versioning in docs Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-09-16 12:26:41 +02:00
Sofie Van Landeghem	00836c2d7d	Update spacy/displacy/templates.py	2021-09-16 09:23:21 +02:00
Sofie Van Landeghem	4bf2606adf	Update spacy/displacy/render.py Co-authored-by: Renat Shigapov <57352291+shigapov@users.noreply.github.com>	2021-09-16 09:22:38 +02:00
Ines Montani	20f63e7154	Only include runtime-relevant config in package CLI dependency detection (#9211 )	2021-09-15 23:16:01 +02:00
Adriane Boyd	d74870d38c	Prepare for v3.1.3 (#9200 ) * Update thinc and spacy-legacy requirements * Set version to v3.1.3	2021-09-14 11:03:51 +02:00
Renat Shigapov	d5cc009faf	Merge branch 'explosion:master' into master	2021-09-13 08:43:48 +02:00
Renat Shigapov	f4b5c4209d	specify kb_id and kb_url for URL visualisation	2021-09-13 08:15:07 +02:00
Renat Shigapov	7562fb5354	add links to entities into the TPL_ENT-template	2021-09-13 08:06:54 +02:00
j-frei	462b009648	Correct parser.py use_upper param info (#9180 )	2021-09-10 16:19:58 +02:00
Adriane Boyd	aba6ce3a43	Handle spacy-legacy in package CLI for dependencies (#9163 ) * Handle spacy-legacy in package CLI for dependencies * Implement legacy backoff in spacy registry.find * Remove unused import * Update and format test	2021-09-08 11:46:40 +02:00
github-actions[bot]	584fae5807	Auto-format code with black (#9130 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2021-09-03 10:47:03 +02:00
Kevin Humphreys	ca93504660	Pass alignments to Matcher callbacks (#9001 ) * pass alignments to callbacks * refactor for single callback loop * Update spacy/matcher/matcher.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-09-02 12:58:05 +02:00
Sofie Van Landeghem	8895e3c9ad	matcher doc corrections (#9115 ) * update error message to current UX * clarify uppercase effect * fix docstring	2021-09-02 09:26:33 +02:00
Robyn Speer	d60b748e3c	Fix surprises when asking for the root of a git repo (#9074 ) * Fix surprises when asking for the root of a git repo In the case of the first asset I wanted to get from git, the data I wanted was the entire repository. I tried leaving "path" blank, which gave a less-than-helpful error, and then I tried `path: "/"`, which started copying my entire filesystem into the project. The path I should have used was "". I've made two changes to make this smoother for others: - The 'path' within a git clone defaults to "" - If the path points outside of the tmpdir that the git clone goes into, we fail with an error Signed-off-by: Elia Robyn Speer <elia@explosion.ai> * use a descriptive error instead of a default plus some minor fixes from PR review Signed-off-by: Elia Robyn Speer <elia@explosion.ai> * check for None values in assets Signed-off-by: Elia Robyn Speer <elia@explosion.ai> Co-authored-by: Elia Robyn Speer <elia@explosion.ai>	2021-09-01 22:52:08 +02:00
Paul O'Leary McCann	f803a84571	Fix inference of epoch_resume (#9084 ) * Fix inference of epoch_resume When an epoch_resume value is not specified individually, it can often be inferred from the filename. The value inference code was there but the value wasn't passed back to the training loop. This also adds a specific error in the case where no epoch_resume value is provided and it can't be inferred from the filename. * Add new error * Always use the epoch resume value if specified Before this the value in the filename was used if found	2021-09-01 14:17:42 +09:00
Adriane Boyd	1e9b4b55ee	Pass overrides to subcommands in workflows (#9059 ) * Pass overrides to subcommands in workflows * Add missing docstring	2021-08-30 09:23:54 +02:00
Sofie Van Landeghem	1e974de837	config is not Optional (#9024 )	2021-08-27 11:44:31 +02:00
github-actions[bot]	fb9c31fbda	Auto-format code with black (#9065 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2021-08-27 11:42:27 +02:00
Sofie Van Landeghem	4d39430b82	Document use-case of freezing tok2vec (#8992 ) * update error msg * add sentence to docs * expand note on frozen components	2021-08-26 09:50:35 +02:00
Sofie Van Landeghem	94fb840443	fix docs for Span constructor arguments (#9023 )	2021-08-25 16:06:22 +02:00
David Strouk	31e9b126a0	Fix verbs list in lang/fr/tokenizer_exceptions.py (#9033 )	2021-08-25 15:55:09 +02:00
Ines Montani	4cd052e81d	Include component factories in third-party dependencies resolver (#9009 ) * Include component factories in third-party dependencies resolver * Increment catalogue and update test	2021-08-25 14:58:01 +02:00
Sofie Van Landeghem	e1f88de729	bump to 3.1.2 (#9008 )	2021-08-20 12:41:09 +02:00
Sofie Van Landeghem	4d52d7051c	Fix spancat training on nested entities (#9007 ) * overfitting test on non-overlapping entities * add failing overfitting test for overlapping entities * failing test for list comprehension * remove test that was put in separate PR * bugfix * cleanup	2021-08-20 12:37:50 +02:00
Paul O'Leary McCann	9cc3dc2b67	Add glossary entry for _SP (#8983 )	2021-08-20 12:04:02 +02:00
Sofie Van Landeghem	de025beb5f	Warn and document spangroup.doc weakref (#8980 ) * test for error after Doc has been garbage collected * warn about using a SpanGroup when the Doc has been garbage collected * add warning to the docs * rephrase slightly * raise error instead of warning * update * move warning to doc property	2021-08-20 11:06:19 +02:00
Adriane Boyd	6722dc3dc5	Fix allow_overlap default for spancat scoring (#8970 ) * Remove irrelevant default options	2021-08-18 09:56:56 +02:00
Steele Farnsworth	b18cb1cd2a	Refactor dependencymatcher.pyx to use list comps and enumerate. (#8956 ) * Refactor to use list comps and enumerate. Replace loops that append to a list with a list comprehensions where this does not change the behavior; replace range(len(...)) loops with enumerate. Correct one typo in a comment. Replace a call to set() with a set literal. * Undo double assignment. Expand `tokens_to_key[j] = k = self._get_matcher_key(key, i, j)` to two statements. Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Sign contributors agreement Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-08-18 09:55:45 +02:00
Ines Montani	d94ddd5686	Auto-detect package dependencies in spacy package (#8948 ) * Auto-detect package dependencies in spacy package * Add simple get_third_party_dependencies test * Import packages_distributions explicitly * Inline packages_distributions * Fix docstring [ci skip] * Relax catalogue requirement * Move importlib_metadata to spacy.compat with note * Include license information [ci skip]	2021-08-17 14:05:13 +02:00
Sofie Van Landeghem	0a6b68848f	Fix making span_group (#8975 ) * fix _make_span_group * fix imports	2021-08-17 10:36:34 +02:00
github-actions[bot]	92071326d8	Auto-format code with black (#8950 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2021-08-13 11:48:38 +02:00
Adriane Boyd	8448c7dbc5	Update da trf recommendation (#8921 ) Update the da trf recommendation to the same model used in the pretrained pipelines.	2021-08-12 13:54:02 +02:00
Paul O'Leary McCann	e227d24d43	Allow passing in array vars for speedup (#8882 ) * Allow passing in array vars for speedup This fixes #8845. Not sure about the docstring changes here... * Update docs Types maybe need more detail? Maybe not? * Run prettier on docs * Update spacy/tokens/span.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-08-10 15:13:53 +02:00
Paul O'Leary McCann	6029cfc391	Add scores to output in spancat (#8855 ) * Add scores to output in spancat This exposes the scores as an attribute on the SpanGroup. Includes a basic test. * Add basic doc note * Vectorize score calcs * Add "annotation format" section * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Clean up doc section * Ran prettier on docs * Get arrays off the gpu before iterating over them * Remove int() calls Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-08-10 13:47:49 +02:00
Eduard Zorita	439f30faad	Add stub files for main cython classes (#8427 ) * Add stub files for main API classes * Add contributor agreement for ezorita * Update types for ndarray and hash() * Fix __getitem__ and __iter__ * Add attributes of Doc and Token classes * Overload type hints for Span.__getitem__ * Fix type hint overload for Span.__getitem__ Co-authored-by: Luca Dorigo <dorigoluca@gmail.com>	2021-08-07 12:30:03 +02:00
github-actions[bot]	56d4d87aeb	Auto-format code with black (#8895 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2021-08-06 13:38:06 +02:00
Kabir Khan	1dfffe5fb4	No output info message in train (#8885 ) * Add info message that no output directory was provided in train * Update train.py * Fix logging	2021-08-05 09:21:22 +02:00
Adriane Boyd	fa2e7a4bbf	Fix spancat tests on GPU (#8872 ) * Fix spancat tests on GPU * Fix more spancat tests	2021-08-04 14:29:43 +02:00
Paul O'Leary McCann	77d698dcae	Fix check for RIGHT_ATTRS in dep matcher (#8807 ) * Fix check for RIGHT_ATTRs in dep matcher If a non-anchor node does not have RIGHT_ATTRS, the dep matcher throws an E100, which says that non-anchor nodes must have LEFT_ID, REL_OP, and RIGHT_ID. It specifically does not say RIGHT_ATTRS is required. A blank RIGHT_ATTRS is also valid, and patterns with one will be excepted. While not normal, sometimes a REL_OP is enough to specify a non-anchor node - maybe you just want the head of another node unconditionally, for example. This change just sets RIGHT_ATTRS to {} if not present. Alternatively changing E100 to state RIGHT_ATTRS is required could also be reasonable. * Fix test This test was written on the assumption that if `RIGHT_ATTRS` isn't present an error will be raised. Since the proposed changes make it so an error won't be raised this is no longer necessary. * Revert test, update error message Error message now lists missing keys, and RIGHT_ATTRS is required. * Use list of required keys in error message Also removes unused key param arg.	2021-08-04 09:20:41 +02:00
Adriane Boyd	941a591f3c	Pass excludes when serializing vocab (#8824 ) * Pass excludes when serializing vocab Additional minor bug fix: * Deserialize vocab in `EntityLinker.from_disk` * Add test for excluding strings on load * Fix formatting	2021-08-03 14:42:44 +02:00
Adriane Boyd	175847f92c	Support list values and INTERSECTS in Matcher (#8784 ) * Support list values and IS_INTERSECT in Matcher * Support list values as token attributes for set operators, not just as pattern values. * Add `IS_INTERSECT` operator. * Fix incorrect `ISSUBSET` and `ISSUPERSET` in schema and docs. * Rename IS_INTERSECT to INTERSECTS	2021-08-02 19:39:26 +02:00
Adriane Boyd	fbbbda1954	Fix start/end chars for empty and out-of-bounds spans (#8816 )	2021-08-02 19:07:19 +02:00
Adriane Boyd	9ad3b8cf8d	Only add sourced vectors hashes to meta if necessary (#8830 )	2021-08-02 18:22:35 +02:00
Nick Sorros	0485cdefcc	Add logger debug for project push and pull (#8860 ) * Add logger debug for project push and pull * Sign contributor agreement	2021-08-02 18:13:53 +02:00
themrmax	de076194c4	Make ConsoleLogger flush after each logging line (#8810 ) This is necessary to avoid "logging blackouts" when running training on Kubernetes pods	2021-08-02 14:33:38 +02:00
Ines Montani	7f21c7dfa2	Merge pull request #8794 from explosion/autoblack Auto-format code with black	2021-07-27 12:17:15 +10:00
Paul O'Leary McCann	284b530c63	Respect the no_skip value Seems like the logic for this was just left out. See #8796.	2021-07-24 15:31:17 +09:00
explosion-bot	a58ab6ea22	Auto-format code with black	2021-07-23 08:04:09 +00:00
Adriane Boyd	6bbc2b1956	Reload train corpus in debug data after initialize (#8776 )	2021-07-21 22:38:40 +02:00
Adriane Boyd	d48c01a6f7	Remove extraneous grc test file (#8768 )	2021-07-20 15:51:15 +02:00
Sofie Van Landeghem	ffaead8fe0	bump to 3.1.1	2021-07-19 14:48:27 +02:00
Sofie Van Landeghem	83e27d262e	negative tag annotation (#8731 ) * unit test to unlearn tag via negative annotation * bump thinc to 8.0.8	2021-07-19 14:39:11 +02:00
Adriane Boyd	0e4b96c97e	Update lexeme ranks for loaded vectors (#8640 ) Update the ranks for any lexemes that have been added to the vocab before the vectors are added to the model.	2021-07-19 18:25:54 +10:00
Adriane Boyd	e532c69475	Update Language.replace_pipe for disabled components (#8729 ) * Fix the index where the replacement in inserted to account for disabled components * Allow `Language.replace_pipe` to replace disabled components	2021-07-19 18:06:12 +10:00
Ines Montani	f90482d077	Tidy up and auto-format	2021-07-18 15:44:56 +10:00
Ines Montani	c0f436efbc	Merge pull request #8735 from explosion/autoblack	2021-07-17 13:46:17 +10:00
Ines Montani	483f3175cb	Tidy up [ci skip]	2021-07-17 13:43:15 +10:00
Ines Montani	15e6578f7d	Adjust formatting	2021-07-17 10:49:13 +10:00
explosion-bot	eff3d1088b	Auto-format code with black	2021-07-16 08:03:36 +00:00
Adriane Boyd	ac45c7c045	Add pre-commit to ignored requirements (#8728 )	2021-07-15 16:41:15 +02:00
jmyerston	993b0fab0e	Added ancient Greek language support (#8606 ) * Add ancient Greek language support Initial commit * Contributor Agreement * grc tokenizer test added and files formatted with black, unnecessary import removed Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Commas in lists fixed. __init__py added to test * Update lex_attrs.py * Update stop_words.py * Update stop_words.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-07-15 10:27:17 +02:00
Sofie Van Landeghem	77859beb99	spacy.ngram_range_suggester.v1 (#8699 )	2021-07-15 10:01:22 +02:00
Julien Rossi	e117573822	Adding noun_chunks to the DUTCH language model (nl) (#8529 ) * ✨ implement noun_chunks for dutch language * copy/paste FR and SV syntax iterators to accomodate UD tags * added tests with dutch text * signed contributor agreement * 🐛 fix noun chunks generator * built from scratch * define noun chunk as a single Noun-Phrase * includes some corner cases debugging (incorrect POS tagging) * test with provided annotated sample (POS, DEP) * ✅ fix failing test * CI pipeline did not like the added sample file * add the sample as a pytest fixture * Update spacy/lang/nl/syntax_iterators.py * Update spacy/lang/nl/syntax_iterators.py Code readability Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/tests/lang/nl/test_noun_chunks.py correct comment Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * finalize code * change "if next_word" into "if next_word is not None" Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-07-14 14:01:02 +02:00
Adriane Boyd	f9fd2889b7	Use 0-vector for OOV lexemes (#8639 )	2021-07-13 14:48:12 +10:00
Edward	8233359225	Fix preservation of spacy package meta (#8663 ) * update package meta with existing_meta and nlp_meta * Add spaCy contributor agreement * Added more info when creating readme	2021-07-12 11:18:52 +02:00
Adriane Boyd	d8805a1073	Fix ru/uk lemmatizer mp with spawn (#8657 ) Use an instance variable instead a class variable for the morphological analzyer so that multiprocessing with spawn is possible.	2021-07-09 15:36:56 +02:00
Adriane Boyd	b8e720fdb9	Fix Azerbaijani init, extend lang init tests (#8656 ) * Extend langs in initialize tests * Fix az init	2021-07-09 15:36:35 +02:00
explosion-bot	334f1f98d8	Auto-format code with black	2021-07-09 08:06:06 +00:00
Sofie Van Landeghem	64fac754fe	add spacy prefix to ngram_suggester.v1 (#8623 )	2021-07-07 08:09:30 +02:00
Sofie Van Landeghem	733e8ceea9	fix spancat initialize with labels (#8620 )	2021-07-06 19:08:25 +02:00
Sofie Van Landeghem	608fc1d623	avoid msg var impliciteness (#8619 ) * avoid msg var impliciteness * rename local msg * Add CI tests for debug data and train * Adjust debug data CLI test Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-07-06 19:08:08 +02:00
Sofie Van Landeghem	e7d747e3ee	TransitionBasedParser.v1 to legacy (#8586 ) * TransitionBasedParser.v1 to legacy * register sublayers * bump spacy-legacy to 3.0.7	2021-07-06 15:26:45 +02:00
Luca Dorigo	e8ef4a46d5	Add the right return type for Language.pipe and an overload for the as_tuples case (#8441 ) * Add the right return type for Language.pipe and an overload for the as_tuples version * Reformat, tidy up Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-07-06 14:18:40 +02:00
Sofie Van Landeghem	b9f59118bf	Fix silent evaluation (#8581 ) * fix silentness * sneak in docs typo fix * pass silent boolean instead	2021-07-06 14:16:19 +02:00
Sofie Van Landeghem	3daf57d70c	Small spancat fixes (#8614 ) * two small fixes + additional tests * rename	2021-07-06 14:15:41 +02:00
Ines Montani	327f83573a	Move scores per type handling into util function (#8590 )	2021-07-06 13:02:37 +02:00
Adriane Boyd	5fd0b5207e	Fix vectors check for sourced components (#8559 ) * Fix vectors check for sourced components Since vectors are not loaded when components are sourced, store a hash for the vectors of each sourced component and compare it to the loaded vectors after the vectors are loaded from the `[initialize]` block. * Pop temporary info * Remove stored hash in remove_pipe * Add default for pop * Add additional convert/debug/assemble CLI tests	2021-07-06 12:43:17 +02:00
Adriane Boyd	29906884c5	Raise an error for textcat with <2 labels (#8584 ) * Raise an error for textcat with <2 labels Raise an error if initializing a `textcat` component without at least two labels. * Add similar note to docs * Update positive_label description in API docs	2021-07-06 12:35:22 +02:00
explosion-bot	ee37288a1f	Auto-format code with black	2021-07-02 07:48:26 +00:00
Ines Montani	af9d984407	Merge pull request #8405 from svlandeg/fix/whitespace_tokenizer [ci skip]	2021-06-30 20:52:59 +10:00
Adriane Boyd	2b8c679a3d	Fix duplicate spacy package CLI opts (#8551 ) Use `-c` for `--code` and not additionally for `--create-meta`, in line with the docs.	2021-06-30 11:23:26 +02:00
Ines Montani	7f65902702	Merge pull request #8522 from adrianeboyd/chore/update-flake8 Update flake8 version in reqs and CI	2021-06-28 21:46:06 +10:00
Adriane Boyd	86d01e9229	Tidy up with flake8: imports, comparisons, etc.	2021-06-28 12:08:15 +02:00
Adriane Boyd	5eeb25f043	Tidy up code	2021-06-28 12:08:15 +02:00
Adriane Boyd	4b0ed73ed4	Update flake8 version in reqs and CI * Update some unneeded forward refs related to flake8 checks	2021-06-28 11:29:36 +02:00
Paul O'Leary McCann	f144888793	Merge pull request #8504 from bryant1410/patch-1 Fix typo in comment	2021-06-27 13:51:19 +09:00
Santiago Castro	ee63b2b199	Fix typo in `train_cli` docstring	2021-06-25 22:45:03 -07:00
Santiago Castro	a2bc743e47	Fix typo in comment	2021-06-25 18:58:38 -07:00
Adrian Zuber	f5aee0bbdf	Raise custom error in EntityLinker when KB is not set (#8442 ) * Raise custom error in EntityLinker when KB is not set * add contributor agreement * Update E1018 error message	2021-06-25 23:04:00 +02:00
Matthew Honnibal	f9946154d9	Add SpanCategorizer component (#6747 ) * Draft spancat model * Add spancat model * Add test for extract_spans * Add extract_spans layer * Upd extract_spans * Add spancat model * Add test for spancat model * Upd spancat model * Update spancat component * Upd spancat * Update spancat model * Add quick spancat test * Import SpanCategorizer * Fix SpanCategorizer component * Import SpanGroup * Fix span extraction * Fix import * Fix import * Upd model * Update spancat models * Add scoring, update defaults * Update and add docs * Fix type * Update spacy/ml/extract_spans.py * Auto-format and fix import * Fix comment * Fix type * Fix type * Update website/docs/api/spancategorizer.md * Fix comment Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Better defense Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix labels list Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/ml/extract_spans.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/pipeline/spancat.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Set annotations during update * Set annotations in spancat * fix imports in test * Update spacy/pipeline/spancat.py * replace MaxoutLogistic with LinearLogistic * fix config * various small fixes * remove set_annotations parameter in update * use our beloved tupley format with recent support for doc.spans * bugfix to allow renaming the default span_key (scores weren't showing up) * use different key in docs example * change defaults to better-working parameters from project (WIP) * register spacy.extract_spans.v1 for legacy purposes * Upd dev version so can build wheel * layers instead of architectures for smaller building blocks * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Include additional scores from overrides in combined score weights * Parameterize spans key in scoring Parameterize the `SpanCategorizer` `spans_key` for scoring purposes so that it's possible to evaluate multiple `spancat` components in the same pipeline. * Use the (intentionally very short) default spans key `sc` in the `SpanCategorizer` * Adjust the default score weights to include the default key * Adjust the scorer to use `spans_{spans_key}` as the prefix for the returned score * Revert addition of `attr_name` argument to `score_spans` and adjust the key in the `getter` instead. Note that for `spancat` components with a custom `span_key`, the score weights currently need to be modified manually in `[training.score_weights]` for them to be available during training. To suppress the default score weights `spans_sc_p/r/f` during training, set them to `null` in `[training.score_weights]`. * Update website/docs/api/scorer.md * Fix scorer for spans key containing underscore * Increment version * Add Spans to Evaluate CLI (#8439) * Add Spans to Evaluate CLI * Change to spans_key * Add spans per_type output Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Fix spancat GPU issues (#8455) * Fix GPU issues * Require thinc >=8.0.6 * Switch to glorot_uniform_init * Fix and test ngram suggester * Include final ngram in doc for all sizes * Fix ngrams for docs of the same length as ngram size * Handle batches of docs that result in no ngrams * Add tests Co-authored-by: Ines Montani <ines@ines.io> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Nirant <NirantK@users.noreply.github.com>	2021-06-24 12:35:27 +02:00
Ines Montani	fb9b389f52	Merge pull request #8486 from adrianeboyd/bugfix/template-paths-vectors Preserve paths.vectors/initialize.vectors setting in quickstart template	2021-06-24 13:12:18 +10:00
Ines Montani	a8e8d02ba7	Merge pull request #8465 from explosion/feature/spacy-package-readme	2021-06-24 13:11:08 +10:00
Ines Montani	3982be14e8	Improve fallbacks	2021-06-24 11:55:50 +10:00
Adriane Boyd	393c3c70d7	Various fixes for spans in Docs.from_docs (#8487 ) * Fix spans offsets if a doc ends in a single space and no space is inserted * Also include spans key in merged doc for empty spans lists	2021-06-23 15:51:35 +02:00
Adriane Boyd	5aa099505f	Preserve paths.vectors/initialize.vectors setting in quickstart template	2021-06-23 11:07:14 +02:00
Ines Montani	cdcbd1023a	Auto-generate README in spacy packge	2021-06-22 12:06:25 +10:00
Adriane Boyd	caba63b74f	Set version to v3.1.0 (#8452 ) * Update test for v3.1 * Set version to v3.1.0	2021-06-21 10:41:40 +02:00

1 2 3 4 5 ...

8840 Commits