spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-09-22 20:09:18 +03:00

Author	SHA1	Message	Date
Adriane Boyd	6108dabdc8	Rephrase error related to sample data initialization Now that the initialize step is fully implemented, the source of E923 is typically missing or improperly converted/formatted data rather than a bug in spaCy, so rephrase the error and message and remove the prompt to open an issue.	2021-02-08 09:21:36 +01:00
Ines Montani	d0c3775712	Replace links to nightly docs [ci skip]	2021-01-30 20:09:38 +11:00
Ines Montani	526b416118	Tidy up comments	2021-01-30 12:34:09 +11:00
Ines Montani	30765674d0	Merge branch 'master' into develop	2021-01-30 12:20:28 +11:00
Ines Montani	7694f76dd1	Update warning and mention replace_listeners	2021-01-29 23:46:01 +11:00
Ines Montani	94232aea08	Improve E889	2021-01-29 23:39:23 +11:00
Ines Montani	bbb94b37c6	Update error handling and docstring	2021-01-29 16:27:49 +11:00
Adriane Boyd	fcce3600ed	Forbid OP matching 2+ tokens in DependencyMatcher (#6824 ) Instead of silently using only the first token in each matched span: * Forbid `OP: ?//+` through `DependencyMatcher` validation As a fail-safe, add warning if a token match that's not exactly one token long is found by a token pattern.	2021-01-29 08:52:01 +08:00
Sofie Van Landeghem	24a697abb8	avoid empty aliases and improve UX and docs (#6840 )	2021-01-29 08:51:40 +08:00
Adriane Boyd	4096a79de7	Add alignment mode error and fix Doc.char_span docs (#6820 ) * Raise an error on an unrecognized alignment mode rather than defaulting to `strict` * Fix the `Doc.char_span` API doc alignment mode details	2021-01-27 23:40:42 +11:00
Ines Montani	c0926c9088	WIP: Various small training changes (#6818 ) * Allow output_path to be None during training * Fix cat scoring (?) * Improve error message for weighted None score * Improve messages So we can call this in other places etc. * FIx output path check * Use latest wasabi * Revert "Improve error message for weighted None score" This reverts commit `7059926763`. * Exclude None scores from final score by default It's otherwise very difficult to keep track of the score weights if we modify a config programmatically, source components etc. * Update warnings and use logger.warning	2021-01-26 14:51:52 +11:00
Ines Montani	1090d3d675	Merge branch 'develop' into feature/spacy-legacy	2021-01-18 11:43:39 +11:00
Sofie Van Landeghem	fed8f48965	raise NotImplementedError when noun_chunks iterator is not implemented (#6711 ) * raise NotImplementedError when noun_chunks iterator is not implemented * bring back, fix and document span.noun_chunks * formatting Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2021-01-17 19:56:05 +08:00
Ines Montani	a552db2819	Include available registry names in error	2021-01-16 14:35:03 +11:00
Ines Montani	a203e3dbb8	Support spacy-legacy via the registry	2021-01-15 21:42:40 +11:00
Adriane Boyd	681a6195f7	Validate seed and gpu_allocator manually	2021-01-14 16:57:57 +01:00
Sofie Van Landeghem	afc5714d32	multi-label textcat component (#6474 ) * multi-label textcat component * formatting * fix comment * cleanup * fix from #6481 * random edit to push the tests * add explicit error when textcat is called with multi-label gold data * fix error nr * small fix	2021-01-06 13:07:14 +11:00
Adriane Boyd	5ca57d8221	Add logger warning when serializing user hooks (#6595 ) Add a warning that user hooks are lost on serialization. Add a `user_hooks` exclude to skip the warning with pickle.	2020-12-29 11:54:32 +01:00
Ines Montani	dfaef27f90	Merge pull request #6503 from adrianeboyd/feature/lemmatizer-rule-warning-pos Warn on empty POS for the rule-based lemmatizer	2020-12-09 11:34:16 +11:00
Sofie Van Landeghem	de108ed3e8	Add specific error when StaticVectors can't read the vectors data (#6450 )	2020-12-09 06:16:07 +08:00
Sofie Van Landeghem	f98a04434a	pretrain architectures (#6451 ) * define new architectures for the pretraining objective * add loss function as attr of the omdel * cleanup * cleanup * shorten name * fix typo * remove unused error	2020-12-08 14:41:03 +08:00
Ines Montani	ee2ec52f48	Merge pull request #6409 from svlandeg/feature/trf-docs	2020-12-08 06:32:10 +01:00
Adriane Boyd	d70950605c	Warn on empty POS for the rule-based lemmatizer Add a warning to the rule-based lemmatizer for any tokens without POS annotation.	2020-12-04 11:46:15 +01:00
Adriane Boyd	26296ab223	Add error message if DocBin zlib decompress fails (#6394 ) Add a better error message if DocBin zlib decompress fails, indicating that the data is not in `DocBin` format.	2020-11-27 14:39:49 +08:00
svlandeg	789fb3d124	add docs for upstream argument of TransformerListener	2020-11-09 21:42:58 +01:00
Adriane Boyd	1c4df8fd09	Replace pytokenizations with internal alignment (#6293 ) * Replace pytokenizations with internal alignment Replace pytokenizations with internal alignment algorithm that is restricted to only allow differences in whitespace and capitalization. * Rename `spacy.training.align` to `spacy.training.alignment` to contain the `Alignment` dataclass * Implement `get_alignments` in `spacy.training.align` * Refactor trailing whitespace handling * Remove unnecessary exception for empty docs Allow a non-empty whitespace-only doc to be aligned with an empty doc * Remove empty docs exceptions completely	2020-11-03 16:24:38 +01:00
Sofie Van Landeghem	75a202ce65	TextCat updates and fixes (#6263 ) * small fix in example imports * throw error when train_corpus or dev_corpus is not a string * small fix in custom logger example * limit macro_auc to labels with 2 annotations * fix typo * also create parents of output_dir if need be * update documentation of textcat scores * refactor TextCatEnsemble * fix tests for new AUC definition * bump to 3.0.0a42 * update docs * rename to spacy.TextCatEnsemble.v2 * spacy.TextCatEnsemble.v1 in legacy * cleanup * small fix * update to 3.0.0rc2 * fix import that got lost in merge * cursed IDE * fix two typos	2020-10-18 14:50:41 +02:00
Ines Montani	bfa3931c9d	Revert added_strings change (#6236 )	2020-10-10 18:55:07 +02:00
Ines Montani	8ac5f22253	Adjust error message	2020-10-09 18:00:16 +02:00
svlandeg	06b9d213fd	formatting	2020-10-09 12:19:47 +02:00
svlandeg	2cafba5f50	shorten error message for clarity	2020-10-09 12:17:35 +02:00
svlandeg	18dfb27985	Add custom error when evaluation throws a KeyError	2020-10-09 12:05:33 +02:00
Sofie Van Landeghem	d093d6343b	TrainablePipe (#6213 ) * rename Pipe to TrainablePipe * split functionality between Pipe and TrainablePipe * remove unnecessary methods from certain components * cleanup * hasattr(component, "pipe") should be sufficient again * remove serialization and vocab/cfg from Pipe * unify _ensure_examples and validate_examples * small fixes * hasattr checks for self.cfg and self.vocab * make is_resizable and is_trainable properties * serialize strings.json instead of vocab * fix KB IO + tests * fix typos * more typos * _added_strings as a set * few more tests specifically for _added_strings field * bump to 3.0.0a36	2020-10-08 21:33:49 +02:00
Ines Montani	be99f1e4de	Remove output dirs before training (#6204 ) * Remove output dirs before training * Re-raise error if cleaning fails	2020-10-05 20:11:16 +02:00
svlandeg	fd2d48556c	fix E902 and E903 numbering	2020-10-05 13:43:32 +02:00
Ines Montani	d38dc466c5	Adjust error [ci skip]	2020-10-04 15:26:01 +02:00
Ines Montani	bcd52e5486	Tidy up errors and warnings	2020-10-04 11:16:31 +02:00
Ines Montani	d3b3663942	Adjust error message and add test	2020-10-04 10:11:27 +02:00
Ines Montani	cc08c88a89	Merge pull request #6187 from svlandeg/fix/begin_training_pipe	2020-10-04 10:01:02 +02:00
svlandeg	3f657ed3a1	implement warning in __init_subclass__ instead	2020-10-03 22:34:10 +02:00
Ines Montani	dd542ec6a4	Fix label initialization of textcat component (#6190 )	2020-10-03 17:07:38 +02:00
svlandeg	fb48de349c	bwd compat for pipe.begin_training	2020-10-02 20:31:14 +02:00
Sofie Van Landeghem	09dcb75076	small UX fix for DocBin (#6167 ) * add informative warning when messing up store_user_data DocBin flags * add informative warning when messing up store_user_data DocBin flags * cleanup test * rename to patterns_path	2020-10-02 15:43:32 +02:00
Ines Montani	f0b30aedad	Make lemmatizers use initialize logic (#6182 ) * Make lemmatizer use initialize logic and tidy up * Fix typo * Raise for uninitialized tables	2020-10-02 15:42:36 +02:00
Ines Montani	01c1538c72	Integrate file readers	2020-10-02 01:36:06 +02:00
Adriane Boyd	86c3ec9c2b	Refactor Token morph setting (#6175 ) * Refactor Token morph setting * Remove `Token.morph_` * Add `Token.set_morph()` * `0` resets `token.c.morph` to unset * Any other values are passed to `Morphology.add` * Add token.morph setter to set from MorphAnalysis	2020-10-01 22:21:46 +02:00
Ines Montani	381258b75b	Merge pull request #6165 from explosion/feature/update-tokenizers-initialize	2020-10-01 09:49:47 +02:00
Ines Montani	6f29f68f69	Update errors and make Tokenizer.initialize args less strict	2020-09-30 23:48:47 +02:00
Ines Montani	a103ab5f1a	Update augmenter lookups and docs	2020-09-30 23:03:47 +02:00
Adriane Boyd	6b7bb32834	Refactor Chinese initialization	2020-09-30 11:46:45 +02:00

1 2 3 4 5 ...

298 Commits