spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-09-22 03:49:17 +03:00

Author	SHA1	Message	Date
Adriane Boyd	348d1829c7	Preserve user data for DependencyMatcher on spans (#7528 ) * Preserve user data for DependencyMatcher on spans * Clean underscore in test * Modify test to use extensions stored in user data	2021-03-30 12:26:22 +02:00
m0canu1	921feee092	Added more exception to the italian language from https://forum.wordr … (#7246 ) * Added more exception to the italian language from https://forum.wordreference.com/threads/le-abbreviazioni-nella-lingua-italiana-abbreviations-in-italian.2464189/ * Remove unnecessary exception Co-authored-by: Alexandru Mocanu <alexandru.mocanu@augeos.it> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-03-30 10:23:32 +02:00
Adriane Boyd	27a48f2802	Fix/update extension copying in Span.as_doc and Doc.from_docs (#7574 ) * Adjust custom extension data when copying user data in `Span.as_doc()` * Restrict `Doc.from_docs()` to adjusting offsets for custom extension data * Update test to use extension * (Duplicate bug fix for character offset from #7497)	2021-03-30 09:49:12 +02:00
Adriane Boyd	3ae8661085	Fix tensor retokenization for non-numpy ops (#7527 ) Implement manual `append` and `delete` for non-numpy ops.	2021-03-29 22:34:48 +11:00
Adriane Boyd	139f655f34	Merge doc.spans in Doc.from_docs() (#7497 ) Merge data from `doc.spans` in `Doc.from_docs()`. * Fix internal character offset set when merging empty docs (only affects tokens and spans in `user_data` if an empty doc is in the list of docs)	2021-03-29 22:34:01 +11:00
Adriane Boyd	d59f968d08	Keep sent starts without parse in retokenization (#7424 ) In the retokenizer, only reset sent starts (with `set_children_from_head`) if the doc is parsed. If there is no parse, merged tokens have the unset `token.is_sent_start == None` by default after retokenization.	2021-03-29 22:32:00 +11:00
Paul O'Leary McCann	cdab341a75	Remove mention of -1 for early stopping (fix #7535 ) Maybe this used to work differently, but currently a negative patience just causes immediate termination.	2021-03-23 11:50:35 +09:00
Ines Montani	4bd3d01aaf	Merge pull request #7471 from polm/fix/listener-warnings	2021-03-22 12:45:02 +01:00
Ines Montani	d545ab4ca4	Merge pull request #7495 from adrianeboyd/bugfix/norm-ux Update lexeme_norm checks	2021-03-22 12:44:52 +01:00
Ines Montani	66ebd5c69e	Merge pull request #7491 from adrianeboyd/bugfix/corpus-depr-props Update deprecated doc.is_sentenced in Corpus	2021-03-21 02:17:24 +01:00
Adriane Boyd	39153ef90f	Update lexeme_norm checks * Add util method for check * Add new languages to list with lexeme norm tables * Add check to all relevant components * Add config details to warning message Note that we're not actually inspecting the model config to see if `NORM` is used as an attribute, so it may warn in cases where it's not relevant.	2021-03-19 10:59:27 +01:00
Adriane Boyd	c771ec22f0	Update matcher errors and docs * Mention `tagger+attribute_ruler` in `POS`/`MORPH` error messages for `Matcher` and `PhraseMatcher` * Document `Matcher.__call__(allow_missing=)`	2021-03-19 10:11:18 +01:00
Adriane Boyd	48b90c8e1c	Update deprecated doc.is_sentenced in Corpus	2021-03-19 09:43:52 +01:00
Ines Montani	4f9aaa2366	Merge pull request #7451 from adrianeboyd/chore/add-py.typed Add py.typed	2021-03-19 02:08:16 +01:00
Ines Montani	66b900a76d	Merge pull request #7440 from adrianeboyd/bugfix/ru-pymorph2-lookup-lemmatize Rename and update Russian pymorphy2 lookup lemmatize	2021-03-19 01:54:08 +01:00
Ines Montani	2c6fa8c890	Merge pull request #7489 from adrianeboyd/bugfix/callbacks-entry-points Check for callbacks entry points	2021-03-19 01:53:53 +01:00
Adriane Boyd	0ad9e16ec3	Check for callbacks entry points	2021-03-18 21:18:25 +01:00
Lukas Winkler	3c362ac520	replace "is not" with !=	2021-03-18 21:09:11 +01:00
Paul O'Leary McCann	40bc01e668	Proactively remove unused listeners With this the changes in initialize.py might be unecessary. Requires testing.	2021-03-17 22:41:41 +09:00
Paul O'Leary McCann	ef77c88638	Don't warn about components not in the pipeline See here: https://github.com/explosion/spaCy/discussions/7463 Still need to check if there are any side effects of listeners being present but not in the pipeline, but this commit will silence the warnings.	2021-03-17 14:56:04 +09:00
Adriane Boyd	02b5c8a1a2	Add py.typed	2021-03-16 09:48:31 +01:00
Adriane Boyd	3bcf74aca7	Rename and update ru pymorphy2 lookup lemmatize * To allow default lookup lemmatization with a blank Russian model, rename pymorphy2 lookup mode to `pymorphy2_lookup` * Bug fix: update pymorphy2 lookup lemmatize to return list rather than string	2021-03-15 11:11:06 +01:00
Ines Montani	068b97a617	Merge pull request #7408 from adrianeboyd/bugfix/load-keyword-only	2021-03-13 04:25:50 +01:00
Adriane Boyd	03e9e7b567	Add --code option to init fill-config	2021-03-12 10:03:57 +01:00
Adriane Boyd	ce6317231f	Add --code to spacy debug CLI	2021-03-12 09:51:26 +01:00
Adriane Boyd	508cb3bef7	Also exclude user hooks in displacy conversion (#7419 )	2021-03-12 09:41:59 +01:00
Adriane Boyd	deffc3a532	Update package requirements tests (#7409 ) * Add hypothesis to packages skipped in version check * Add numpy back to tests following `2df1ab8a`	2021-03-11 16:24:31 +01:00
Adriane Boyd	124304b146	Add vocab kwarg back to spacy.load * Additional minor formatting and docs cleanup	2021-03-11 10:58:59 +01:00
Adriane Boyd	fbf3a755d7	Make spacy.load kwargs keyword-only	2021-03-11 09:36:58 +01:00
Adriane Boyd	53a3b967ac	Update thinc pin and set version to v3.0.5 (#7389 )	2021-03-10 11:10:53 +01:00
Adriane Boyd	3b911ee5ef	Set version to v3.0.4 (#7376 )	2021-03-09 16:49:41 +01:00
Adriane Boyd	d746ea6278	Add warning about GPU selection in Jupyter notebooks (#7075 ) * Initial warning * Update check * Redo edit * Move jupyter warning to helper method * Add link with details to warnings	2021-03-09 15:35:21 +01:00
Ines Montani	37fc495f5d	Merge pull request #7353 from jankrepl/fix_entity_rules_labels	2021-03-09 15:09:24 +01:00
Sofie Van Landeghem	932887b950	textcat scoring fix and multi_label docs (#6974 ) * add multi-label textcat to menu * add infobox on textcat API * add info to v3 migration guide * small edits * further fixes in doc strings * add infobox to textcat architectures * add textcat_multilabel to overview of built-in components * spelling * fix unrelated warn msg * Add textcat_multilabel to quickstart [ci skip] * remove separate documentation page for multilabel_textcategorizer * small edits * positive label clarification * avoid duplicating information in self.cfg and fix textcat.score * fix multilabel textcat too * revert threshold to storage in cfg * revert threshold stuff for multi-textcat Co-authored-by: Ines Montani <ines@ines.io>	2021-03-09 23:04:22 +11:00
Sofie Van Landeghem	39de3602e0	return custom error in nlp.initialize (#7104 ) * return custom error in nlp.initialize * Rename error Co-authored-by: Ines Montani <ines@ines.io>	2021-03-09 23:01:31 +11:00
Jan Krepl	f26b61e001	Make sure sorted	2021-03-09 10:49:53 +01:00
Adriane Boyd	3f3e8110dc	Fix lowercase augmentation (#7336 ) * Fix aborted/skipped augmentation for `spacy.orth_variants.v1` if lowercasing was enabled for an example * Simplify `spacy.orth_variants.v1` for `Example` vs. `GoldParse` * Preserve reference tokenization in `spacy.lower_case.v1`	2021-03-09 14:02:32 +11:00
Sofie Van Landeghem	cd70c3cb79	Fixing pretrain (#7342 ) * initialize NLP with train corpus * add more pretraining tests * more tests * function to fetch tok2vec layer for pretraining * clarify parameter name * test different objectives * formatting * fix check for static vectors when using vectors objective * clarify docs * logger statement * fix init_tok2vec and proc.initialize order * test training after pretraining * add init_config tests for pretraining * pop pretraining block to avoid config validation errors * custom errors	2021-03-09 14:01:13 +11:00
Adriane Boyd	97bcf2ae3a	Fix patience for identical scores (#7250 ) * Fix patience for identical scores Fix training patience so that the earliest best step is chosen for identical max scores. * Restore break, remove print * Explicitly define best_step for clarity	2021-03-06 18:42:14 +11:00
Ines Montani	ea555b03e0	Merge pull request #7255 from adrianeboyd/bugfix/extraneous-tok2vec Omit unused tok2vec/transformer components	2021-03-03 23:15:06 +11:00
svlandeg	d900c55061	consistently use registry as callable	2021-03-02 17:56:28 +01:00
Adriane Boyd	8a4200d4e9	Omit unused tok2vec/transformer components Omit unused tok2vec/transformer components in quickstart template.	2021-03-02 15:53:30 +01:00
Sofie Van Landeghem	212f0e779e	Support doc.spans in Example.from_dict (#7197 ) * add support for spans in Example.from_dict * add unit tests * update error to E879	2021-03-03 01:12:54 +11:00
Adriane Boyd	fb98862337	Add hint for --gpu-id to CLI device info (#7234 ) * Add hint for --gpu-id to CLI device info If the user has `cupy` and an available GPU, add a hint about using `--gpu-id 0` to the CLI output. * Undo change to original CPU message	2021-03-03 01:11:18 +11:00
Ines Montani	635ae55b74	Merge pull request #7237 from adrianeboyd/bugfix/is-cython-func-7224	2021-03-03 00:05:16 +11:00
Adriane Boyd	0efb7413f9	Use make_tempdir instead	2021-03-01 17:54:14 +01:00
Adriane Boyd	e9f7f9a4bc	Fix is_cython_func for additional imported code * Fix `is_cython_func` for imported code loaded under `python_code` module name * Add `make_named_tempfile` context manager to test utils to test loading of imported code * Add test for validation of `initialize` params in custom module	2021-03-01 16:37:39 +01:00
Sofie Van Landeghem	dd99872bb0	Fix spans weak ref in doc copy (#7225 ) * failing unit test * ensure that doc.spans refers to the copied doc, not the old * add type info	2021-02-28 12:32:48 +11:00
Ines Montani	408b94887a	Merge pull request #7207 from adrianeboyd/docs/get-noun-chunks [ci skip] Extend docs related to Vocab.get_noun_chunks	2021-02-27 11:51:08 +11:00
Ines Montani	dc46fa078f	Merge pull request #7220 from svlandeg/docs/has_annotation [ci skip] has_annotation docs fix	2021-02-27 11:50:34 +11:00

1 2 3 4 5 ...

8589 Commits