spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-11-11 13:25:43 +03:00

Author	SHA1	Message	Date
Adriane Boyd	6788d90f61	Preserve existing ENT_KB_ID annotation in NER (#7988 ) * Preserve existing ENT_KB_ID annotation in NER Preserve `ent_kb_id` annotation on existing entity spans, which is not preserved by the transition system. * Simplify kb_id assignment * Simplify further	2021-05-06 18:49:55 +10:00
Sofie Van Landeghem	02a6a5fea0	Fix 'debug model' for transformers + generalize (#7973 ) * add overrides to docs * fix debug model with transformer * assume training data is set in config	2021-05-06 18:43:32 +10:00
Adriane Boyd	cc5aeaed29	Add Chinese PTB tags to glossary (#7993 )	2021-05-06 18:43:03 +10:00
Adriane Boyd	0a22fed634	Fix span offsets for Matcher(as_spans) on spans (#7992 ) Fix returned span offsets for `Matcher(as_spans=True)(span)`.	2021-05-06 18:42:44 +10:00
Adriane Boyd	7d5db41ac3	Skip vector ngram backoff if minn is not set (#7925 )	2021-05-06 18:34:35 +10:00
Sofie Van Landeghem	e9037d8fc0	make EntityLinker robust for nO=None (#7930 )	2021-05-06 18:14:47 +10:00
Paul O'Leary McCann	66bfabd839	Fix pretraining objectives fragment (#8005 ) * Fix pretraining objectives fragment The fragment here is reused from a heading higher up, so you couldn't link to this section. * Fix section link to new fragment	2021-05-06 08:27:36 +02:00
Adriane Boyd	a71194362f	Fix Docs.from_docs for all empty docs (#8009 )	2021-05-05 18:44:14 +02:00
meghanabhange	46311cf03f	Update details in universe denomme \| Multilingual Name Detection (#7982 ) * Add denomme * spaCy contributor agreement * Update install and thumb Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-05-05 17:14:14 +02:00
meghanabhange	debaab7021	Update details in universe denomme \| Multilingual Name Detection (#7982 ) * Add denomme * spaCy contributor agreement * Update install and thumb Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-05-05 17:12:13 +02:00
Adriane Boyd	31528f62ed	Add / to nb infixes (#7991 )	2021-05-04 11:00:10 +02:00
Santiago Castro	e99ff6f255	Fix typo in Language docstrings (#7958 )	2021-05-03 14:44:09 +02:00
Ines Montani	62a01956c3	Fix quickstart default checked of conditional fields [ci skip]	2021-05-03 14:04:45 +02:00
Ines Montani	12d3d0fedd	Fix quickstart default checked of conditional fields [ci skip]	2021-05-03 11:48:12 +10:00
Adriane Boyd	ffaa0d6b9b	Fix Transformer.initialize example (#7963 )	2021-04-30 12:21:59 +02:00
Adriane Boyd	2320791f6d	Fix Transformer.initialize example (#7963 )	2021-04-30 12:21:31 +02:00
Adriane Boyd	cf032ec31e	Update to catalogue>=2.0.4 (#7951 )	2021-04-29 19:11:28 +02:00
Adriane Boyd	7cf5bd072f	Refactor util.to_ternary_int (#7944 ) * Refactor to avoid literal comparison with `is` * Extend tests	2021-04-29 16:58:54 +02:00
Sevdimali	49aed683cc	Azerbaijani language added (#7911 )	2021-04-28 14:42:02 +02:00
Adriane Boyd	f4080983ea	Extend to cupy 9.0.0 (#7914 )	2021-04-28 10:18:24 +02:00
Paul O'Leary McCann	8007d5c814	Check if the resume path points to a directory (#7919 ) This came up in #7878, but if --resume-path is a directory then loading the weights will fail. On Linux this will give a straightforward error message, but on Windows it gives "Permission Denied", which is confusing.	2021-04-28 09:17:15 +02:00
Paul O'Leary McCann	de6b5ed14d	Fix percent unk display in debug data (#7886 ) * Fix percent unk display This was showing (ratio %), so 10% would show as 0.10%. Fix by multiplying ration by 100. Might want to add a warning if this is over a threshold. * Only show whole-integer percents	2021-04-27 09:16:35 +02:00
Janis Klaise	b33fb9ac1e	Update load_lookups return type and docstring (#7907 ) * Update load_lookups return type and docstring * Add contributor agreement	2021-04-27 09:14:59 +02:00
Janis Klaise	1690595e4d	Update load_lookups return type and docstring (#7907 ) * Update load_lookups return type and docstring * Add contributor agreement	2021-04-27 09:13:39 +02:00
Adriane Boyd	946a4284be	Set spacy-legacy to >=3.0.5 (#7897 ) Set `spacy-legacy` to `>=3.0.5` due to `spacy.StaticVectors.v1` init bug.	2021-04-26 18:25:39 +02:00
Adriane Boyd	874cd02539	Set spacy-legacy to >=3.0.5 (#7897 ) Set `spacy-legacy` to `>=3.0.5` due to `spacy.StaticVectors.v1` init bug.	2021-04-26 17:06:32 +02:00
Adriane Boyd	ae855a4625	Clean up Morphology imports and definitions (#7441 ) * Clean up Morphology imports and definitions * Whitespace formatting	2021-04-26 16:54:23 +02:00
Adriane Boyd	ceee1ecf17	Replace cpdef variables with cdef (#7834 )	2021-04-26 16:54:02 +02:00
Adriane Boyd	95c0833656	Add training option to set annotations on update (#7767 ) * Add training option to set annotations on update Add a `[training]` option called `set_annotations_on_update` to specify a list of components for which the predicted annotations should be set on `example.predicted` immediately after that component has been updated. The predicted annotations can be accessed by later components in the pipeline during the processing of the batch in the same `update` call. * Rename to annotates / annotating_components * Add test for `annotating_components` when training from config * Add documentation	2021-04-26 16:53:53 +02:00
Jacopo Farina	c105ed10fd	Remove torino from stop words (#7634 ) Torino is the proper name of a city and the token has no other meaning	2021-04-26 16:53:43 +02:00
Sofie Van Landeghem	e0b29f8ef7	Fix scoring normalization (#7629 ) * fix scoring normalization * score weights by total sum instead of per component * cleanup * more cleanup	2021-04-26 16:53:38 +02:00
Sofie Van Landeghem	95e3cf576b	Optionally append lang for packaged model name (#7417 ) * Add empty lines at the end of Python files * Only prepend the lang code if it's not there already * Update spacy/cli/package.py * fix whitespace stripping	2021-04-26 16:53:21 +02:00
Adriane Boyd	29ac7f776a	Merge branch 'master' into spacy.io	2021-04-24 12:58:47 +02:00
Adriane Boyd	df3444421a	Update spacy-legacy to >=3.0.4 (#7865 )	2021-04-23 12:16:12 +02:00
Adriane Boyd	8a95475b3d	Set version to v3.0.6 (#7854 )	2021-04-22 16:33:26 +02:00
Adriane Boyd	36ecba224e	Set up GPU CI testing (#7293 ) * Set up CI for tests with GPU agent * Update tests for enabled GPU * Fix steps filename * Add parallel build jobs as a setting * Fix test requirements * Fix install test requirements condition * Fix pipeline models test * Reset current ops in prefer/require testing * Fix more tests * Remove separate test_models test * Fix regression 5551 * fix StaticVectors for GPU use * fix vocab tests * Fix regression test 5082 * Move azure steps to .github and reenable default pool jobs * Consolidate/rename azure steps Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-04-22 14:58:29 +02:00
Adriane Boyd	bdb485cc80	Add callback to copy vocab/tokenizer from model (#7750 ) * Add callback to copy vocab/tokenizer from model Add callback `spacy.copy_from_base_model.v1` to copy the tokenizer settings and/or vocab (including vectors) from a base model. * Move spacy.copy_from_base_model.v1 to spacy.training.callbacks * Add documentation * Modify to specify model as tokenizer and vocab params	2021-04-22 12:36:50 +02:00
Adriane Boyd	f68fc29130	Update sent_starts in Example.from_dict (#7847 ) * Update sent_starts in Example.from_dict Update `sent_starts` for `Example.from_dict` so that `Optional[bool]` values have the same meaning as for `Token.is_sent_start`. Use `Optional[bool]` as the type for sent start values in the docs. * Use helper function for conversion to ternary ints	2021-04-22 11:32:45 +02:00
Adriane Boyd	f4339f9bff	Fix tokenizer cache flushing (#7836 ) * Fix tokenizer cache flushing Fix/simplify tokenizer init detection in order to fix cache flushing when properties are modified. * Remove init reloading logic * Remove logic disabling `_reload_special_cases` on init * Setting `rules` last in `__init__` (as before) means that setting other properties doesn't reload any special cases * Reset `rules` first in `from_bytes` so that setting other properties during deserialization doesn't reload any special cases unnecessarily * Reset all properties in `Tokenizer.from_bytes` to allow any settings to be `None` * Also reset special matcher when special cache is flushed * Remove duplicate special case validation * Add test for special cases flushing * Extend test for tokenizer deserialization of None values	2021-04-22 18:14:57 +10:00
Sofie Van Landeghem	047d912904	fix typo in entity_linker docs	2021-04-22 10:10:31 +02:00
Sofie Van Landeghem	cfad7e21d5	fix config parsing of ints/strings (#7755 ) * add few failing tests for parsing integers and strings * bump thinc to 8.0.3	2021-04-22 18:09:13 +10:00
Adriane Boyd	d2bdaa7823	Replace negative rows with 0 in StaticVectors (#7674 ) * Replace negative rows with 0 in StaticVectors Replace negative row indices with 0-vectors in `StaticVectors`. * Increase versions related to StaticVectors * Increase versions of all architctures and layers related to `StaticVectors` * Improve efficiency of 0-vector operations Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5 * Update config defaults to new versions * Update docs	2021-04-22 18:04:15 +10:00
Sofie Van Landeghem	6f565cf39d	fix typo in entity_linker docs	2021-04-22 09:59:24 +02:00
Sofie Van Landeghem	47bbc46392	update EL training data format in docs (#7839 ) * update EL training data format * fix typo * all -1 because reasons	2021-04-22 08:50:31 +02:00
Sofie Van Landeghem	2e746dbf32	update EL training data format in docs (#7839 ) * update EL training data format * fix typo * all -1 because reasons	2021-04-22 08:50:09 +02:00
meghanabhange	7985e6bb39	Project Idea : denomme \| Multilingual Name Detection (#7845 ) * Add denomme * spaCy contributor agreement Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-04-22 08:48:41 +02:00
meghanabhange	49ff1126bf	Project Idea : denomme \| Multilingual Name Detection (#7845 ) * Add denomme * spaCy contributor agreement Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-04-22 08:48:17 +02:00
Sam Edwardes	05c609cdeb	Added a logo to spaCyTextBlob (#7818 ) * Added a logo to spaCyTextBlob * Updated to better thumb	2021-04-22 08:42:14 +02:00
Sam Edwardes	b8c6c10c6f	Added a logo to spaCyTextBlob (#7818 ) * Added a logo to spaCyTextBlob * Updated to better thumb	2021-04-22 08:41:55 +02:00
Diego Palma	ac101cba00	Add TRUNAJOD to spaCy universe. (#7754 ) * Add TRUNAJOD to spaCy universe. * Add trunajod logo and thumb. Co-authored-by: Diego <dpalma@evernote.com>	2021-04-22 08:41:03 +02:00

... 7 8 9 10 11 ...

14988 Commits