spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-19 10:32:40 +03:00

Author	SHA1	Message	Date
Ines Montani	595ef03e23	Merge pull request #8096 from juliensalinas/master [ci skip]	2021-05-17 13:58:37 +10:00
Paul O'Leary McCann	7c42a8c90a	Migrate coref code This includes the coref code that was being tested separately, modified to work in spaCy. It hasn't been tested yet and presumably still needs fixes. In particular, the evaluation code is currently omitted. It's unclear at the moment whether we want to use a complex scorer similar to the official one, or a simpler scorer using more modern evaluation methods.	2021-05-15 21:36:10 +09:00
Paul O'Leary McCann	3608b7b3f9	Merge branch 'master' into feature/coref	2021-05-15 20:05:17 +09:00
Julien Salinas	c496f78245	Add NLP Cloud to Universe.	2021-05-14 11:13:44 +02:00
Julien Salinas	a176d2209a	Sign contributors agreement.	2021-05-14 11:00:27 +02:00
Paul O'Leary McCann	2dc6db53fd	Merge pull request #8072 from medianeuroscience/master Added eMFDscore to universe.json	2021-05-14 11:58:30 +09:00
Frederic R. Hopp	c5962b9fba	Update universe.json fixed typo	2021-05-13 07:40:05 -07:00
Frederic R. Hopp	a9ca221e03	Update universe.json Added more detailed description to eMFDscore project	2021-05-12 09:20:17 -07:00
svlandeg	235e9f5488	call replace_listener_cfg attr if it's available	2021-05-12 17:19:38 +02:00
svlandeg	44a3a58599	call replace_listener attr if it's available	2021-05-12 16:01:02 +02:00
svlandeg	ece8be4fec	extend test to training with replaced tok2vec layer	2021-05-12 11:32:22 +02:00
Frederic R. Hopp	7bba9cdc14	Update universe.json	2021-05-11 19:18:19 -07:00
Adriane Boyd	d5bbd1f94f	Handle partial entities in Span.as_doc (#8055 ) * Handle partial entities in Span.as_doc In `Span.as_doc` replace partial entities at the beginning or end of the span with missing entity annotation. Fixes a bug where invalid entity annotation (no initial `B`) was returned for an initial partial entity. * Check for empty span in ents conversion Note: `Span.as_doc()` will still fail on an empty span due to failures in `Span.vector`.	2021-05-11 17:10:16 +02:00
Ines Montani	3883d49446	Fix default transformer in quickstart generator (resolves #8018 ) [ci skip]	2021-05-11 11:27:08 +10:00
Paul O'Leary McCann	bdeaf3a18b	Fix/fix en ordinals (#8028 ) * Fix #8019 "th" is not the only ordinal ending. * Add some more ordinal tests	2021-05-07 10:26:42 +02:00
Adriane Boyd	71c2a3ab47	Fix new version for match_alignments (#8021 )	2021-05-07 09:55:20 +02:00
Jeno Pizarro	5cf76ab608	Update negspacy example code for spaCy 3.0 (#8022 )	2021-05-07 09:33:21 +02:00
Adriane Boyd	6788d90f61	Preserve existing ENT_KB_ID annotation in NER (#7988 ) * Preserve existing ENT_KB_ID annotation in NER Preserve `ent_kb_id` annotation on existing entity spans, which is not preserved by the transition system. * Simplify kb_id assignment * Simplify further	2021-05-06 18:49:55 +10:00
Sofie Van Landeghem	02a6a5fea0	Fix 'debug model' for transformers + generalize (#7973 ) * add overrides to docs * fix debug model with transformer * assume training data is set in config	2021-05-06 18:43:32 +10:00
Adriane Boyd	cc5aeaed29	Add Chinese PTB tags to glossary (#7993 )	2021-05-06 18:43:03 +10:00
Adriane Boyd	0a22fed634	Fix span offsets for Matcher(as_spans) on spans (#7992 ) Fix returned span offsets for `Matcher(as_spans=True)(span)`.	2021-05-06 18:42:44 +10:00
Adriane Boyd	7d5db41ac3	Skip vector ngram backoff if minn is not set (#7925 )	2021-05-06 18:34:35 +10:00
Sofie Van Landeghem	e9037d8fc0	make EntityLinker robust for nO=None (#7930 )	2021-05-06 18:14:47 +10:00
Paul O'Leary McCann	66bfabd839	Fix pretraining objectives fragment (#8005 ) * Fix pretraining objectives fragment The fragment here is reused from a heading higher up, so you couldn't link to this section. * Fix section link to new fragment	2021-05-06 08:27:36 +02:00
Adriane Boyd	a71194362f	Fix Docs.from_docs for all empty docs (#8009 )	2021-05-05 18:44:14 +02:00
meghanabhange	debaab7021	Update details in universe denomme \| Multilingual Name Detection (#7982 ) * Add denomme * spaCy contributor agreement * Update install and thumb Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-05-05 17:12:13 +02:00
Adriane Boyd	31528f62ed	Add / to nb infixes (#7991 )	2021-05-04 11:00:10 +02:00
Santiago Castro	e99ff6f255	Fix typo in Language docstrings (#7958 )	2021-05-03 14:44:09 +02:00
Ines Montani	12d3d0fedd	Fix quickstart default checked of conditional fields [ci skip]	2021-05-03 11:48:12 +10:00
Adriane Boyd	2320791f6d	Fix Transformer.initialize example (#7963 )	2021-04-30 12:21:31 +02:00
Adriane Boyd	cf032ec31e	Update to catalogue>=2.0.4 (#7951 )	2021-04-29 19:11:28 +02:00
Adriane Boyd	7cf5bd072f	Refactor util.to_ternary_int (#7944 ) * Refactor to avoid literal comparison with `is` * Extend tests	2021-04-29 16:58:54 +02:00
Sevdimali	49aed683cc	Azerbaijani language added (#7911 )	2021-04-28 14:42:02 +02:00
Adriane Boyd	f4080983ea	Extend to cupy 9.0.0 (#7914 )	2021-04-28 10:18:24 +02:00
Paul O'Leary McCann	8007d5c814	Check if the resume path points to a directory (#7919 ) This came up in #7878, but if --resume-path is a directory then loading the weights will fail. On Linux this will give a straightforward error message, but on Windows it gives "Permission Denied", which is confusing.	2021-04-28 09:17:15 +02:00
Paul O'Leary McCann	de6b5ed14d	Fix percent unk display in debug data (#7886 ) * Fix percent unk display This was showing (ratio %), so 10% would show as 0.10%. Fix by multiplying ration by 100. Might want to add a warning if this is over a threshold. * Only show whole-integer percents	2021-04-27 09:16:35 +02:00
Janis Klaise	1690595e4d	Update load_lookups return type and docstring (#7907 ) * Update load_lookups return type and docstring * Add contributor agreement	2021-04-27 09:13:39 +02:00
Adriane Boyd	946a4284be	Set spacy-legacy to >=3.0.5 (#7897 ) Set `spacy-legacy` to `>=3.0.5` due to `spacy.StaticVectors.v1` init bug.	2021-04-26 18:25:39 +02:00
Adriane Boyd	874cd02539	Set spacy-legacy to >=3.0.5 (#7897 ) Set `spacy-legacy` to `>=3.0.5` due to `spacy.StaticVectors.v1` init bug.	2021-04-26 17:06:32 +02:00
Adriane Boyd	ae855a4625	Clean up Morphology imports and definitions (#7441 ) * Clean up Morphology imports and definitions * Whitespace formatting	2021-04-26 16:54:23 +02:00
Adriane Boyd	ceee1ecf17	Replace cpdef variables with cdef (#7834 )	2021-04-26 16:54:02 +02:00
Adriane Boyd	95c0833656	Add training option to set annotations on update (#7767 ) * Add training option to set annotations on update Add a `[training]` option called `set_annotations_on_update` to specify a list of components for which the predicted annotations should be set on `example.predicted` immediately after that component has been updated. The predicted annotations can be accessed by later components in the pipeline during the processing of the batch in the same `update` call. * Rename to annotates / annotating_components * Add test for `annotating_components` when training from config * Add documentation	2021-04-26 16:53:53 +02:00
Jacopo Farina	c105ed10fd	Remove torino from stop words (#7634 ) Torino is the proper name of a city and the token has no other meaning	2021-04-26 16:53:43 +02:00
Sofie Van Landeghem	e0b29f8ef7	Fix scoring normalization (#7629 ) * fix scoring normalization * score weights by total sum instead of per component * cleanup * more cleanup	2021-04-26 16:53:38 +02:00
Sofie Van Landeghem	95e3cf576b	Optionally append lang for packaged model name (#7417 ) * Add empty lines at the end of Python files * Only prepend the lang code if it's not there already * Update spacy/cli/package.py * fix whitespace stripping	2021-04-26 16:53:21 +02:00
Adriane Boyd	df3444421a	Update spacy-legacy to >=3.0.4 (#7865 )	2021-04-23 12:16:12 +02:00
Adriane Boyd	8a95475b3d	Set version to v3.0.6 (#7854 )	2021-04-22 16:33:26 +02:00
Adriane Boyd	36ecba224e	Set up GPU CI testing (#7293 ) * Set up CI for tests with GPU agent * Update tests for enabled GPU * Fix steps filename * Add parallel build jobs as a setting * Fix test requirements * Fix install test requirements condition * Fix pipeline models test * Reset current ops in prefer/require testing * Fix more tests * Remove separate test_models test * Fix regression 5551 * fix StaticVectors for GPU use * fix vocab tests * Fix regression test 5082 * Move azure steps to .github and reenable default pool jobs * Consolidate/rename azure steps Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-04-22 14:58:29 +02:00
Adriane Boyd	bdb485cc80	Add callback to copy vocab/tokenizer from model (#7750 ) * Add callback to copy vocab/tokenizer from model Add callback `spacy.copy_from_base_model.v1` to copy the tokenizer settings and/or vocab (including vectors) from a base model. * Move spacy.copy_from_base_model.v1 to spacy.training.callbacks * Add documentation * Modify to specify model as tokenizer and vocab params	2021-04-22 12:36:50 +02:00
Adriane Boyd	f68fc29130	Update sent_starts in Example.from_dict (#7847 ) * Update sent_starts in Example.from_dict Update `sent_starts` for `Example.from_dict` so that `Optional[bool]` values have the same meaning as for `Token.is_sent_start`. Use `Optional[bool]` as the type for sent start values in the docs. * Use helper function for conversion to ternary ints	2021-04-22 11:32:45 +02:00

... 4 5 6 7 8 ...

14759 Commits