spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-19 20:52:23 +03:00

Author	SHA1	Message	Date
svlandeg	910026582d	set versions to v1 instead of v0	2021-05-27 16:17:20 +02:00
svlandeg	2e3c0e2256	delete outdated tests	2021-05-27 13:54:31 +02:00
svlandeg	ba2e491cc4	Merge remote-tracking branch 'upstream/master' into feature/coref	2021-05-27 13:50:32 +02:00
Sofie Van Landeghem	3c58c0323f	fix docs (#8200 )	2021-05-27 10:48:59 +02:00
Sofie Van Landeghem	290bd6ed39	ensure tolerance is properly passed on (#8158 )	2021-05-27 18:10:28 +10:00
Paul O'Leary McCann	0c553ecd4e	Fix docs (fix #8189 )	2021-05-24 19:47:30 +09:00
Paul O'Leary McCann	a484245f35	Remove references to coref_er	2021-05-24 19:08:45 +09:00
Paul O'Leary McCann	d6389b133d	Don't use a generator for no reason	2021-05-24 19:06:15 +09:00
Paul O'Leary McCann	d6fd5fe1c0	Minor cleanup	2021-05-24 14:56:43 +09:00
Paul O'Leary McCann	0942a0b51b	Remove coref_er.py The intent of this was that it would be a component pipeline that used entities as input, but that's now covered by the get_mentions function as a pipeline arg.	2021-05-21 18:20:25 +09:00
Paul O'Leary McCann	f6652c9252	Add new coref scoring This is closer to the traditional evaluation method. That uses an average of three scores, this is just using the bcubed metric for now (nothing special about bcubed, just picked one). The scoring implementation comes from the coval project. It relies on scipy, which is one issue, and is rather involved, which is another. Besides being comparable with traditional evaluations, this scoring is relatively fast.	2021-05-21 15:56:40 +09:00
Paul O'Leary McCann	e1b4a85bb9	Fix loss The loss was being returned as a single element array, which caused training to die when it attempted to turn it into JSON.	2021-05-21 15:46:50 +09:00
Adriane Boyd	cd6bd91c3a	Switch default train corpus max_length to 0 in quickstart (#8142 ) The behavior of `spacy.Corpus.v1` is unexpected enough for `max_length != 0` that `0` is a better default for users creating a new config with the quickstart. If not, documents are skipped, sometimes the entire corpus is skipped, and sometimes documents are (quite unexpectedly for your average user) split into sentences.	2021-05-20 14:48:09 +02:00
Paul O'Leary McCann	ff3fed06cf	Catch a stray reference	2021-05-20 21:30:46 +09:00
Sofie Van Landeghem	202943bc8c	KB & NEL to/from bytes (#8113 ) * unit test for pickling KB * add pickling test for NEL * KB to_bytes and from_bytes * NEL to_bytes and from_bytes * xfail pickle tests for now * fix docs * cleanup	2021-05-20 18:11:30 +10:00
Paul O'Leary McCann	8c5df622d8	Help out python gc in coref backprop	2021-05-20 16:40:55 +09:00
Paul O'Leary McCann	fa92daf052	Break pairwise operations into pseudolayers This makes their scope tighter and more contained, and has the nice side effect that fewer things need to be passed around for backprop.	2021-05-20 15:59:51 +09:00
Adriane Boyd	4e69fcaa50	Disable GPU CI tests (#8143 )	2021-05-19 12:00:31 +02:00
Adriane Boyd	f6128c06b0	Disable GPU CI tests (#8143 )	2021-05-19 12:00:07 +02:00
Paul O'Leary McCann	d22acee4f7	Fix backprop Training seems to actually run now!	2021-05-18 20:09:27 +09:00
Paul O'Leary McCann	2486b8ad4d	Fix pipeline intialize	2021-05-18 19:56:27 +09:00
Paul O'Leary McCann	0620820857	Deal with generators in tuplify	2021-05-18 19:55:52 +09:00
Paul O'Leary McCann	a7d9c8156d	Make get_sentence_map work with init When sentences are not available, just treat the whole doc as one sentence. A reasonable general fallback, but important due to the init call, where upstream components aren't run.	2021-05-18 19:54:54 +09:00
Paul O'Leary McCann	883c137b26	Add basic tuplify init	2021-05-18 19:53:59 +09:00
Paul O'Leary McCann	051715506e	Fiddle with get_mentions definition Ended up not making a difference, but oh well.	2021-05-18 19:53:33 +09:00
Adriane Boyd	06324e5a5e	Update pydantic requirements (#8127 ) Update pydantic requirements following https://github.com/explosion/thinc/pull/499	2021-05-18 11:35:50 +02:00
Paul O'Leary McCann	a33d29441a	Merge remote-tracking branch 'upstream/develop' into feature/coref	2021-05-18 17:00:17 +09:00
Adriane Boyd	6baab565eb	Minor updates to quickstart settings/instructions (#7965 ) * Minor updates to quickstart settings/instructions * set default value of textcat exclusive to `false` until the default checkbox behavior is updated * add the `morphologizer` to the list of components * add a note that v3.0.6+ is required * Switch to warning above quickstart * Undo changes to textcat default in quickstart Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-05-17 16:55:22 +02:00
Adriane Boyd	2c545c4c5b	Fix offsets in Span.get_lca_matrix (#8116 ) * Fix range in Span.get_lca_matrix Fix the adjusted token index / lca matrix index ranges for `_get_lca_matrix` for spans. * The range for `k` should correspond to the adjusted indices in `lca_matrix` with the `start` indexed at `0` * Update test for v3.x	2021-05-17 16:54:23 +02:00
Sofie Van Landeghem	0dffc5d9e2	Custom warning if the doc_bin is too large (#8069 ) * custom warning if the doc_bin is too large * cleanup * Update spacy/errors.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * fix numbering * fixing numbering once more * fixing this seems to be pretty hard Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-05-17 15:48:40 +02:00
Adriane Boyd	b120fb3511	Handle errors while multiprocessing (#8004 ) * Handle errors while multiprocessing Handle errors while multiprocessing without hanging. * Return the traceback for errors raised while processing a batch, which can be handled by the top-level error handler * Allow for shortened batches due to custom error handlers that ignore errors and skip documents * Define custom components at a higher level * Also move up custom error handler * Use simpler component for test * Switch error type * Adjust test * Only call top-level error handler for exceptions * Register custom test components within tests Use global functions (so they can be pickled) but register the components only within the individual tests.	2021-05-17 13:28:39 +02:00
Adriane Boyd	8a2602051c	Update debug data for textcat (#8066 ) * Check for unsupported cats values * Only show labels if train/dev mismatched * Don't show label counts (only counting positive labels seems odd) * Use warnings for mismatched train/dev labels	2021-05-17 13:27:04 +02:00
Adriane Boyd	1d59fdbd39	Update Vietnamese tokenizer (#8099 ) * Adapt tokenization methods from `pyvi` to preserve text encoding and whitespace * Add serialization support similar to Chinese and Japanese Note: as for Chinese and Japanese, some settings are duplicated in `config.cfg` and `tokenizer/cfg`.	2021-05-17 18:16:20 +10:00
Adriane Boyd	fe3a4aa846	Add ENT_ID and NORM to DocBin strings (#8054 ) Save strings for token attributes `ENT_ID` and `NORM` in `DocBin` strings.	2021-05-17 18:06:11 +10:00
Adriane Boyd	82fa81d095	Make all Span attrs writable (#8062 ) Also allow `Span` string properties `label_` and `kb_id_` to be writable following #6696.	2021-05-17 18:05:45 +10:00
svlandeg	b403f924ee	Merge remote-tracking branch 'upstream/master' into bugfix/replace-trf	2021-05-17 09:47:47 +02:00
Paul O'Leary McCann	e303628205	Attempt to use registry correctly	2021-05-17 14:52:48 +09:00
Paul O'Leary McCann	91b111467b	Minor fixes	2021-05-17 14:52:30 +09:00
Ines Montani	595ef03e23	Merge pull request #8096 from juliensalinas/master [ci skip]	2021-05-17 13:58:37 +10:00
Paul O'Leary McCann	7c42a8c90a	Migrate coref code This includes the coref code that was being tested separately, modified to work in spaCy. It hasn't been tested yet and presumably still needs fixes. In particular, the evaluation code is currently omitted. It's unclear at the moment whether we want to use a complex scorer similar to the official one, or a simpler scorer using more modern evaluation methods.	2021-05-15 21:36:10 +09:00
Paul O'Leary McCann	3608b7b3f9	Merge branch 'master' into feature/coref	2021-05-15 20:05:17 +09:00
Julien Salinas	c496f78245	Add NLP Cloud to Universe.	2021-05-14 11:13:44 +02:00
Julien Salinas	a176d2209a	Sign contributors agreement.	2021-05-14 11:00:27 +02:00
Paul O'Leary McCann	2dc6db53fd	Merge pull request #8072 from medianeuroscience/master Added eMFDscore to universe.json	2021-05-14 11:58:30 +09:00
Frederic R. Hopp	c5962b9fba	Update universe.json fixed typo	2021-05-13 07:40:05 -07:00
Frederic R. Hopp	a9ca221e03	Update universe.json Added more detailed description to eMFDscore project	2021-05-12 09:20:17 -07:00
svlandeg	235e9f5488	call replace_listener_cfg attr if it's available	2021-05-12 17:19:38 +02:00
svlandeg	44a3a58599	call replace_listener attr if it's available	2021-05-12 16:01:02 +02:00
svlandeg	ece8be4fec	extend test to training with replaced tok2vec layer	2021-05-12 11:32:22 +02:00
Frederic R. Hopp	7bba9cdc14	Update universe.json	2021-05-11 19:18:19 -07:00

... 3 4 5 6 7 ...

14747 Commits