spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-20 19:12:36 +03:00

Author	SHA1	Message	Date
Paul O'Leary McCann	4a4ef72191	Clean up unused functions `make_clean_doc` is not needed and was removed. `logsumexp` may be needed if I misunderstood the loss calculation, so I left it in for now with a note.	2021-06-02 21:42:23 +09:00
Jean-Hugues Roy	ff5cf3606c	Improvements to French stopwords list (#7941 ) * "y" etc. Many changes described in pull request * Update spacy/lang/fr/stop_words.py * Update spacy/lang/fr/stop_words.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-06-02 11:50:49 +02:00
Vito De Tullio	3672464e25	applying suggestion to avoid mypy errors (#8265 ) * applying suggestion to avoid mypy errors * sign contributor agreement	2021-06-02 19:25:30 +10:00
Adriane Boyd	4aa1a7d5a3	Remove unsupported attrs from attrs.IDS (#8132 ) The attributes `PROB`, `CLUSTER` and `SENT_END` are not supported by `Lexeme.get_struct_attr` so should not be included through `attrs.IDS` as supported attributes in `Doc.to_array` and other methods.	2021-06-02 19:16:57 +10:00
Paul O'Leary McCann	d54631f68b	Fix other open calls without context managers (#8245 )	2021-05-31 19:04:29 +10:00
Paul O'Leary McCann	5aba213349	Fix skweak Github URL Github entry should not contain url, just user/repo	2021-05-31 18:00:43 +09:00
Kristian Boda	0035db4103	Add hmrb to spaCy Universe (#8129 ) * docs: add hmrb to spacy universe * docs: add sentence on spacy versions * docs: update description and images * misc: add spaCy Contributor Agreement	2021-05-31 10:41:34 +02:00
Kristian Boda	dc8d8d15d2	Add hmrb to spaCy Universe (#8129 ) * docs: add hmrb to spacy universe * docs: add sentence on spacy versions * docs: update description and images * misc: add spaCy Contributor Agreement	2021-05-31 18:40:48 +10:00
Dhruv Naik	283f64a98d	Fix bug from Entityruler: ent_ids returns None for phrases (#8169 ) * bugfix for explosion/spaCy#8168 * add test for explosion/spaCy#8168	2021-05-31 18:38:53 +10:00
Michael K	b0467d2972	Add project urls to package metadata (#7728 ) This adds the links to PyPI. To see that in action check out https://pypi.org/project/Django/ (source code: `b8c9e9fae1/setup.cfg (L27-L32)`)	2021-05-31 18:38:29 +10:00
Narayan Acharya	6b79714080	Address missing config overrides post load of models (#8208 )	2021-05-31 18:36:52 +10:00
Sofie Van Landeghem	fff662e41f	Ensemble textcat with listener (#8012 ) * add unit test for two listeners, with a textcat ensemble in the middle * return zero gradients instead of None in accumulate_gradient	2021-05-31 18:21:06 +10:00
Sofie Van Landeghem	ff91e6dac7	Show warning if entity_ruler runs without patterns (#7807 ) * Show warning if entity_ruler runs without patterns * Show warning if matcher runs without patterns * fix wording * unit test for warning once (WIP) * warn W036 only once * cleanup * create filter_warning helper	2021-05-31 18:20:27 +10:00
Paul O'Leary McCann	d1a221a374	Add all symbols in Unicode Currency Symbols block (#8212 ) * Add all symbols in Unicode Currency Symbols block In #8102 it came up that the rupee symbol was treated different from dollar / euro / yen symbols. This adds many symbols not already included. * Fix test * Fix training test	2021-05-31 18:03:40 +10:00
Paul O'Leary McCann	04239e94c7	Use a context manager when reading model (fix #7036 ) (#8244 )	2021-05-31 17:36:17 +10:00
Sofie Van Landeghem	fc37715cfb	ensure 'spacy ray' works (#7799 ) * ensure 'spacy ray' works * better fix by changing entry point	2021-05-28 18:15:31 +02:00
svlandeg	0aa1083ce8	avoid repetitive entities in the output	2021-05-28 16:52:51 +02:00
svlandeg	0d81bce9cc	add failing test for too short a sentence	2021-05-28 15:10:35 +02:00
svlandeg	0f5c586e2f	add basic tests for debugging	2021-05-28 14:19:55 +02:00
Ines Montani	5957ab74f7	Merge pull request #8112 from svlandeg/bugfix/replace-trf	2021-05-28 11:35:17 +10:00
svlandeg	391b512afd	fix types of fwd functions	2021-05-27 16:36:46 +02:00
svlandeg	04b55bf054	removing unused imports	2021-05-27 16:31:38 +02:00
svlandeg	910026582d	set versions to v1 instead of v0	2021-05-27 16:17:20 +02:00
svlandeg	2e3c0e2256	delete outdated tests	2021-05-27 13:54:31 +02:00
svlandeg	ba2e491cc4	Merge remote-tracking branch 'upstream/master' into feature/coref	2021-05-27 13:50:32 +02:00
Sofie Van Landeghem	4b81f58eda	fix docs (#8200 )	2021-05-27 10:50:46 +02:00
Sofie Van Landeghem	3c58c0323f	fix docs (#8200 )	2021-05-27 10:48:59 +02:00
Sofie Van Landeghem	290bd6ed39	ensure tolerance is properly passed on (#8158 )	2021-05-27 18:10:28 +10:00
Paul O'Leary McCann	ee62344970	Fix skweak Github URL Github entry should not contain url, just user/repo	2021-05-24 20:31:43 +09:00
Paul O'Leary McCann	68ccfc4c39	Fix docs (fix #8189 )	2021-05-24 19:49:21 +09:00
Paul O'Leary McCann	0c553ecd4e	Fix docs (fix #8189 )	2021-05-24 19:47:30 +09:00
Paul O'Leary McCann	a484245f35	Remove references to coref_er	2021-05-24 19:08:45 +09:00
Paul O'Leary McCann	d6389b133d	Don't use a generator for no reason	2021-05-24 19:06:15 +09:00
Paul O'Leary McCann	d6fd5fe1c0	Minor cleanup	2021-05-24 14:56:43 +09:00
Paul O'Leary McCann	0942a0b51b	Remove coref_er.py The intent of this was that it would be a component pipeline that used entities as input, but that's now covered by the get_mentions function as a pipeline arg.	2021-05-21 18:20:25 +09:00
Paul O'Leary McCann	f6652c9252	Add new coref scoring This is closer to the traditional evaluation method. That uses an average of three scores, this is just using the bcubed metric for now (nothing special about bcubed, just picked one). The scoring implementation comes from the coval project. It relies on scipy, which is one issue, and is rather involved, which is another. Besides being comparable with traditional evaluations, this scoring is relatively fast.	2021-05-21 15:56:40 +09:00
Paul O'Leary McCann	e1b4a85bb9	Fix loss The loss was being returned as a single element array, which caused training to die when it attempted to turn it into JSON.	2021-05-21 15:46:50 +09:00
Adriane Boyd	cd6bd91c3a	Switch default train corpus max_length to 0 in quickstart (#8142 ) The behavior of `spacy.Corpus.v1` is unexpected enough for `max_length != 0` that `0` is a better default for users creating a new config with the quickstart. If not, documents are skipped, sometimes the entire corpus is skipped, and sometimes documents are (quite unexpectedly for your average user) split into sentences.	2021-05-20 14:48:09 +02:00
Paul O'Leary McCann	ff3fed06cf	Catch a stray reference	2021-05-20 21:30:46 +09:00
Sofie Van Landeghem	202943bc8c	KB & NEL to/from bytes (#8113 ) * unit test for pickling KB * add pickling test for NEL * KB to_bytes and from_bytes * NEL to_bytes and from_bytes * xfail pickle tests for now * fix docs * cleanup	2021-05-20 18:11:30 +10:00
Paul O'Leary McCann	8c5df622d8	Help out python gc in coref backprop	2021-05-20 16:40:55 +09:00
Paul O'Leary McCann	fa92daf052	Break pairwise operations into pseudolayers This makes their scope tighter and more contained, and has the nice side effect that fewer things need to be passed around for backprop.	2021-05-20 15:59:51 +09:00
Adriane Boyd	4e69fcaa50	Disable GPU CI tests (#8143 )	2021-05-19 12:00:31 +02:00
Adriane Boyd	f6128c06b0	Disable GPU CI tests (#8143 )	2021-05-19 12:00:07 +02:00
Paul O'Leary McCann	d22acee4f7	Fix backprop Training seems to actually run now!	2021-05-18 20:09:27 +09:00
Paul O'Leary McCann	2486b8ad4d	Fix pipeline intialize	2021-05-18 19:56:27 +09:00
Paul O'Leary McCann	0620820857	Deal with generators in tuplify	2021-05-18 19:55:52 +09:00
Paul O'Leary McCann	a7d9c8156d	Make get_sentence_map work with init When sentences are not available, just treat the whole doc as one sentence. A reasonable general fallback, but important due to the init call, where upstream components aren't run.	2021-05-18 19:54:54 +09:00
Paul O'Leary McCann	883c137b26	Add basic tuplify init	2021-05-18 19:53:59 +09:00
Paul O'Leary McCann	051715506e	Fiddle with get_mentions definition Ended up not making a difference, but oh well.	2021-05-18 19:53:33 +09:00

... 19 20 21 22 23 ...

15678 Commits