spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-28 00:50:07 +03:00

Author	SHA1	Message	Date
svlandeg	fba219f737	remove unnecessary itertools call	2020-03-16 08:31:36 +01:00
Alan Chan	1ae01684cf	Fill in contributor agreement	2020-03-15 03:45:20 +08:00
Alan Chan	2124be100d	Tweak run-on sentence	2020-03-15 03:45:20 +08:00
Alan Chan	7c3a4ce933	Missing word in api/cli doc	2020-03-15 03:45:20 +08:00
Alan Chan	36e3532475	Remove unfinished sentence	2020-03-15 03:45:17 +08:00
nihil	9cde7eb08c	add spacy_syllables to universe + sign contributor agreement	2020-03-13 18:09:42 +01:00
svlandeg	59000ee21d	fix serialization of empty doc + unit test	2020-03-13 16:07:56 +01:00
Mark Abraham	a0ffa346c0	Fix broken link in docs	2020-03-13 14:07:26 +01:00
Adriane Boyd	423849f94a	Fix sents comparison in test util Due to changes to `Span` (#5005), spans from different documents are now never equal. Check `Token.is_sent_start` values instead.	2020-03-13 09:25:23 +01:00
Ines Montani	353f8486f5	Merge branch 'master' into spacy.io	2020-03-12 14:45:33 +01:00
Matthew Honnibal	26a90f011b	Set version to v2.2.4	2020-03-12 11:30:41 +01:00
Ines Montani	c669435c62	Merge pull request #5125 from renaud/patch-1 small typo in code sample	2020-03-12 11:19:12 +01:00
Ines Montani	4130fef4ec	Merge pull request #5127 from svlandeg/docs/empty-doc is_XXX is True if doc is empty	2020-03-12 11:18:10 +01:00
Ines Montani	3497b2973d	Merge pull request #5130 from merrcury/patch-1 DOC : Update LICENSE Year	2020-03-12 11:17:38 +01:00
Himanshu Garg	27d1300bdb	Create merrcury.md	2020-03-10 15:11:07 +05:30
Himanshu Garg	ba47d5a5cb	Update LICENSE Year	2020-03-10 15:03:29 +05:30
svlandeg	c4d030dbf6	remove accidental commit	2020-03-09 18:10:54 +01:00
svlandeg	1724a4f75b	additional information if doc is empty	2020-03-09 18:08:18 +01:00
Renaud Richardet	eccf6b1686	small typo in code sample	2020-03-09 14:49:11 +01:00
Adriane Boyd	0c31f03ec5	Update docs [ci skip]	2020-03-09 13:41:17 +01:00
Adriane Boyd	1139247532	Revert changes to token_match priority from #4374 * Revert changes to priority of `token_match` so that it has priority over all other tokenizer patterns * Add lookahead and potentially slow lookbehind back to the default URL pattern * Expand character classes in URL pattern to improve matching around lookaheads and lookbehinds related to #4882 * Revert changes to Hungarian tokenizer * Revert (xfail) several URL tests to their status before #4374 * Update `tokenizer.explain()` and docs accordingly	2020-03-09 12:09:41 +01:00
Ines Montani	1d6aec805d	Fix formatting and update docs for v2.2.4	2020-03-09 11:17:20 +01:00
Ines Montani	5f68004264	Port over gitignore changes from develop Prevents stale files when switching branches	2020-03-09 11:05:00 +01:00
Mark Abraham	0345135167	Tokenizer to_disk and from_disk now ensure paths (#5116 ) * Tokenizer to_disk and from_disk now ensure strings are converted to paths Fixes #5115 * Sign contributor agreement	2020-03-08 13:25:56 +01:00
Yohei Tamura	31755630a7	fix typ (#5106 )	2020-03-08 13:24:38 +01:00
adrianeboyd	9dd98a4b27	Improve Makefile (#5105 ) * Explicitly upgrade pip * Include spacy-lookups-data in pex	2020-03-08 13:24:19 +01:00
Sofie Van Landeghem	5847be6022	Tok2Vec: extract-embed-encode (#5102 ) * avoid changing original config * fix elif structure, batch with just int crashes otherwise * tok2vec example with doc2feats, encode and embed architectures * further clean up MultiHashEmbed * further generalize Tok2Vec to work with extract-embed-encode parts * avoid initializing the charembed layer with Docs (for now ?) * small fixes for bilstm config (still does not run) * rename to core layer * move new configs * walk model to set nI instead of using core ref * fix senter overfitting test to be more similar to the training data (avoid flakey behaviour)	2020-03-08 13:23:18 +01:00
adrianeboyd	993758c58f	Remove unnecessary iterator in Language.pipe (#5101 ) Remove iterator over `raw_texts` with `iterator.tee()` in `Language.pipe` that is never consumed and consumes memory unnecessarily.	2020-03-08 13:22:25 +01:00
Ines Montani	cd79c7bd26	Merge pull request #5110 from dhpollack/dhp/fix-minor-svg-error fix typo in svg file - caused documentation build error	2020-03-06 15:32:43 +01:00
Sofie Van Landeghem	1a2b8fc264	set vector of merged entity (#5085 ) * merge_entities sets the vector in the vocab for the merged token * add unit test * import unicode_literals * move code to _merge function * only set vector if vocab has non-zero vectors	2020-03-06 14:45:28 +01:00
adrianeboyd	c95ce96c44	Update sentence recognizer (#5109 ) * Update sentence recognizer * rename `sentrec` to `senter` * use `spacy.HashEmbedCNN.v1` by default * update to follow `Tagger` modifications * remove component methods that can be inherited from `Tagger` * add simple initialization and overfitting pipeline tests * Update serialization test for senter	2020-03-06 14:45:02 +01:00
Sofie Van Landeghem	6ac9fc0619	Unit test for NEL functionality (#5114 ) * empty begin_training for sentencizer * overfitting unit test for entity linker * fixed NEL IO by storing the entity_vector_length in the cfg	2020-03-06 14:42:23 +01:00
David Pollack	80004930ed	fix typo in svg file	2020-03-05 17:04:33 +01:00
Matthew Honnibal	3440a72ecb	Update Makefile (#5099 )	2020-03-04 19:28:16 +01:00
Ines Montani	31faab3647	Merge pull request #5097 from mirfan899/master Basque language support added.	2020-03-04 17:20:23 +01:00
Ines Montani	3adc511cb0	Merge pull request #5070 from explosion/refactor/simplify-warnings Simplify warnings	2020-03-04 17:11:18 +01:00
Ines Montani	b0cfab317f	Merge branch 'develop' into refactor/simplify-warnings	2020-03-04 16:38:55 +01:00
Ines Montani	99d8ee506f	Merge pull request #5100 from adrianeboyd/feature/bump-srsly-1.0.2 Require srsly >=1.0.2	2020-03-04 16:32:52 +01:00
Adriane Boyd	4d655b1d45	Require srsly >=1.0.2	2020-03-04 13:50:37 +01:00
Muhammad Irfan	224a7f8e94	examples	2020-03-04 15:49:06 +05:00
Muhammad Irfan	03376c9d9b	Basque language added and tested.	2020-03-04 11:58:56 +05:00
adrianeboyd	9be90dbca3	Improve token head verification (#5079 ) * Improve token head verification Improve the verification for valid token heads when heads are set: * in `Token.head`: heads come from the same document * in `Doc.from_array()`: head indices are within the bounds of the document * Improve error message	2020-03-03 21:44:51 +01:00
adrianeboyd	8c20dae6f7	Fix model-final/model-best meta from train CLI (#5093 ) * Fix model-final/model-best meta * include speed and accuracy from final iteration * combine with speeds from base model if necessary * Include token_acc metric for all components	2020-03-03 21:43:25 +01:00
Sofie Van Landeghem	a0998868ff	prevent updating cfg if the Model was already defined (#5078 )	2020-03-03 13:58:56 +01:00
Sofie Van Landeghem	d307e9ca58	take care of global vectors in multiprocessing (#5081 ) * restore load_nlp.VECTORS in the child process * add unit test * fix test * remove unnecessary import * add utf8 encoding * import unicode_literals	2020-03-03 13:58:22 +01:00
adrianeboyd	d078b47c81	Break out of infinite loop as intended (#5077 )	2020-03-03 12:29:05 +01:00
adrianeboyd	697bec764d	Normalize IS_SENT_START to SENT_START for Matcher (#5080 )	2020-03-03 12:22:39 +01:00
adrianeboyd	2281c4708c	Restore empty tokenizer properties (#5026 ) * Restore empty tokenizer properties * Check for types in tokenizer.from_bytes() * Add test for setting empty tokenizer rules	2020-03-02 11:55:02 +01:00
Sofie Van Landeghem	c6b12ab02a	Bugfix/get doc (#5049 ) * new (broken) unit test * fixing get_doc method	2020-03-02 11:49:28 +01:00
Ines Montani	648f61d077	Tidy up compiler flags and imports (#5071 )	2020-03-02 11:48:10 +01:00

... 11 12 13 14 15 ...

12055 Commits