spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-12 01:02:23 +03:00

Author	SHA1	Message	Date
Paolo Arduin	1ca32d8f9c	Matcher support for Span as well as Doc (#5113 ) * Matcher support for Span, as well as Doc #5056 * Removes an import unused * Signed contributors agreement * Code optimization and better test * Add error message for bad Matcher call argument * Fix merging	2020-04-15 13:51:33 +02:00
adrianeboyd	98c59027ed	Use max(uint64) for OOV lexeme rank (#5303 ) * Use max(uint64) for OOV lexeme rank * Add test for default OOV rank * Revert back to thinc==7.4.0 Requiring the updated version of thinc was unnecessary. * Define OOV_RANK in one place Define OOV_RANK in one place in `util`. * Fix formatting [ci skip] * Switch to external definitions of max(uint64) Switch to external defintions of max(uint64) and confirm that they are equal.	2020-04-15 13:49:47 +02:00
adrianeboyd	3d2c308906	Add Doc init from list of words and text (#5251 ) * Add Doc init from list of words and text Add an option to initialize a `Doc` from a text and list of words where the words may or may not include all whitespace tokens. If the text and words are mismatched, raise an error. * Fix error code * Remove all whitespace before aligning words/text * Move words/text init to util function * Update error message * Rename to get_words_and_spaces * Fix formatting	2020-04-14 19:15:52 +02:00
Paolo Arduin	8ce408d2e1	Comparison predicate handling for `!=` (#5282 ) * Fix #5281 * Optim test	2020-04-14 19:14:15 +02:00
Leander Fiedler	d60e2d3ebf	issue5230 added unit test for dumping and loading knowledgebase	2020-04-12 09:08:41 +02:00
Leander Fiedler	d2bb649227	issue5230 filter warnings in addition to filterwarnings to prevent deprecation warnings in python35(win) setup to pop up	2020-04-10 23:21:13 +02:00
Leander Fiedler	ca2a7a44db	issue5230 store string values of warnings to remotely debug failing python35(win) setup	2020-04-10 22:26:55 +02:00
Leander Fiedler	88ca40a15d	issue5230 raise warnings as errors to remotely debug failing python35(win) setup	2020-04-10 21:45:53 +02:00
Leander Fiedler	a7bdfe42e1	issue5230 added print statement to warnings filter to remotely debug failing python35(win) setup	2020-04-10 21:14:33 +02:00
Leander Fiedler	8c1d0d628f	issue5230 writer now checks instance of loc parameter before trying to operate on it	2020-04-10 20:35:52 +02:00
adrianeboyd	cf579a398d	Add __init__.py to eu and hy tests (#5278 )	2020-04-08 20:03:06 +02:00
lfiedler	e1e25c7e30	issue5230: added unittest test case for completion	2020-04-06 21:36:02 +02:00
Leander Fiedler	cde96f6c64	issue5230: optimized unit test a bit	2020-04-06 20:51:12 +02:00
Leander Fiedler	71cc903d65	issue5230: replaced open statements on path objects so that serialization still works an files are closed	2020-04-06 20:30:41 +02:00
Leander Fiedler	273ed452bb	issue5230: added unicode declaration at top of the file	2020-04-06 19:22:32 +02:00
Leander Fiedler	1cd975d4a5	issue5230: fixed resource warnings in language	2020-04-06 18:54:32 +02:00
Leander Fiedler	493c77462a	issue5230: test cases covering known sources of resource warnings	2020-04-06 18:46:51 +02:00
YohannesDatasci	beef184e53	Armenian language support (#5246 ) * add Armenian language and test cases * agreement submission	2020-04-03 13:02:18 +02:00
adrianeboyd	963bd890c1	Modify Vector.resize to work with cupy and improve resizing (#5216 ) * Modify Vector.resize to work with cupy Modify `Vectors.resize` to work with cupy. Modify behavior when resizing to a different vector dimension so that individual vectors are truncated or extended with zeros instead of having the original values filled into the new shape without regard for the original axes. * Update spacy/tests/vocab_vectors/test_vectors.py Co-Authored-By: Matthew Honnibal <honnibal+gh@gmail.com> Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-03-29 13:51:20 +02:00
Ines Montani	828acffc12	Tidy up and auto-format	2020-03-25 12:28:12 +01:00
adrianeboyd	86c43e55fa	Improve Lithuanian tokenization (#5205 ) * Improve Lithuanian tokenization Modify Lithuanian tokenization to improve performance for UD_Lithuanian-ALKSNIS. * Update Lithuanian tokenizer tests	2020-03-25 11:28:12 +01:00
Adriane Boyd	09d442f5ad	Merge remote-tracking branch 'upstream/master' into feature/ud-tokenization-da	2020-03-25 09:41:52 +01:00
Adriane Boyd	cba2d1d972	Disable failing abbreviation test UD_Danish-DDT has (as far as I can tell) hallucinated periods after abbreviations, so the changes are an artifact of the corpus and not due to anything meaningful about Danish tokenization.	2020-03-25 09:39:26 +01:00
Adriane Boyd	423849f94a	Fix sents comparison in test util Due to changes to `Span` (#5005), spans from different documents are now never equal. Check `Token.is_sent_start` values instead.	2020-03-13 09:25:23 +01:00
Adriane Boyd	1139247532	Revert changes to token_match priority from #4374 * Revert changes to priority of `token_match` so that it has priority over all other tokenizer patterns * Add lookahead and potentially slow lookbehind back to the default URL pattern * Expand character classes in URL pattern to improve matching around lookaheads and lookbehinds related to #4882 * Revert changes to Hungarian tokenizer * Revert (xfail) several URL tests to their status before #4374 * Update `tokenizer.explain()` and docs accordingly	2020-03-09 12:09:41 +01:00
Sofie Van Landeghem	1a2b8fc264	set vector of merged entity (#5085 ) * merge_entities sets the vector in the vocab for the merged token * add unit test * import unicode_literals * move code to _merge function * only set vector if vocab has non-zero vectors	2020-03-06 14:45:28 +01:00
Muhammad Irfan	03376c9d9b	Basque language added and tested.	2020-03-04 11:58:56 +05:00
adrianeboyd	9be90dbca3	Improve token head verification (#5079 ) * Improve token head verification Improve the verification for valid token heads when heads are set: * in `Token.head`: heads come from the same document * in `Doc.from_array()`: head indices are within the bounds of the document * Improve error message	2020-03-03 21:44:51 +01:00
Sofie Van Landeghem	d307e9ca58	take care of global vectors in multiprocessing (#5081 ) * restore load_nlp.VECTORS in the child process * add unit test * fix test * remove unnecessary import * add utf8 encoding * import unicode_literals	2020-03-03 13:58:22 +01:00
adrianeboyd	697bec764d	Normalize IS_SENT_START to SENT_START for Matcher (#5080 )	2020-03-03 12:22:39 +01:00
adrianeboyd	2281c4708c	Restore empty tokenizer properties (#5026 ) * Restore empty tokenizer properties * Check for types in tokenizer.from_bytes() * Add test for setting empty tokenizer rules	2020-03-02 11:55:02 +01:00
Sofie Van Landeghem	c6b12ab02a	Bugfix/get doc (#5049 ) * new (broken) unit test * fixing get_doc method	2020-03-02 11:49:28 +01:00
Ines Montani	4440a072d2	Merge pull request #5006 from svlandeg/bugfix/multiproc-underscore load Underscore state when multiprocessing	2020-02-25 14:46:02 +01:00
svlandeg	b49a3afd0c	use clean_underscore fixture	2020-02-23 15:49:20 +01:00
Tom Keefe	ddf63b97a8	make idx available via to_array (#5030 )	2020-02-22 14:13:06 +01:00
Sofie Van Landeghem	44f4142ce4	add two abbreviations and some additional unit tests (#5040 )	2020-02-22 14:12:32 +01:00
Sofie Van Landeghem	479bd8d09f	add lemma option to displacy 'dep' visualiser (#5041 ) * add lemma option to displacy 'dep' visualiser * more compact list comprehension * add option to doc * fix test and add lemmas to util.get_doc * fix capital * remove lemma from get_doc * cleanup	2020-02-22 14:11:51 +01:00
adrianeboyd	3b22eb651b	Sync Span __eq__ and __hash__ (#5005 ) * Sync Span __eq__ and __hash__ Use the same tuple for `__eq__` and `__hash__`, including all attributes except `vector` and `vector_norm`. * Update entity comparison in tests Update `assert_docs_equal()` test util to compare `Span` properties for ents rather than `Span` objects.	2020-02-16 17:20:36 +01:00
adrianeboyd	5b102963bf	Require HEAD for is_parsed in Doc.from_array() (#5011 ) Modify flag settings so that `DEP` is not sufficient to set `is_parsed` and only run `set_children_from_heads()` if `HEAD` is provided. Then the combination `[SENT_START, DEP]` will set deps and not clobber sent starts with a lot of one-word sentences.	2020-02-16 17:17:09 +01:00
svlandeg	6e717c62ed	avoid the tests interacting with eachother through the global Underscore variable	2020-02-12 13:21:31 +01:00
svlandeg	7939c63886	use English instead of model	2020-02-12 12:26:27 +01:00
svlandeg	46628d8890	add some asserts	2020-02-12 12:12:52 +01:00
svlandeg	51d37033c8	remove old comment	2020-02-12 12:10:05 +01:00
svlandeg	05dedaa2cf	add unit test	2020-02-12 12:00:13 +01:00
Antti Ajanki	e1f777b151	Improvements for Finnish tokenizer (#4985 ) * don't split on a colon. Colon is used to attach suffixes for abbreviations * tokenize on any of LIST_HYPHENS (except a single hyphen), not just on -- * simplify infix rules by merging similar rules	2020-02-10 20:32:43 -05:00
Tyler Couto	9fa9d7f2cb	Fix for Issue 4665 - conllu2json (#4953 ) * Fix for Issue 4665 - conllu2json - Allowing HEAD to be an underscore * Added contributor agreement	2020-02-03 13:01:48 +01:00
adrianeboyd	a938566b62	Fix Sentencizer.pipe() for empty doc (#4940 )	2020-01-28 11:36:49 +01:00
Yohei Tamura	708a4d27eb	fix nlp.evaluate (#4924 ) (#4925 ) * new file: test_issue4924.py * modified: spacy/gold.pyx * modified: test_issue4924.py for python2	2020-01-20 12:17:46 +01:00
Kabir Khan	b9afcd56e3	Fix ent_ids and labels properties when id attribute used in patterns (#4900 ) * Fix ent_ids and labels properties when id attribute used in patterns * use set for labels * sort end_ids for comparison in entity_ruler tests * fixing entity_ruler ent_ids test * add to set	2020-01-16 02:01:31 +01:00
adrianeboyd	d24bca62f6	Add CJK to character classes (#4884 ) * Add CJK character class as uncased * Incorporate Chinese URL test case Un-xfail Chinese URL test instance	2020-01-08 16:50:19 +01:00
adrianeboyd	aef83e8070	Mark most Hungarian tokenizer test cases as slow (#4883 ) * Mark most Hungarian tokenizer test cases as slow Mark most Hungarian tokenizer test cases as slow to reduce the runtime of the test suite in ordinary usage: * for normal tests: run default tests plus 10% of the detailed tests * for slow tests: run all tests * Rework to mark individual tests as slow	2020-01-08 12:34:06 +01:00
adrianeboyd	d652ff215d	Add trailing whitespace to multiline test text (#4877 )	2020-01-06 14:58:59 +01:00
adrianeboyd	de69bc6509	Fix and improve URL pattern (#4882 ) * match domains longer than `hostname.domain.tld` like `www.foo.co.uk` * expand allowed characters in domain names while only matching lowercase TLDs so that "this.That" isn't matched as a URL and can be split on the period as an infix (relevant for at least English, German, and Tatar)	2020-01-06 14:58:30 +01:00
Sofie Van Landeghem	a1b22e90cd	serialize ENT_ID (#4852 ) * expand serialization test for custom token attribute * add failing test for issue 4849 * define ENT_ID as attr and use in doc serialization * fix few typos	2020-01-06 14:57:34 +01:00
Ines Montani	3431ac42de	Fix typo	2019-12-21 21:17:45 +01:00
Ines Montani	7c69d30de5	Tidy up and expect warning	2019-12-21 21:14:52 +01:00
Ines Montani	cb4145adc7	Tidy up and auto-format	2019-12-21 19:04:17 +01:00
Olamilekan Wahab	a741de7cf6	Adding support for Yoruba Language (#4614 ) * Adding Support for Yoruba * test text * Updated test string. * Fixing encoding declaration. * Adding encoding to stop_words.py * Added contributor agreement and removed iranlowo. * Added removed test files and removed iranlowo to keep project bare. * Returned CONTRIBUTING.md to default state. * Added delted conftest entries * Tidy up and auto-format * Revert CONTRIBUTING.md Co-authored-by: Ines Montani <ines@ines.io>	2019-12-21 14:11:50 +01:00
tamuhey	1707e77c5e	add char_span to Span (#4793 )	2019-12-13 15:54:58 +01:00
Sofie Van Landeghem	f9b541f9ef	More robust set entities method in KB (#4794 ) * add unit test for setting entities with duplicate identifiers * count the number of actual unique identifiers and throw duplicate warning	2019-12-13 10:45:29 +01:00
adrianeboyd	676e75838f	Include Doc.cats in serialization of Doc and DocBin (#4774 ) * Include Doc.cats in to_bytes() * Include Doc.cats in DocBin serialization * Add tests for serialization of cats Test serialization of cats for Doc and DocBin.	2019-12-06 14:07:39 +01:00
Antti Ajanki	e626a011cc	Improvements to the Finnish language data (#4738 ) * Enable lex_attrs on Finnish * Copy the Danish tokenizer rules to Finnish Specifically, don't break hyphenated compound words * Contributor agreement * A new file for Finnish tokenizer rules instead of including the Danish ones	2019-12-03 12:55:28 +01:00
Christoph Purschke	a7ee4b6f17	new tests & tokenization fixes (#4734 ) - added some tests for tokenization issues - fixed some issues with tokenization of words with hyphen infix - rewrote the "tokenizer_exceptions.py" file (stemming from the German version)	2019-12-01 23:08:21 +01:00
adrianeboyd	48ea2e8d0f	Restructure Sentencizer to follow Pipe API (#4721 ) * Restructure Sentencizer to follow Pipe API Restructure Sentencizer to follow Pipe API so that it can be scored with `nlp.evaluate()`. * Add Sentencizer pipe() test	2019-11-27 16:33:34 +01:00
Ines Montani	5b36dec7eb	Auto-exclude disabled when calling from_disk during load (#4708 )	2019-11-25 16:01:22 +01:00
adrianeboyd	2d8c6e1124	Iterate over lr_edges until sents are correct (#4702 ) Iterate over lr_edges until all heads are within the current sentence. Instead of iterating over them for a fixed number of iterations, check whether the sentence boundaries are correct for the heads and stop when all are correct. Stop after a maximum of 10 iterations, providing a warning in this case since the sentence boundaries may not be correct.	2019-11-25 13:06:36 +01:00
Paul O'Leary McCann	f0e3e606a6	Replace python-mecab3 with fugashi for Japanese (#4621 ) * Switch from mecab-python3 to fugashi mecab-python3 has been the best MeCab binding for a long time but it's not very actively maintained, and since it's based on old SWIG code distributed with MeCab there's a limit to how effectively it can be maintained. Fugashi is a new Cython-based MeCab wrapper I wrote. Since it's not based on the old SWIG code it's easier to keep it current and make small deviations from the MeCab C/C++ API where that makes sense. * Change mecab-python3 to fugashi in setup.cfg * Change "mecab tags" to "unidic tags" The tags come from MeCab, but the tag schema is specified by Unidic, so it's more proper to refer to it that way. * Update conftest * Add fugashi link to external deps list for Japanese	2019-11-23 14:31:04 +01:00
Ines Montani	5d4eede1e4	Fix test util imports	2019-11-21 16:28:29 +01:00
GuiGel	8f7ab70870	Bugfix/fix entity ruler from disk (#4670 ) * fix EntityRuler from_disk bug * add contributor file * Test EntityRuler PhraseMatcher deserialization (#4651) * newline at end of file * fix copy paste error * serializing the EntityRuler by itself * Add unicode declarations for Python 2 and auto-format	2019-11-21 16:26:37 +01:00
adrianeboyd	054df5d90a	Add error for non-string labels (#4690 ) Add error when attempting to add non-string labels to `Tagger` or `TextCategorizer`.	2019-11-21 16:24:10 +01:00
adrianeboyd	d7f32b285c	Detect more empty matches in tokenizer.explain() (#4675 ) * Detect more empty matches in tokenizer.explain() * Include a few languages in explain non-slow tests Mark a few languages in tokenizer.explain() tests as not slow so they're run by default.	2019-11-20 16:31:29 +01:00
Ines Montani	5bf9ab5b03	Tidy up and auto-format	2019-11-20 13:16:33 +01:00
Ines Montani	7f3b00164a	Re-add slow marker	2019-11-20 13:15:59 +01:00
Ines Montani	6e303de717	Auto-format	2019-11-20 13:15:24 +01:00
Ines Montani	2e7c896fe5	Update Tokenizer.explain tests	2019-11-20 13:14:11 +01:00
adrianeboyd	2c876eb672	Add tokenizer explain() debugging method (#4596 ) * Expose tokenizer rules as a property Expose the tokenizer rules property in the same way as the other core properties. (The cache resetting is overkill, but consistent with `from_bytes` for now.) Add tests and update Tokenizer API docs. * Update Hungarian punctuation to remove empty string Update Hungarian punctuation definitions so that `_units` does not match an empty string. * Use _load_special_tokenization consistently Use `_load_special_tokenization()` and have it to handle `None` checks. * Fix precedence of `token_match` vs. special cases Remove `token_match` check from `_split_affixes()` so that special cases have precedence over `token_match`. `token_match` is checked only before infixes are split. * Add `make_debug_doc()` to the Tokenizer Add `make_debug_doc()` to the Tokenizer as a working implementation of the pseudo-code in the docs. Add a test (marked as slow) that checks that `nlp.tokenizer()` and `nlp.tokenizer.make_debug_doc()` return the same non-whitespace tokens for all languages that have `examples.sentences` that can be imported. * Update tokenization usage docs Update pseudo-code and algorithm description to correspond to `nlp.tokenizer.make_debug_doc()` with example debugging usage. Add more examples for customizing tokenizers while preserving the existing defaults. Minor edits / clarifications. * Revert "Update Hungarian punctuation to remove empty string" This reverts commit `f0a577f7a5`. * Rework `make_debug_doc()` as `explain()` Rework `make_debug_doc()` as `explain()`, which returns a list of `(pattern_string, token_string)` tuples rather than a non-standard `Doc`. Update docs and tests accordingly, leaving the visualization for future work. * Handle cases with bad tokenizer patterns Detect when tokenizer patterns match empty prefixes and suffixes so that `explain()` does not hang on bad patterns. * Remove unused displacy image * Add tokenizer.explain() to usage docs	2019-11-20 13:07:25 +01:00
Matthew Honnibal	4b123952aa	Add option for improved NER feature extraction (#4671 ) * Support option of three NER features * Expose nr_feature parser model setting * Give feature tokens better name * Test nr_feature=3 for NER * Format	2019-11-19 15:03:14 +01:00
Ines Montani	74b951fe61	Fix xpassing tests (#4657 ) * Ignore internal warnings * Un-xfail passing tests * Skip instead of xfail	2019-11-16 20:20:53 +01:00
Ines Montani	3bd15055ce	Fix bug in Language.evaluate for components without .pipe (#4662 )	2019-11-16 20:20:37 +01:00
Christoph Purschke	433748e867	Fix basic language support for Luxembourgish (by adding punctuation.py) (#4648 ) * Update __init__.py * Create punctuation.py * Update tokenizer_exceptions.py * Create questoph.md * Update questoph.md * Update test_text.py * Update test_text.py * Update test_text.py * Update test_text.py	2019-11-15 16:16:47 +01:00
adrianeboyd	91f89f9693	Fix realloc in retokenizer.split() (#4606 ) Always realloc to a size larger than `doc.max_length` in `retokenizer.split()` (or cymem will throw errors).	2019-11-11 16:26:46 +01:00
adrianeboyd	0b9a5f4074	Rework Chinese language initialization and tokenization (#4619 ) * Rework Chinese language initialization * Create a `ChineseTokenizer` class * Modify jieba post-processing to handle whitespace correctly * Modify non-jieba character tokenization to handle whitespace correctly * Add a `create_tokenizer()` method to `ChineseDefaults` * Load lexical attributes * Update Chinese tag_map for UD v2 * Add very basic Chinese tests * Test tokenization with and without jieba * Test `like_num` attribute * Fix try_jieba_import() * Fix zh code formatting	2019-11-11 14:23:21 +01:00
Priscilla de Abreu Lopes	39e79fcc86	Bugfix/dep matcher issue 4590 (#4601 ) * add contributor agreement for prilopes * add test for issue #4590 * fix on_match params for DependencyMacther (#4590)	2019-11-07 12:01:06 +01:00
Ines Montani	09cec3e41b	Replace function registries with catalogue (#4584 ) * Replace functions registries with catalogue * Update __init__.py * Fix test * Revert unrelated flag [ci skip]	2019-11-07 11:45:22 +01:00
adrianeboyd	56ad3a3988	Add LAS per dependency to Scorer (#4560 )	2019-10-31 21:18:16 +01:00
Matthew Honnibal	e82306937e	Put Tok2Vec refactor behind feature flag (#4563 ) * Add back pre-2.2.2 tok2vec * Add simple tok2vec tests * Add simple tok2vec tests * Reformat * Fix CharacterEmbed in new tok2vec * Fix legacy tok2vec * Resolve circular imports * Fix test for Python 2	2019-10-31 15:01:15 +01:00
Ines Montani	5e9849b60f	Auto-format [ci skip]	2019-10-30 19:27:18 +01:00
Ines Montani	afe4a428f7	Fix pipeline analysis on remove pipe (#4557 ) Validate after component is removed, not before	2019-10-30 19:04:17 +01:00
Ines Montani	85f2b04c45	Support span._. in component decorator attrs (#4555 ) * Support span._. in component decorator attrs * Adjust error [ci skip]	2019-10-30 17:19:36 +01:00
Matthew Honnibal	a927b3a21e	Put new alignment behind flag for v2.2.2 release (#4541 ) * Xfail new tokenization test * Put new alignment behind feature flag * Move USE_ALIGN to top of the file [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2019-10-28 16:12:32 +01:00
Ines Montani	a90025b277	Fix serialization of extension attr values in DocBin (#4540 )	2019-10-28 16:02:13 +01:00
tamuhey	df293f3894	modified gold.align to handle space tokens (#4537 ) Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2019-10-28 15:44:28 +01:00
adrianeboyd	f2bfaa1b38	Filter subtoken matches in merge_subtokens() (#4539 ) The `Matcher` in `merge_subtokens()` returns all possible subsequences of `subtok`, so for sequences of two or more subtoks it's necessary to filter the matches so that the retokenizer is only merging the longest matches with no overlapping spans.	2019-10-28 15:40:28 +01:00
Ines Montani	96bb8f2187	Add regression test for #4528 [ci skip]	2019-10-28 14:36:03 +01:00
Ines Montani	c5e41247e8	Tidy up and auto-format	2019-10-28 12:43:55 +01:00
Matthw Honnibal	426b745640	Fix tests for gpu	2019-10-27 22:19:18 +01:00
Sofie Van Landeghem	8e7414dace	Match pop with append for training format (#4516 ) * trying to fix script - not succesful yet * match pop() with extend() to avoid changing the data * few more pop-extend fixes * reinsert deleted print statement * fix print statement * add last tested version * append instead of extend * add in few comments * quick fix for 4402 + unit test * fixing number of docs (not counting cats) * more fixes * fix len * print tmp file instead of using data from examples dir * print tmp file instead of using data from examples dir (2)	2019-10-27 16:01:32 +01:00
tamuhey	fcd25db033	[#4529 ] fix: gold pyx (#4530 ) * fix: gold pyx * remove print * skip test in python2 * Add unicode declarations and don't skip test on Python 2	2019-10-27 13:50:07 +01:00
tamuhey	554850206c	[#4525 ] fix gold.align (#4526 ) * fix: gold.align * fix align * remove old align	2019-10-27 13:38:04 +01:00
Ines Montani	a9c6104047	Component decorator and component analysis (#4517 ) * Add work in progress * Update analysis helpers and component decorator * Fix porting of docstrings for Python 2 * Fix docstring stuff on Python 2 * Support meta factories when loading model * Put auto pipeline analysis behind flag for now * Analyse pipes on remove_pipe and replace_pipe * Move analysis to root for now Try to find a better place for it, but it needs to go for now to avoid circular imports * Simplify decorator Don't return a wrapped class and instead just write to the object * Update existing components and factories * Add condition in factory for classes vs. functions * Add missing from_nlp classmethods * Add "retokenizes" to printed overview * Update assigns/requires declarations of builtins * Only return data if no_print is enabled * Use multiline table for overview * Don't support Span * Rewrite errors/warnings and move them to spacy.errors	2019-10-27 13:35:49 +01:00

1 2 3 4 5 ...

1578 Commits