spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-11-20 09:46:02 +03:00

Author	SHA1	Message	Date
svlandeg	b49a3afd0c	use clean_underscore fixture	2020-02-23 15:49:20 +01:00
svlandeg	6e717c62ed	avoid the tests interacting with eachother through the global Underscore variable	2020-02-12 13:21:31 +01:00
svlandeg	7939c63886	use English instead of model	2020-02-12 12:26:27 +01:00
svlandeg	46628d8890	add some asserts	2020-02-12 12:12:52 +01:00
svlandeg	51d37033c8	remove old comment	2020-02-12 12:10:05 +01:00
svlandeg	05dedaa2cf	add unit test	2020-02-12 12:00:13 +01:00
Tyler Couto	9fa9d7f2cb	Fix for Issue 4665 - conllu2json (#4953 ) * Fix for Issue 4665 - conllu2json - Allowing HEAD to be an underscore * Added contributor agreement	2020-02-03 13:01:48 +01:00
Yohei Tamura	708a4d27eb	fix nlp.evaluate (#4924 ) (#4925 ) * new file: test_issue4924.py * modified: spacy/gold.pyx * modified: test_issue4924.py for python2	2020-01-20 12:17:46 +01:00
Sofie Van Landeghem	a1b22e90cd	serialize ENT_ID (#4852 ) * expand serialization test for custom token attribute * add failing test for issue 4849 * define ENT_ID as attr and use in doc serialization * fix few typos	2020-01-06 14:57:34 +01:00
Ines Montani	3431ac42de	Fix typo	2019-12-21 21:17:45 +01:00
Ines Montani	7c69d30de5	Tidy up and expect warning	2019-12-21 21:14:52 +01:00
Ines Montani	cb4145adc7	Tidy up and auto-format	2019-12-21 19:04:17 +01:00
Sofie Van Landeghem	f9b541f9ef	More robust set entities method in KB (#4794 ) * add unit test for setting entities with duplicate identifiers * count the number of actual unique identifiers and throw duplicate warning	2019-12-13 10:45:29 +01:00
Ines Montani	5b36dec7eb	Auto-exclude disabled when calling from_disk during load (#4708 )	2019-11-25 16:01:22 +01:00
Ines Montani	5d4eede1e4	Fix test util imports	2019-11-21 16:28:29 +01:00
GuiGel	8f7ab70870	Bugfix/fix entity ruler from disk (#4670 ) * fix EntityRuler from_disk bug * add contributor file * Test EntityRuler PhraseMatcher deserialization (#4651) * newline at end of file * fix copy paste error * serializing the EntityRuler by itself * Add unicode declarations for Python 2 and auto-format	2019-11-21 16:26:37 +01:00
Ines Montani	5bf9ab5b03	Tidy up and auto-format	2019-11-20 13:16:33 +01:00
Ines Montani	6e303de717	Auto-format	2019-11-20 13:15:24 +01:00
Ines Montani	74b951fe61	Fix xpassing tests (#4657 ) * Ignore internal warnings * Un-xfail passing tests * Skip instead of xfail	2019-11-16 20:20:53 +01:00
Priscilla de Abreu Lopes	39e79fcc86	Bugfix/dep matcher issue 4590 (#4601 ) * add contributor agreement for prilopes * add test for issue #4590 * fix on_match params for DependencyMacther (#4590)	2019-11-07 12:01:06 +01:00
Ines Montani	a90025b277	Fix serialization of extension attr values in DocBin (#4540 )	2019-10-28 16:02:13 +01:00
Ines Montani	96bb8f2187	Add regression test for #4528 [ci skip]	2019-10-28 14:36:03 +01:00
Ines Montani	c5e41247e8	Tidy up and auto-format	2019-10-28 12:43:55 +01:00
Sofie Van Landeghem	8e7414dace	Match pop with append for training format (#4516 ) * trying to fix script - not succesful yet * match pop() with extend() to avoid changing the data * few more pop-extend fixes * reinsert deleted print statement * fix print statement * add last tested version * append instead of extend * add in few comments * quick fix for 4402 + unit test * fixing number of docs (not counting cats) * more fixes * fix len * print tmp file instead of using data from examples dir * print tmp file instead of using data from examples dir (2)	2019-10-27 16:01:32 +01:00
tamuhey	fcd25db033	[#4529 ] fix: gold pyx (#4530 ) * fix: gold pyx * remove print * skip test in python2 * Add unicode declarations and don't skip test on Python 2	2019-10-27 13:50:07 +01:00
Ines Montani	cfffdba7b1	Implement new API for {Phrase}Matcher.add (backwards-compatible) (#4522 ) * Implement new API for {Phrase}Matcher.add (backwards-compatible) * Update docs * Also update DependencyMatcher.add * Update internals * Rewrite tests to use new API * Add basic check for common mistake Raise error with suggestion if user likely passed in a pattern instead of a list of patterns * Fix typo [ci skip]	2019-10-25 22:21:08 +02:00
Ines Montani	d2da117114	Also support passing list to Language.disable_pipes (#4521 ) * Also support passing list to Language.disable_pipes * Adjust internals	2019-10-25 16:19:08 +02:00
Ines Montani	cc05d9dad6	Auto-format [ci skip]	2019-10-24 16:21:08 +02:00
Sofie Van Landeghem	d5d55312b2	prevent division by zero in most_similar method (#4488 )	2019-10-21 12:04:46 +02:00
Ines Montani	181c01f629	Tidy up and auto-format	2019-10-18 11:27:38 +02:00
Sofie Van Landeghem	2d249a9502	KB extensions and better parsing of WikiData (#4375 ) * fix overflow error on windows * more documentation & logging fixes * md fix * 3 different limit parameters to play with execution time * bug fixes directory locations * small fixes * exclude dev test articles from prior probabilities stats * small fixes * filtering wikidata entities, removing numeric and meta items * adding aliases from wikidata also to the KB * fix adding WD aliases * adding also new aliases to previously added entities * fixing comma's * small doc fixes * adding subclassof filtering * append alias functionality in KB * prevent appending the same entity-alias pair * fix for appending WD aliases * remove date filter * remove unnecessary import * small corrections and reformatting * remove WD aliases for now (too slow) * removing numeric entities from training and evaluation * small fixes * shortcut during prediction if there is only one candidate * add counts and fscore logging, remove FP NER from evaluation * fix entity_linker.predict to take docs instead of single sentences * remove enumeration sentences from the WP dataset * entity_linker.update to process full doc instead of single sentence * spelling corrections and dump locations in readme * NLP IO fix * reading KB is unnecessary at the end of the pipeline * small logging fix * remove empty files	2019-10-14 12:28:53 +02:00
Ines Montani	fec9433044	Make PhraseMatcher.vocab consistent with Matcher.vocab (closes #4373 )	2019-10-04 12:18:41 +02:00
Sofie Van Landeghem	4e7259c6cf	Bugfix initializing DocBin with attributes (#4368 ) * docbin init fix + documentation fix + unit tests * newline * try with zlib instead of gzip (python 2 incompatibilities)	2019-10-03 14:48:45 +02:00
Sofie Van Landeghem	9d3ce7cba2	Ensure training doesn't crash with empty batches (#4360 ) * unit test for previously resolved unflatten issue * prevent batch of empty docs to cause problems	2019-10-02 12:50:47 +02:00
Ines Montani	cf65a80f36	Refactor lemmatizer and data table integration (#4353 ) * Move test * Allow default in Lookups.get_table * Start with blank tables in Lookups.from_bytes * Refactor lemmatizer to hold instance of Lookups * Get lookups table within the lemmatization methods to make sure it references the correct table (even if the table was replaced or modified, e.g. when loading a model from disk) * Deprecate other arguments on Lemmatizer.__init__ and expect Lookups for consistency * Remove old and unsupported Lemmatizer.load classmethod * Refactor language-specific lemmatizers to inherit as much as possible from base class and override only what they need * Update tests and docs * Fix more tests * Fix lemmatizer * Upgrade pytest to try and fix weird CI errors * Try pytest 4.6.5	2019-10-01 21:36:03 +02:00
Ines Montani	e0cf4796a5	Move lookup tables out of the core library (#4346 ) * Add default to util.get_entry_point * Tidy up entry points * Read lookups from entry points * Remove lookup tables and related tests * Add lookups install option * Remove lemmatizer tests * Remove logic to process language data files * Update setup.cfg	2019-10-01 00:01:27 +02:00
Ines Montani	0226b3bf0e	Fix test imports	2019-09-29 17:34:56 +02:00
Ines Montani	3d8fd4b461	Revert #4334	2019-09-29 17:32:12 +02:00
Ines Montani	c9cd516d96	Move tests out of package (#4334 ) * Move tests out of package * Fix typo	2019-09-28 18:05:00 +02:00
Sofie Van Landeghem	22b9e12159	Ensure the NER remains consistent after resizing (#4330 ) * test and fix for second bug of issue 4042 * fix for first bug in 4042 * crashing test for Issue 4313 * forgot one instance of resize * remove prints * undo uncomment * delete test for 4313 (uses third party lib) * add fix for Issue 4313 * unit test for 4313	2019-09-27 20:57:13 +02:00
Matthew Honnibal	46c02d25b1	Merge changes to test_ner	2019-09-18 21:41:24 +02:00
Sofie Van Landeghem	de5a9ecdf3	Distinction between outside, missing and blocked NER annotations (#4307 ) * remove duplicate unit test * unit test (currently failing) for issue 4267 * bugfix: ensure doc.ents preserves kb_id annotations * fix in setting doc.ents with empty label * rename * test for presetting an entity to a certain type * allow overwriting Outside + blocking presets * fix actions when previous label needs to be kept * fix default ent_iob in set entities * cleaner solution with U- action * remove debugging print statements * unit tests with explicit transitions and is_valid testing * remove U- from move_names explicitly * remove unit tests with pre-trained models that don't work * remove (working) unit tests with pre-trained models * clean up unit tests * move unit tests * small fixes * remove two TODO's from doc.ents comments	2019-09-18 21:37:17 +02:00
tamuhey	875f3e5d8c	remove redundant __call__ method in pipes.TextCategorizer (#4305 ) * remove redundant __call__ method in pipes.TextCategorizer Because the parent __call__ method behaves in the same way. * fix: Pipe.__call__ arg * fix: invalid arg in Pipe.__call__ * modified: spacy/tests/regression/test_issue4278.py (#4278) * deleted: Pipfile	2019-09-18 21:31:27 +02:00
Ines Montani	139428c20f	Set unique vector names in tests	2019-09-16 15:16:54 +02:00
Ines Montani	655b434553	Merge branch 'master' into develop	2019-09-12 11:39:18 +02:00
tamuhey	71909cdf22	Fix iss4278 (#4279 ) * fix: len(tuple) == 2 * (#4278) add fail test * add contributor's aggreement	2019-09-12 10:44:49 +02:00
Ines Montani	e82a8d0d7a	Merge branch 'master' into develop	2019-09-11 11:52:38 +02:00
Ines Montani	8f9f48b04c	Add GreekLemmatizer.lookup (resolves #4272 )	2019-09-11 11:44:40 +02:00
Ines Montani	6279d74c65	Tidy up and auto-format	2019-09-11 11:38:22 +02:00
Matthew Honnibal	7b858ba606	Update from master	2019-09-10 20:14:08 +02:00

1 2 3 4 5 ...

448 Commits