spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-11 12:18:04 +03:00

Author	SHA1	Message	Date
Ines Montani	8d3bfb3c04	Remove outdated options and fix formatting	2018-11-28 23:33:34 +01:00
Nathaniel J. Smith	73255091f8	Fix conftest getoption	2018-11-28 19:07:24 +01:00
Ines Montani	b6e991440c	💫 Tidy up and auto-format tests (#2967 ) * Auto-format tests with black * Add flake8 config * Tidy up and remove unused imports * Fix redefinitions of test functions * Replace orths_and_spaces with words and spaces * Fix compatibility with pytest 4.0 * xfail test for now Test was previously overwritten by following test due to naming conflict, so failure wasn't reported * Unfail passing test * Only use fixture via arguments Fixes pytest 4.0 compatibility	2018-11-27 01:09:36 +01:00
Matthew Honnibal	4336397ecb	Update develop from master	2018-08-14 03:04:28 +02:00
Ines Montani	75f3234404	💫 Refactor test suite (#2568 ) ## Description Related issues: #2379 (should be fixed by separating model tests) * total execution time down from > 300 seconds to under 60 seconds 🎉 * removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure * changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version) * merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways) * tidied up and rewrote existing tests wherever possible ### Todo - [ ] move tests to `/tests` and adjust CI commands accordingly - [x] move model test suite from internal repo to `spacy-models` - [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~ - [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted - [ ] update documentation on how to run tests ### Types of change enhancement, tests ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-07-24 23:38:44 +02:00
Matthew Honnibal	6303ce3d0e	Try to fix memory error by moving fr_tokenizer to module scope	2018-07-24 20:09:06 +02:00
Matthew Honnibal	b2e9e958b9	Add session scoping to tokenizers to try to fix oom on Appveyor	2018-07-24 19:44:18 +02:00
Paul O'Leary McCann	1987f3f784	Add Japanese lemmas (#2543 ) This info was already available from Mecab, forgot to add it before.	2018-07-13 10:55:14 +02:00
Eleni170	6042723535	Add support for Greek language (#2535 ) * Add contributor agreement * Support for Greek language * Fix missing el_tokenizer	2018-07-10 13:48:38 +02:00
Muhammad Irfan	f33c703066	Add Urdu Language Support (#2430 ) * added Urdu language support. * added Urdu language tests. * modified conftest.py for Urdu language support. * added spacy contributor agreement.	2018-06-22 11:14:03 +02:00
Aliia E	428bae66b5	Add Tatar Language Support (#2444 ) * add Tatar lang support * add Tatar letters * add Tatar tests * sign contributor agreement * sign contributor agreement [x] * remove comments from Language class * remove all template comments	2018-06-19 10:17:53 +02:00
ines	b8ef9c1000	Fix model names in conftest (see #2379 )	2018-05-30 14:10:20 +02:00
Jani Monoses	ec62cadf4c	Updates to Romanian support (#2354 ) * Add back Romanian in conftest * Romanian lex_attr * More tokenizer exceptions for Romanian * Add tests for some Romanian tokenizer exceptions	2018-05-24 11:40:00 +02:00
Matthew Honnibal	581d318971	Fix conftest	2018-05-15 00:54:45 +02:00
Tahar Zanouda	00417794d3	Add Arabic language (#2314 ) * added support for Arabic lang * added Arabic language support * updated conftest	2018-05-15 00:27:19 +02:00
Jani Monoses	0e08e49e87	Lemmatizer ro (#2319 ) * Add Romanian lemmatizer lookup table. Adapted from http://www.lexiconista.com/datasets/lemmatization/ by replacing cedillas with commas (ș and ț). The original dataset is licensed under the Open Database License. * Fix one blatant issue in the Romanian lemmatizer * Romanian examples file * Add ro_tokenizer in conftest * Add Romanian lemmatizer test	2018-05-12 15:20:04 +02:00
Paul O'Leary McCann	bd72fbf09c	Port Japanese mecab tokenizer from v1 (#2036 ) * Port Japanese mecab tokenizer from v1 This brings the Mecab-based Japanese tokenization introduced in #1246 to spaCy v2. There isn't a JapaneseTagger implementation yet, but POS tag information from Mecab is stored in a token extension. A tag map is also included. As a reminder, Mecab is required because Universal Dependencies are based on Unidic tags, and Janome doesn't support Unidic. Things to check: 1. Is this the right way to use a token extension? 2. What's the right way to implement a JapaneseTagger? The approach in #1246 relied on `tag_from_strings` which is just gone now. I guess the best thing is to just try training spaCy's default Tagger? -POLM * Add tagging/make_doc and tests	2018-05-03 18:38:26 +02:00
Matthew Honnibal	95a9615221	Fix loading of multiple pre-trained vectors This patch addresses #1660, which was caused by keying all pre-trained vectors with the same ID when telling Thinc how to refer to them. This meant that if multiple models were loaded that had pre-trained vectors, errors or incorrect behaviour resulted. The vectors class now includes a .name attribute, which defaults to: {nlp.meta['lang']_nlp.meta['name']}.vectors The vectors name is set in the cfg of the pipeline components under the key pretrained_vectors. This replaces the previous cfg key pretrained_dims. In order to make existing models compatible with this change, we check for the pretrained_dims key when loading models in from_disk and from_bytes, and add the cfg key pretrained_vectors if we find it.	2018-03-28 16:02:59 +02:00
Canbey Bilgili	abe098b255	Adds Turkish Lemmatization	2017-12-01 17:04:32 +03:00
Vadim Mazaev	4ba7ddf651	Bugfixies	2017-11-30 12:29:38 +03:00
Vadim Mazaev	53e7c38637	Fixed tests depends on pymorphy2	2017-11-26 21:04:44 +03:00
Vadim Mazaev	cacd859dcd	Added tag map, fixed tests fails, added more exceptions	2017-11-26 20:54:48 +03:00
Vadim Mazaev	81314f8659	Fixed tokenizer: added char classes; added first lemmatizer and tokenizer tests	2017-11-21 22:23:59 +03:00
ines	3af281a334	Update test model name	2017-11-01 23:02:00 +01:00
Jim O'Regan	34ca59691b	no idea what is wrong here	2017-10-31 14:50:13 +00:00
Jim O'Regan	41dd29e48e	merge	2017-10-31 14:07:45 +00:00
Ines Montani	facf77e541	Merge branch 'develop' into support-danish	2017-10-24 11:53:19 +02:00
ines	612224c10d	Port over changes from #1157	2017-10-14 13:11:39 +02:00
ines	9b3f8f9ec3	Fix formatting and add comment on languages	2017-10-14 13:11:18 +02:00
ines	61a503a611	Fix parser test	2017-10-07 00:38:51 +02:00
Wannaphong Phatthiyaphaibun	7b5263ffa4	fix thai test	2017-09-26 23:54:15 +07:00
Wannaphong Phatthiyaphaibun	5cba67146c	add thai in spacy2	2017-09-26 21:36:27 +07:00
Jim O'Regan	7de709483b	missed adding here	2017-09-11 10:51:21 +01:00
Jim O'Regan	b1b6123867	add ga_tokenizer	2017-09-11 10:31:41 +01:00
Matthew Honnibal	cb4839033c	Fix loader for EN tests	2017-09-04 15:19:18 +02:00
Jim Geovedi	713d7c0aa0	added indonesian lang test	2017-08-20 12:17:14 +07:00
mollerhoj	e840077601	Add some basic tests for Danish	2017-07-03 15:49:51 +02:00
ines	a0f4592f0a	Update tests	2017-06-05 02:26:13 +02:00
ines	3e105bcd36	Update tests	2017-06-05 02:09:27 +02:00
ines	078232932c	Fix tokenizer fixture scope	2017-06-05 01:06:34 +02:00
Matthew Honnibal	55d0621532	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-06-04 15:53:25 -05:00
Matthew Honnibal	5b9f116aca	Update tests	2017-06-04 15:53:17 -05:00
ines	f432bb4b48	Fix fixture scopes	2017-06-04 22:34:31 +02:00
ines	20a7003c0d	Update model fixtures and reorganise tests	2017-05-29 22:14:31 +02:00
ines	6e3937efc5	Check for arguments of model markers to specify models to test Lets user set --models --en for only English models	2017-05-29 22:10:16 +02:00
ines	b462076d80	Merge load_lang_class and get_lang_class	2017-05-14 01:31:10 +02:00
ines	5858857a78	Update languages list in conftest	2017-05-13 15:37:54 +02:00
ines	bd57b611cc	Update conftest to lazy load languages	2017-05-09 00:02:21 +02:00
Gregory Howard	c0afcd22bb	Merge remote-tracking branch 'remotes/upstream/master'	2017-04-27 14:42:54 +02:00
Gregory Howard	8ff4682255	correcting tokenizer exception. Adding tests for lemmatization	2017-04-27 11:52:14 +02:00

1 2

85 Commits