spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-08 22:03:24 +03:00

Author	SHA1	Message	Date
Ines Montani	a6b76440b7	Update project CLI	2020-06-22 14:53:31 +02:00
svlandeg	c5ac382f0a	fix name clash	2020-06-02 22:24:57 +02:00
svlandeg	85b0597ed5	add test for minibatch util	2020-06-02 18:26:21 +02:00
Sofie Van Landeghem	311133e579	Train textcat with config (#5143 ) * bring back default build_text_classifier method * remove _set_dims_ hack in favor of proper dim inference * add tok2vec initialize to unit test * small fixes * add unit test for various textcat config settings * logistic output layer does not have nO * fix window_size setting * proper fix * fix W initialization * Update textcat training example * Use ml_datasets * Convert training data to `Example` format * Use `n_texts` to set proportionate dev size * fix _init renaming on latest thinc * avoid setting a non-existing dim * update to thinc==8.0.0a2 * add BOW and CNN defaults for easy testing * various experiments with train_textcat script, fix softmax activation in textcat bow * allow textcat train script to work on other datasets as well * have dataset as a parameter * train textcat from config, with example config * add config for training textcat * formatting * fix exclusive_classes * fixing BOW for GPU * bump thinc to 8.0.0a3 (not published yet so CI will fail) * add in link_vectors_to_models which got deleted Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2020-03-29 19:40:36 +02:00
Ines Montani	46568f40a7	Merge branch 'master' into tmp/sync	2020-03-26 13:38:14 +01:00
Ines Montani	828acffc12	Tidy up and auto-format	2020-03-25 12:28:12 +01:00
Adriane Boyd	423849f94a	Fix sents comparison in test util Due to changes to `Span` (#5005), spans from different documents are now never equal. Check `Token.is_sent_start` values instead.	2020-03-13 09:25:23 +01:00
Sofie Van Landeghem	c6b12ab02a	Bugfix/get doc (#5049 ) * new (broken) unit test * fixing get_doc method	2020-03-02 11:49:28 +01:00
Ines Montani	de11ea753a	Merge branch 'master' into develop	2020-02-18 14:47:23 +01:00
adrianeboyd	3b22eb651b	Sync Span __eq__ and __hash__ (#5005 ) * Sync Span __eq__ and __hash__ Use the same tuple for `__eq__` and `__hash__`, including all attributes except `vector` and `vector_norm`. * Update entity comparison in tests Update `assert_docs_equal()` test util to compare `Span` properties for ents rather than `Span` objects.	2020-02-16 17:20:36 +01:00
Ines Montani	db55577c45	Drop Python 2.7 and 3.5 (#4828 ) * Remove unicode declarations * Remove Python 3.5 and 2.7 from CI * Don't require pathlib * Replace compat helpers * Remove OrderedDict * Use f-strings * Set Cython compiler language level * Fix typo * Re-add OrderedDict for Table * Update setup.cfg * Revert CONTRIBUTING.md * Revert lookups.md * Revert top-level.md * Small adjustments and docs [ci skip]	2019-12-22 01:53:56 +01:00
Ines Montani	3d8fd4b461	Revert #4334	2019-09-29 17:32:12 +02:00
Ines Montani	c9cd516d96	Move tests out of package (#4334 ) * Move tests out of package * Fix typo	2019-09-28 18:05:00 +02:00
Ines Montani	f37863093a	💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003 ) Remove hacks and wrappers, keep code in sync across our libraries and move spaCy a few steps closer to only depending on packages with binary wheels 🎉 See here: https://github.com/explosion/srsly Serialization is hard, especially across Python versions and multiple platforms. After dealing with many subtle bugs over the years (encodings, locales, large files) our libraries like spaCy and Prodigy have steadily grown a number of utility functions to wrap the multiple serialization formats we need to support (especially json, msgpack and pickle). These wrapping functions ended up duplicated across our codebases, so we wanted to put them in one place. At the same time, we noticed that having a lot of small dependencies was making maintainence harder, and making installation slower. To solve this, we've made srsly standalone, by including the component packages directly within it. This way we can provide all the serialization utilities we need in a single binary wheel. srsly currently includes forks of the following packages: ujson msgpack msgpack-numpy cloudpickle * WIP: replace json/ujson with srsly * Replace ujson in examples Use regular json instead of srsly to make code easier to read and follow * Update requirements * Fix imports * Fix typos * Replace msgpack with srsly * Fix warning	2018-12-03 01:28:22 +01:00
Ines Montani	b6e991440c	💫 Tidy up and auto-format tests (#2967 ) * Auto-format tests with black * Add flake8 config * Tidy up and remove unused imports * Fix redefinitions of test functions * Replace orths_and_spaces with words and spaces * Fix compatibility with pytest 4.0 * xfail test for now Test was previously overwritten by following test due to naming conflict, so failure wasn't reported * Unfail passing test * Only use fixture via arguments Fixes pytest 4.0 compatibility	2018-11-27 01:09:36 +01:00
Ines Montani	75f3234404	💫 Refactor test suite (#2568 ) ## Description Related issues: #2379 (should be fixed by separating model tests) * total execution time down from > 300 seconds to under 60 seconds 🎉 * removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure * changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version) * merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways) * tidied up and rewrote existing tests wherever possible ### Todo - [ ] move tests to `/tests` and adjust CI commands accordingly - [x] move model test suite from internal repo to `spacy-models` - [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~ - [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted - [ ] update documentation on how to run tests ### Types of change enhancement, tests ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-07-24 23:38:44 +02:00
Matthew Honnibal	77d8f5de9a	Revise and simplify Vectors class	2017-10-31 18:25:08 +01:00
Matthew Honnibal	ef87562741	Restore vectors test utils	2017-08-19 20:35:16 +02:00
ines	1ebd0d3f27	Add assert_packed_msg_equal util function	2017-06-03 17:04:30 +02:00
ines	9692c98f57	Add test utils for temp file and temp dir	2017-06-02 10:56:09 +02:00
ines	5e1c361270	Update tests README with info on model tests	2017-05-31 12:22:58 +02:00
ines	795fe43a4d	Add load_test_model function with importorskip() Loads model only if it can be imported, i.e. if it's installed as a package.	2017-05-29 22:11:31 +02:00
Matthew Honnibal	fe11564b8e	Finish stringstore change. Also xfail vectors tests	2017-05-28 15:10:22 +02:00
Ines Montani	dc2bb1259f	Add util function to add vectors to vocab	2017-01-13 15:12:07 +01:00
Ines Montani	db9b25663d	Reformat add_docs_equal and add docstring	2017-01-13 15:12:07 +01:00
Ines Montani	442237787c	Add assert_docs_equal util to compare two docs	2017-01-12 21:56:52 +01:00
Ines Montani	bd20ec0a6a	Add get_cosine util function	2017-01-12 16:51:13 +01:00
Ines Montani	c2406e92bc	Allow setting ents in get_doc	2017-01-12 12:25:10 +01:00
Ines Montani	a6790b6694	Rename tags to pos in get_doc and allow adding tags to tokens	2017-01-12 11:18:36 +01:00
Ines Montani	342cb41782	Add apply_transition_sequence util function to utils	2017-01-11 21:30:14 +01:00
Ines Montani	8bf3bb5c44	Make words optional for get_doc	2017-01-11 18:05:10 +01:00
Ines Montani	909f24d7df	Add test utils and get_doc helper function Create Doc object from given vocab, words and annotations to allow tests not to depend on loading the models.	2017-01-11 13:55:33 +01:00

32 Commits