Commit Graph

2361 Commits

Author SHA1 Message Date
Ines Montani
0967eb07be Add regression test for #768 2017-01-23 21:25:46 +01:00
Ines Montani
6baa98f774 Merge pull request #769 from raphael0202/spacy-768
Allow zero-width 'infix' token
2017-01-23 21:24:33 +01:00
Raphaël Bournhonesque
dce8f5515e Allow zero-width 'infix' token 2017-01-23 18:28:01 +01:00
Ines Montani
5f6f48e734 Add regression test for #759 2017-01-20 15:11:48 +01:00
Ines Montani
09ecc39b4e Fix multi-line string of NUM_WORDS (resolves #759) 2017-01-20 15:11:48 +01:00
Magnus Burton
69eab727d7 Added loops to handle contractions with verbs 2017-01-19 14:08:52 +01:00
Matthew Honnibal
be26085277 Fix missing import
Closes #755
2017-01-19 22:03:52 +11:00
Ines Montani
7e36568d5b Fix title to accommodate sputnik 2017-01-17 00:51:09 +01:00
Ines Montani
d704cfa60d Fix typo 2017-01-16 21:30:33 +01:00
Ines Montani
64e142f460 Update about.py 2017-01-16 14:23:08 +01:00
Matthew Honnibal
e889cd698e Increment version 2017-01-16 14:01:35 +01:00
Matthew Honnibal
e7f8e13cf3 Make Token hashable. Fixes #743 2017-01-16 13:27:57 +01:00
Matthew Honnibal
2c60d0cb1e Test #743: Tokens unhashable. 2017-01-16 13:27:26 +01:00
Matthew Honnibal
48c712f1c1 Merge branch 'master' of ssh://github.com/explosion/spaCy 2017-01-16 13:18:06 +01:00
Matthew Honnibal
7ccf490c73 Increment version 2017-01-16 13:17:58 +01:00
Ines Montani
50878ef598 Exclude "were" and "Were" from tokenizer exceptions and add regression test (resolves #744) 2017-01-16 13:10:38 +01:00
Ines Montani
e053c7693b Fix formatting 2017-01-16 13:09:52 +01:00
Ines Montani
116c675c3c Merge pull request #742 from oroszgy/hu_tokenizer_fix
Improved Hungarian tokenizer
2017-01-14 23:52:44 +01:00
Gyorgy Orosz
92345b6a41 Further numeric test. 2017-01-14 22:44:19 +01:00
Gyorgy Orosz
b4df202bfa Better error handling 2017-01-14 22:24:58 +01:00
Gyorgy Orosz
b03a46792c Better error handling 2017-01-14 22:09:29 +01:00
Gyorgy Orosz
a45f22913f Added further abbreviations present in the Szeged corpus 2017-01-14 22:08:55 +01:00
Ines Montani
332ce2d758 Update README.md 2017-01-14 21:12:11 +01:00
Gyorgy Orosz
9505c6a72b Passing all old tests. 2017-01-14 20:39:21 +01:00
Gyorgy Orosz
63037e79af Fixed hyphen handling in the Hungarian tokenizer. 2017-01-14 16:30:11 +01:00
Gyorgy Orosz
f77c0284d6 Maintaining compatibility with other spacy tokenizers. 2017-01-14 16:19:15 +01:00
Gyorgy Orosz
be7a7aeb1a Reversed accidental changes. 2017-01-14 15:59:36 +01:00
Gyorgy Orosz
1be5da1ac6 Fixed Hungarian tokenizer for numbers 2017-01-14 15:51:59 +01:00
Ines Montani
a89e269a5a Fix test formatting and consistency 2017-01-14 13:41:19 +01:00
Ines Montani
3424e3a7e5 Update README.md 2017-01-13 15:54:54 +01:00
Ines Montani
49186b34a1 Mark lemmatizer tests as models since they use installed data 2017-01-13 15:12:07 +01:00
Ines Montani
138deb80a1 Modernise vector tests, use add_vecs_to_vocab and don't depend on models 2017-01-13 15:12:07 +01:00
Ines Montani
96f0caa28a Fix test name for consistency 2017-01-13 15:12:07 +01:00
Ines Montani
dc2bb1259f Add util function to add vectors to vocab 2017-01-13 15:12:07 +01:00
Ines Montani
db9b25663d Reformat add_docs_equal and add docstring 2017-01-13 15:12:07 +01:00
Ines Montani
62ce0a0073 Add README.md to tests to explain organisation and conventions 2017-01-13 15:11:18 +01:00
Ines Montani
38d60f6b90 Modernise serializer I/O tests and don't depend on models where possible 2017-01-13 02:24:56 +01:00
Ines Montani
4bb5b89ee4 Add text_file_b fixture using BytesIO 2017-01-13 02:23:50 +01:00
Ines Montani
49febd8c62 Modernise noun chunks tests and don't depend on models 2017-01-13 02:01:00 +01:00
Ines Montani
3ee97b5686 Rename test_parser to test_noun_chunks 2017-01-13 01:36:33 +01:00
Ines Montani
a308703f47 Remove old tests 2017-01-13 01:34:48 +01:00
Ines Montani
12eb8edf26 Move parser tests from unit to parser 2017-01-13 01:34:38 +01:00
Ines Montani
138c53ff2e Merge tokenizer tests 2017-01-13 01:34:14 +01:00
Ines Montani
01f36ca3ff Move attrs tests from unit to root and modernise 2017-01-13 01:33:50 +01:00
Ines Montani
3610d27967 Move alignment tests from munge to gold and modernise 2017-01-13 01:33:31 +01:00
Ines Montani
094ff7396a Reformat and rename Pragmatic Segmenter tests and mark xfails 2017-01-13 01:30:20 +01:00
Ines Montani
affcf1b19d Modernise lemmatizer tests 2017-01-12 23:41:17 +01:00
Ines Montani
33d9cf87f9 Modernise tagger tests and fix xpassing test 2017-01-12 23:40:52 +01:00
Ines Montani
33e5f8dc2e Create basic and extended test set for URLs 2017-01-12 23:40:02 +01:00
Ines Montani
5e4f5ebfc8 Modernise BILUO tests 2017-01-12 23:39:18 +01:00
Ines Montani
09acfbca01 Add Lemmatizer fixture 2017-01-12 23:38:55 +01:00
Ines Montani
514bfa2597 Add path fixture for spaCy data path 2017-01-12 23:38:47 +01:00
Ines Montani
0894b8c0ef Don't split tokens with digits and "/" infixes (resolves #740) 2017-01-12 22:58:26 +01:00
Ines Montani
e9e99a5670 Add regression test for #740 2017-01-12 22:57:38 +01:00
Ines Montani
6935d55409 Fix formatting 2017-01-12 22:56:20 +01:00
Ines Montani
5f0d196a31 Modernise and merge matcher tests 2017-01-12 22:23:11 +01:00
Ines Montani
d5d774413a Update comments on EN and DE fixtures 2017-01-12 22:03:07 +01:00
Ines Montani
9b4bea1df9 Tidy up and rename regression tests and remove unnecessary imports 2017-01-12 22:00:37 +01:00
Ines Montani
5e1b6178e3 Fix formatting and consistency 2017-01-12 22:00:06 +01:00
Ines Montani
a3fd32455e Remove redundant language loading integration tests 2017-01-12 21:59:48 +01:00
Ines Montani
61f1ca09c2 Modernise serializer codecs tests 2017-01-12 21:58:55 +01:00
Ines Montani
5dbc6e59f6 Modernise Huffman tests 2017-01-12 21:58:40 +01:00
Ines Montani
edeeeccea5 Modernise packer tests and don't depend on models where possible 2017-01-12 21:58:07 +01:00
Ines Montani
d084676cd0 Modernise and merge serialization tests 2017-01-12 21:57:19 +01:00
Ines Montani
442237787c Add assert_docs_equal util to compare two docs 2017-01-12 21:56:52 +01:00
Ines Montani
eac3f700fb Add fixture for entity recognizer 2017-01-12 21:56:32 +01:00
Ines Montani
b438cfddbc Modernise matcher tests and split into two files 2017-01-12 17:51:46 +01:00
Ines Montani
27482ebed8 Move matcher tests for #188 and #242 to regression tests
Modernise tests and remove unnecessary imports
2017-01-12 17:33:57 +01:00
Ines Montani
0a4dc632bd Update test to not create redundant Doc object 2017-01-12 17:33:18 +01:00
Ines Montani
a2526e66d8 Fix formatting, naming and unicode declaration 2017-01-12 16:51:13 +01:00
Ines Montani
052cdff07d Modernise vector similarity tests 2017-01-12 16:51:13 +01:00
Ines Montani
bd20ec0a6a Add get_cosine util function 2017-01-12 16:51:13 +01:00
Ines Montani
51ef75f629 Fix regression test for #615 and remove unnecessary imports 2017-01-12 16:51:12 +01:00
Ines Montani
aeb747e10c Adjust formatting 2017-01-12 16:51:12 +01:00
Ines Montani
8e3e58a7e6 Modernise and merge lexeme vocab tests 2017-01-12 16:51:12 +01:00
Ines Montani
c3d4516fc2 Move test for #361 to regression tests 2017-01-12 16:51:12 +01:00
Daniel Hershcovich
99eb494a82 Fix #737: support loading word vectors with " " as a word 2017-01-12 17:00:14 +02:00
Ines Montani
7cb3d74426 Modernise span tests and don't depend on models 2017-01-12 15:30:49 +01:00
Ines Montani
92e3d8b3ee Modernise vocab API tests and remove old xfailing tests 2017-01-12 15:27:46 +01:00
Ines Montani
7ea87684cd Rename test_vocab.py to test_vocab_api.py 2017-01-12 15:12:21 +01:00
Ines Montani
0da2ee5c68 Merge flag features tests into orth tests in tests root 2017-01-12 15:12:00 +01:00
Ines Montani
03c136cfd3 Remove StringStore tests from vocab tests 2017-01-12 15:11:15 +01:00
Ines Montani
d7bd57abdf Modernise add vectors vocab test 2017-01-12 15:09:49 +01:00
Ines Montani
89525ef345 Use consistent test names 2017-01-12 15:09:21 +01:00
Ines Montani
f8803808ce Remove old unused tests and conftest files 2017-01-12 15:09:05 +01:00
Ines Montani
4d0bfebcd9 Move Pragmatic Segmenter test cases (currently unused) to parser tests 2017-01-12 15:08:02 +01:00
Ines Montani
26d018d874 Add tests for StringStore 2017-01-12 15:07:31 +01:00
Ines Montani
9b6784bab5 Add fixture for StringStore 2017-01-12 15:05:40 +01:00
Ines Montani
99d66d613a Modernise tests for merging spans and don't depend on models 2017-01-12 12:26:26 +01:00
Ines Montani
fa8f67596d Remove unused old test 2017-01-12 12:26:08 +01:00
Ines Montani
359f73a96b Move test for #54 to regression tests 2017-01-12 12:25:51 +01:00
Ines Montani
3f3a46722c Remove unused conftest 2017-01-12 12:25:24 +01:00
Ines Montani
c2406e92bc Allow setting ents in get_doc 2017-01-12 12:25:10 +01:00
Ines Montani
c5914c6fe5 Fix and pass regression test for #736 2017-01-12 11:48:56 +01:00
Matthew Honnibal
4e48862fa8 Remove print statement 2017-01-12 11:25:39 +01:00
Matthew Honnibal
d1d8214767 Increment version 2017-01-12 11:21:57 +01:00
Matthew Honnibal
fba67fa342 Fix Issue #736: Times were being tokenized with incorrect string values. 2017-01-12 11:21:01 +01:00
Ines Montani
a6790b6694 Rename tags to pos in get_doc and allow adding tags to tokens 2017-01-12 11:18:36 +01:00
Ines Montani
1add8ace67 Merge lemmatizer tests 2017-01-12 11:16:53 +01:00
Ines Montani
3bc082abdf Modernise morph exceptions test and don't depend on models 2017-01-12 11:14:29 +01:00