Commit Graph

507 Commits

Author SHA1 Message Date
ines
66c1f194f9 Use consistent unicode declarations 2017-03-12 13:07:28 +01:00
Em
9c809efc25 Removed mapStr 2017-03-11 16:23:26 -08:00
Matthew Honnibal
ea2592879f Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-11 11:13:37 -06:00
Em
426d17167f Added string manipulation for spans 2017-03-10 16:50:02 -08:00
ines
10e29189ac Adjust URL testcases and xfail problems (instead of comment) 2017-03-10 14:22:50 +01:00
Matthew Honnibal
ea53647362 Merge branch 'develop' 2017-03-10 02:49:39 -06:00
Dan Rapp
123d3f2d38 Fix error in test case parameterization 2017-03-09 12:18:21 -07:00
Dan Rapp
b9307dfcd7 Merge branch 'master' into rappdw/tokenizer_exceptions_url_fix 2017-03-09 11:42:14 -07:00
Dan Rapp
3b1df3808d Issue #840 - URL pattenr too broad 2017-03-09 11:39:39 -07:00
Matthew Honnibal
5b0b968d13 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-03-08 15:03:10 +01:00
Matthew Honnibal
0ac3d27689 Fix handling of trailing whitespace
Fix off-by-one error that meant trailing spaces were being dropped.
Closes #792
2017-03-08 15:01:40 +01:00
ines
c2e3e651b8 Re-add regression test for #859 2017-03-08 14:36:09 +01:00
Matthew Honnibal
16670d3251 Xfail the vocab pickling for now 2017-03-07 21:43:28 +01:00
Matthew Honnibal
a89c3500f6 Fixes to hacky vocab pickling 2017-03-07 20:58:55 +01:00
Matthew Honnibal
3edb8ae207 Whitespace 2017-03-07 17:16:26 +01:00
Matthew Honnibal
5de7e712b7 Add support for pickling StringStore. 2017-03-07 17:15:18 +01:00
Matthew Honnibal
4e75e74247 Update regression test for variable-length pattern problem in the matcher. 2017-03-07 16:08:32 +01:00
Matthew Honnibal
6d67213b80 Add test for 850: Matcher fails on zero-or-more. 2017-03-07 15:55:28 +01:00
Aniruddha Adhikary
696215a3fb add tests for Bengali 2017-03-05 11:25:12 +06:00
ines
8dff040032 Revert "Add regression test for #859"
This reverts commit c4f16c66d1.
2017-03-01 21:56:20 +01:00
Juan Miguel Cejuela
a8cfde46d3 #781 Fix test — colocalizes is lemmatized to colocaliz and colicalize 2017-03-01 21:43:08 +01:00
Juan Miguel Cejuela
a471114eb2 #781 add regression test, failing previous bug fix 2017-03-01 21:30:51 +01:00
ines
c4f16c66d1 Add regression test for #859 2017-03-01 16:07:27 +01:00
Matthew Honnibal
34bcc8706d Merge branch 'french-tokenizer-exceptions' 2017-02-27 11:21:21 +01:00
Matthew Honnibal
0aaa546435 Fix test after updating the French tokenizer stuff 2017-02-27 11:20:47 +01:00
ines
376c5813a7 Remove print statements from test 2017-02-24 18:26:32 +01:00
ines
7c1260e98c Add regression test 2017-02-24 18:22:49 +01:00
ines
51eb190ef4 Remove print statements from test 2017-02-24 17:41:12 +01:00
Matthew Honnibal
db5ada3995 Merge branch 'master' of https://github.com/explosion/spaCy 2017-02-24 14:28:12 +01:00
Matthew Honnibal
8f94897d07 Add 1 operator to matcher, and make sure open patterns are closed at end of document. Closes Issue #766 2017-02-24 14:27:02 +01:00
ines
67991b6e5f Add more test cases to #775 regression test to cover #847 2017-02-18 14:10:44 +01:00
ines
44de3c7642 Reformat test and use text_file fixture 2017-02-16 23:49:19 +01:00
ines
3dd22e9c88 Mark vectors test as xfail (temporary) 2017-02-16 23:28:51 +01:00
ines
85d249d451 Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)""
This reverts commit ea05f78660.
2017-02-16 23:26:25 +01:00
ines
ea05f78660 Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)"
This reverts commit 7d8c9eee7f, reversing
changes made to f6b69babcc.
2017-02-16 15:27:12 +01:00
Raphaël Bournhonesque
06a71d22df Fix test failure by using unicode literals 2017-02-16 14:48:00 +01:00
Raphaël Bournhonesque
3ba109622c Add regression test with non ' ' space character as token 2017-02-16 12:23:27 +01:00
ines
21f09d10d7 Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions""
This reverts commit f02a2f9322.
2017-02-10 13:17:05 +01:00
ines
f02a2f9322 Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"
This reverts commit b95afdf39c, reversing
changes made to b0ccf32378.
2017-02-09 17:07:21 +01:00
Raphaël Bournhonesque
309da78bf0 Merge branch 'master' into tokenizer_exceptions 2017-02-09 16:32:12 +01:00
Raphaël Bournhonesque
4ce0bbc6b6 Update unit tests 2017-02-09 16:30:43 +01:00
ines
654fe447b1 Add Swedish tokenizer tests (see #807) 2017-02-05 11:47:07 +01:00
Michael Wallin
35100c8bdd [issue 805] Add regression test and the required fixture 2017-02-04 16:21:34 +02:00
Michael Wallin
1a1952afa5 [finnish] Add initial tests for tokenizer 2017-02-04 13:54:10 +02:00
Ines Montani
afc6365388 Update regression test for #801 to match current expected behaviour 2017-02-02 16:23:05 +01:00
Ines Montani
13a4ab37e0 Add regression test for #801 2017-02-02 15:33:52 +01:00
Raphaël Bournhonesque
85f951ca99 Add tokenizer exceptions for French 2017-02-02 08:36:16 +01:00
Ines Montani
e4875834fe Fix formatting 2017-01-31 15:19:33 +01:00
Ines Montani
c304834e45 Add missing import 2017-01-31 15:18:30 +01:00
Ines Montani
e6465b9ca3 Parametrize test cases and mark as xfail 2017-01-31 15:14:42 +01:00
latkins
e4c84321a5 Added regression test for Issue #792. 2017-01-31 13:47:42 +00:00
Ines Montani
19501f3340 Add regression test for #775 2017-01-25 13:16:52 +01:00
Raphaël Bournhonesque
1be9c0e724 Add fr tokenization unit tests 2017-01-24 10:57:37 +01:00
Ines Montani
0967eb07be Add regression test for #768 2017-01-23 21:25:46 +01:00
Ines Montani
5f6f48e734 Add regression test for #759 2017-01-20 15:11:48 +01:00
Ines Montani
d704cfa60d Fix typo 2017-01-16 21:30:33 +01:00
Matthew Honnibal
2c60d0cb1e Test #743: Tokens unhashable. 2017-01-16 13:27:26 +01:00
Ines Montani
50878ef598 Exclude "were" and "Were" from tokenizer exceptions and add regression test (resolves #744) 2017-01-16 13:10:38 +01:00
Ines Montani
e053c7693b Fix formatting 2017-01-16 13:09:52 +01:00
Ines Montani
116c675c3c Merge pull request #742 from oroszgy/hu_tokenizer_fix
Improved Hungarian tokenizer
2017-01-14 23:52:44 +01:00
Gyorgy Orosz
92345b6a41 Further numeric test. 2017-01-14 22:44:19 +01:00
Gyorgy Orosz
b4df202bfa Better error handling 2017-01-14 22:24:58 +01:00
Gyorgy Orosz
b03a46792c Better error handling 2017-01-14 22:09:29 +01:00
Ines Montani
332ce2d758 Update README.md 2017-01-14 21:12:11 +01:00
Gyorgy Orosz
9505c6a72b Passing all old tests. 2017-01-14 20:39:21 +01:00
Gyorgy Orosz
63037e79af Fixed hyphen handling in the Hungarian tokenizer. 2017-01-14 16:30:11 +01:00
Gyorgy Orosz
f77c0284d6 Maintaining compatibility with other spacy tokenizers. 2017-01-14 16:19:15 +01:00
Gyorgy Orosz
1be5da1ac6 Fixed Hungarian tokenizer for numbers 2017-01-14 15:51:59 +01:00
Ines Montani
a89e269a5a Fix test formatting and consistency 2017-01-14 13:41:19 +01:00
Ines Montani
3424e3a7e5 Update README.md 2017-01-13 15:54:54 +01:00
Ines Montani
49186b34a1 Mark lemmatizer tests as models since they use installed data 2017-01-13 15:12:07 +01:00
Ines Montani
138deb80a1 Modernise vector tests, use add_vecs_to_vocab and don't depend on models 2017-01-13 15:12:07 +01:00
Ines Montani
96f0caa28a Fix test name for consistency 2017-01-13 15:12:07 +01:00
Ines Montani
dc2bb1259f Add util function to add vectors to vocab 2017-01-13 15:12:07 +01:00
Ines Montani
db9b25663d Reformat add_docs_equal and add docstring 2017-01-13 15:12:07 +01:00
Ines Montani
62ce0a0073 Add README.md to tests to explain organisation and conventions 2017-01-13 15:11:18 +01:00
Ines Montani
38d60f6b90 Modernise serializer I/O tests and don't depend on models where possible 2017-01-13 02:24:56 +01:00
Ines Montani
4bb5b89ee4 Add text_file_b fixture using BytesIO 2017-01-13 02:23:50 +01:00
Ines Montani
49febd8c62 Modernise noun chunks tests and don't depend on models 2017-01-13 02:01:00 +01:00
Ines Montani
3ee97b5686 Rename test_parser to test_noun_chunks 2017-01-13 01:36:33 +01:00
Ines Montani
a308703f47 Remove old tests 2017-01-13 01:34:48 +01:00
Ines Montani
12eb8edf26 Move parser tests from unit to parser 2017-01-13 01:34:38 +01:00
Ines Montani
138c53ff2e Merge tokenizer tests 2017-01-13 01:34:14 +01:00
Ines Montani
01f36ca3ff Move attrs tests from unit to root and modernise 2017-01-13 01:33:50 +01:00
Ines Montani
3610d27967 Move alignment tests from munge to gold and modernise 2017-01-13 01:33:31 +01:00
Ines Montani
094ff7396a Reformat and rename Pragmatic Segmenter tests and mark xfails 2017-01-13 01:30:20 +01:00
Ines Montani
affcf1b19d Modernise lemmatizer tests 2017-01-12 23:41:17 +01:00
Ines Montani
33d9cf87f9 Modernise tagger tests and fix xpassing test 2017-01-12 23:40:52 +01:00
Ines Montani
33e5f8dc2e Create basic and extended test set for URLs 2017-01-12 23:40:02 +01:00
Ines Montani
5e4f5ebfc8 Modernise BILUO tests 2017-01-12 23:39:18 +01:00
Ines Montani
09acfbca01 Add Lemmatizer fixture 2017-01-12 23:38:55 +01:00
Ines Montani
514bfa2597 Add path fixture for spaCy data path 2017-01-12 23:38:47 +01:00
Ines Montani
e9e99a5670 Add regression test for #740 2017-01-12 22:57:38 +01:00
Ines Montani
6935d55409 Fix formatting 2017-01-12 22:56:20 +01:00
Ines Montani
5f0d196a31 Modernise and merge matcher tests 2017-01-12 22:23:11 +01:00
Ines Montani
d5d774413a Update comments on EN and DE fixtures 2017-01-12 22:03:07 +01:00
Ines Montani
9b4bea1df9 Tidy up and rename regression tests and remove unnecessary imports 2017-01-12 22:00:37 +01:00
Ines Montani
5e1b6178e3 Fix formatting and consistency 2017-01-12 22:00:06 +01:00
Ines Montani
a3fd32455e Remove redundant language loading integration tests 2017-01-12 21:59:48 +01:00
Ines Montani
61f1ca09c2 Modernise serializer codecs tests 2017-01-12 21:58:55 +01:00