Commit Graph

2703 Commits

Author SHA1 Message Date
ines
d8c984b65e Don't exit if no model meta data is present 2017-03-16 20:33:33 +01:00
Matthew Honnibal
2524efc0ac Merge remote-tracking branch 'origin/develop-downloads' 2017-03-16 20:20:41 +01:00
ines
8253581057 Link model automatically if not direct download 2017-03-16 19:54:51 +01:00
Matthew Honnibal
8843b84bd1 Merge remote-tracking branch 'origin/develop-downloads' 2017-03-16 12:00:42 -05:00
Matthew Honnibal
55f813bfbb Don't reapply the model during training 2017-03-16 11:59:43 -05:00
Matthew Honnibal
c90dc7ac29 Clean up state initiatisation in transition system 2017-03-16 11:59:11 -05:00
Matthew Honnibal
a46933a8fe Clean up FTRL parsing stuff. 2017-03-16 11:58:20 -05:00
ines
618ce3b425 Add .meta to Language object
Allows getting the current model's meta data, e.g.:
nlp = spacy.load('my-model')
print(nlp.meta)
2017-03-16 17:14:56 +01:00
ines
e348d4434c Add spacy.info(model_name) to show model meta
Allows "previewing" model before loading and making sure it's linked
correctly.
2017-03-16 17:13:40 +01:00
ines
eea3b35e3f Update model loading to support links
Remove match_best_version check, fetch model language from meta instead
of directory name, and don't make too many assumptions – if model is
downloaded via downloader, version should match anyway. (Otherwise,
users should be free to add and load whichever models they want.)
2017-03-16 17:13:08 +01:00
ines
5f3f04bd0a Add util function to load and parse package meta.json 2017-03-16 17:10:05 +01:00
ines
7f920c2f75 Don't break text in when rendering print_msg 2017-03-16 17:09:50 +01:00
ines
16a63d9676 Add docstring 2017-03-16 17:09:11 +01:00
ines
68c04fa897 Move sys_exit() function to util 2017-03-16 17:08:58 +01:00
ines
ccd1a79988 Add spacy.link module to link model directories to shortcuts 2017-03-16 17:01:51 +01:00
Matthew Honnibal
2611ac2a89 Fix scorer bug for NER, related to ambiguity between missing annotations and misaligned tokens 2017-03-16 09:38:28 -05:00
ines
595d89698a Add basestring 2017-03-16 10:01:14 +01:00
ines
7b2eca36e4 Revert "Fix formatting and remove unused code"
This reverts commit d7898d586f.
2017-03-16 09:58:41 +01:00
ines
2f0db1dd36 Use small English model as default 2017-03-16 09:54:40 +01:00
Matthew Honnibal
3d0833c3df Fix off-by-1 in parse features fill_context 2017-03-15 19:55:35 -05:00
Matthew Honnibal
4ef68c413f Approximate cost in Break transition, to speed things up a bit. 2017-03-15 16:40:27 -05:00
Matthew Honnibal
8543db8a5b Use ftrl optimizer in parser 2017-03-15 11:56:37 -05:00
ines
4cfc8ffbd2 Reformat pickle tests 2017-03-15 17:39:54 +01:00
ines
2a0fcf1354 Add tests for new download module 2017-03-15 17:39:43 +01:00
ines
71956c94db Handle deprecated language-specific model downloading 2017-03-15 17:37:55 +01:00
ines
58b884b6d4 Refactor download script and about.py to use new download method 2017-03-15 17:37:18 +01:00
ines
f5d1a39a5b Add util functions for printing and wrapping messages 2017-03-15 17:35:57 +01:00
ines
d7898d586f Fix formatting and remove unused code 2017-03-15 17:35:41 +01:00
ines
b672e95045 Fix formatting 2017-03-15 17:35:04 +01:00
ines
0474e706a0 Remove unused deprecated functions for sputnik 2017-03-15 17:34:54 +01:00
ines
b13e7f79b4 Fix formatting and remove unused imports 2017-03-15 17:33:57 +01:00
ines
1101fd3855 Fix formatting and remove unused imports 2017-03-15 17:33:39 +01:00
ines
842782c128 Move fix_deprecated_glove_vectors_loading to deprecated.py 2017-03-15 17:33:29 +01:00
Matthew Honnibal
4cab8ac136 Update morph exceptions test 2017-03-15 09:31:34 -05:00
Matthew Honnibal
d719f8e77e Use nogil in parser, and set L1 to 0.0 by default 2017-03-15 09:31:01 -05:00
Matthew Honnibal
c61c501406 Update beam-parser to allow parser to maintain nogil 2017-03-15 09:30:22 -05:00
Matthew Honnibal
3d4e389d23 Whitespace 2017-03-15 09:29:42 -05:00
Matthew Honnibal
7769bc31e3 Add beam-search classes 2017-03-15 09:27:41 -05:00
Matthew Honnibal
c79b3129e3 Fix setting of empty lexeme in initial parse state 2017-03-15 09:26:53 -05:00
Matthew Honnibal
d864708072 Add more morphology names in attrs.pyx 2017-03-15 09:26:16 -05:00
Matthew Honnibal
b382dc902c Add morph rules in Language 2017-03-15 09:24:40 -05:00
Matthew Honnibal
8dbff4f5f4 Wire up English lemma and morph rules. 2017-03-15 09:23:22 -05:00
Matthew Honnibal
f70be44746 Use lemmatizer in code, not from downloaded model. 2017-03-15 04:52:50 -05:00
ines
42ba740dde Revert "Merge branch 'debug'"
This reverts commit 89b79d1178, reversing
changes made to 02bdf490a1.
2017-03-13 20:11:52 +01:00
ines
4c5f51e49e Update regression test 2017-03-13 15:16:11 +01:00
ines
02bdf490a1 Remove regression test to see if it caused pytest Travis error 2017-03-13 13:00:22 +01:00
ines
17018750ac Add regression test for #717 2017-03-13 12:58:22 +01:00
ines
2883ebfca2 Remove print statement 2017-03-13 12:30:42 +01:00
ines
98c13d8aa9 Add regression test for #401 2017-03-13 12:28:41 +01:00
ines
444d665f9d Add regression test for #686 2017-03-13 12:23:35 +01:00
ines
46b17e5b51 Add regression test for #719 2017-03-13 12:17:35 +01:00
ines
c8ae682ff9 Add regression test for #636 2017-03-13 12:08:31 +01:00
ines
337f9601f2 Add missing unicode declaration 2017-03-13 12:08:19 +01:00
ines
d70386ec6e Update docstring in #886 regression test 2017-03-13 12:00:38 +01:00
ines
51ba3ef0a8 Add regression test for #886 2017-03-13 11:44:58 +01:00
ines
eec3f21c50 Add WordNet license 2017-03-12 13:58:24 +01:00
ines
f9e603903b Rename stop_words.py to word_sets.py and include more sets
NUM_WORDS and ORDINAL_WORDS are currently not used, but the hard-coded
list should be removed from orth.pyx and replaced to use
language-specific functions. This will later allow other languages to
use their own functions to set those flags. (In English, this is easier
because it only needs to be checked against a set – in German for
example, this requires a more complex function, as most number words
are one word.)
2017-03-12 13:58:22 +01:00
ines
f24f9b4b7b Remove unused code 2017-03-12 13:58:22 +01:00
ines
1da29a7146 Use new Lemmatizer data and remove file import
Since there's currently only an English lemmatizer, the global
Lemmatizer imports from spacy.en. This is unideal and still needs to be
fixed.
2017-03-12 13:58:22 +01:00
ines
0957737ee8 Add Python-formatted lemmatizer data and rules 2017-03-12 13:58:22 +01:00
ines
c89e30d1a3 Add test for English time exceptions ("1a.m." etc.) 2017-03-12 13:58:22 +01:00
ines
ce9568af84 Move English time exceptions ("1a.m." etc.) and refactor 2017-03-12 13:58:22 +01:00
ines
6b30541774 Fix formatting 2017-03-12 13:58:22 +01:00
Ines Montani
e97a30b99a Merge pull request #885 from PySUST/master
[Bengali] 	Spell checked and add new stop words
2017-03-12 13:20:59 +01:00
ines
66c1f194f9 Use consistent unicode declarations 2017-03-12 13:07:28 +01:00
shuvanon
91cb4cdb2b Sort stop_words 2017-03-12 17:55:51 +06:00
shuvanon
784f6cfa49 Update stop_words 2017-03-12 17:41:01 +06:00
shuvanon
73cc17078e Merge branch 'master' of https://github.com/PySUST/spaCy 2017-03-12 14:52:17 +06:00
shuvanon
35ec7135bb Spell checked and add new stop words 2017-03-12 14:51:34 +06:00
Em
9c809efc25 Removed mapStr 2017-03-11 16:23:26 -08:00
Matthew Honnibal
fa23278ee3 Add classes for beam parser and beam NER 2017-03-11 12:45:37 -06:00
Matthew Honnibal
6c4108c073 Add header for beam parser 2017-03-11 12:45:12 -06:00
Matthew Honnibal
4382f175b3 Squelch compiler warnings 2017-03-11 12:44:43 -06:00
Matthew Honnibal
ea2592879f Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-11 11:13:37 -06:00
Matthew Honnibal
1224c4d3c6 Improve output on trainer 2017-03-11 11:12:48 -06:00
Matthew Honnibal
b438dfd3f3 Add itn argument to tagger.update 2017-03-11 11:12:21 -06:00
Matthew Honnibal
931feb3360 Allow beam parsing for NER 2017-03-11 11:12:01 -06:00
Matthew Honnibal
f77a5bb60a Switch back to greedy parser 2017-03-11 11:11:30 -06:00
Matthew Honnibal
ca9c8c57c0 Add iteration argument to parser.update 2017-03-11 07:00:47 -06:00
Matthew Honnibal
dcce9ca3f3 Use beam parser 2017-03-11 07:00:20 -06:00
Matthew Honnibal
e30ffdd003 Use ftrl optimizer in tagger 2017-03-11 06:59:13 -06:00
Matthew Honnibal
d59c6926c1 I think this fixes the segfault 2017-03-11 06:58:34 -06:00
Matthew Honnibal
318b9e32ff WIP on beam parser. Currently segfaults. 2017-03-11 06:19:52 -06:00
Em
426d17167f Added string manipulation for spans 2017-03-10 16:50:02 -08:00
Matthew Honnibal
b0d80dc9ae Update name of 'train' function in BeamParser 2017-03-10 14:35:43 -06:00
Matthew Honnibal
d11f1a4ddf Record negative costs in non-monotonic arc eager oracle 2017-03-10 11:22:04 -06:00
Matthew Honnibal
ecf91a2dbb Support beam parser 2017-03-10 11:21:21 -06:00
Ines Montani
a16aff17aa Merge pull request #876 from PySUST/master
[Bangla] Update "tokenizer_exceptions.py"
2017-03-10 14:46:00 +01:00
ines
10e29189ac Adjust URL testcases and xfail problems (instead of comment) 2017-03-10 14:22:50 +01:00
ines
b04893a059 Make regex locale-independent for Python 2 2017-03-10 14:21:57 +01:00
Matthew Honnibal
ea53647362 Merge branch 'develop' 2017-03-10 02:49:39 -06:00
Ines Montani
1c40890321 Add missing comma
Should fix Travis build error
2017-03-10 09:34:54 +01:00
Shuvanon Razik
c251703428 Update abbreviations 2017-03-10 10:45:01 +06:00
Matthew Honnibal
b5247c49eb Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-03-09 18:45:43 -06:00
Matthew Honnibal
798450136d Set L1 penalty to 0 in tagger. 2017-03-09 18:43:47 -06:00
Matthew Honnibal
c62da02344 Use ftrl training, to learn compressed model. 2017-03-09 18:43:21 -06:00
Matthew Honnibal
f71eeef9bb Pass path argument to end_training 2017-03-09 18:42:40 -06:00
Dan Rapp
123d3f2d38 Fix error in test case parameterization 2017-03-09 12:18:21 -07:00
Dan Rapp
b9307dfcd7 Merge branch 'master' into rappdw/tokenizer_exceptions_url_fix 2017-03-09 11:42:14 -07:00
Dan Rapp
3b1df3808d Issue #840 - URL pattenr too broad 2017-03-09 11:39:39 -07:00
Matthew Honnibal
5b0b968d13 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-03-08 15:03:10 +01:00
Matthew Honnibal
0ac3d27689 Fix handling of trailing whitespace
Fix off-by-one error that meant trailing spaces were being dropped.
Closes #792
2017-03-08 15:01:40 +01:00
ines
c2e3e651b8 Re-add regression test for #859 2017-03-08 14:36:09 +01:00
Matthew Honnibal
0a6d7ca200 Fix spacing after token_match
The boolean flag indicating a space after the token was
being set incorrectly after the token_match regex was applied.
Fixes #859.
2017-03-08 14:33:32 +01:00
shuvanon
85438aee1b update tokenizertokenizer 2017-03-08 17:29:39 +06:00
shuvanon
45bc78461c update tokenizertokenizer 2017-03-08 17:27:12 +06:00
Matthew Honnibal
cd33b39a04 Fix 2/3 problem for json save/load 2017-03-08 01:39:13 +01:00
Matthew Honnibal
40703988bc Use FTRL training in parser 2017-03-08 01:38:51 +01:00
Matthew Honnibal
d108534dc2 Fix 2/3 problems for training 2017-03-08 01:37:52 +01:00
Matthew Honnibal
d03d6a13f1 Merge branch 'rominf-ud20' into develop 2017-03-07 21:48:56 +01:00
Matthew Honnibal
f7374d0b86 Merge branch 'ud20' of https://github.com/rominf/spaCy into rominf-ud20 2017-03-07 21:48:37 +01:00
Matthew Honnibal
16670d3251 Xfail the vocab pickling for now 2017-03-07 21:43:28 +01:00
Matthew Honnibal
a89c3500f6 Fixes to hacky vocab pickling 2017-03-07 20:58:55 +01:00
Matthew Honnibal
d814892805 Hackish pickle support for Vocab. 2017-03-07 20:25:12 +01:00
Matthew Honnibal
26614e028f Add hacky support for StringCFile, to make pickling easier. 2017-03-07 20:24:37 +01:00
Matthew Honnibal
3edb8ae207 Whitespace 2017-03-07 17:16:26 +01:00
Matthew Honnibal
5de7e712b7 Add support for pickling StringStore. 2017-03-07 17:15:18 +01:00
Matthew Honnibal
4e75e74247 Update regression test for variable-length pattern problem in the matcher. 2017-03-07 16:08:32 +01:00
Matthew Honnibal
6d67213b80 Add test for 850: Matcher fails on zero-or-more. 2017-03-07 15:55:28 +01:00
Aniruddha Adhikary
696215a3fb add tests for Bengali 2017-03-05 11:25:12 +06:00
Aniruddha Adhikary
8f3bfe9bfc [Bengali] basic tag map, morph, lemma rules and exceptions 2017-03-04 12:36:59 +06:00
Roman Inflianskas
66e1109b53 Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
ines
8dff040032 Revert "Add regression test for #859"
This reverts commit c4f16c66d1.
2017-03-01 21:56:20 +01:00
Juan Miguel Cejuela
25c29f072d apply patch 2017-03-01 21:44:17 +01:00
Juan Miguel Cejuela
a8cfde46d3 #781 Fix test — colocalizes is lemmatized to colocaliz and colicalize 2017-03-01 21:43:08 +01:00
Juan Miguel Cejuela
a471114eb2 #781 add regression test, failing previous bug fix 2017-03-01 21:30:51 +01:00
ines
c4f16c66d1 Add regression test for #859 2017-03-01 16:07:27 +01:00
Aniruddha Adhikary
d91be7aed4 add punctuations for Bengali 2017-02-28 21:07:14 +06:00
Aniruddha Adhikary
5a4fc09576 add basic Bengali support 2017-02-28 07:48:37 +06:00
Matthew Honnibal
cc9b2b74e3 Merge branch 'french-tokenizer-exceptions' 2017-02-27 11:44:39 +01:00
Matthew Honnibal
bd4375a2e6 Remove comment 2017-02-27 11:44:26 +01:00
Matthew Honnibal
e7e22d8be6 Move import within get_exceptions() function, to speed import 2017-02-27 11:34:48 +01:00
Matthew Honnibal
34bcc8706d Merge branch 'french-tokenizer-exceptions' 2017-02-27 11:21:21 +01:00
Matthew Honnibal
0aaa546435 Fix test after updating the French tokenizer stuff 2017-02-27 11:20:47 +01:00
Matthew Honnibal
26446aa728 Avoid loading all French exceptions on import
Move exceptions loading behind a get_tokenizer_exceptions() function
for French, instead of loading into the top-level namespace. This
cuts import times from 0.6s to 0.2s, at the expense of making the
French data a little different from the others (there's no top-level
TOKENIZER_EXCEPTIONS variable.) The current solution feels somewhat
unsatisfying.
2017-02-25 11:55:00 +01:00
ines
376c5813a7 Remove print statements from test 2017-02-24 18:26:32 +01:00
ines
7c1260e98c Add regression test 2017-02-24 18:22:49 +01:00
ines
0e2e331b58 Convert exceptions to Python list 2017-02-24 18:22:40 +01:00
ines
51eb190ef4 Remove print statements from test 2017-02-24 17:41:12 +01:00
Matthew Honnibal
db5ada3995 Merge branch 'master' of https://github.com/explosion/spaCy 2017-02-24 14:28:12 +01:00
Matthew Honnibal
8f94897d07 Add 1 operator to matcher, and make sure open patterns are closed at end of document. Closes Issue #766 2017-02-24 14:27:02 +01:00
ines
67991b6e5f Add more test cases to #775 regression test to cover #847 2017-02-18 14:10:44 +01:00
ines
30ce2a6793 Exclude "shed" and "Shed" from tokenizer exceptions (see #847) 2017-02-18 14:10:44 +01:00
Ines Montani
de997c1a33 Merge pull request #842 from magnusburton/master
Added regular verb rules for Swedish
2017-02-17 11:18:20 +01:00
Magnus Burton
41fcfd06b8 Added regular verb rules for Swedish 2017-02-17 10:04:04 +01:00
ines
aa92d4e9b5 Fix unicode regex for Python 2 (see #834) 2017-02-16 23:49:54 +01:00
ines
44de3c7642 Reformat test and use text_file fixture 2017-02-16 23:49:19 +01:00
ines
3dd22e9c88 Mark vectors test as xfail (temporary) 2017-02-16 23:28:51 +01:00
ines
85d249d451 Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)""
This reverts commit ea05f78660.
2017-02-16 23:26:25 +01:00
ines
ea05f78660 Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)"
This reverts commit 7d8c9eee7f, reversing
changes made to f6b69babcc.
2017-02-16 15:27:12 +01:00
Raphaël Bournhonesque
06a71d22df Fix test failure by using unicode literals 2017-02-16 14:48:00 +01:00
Raphaël Bournhonesque
3ba109622c Add regression test with non ' ' space character as token 2017-02-16 12:23:27 +01:00
Raphaël Bournhonesque
e17dc2db75 Remove useless import 2017-02-16 12:10:24 +01:00
Raphaël Bournhonesque
3fd2742649 load_vectors should accept arbitrary space characters as word tokens
Fix bug  #834
2017-02-16 12:08:30 +01:00
ines
f08e180a47 Make groups non-capturing
Prevents hitting the 100 named groups limit in Python
2017-02-10 13:35:02 +01:00
ines
fa3b8512da Use consistent imports and exports
Bundle everything in language_data to keep it consistent with other
languages and make TOKENIZER_EXCEPTIONS importable from there.
2017-02-10 13:34:09 +01:00
ines
21f09d10d7 Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions""
This reverts commit f02a2f9322.
2017-02-10 13:17:05 +01:00
ines
f02a2f9322 Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"
This reverts commit b95afdf39c, reversing
changes made to b0ccf32378.
2017-02-09 17:07:21 +01:00
Raphaël Bournhonesque
309da78bf0 Merge branch 'master' into tokenizer_exceptions 2017-02-09 16:32:12 +01:00
Raphaël Bournhonesque
4ce0bbc6b6 Update unit tests 2017-02-09 16:30:43 +01:00
Raphaël Bournhonesque
5d706ab95d Merge tokenizer exceptions from PR #802 2017-02-09 16:30:28 +01:00
ines
654fe447b1 Add Swedish tokenizer tests (see #807) 2017-02-05 11:47:07 +01:00
ines
6715615d55 Add missing EXC variable and combine tokenizer exceptions 2017-02-05 11:42:52 +01:00
Ines Montani
30a52d576b Merge pull request #807 from magnusburton/master
Added swedish lemma rules and more verb contractions
2017-02-05 11:34:19 +01:00
Magnus Burton
19c0ce745a Added swedish lemma rules 2017-02-04 17:53:32 +01:00
Michael Wallin
d25556bf80 [issue 805] Fix issue 2017-02-04 16:22:21 +02:00
Michael Wallin
35100c8bdd [issue 805] Add regression test and the required fixture 2017-02-04 16:21:34 +02:00
ines
0ab353b0ca Add line breaks to Finnish stop words for better readability 2017-02-04 13:40:25 +01:00
Michael Wallin
1a1952afa5 [finnish] Add initial tests for tokenizer 2017-02-04 13:54:10 +02:00
Michael Wallin
f9bb25d1cf [finnish] Reformat and correct stop words 2017-02-04 13:54:10 +02:00
Michael Wallin
73f66ec570 Add preliminary support for Finnish 2017-02-04 13:54:10 +02:00
Ines Montani
65d6202107 Merge pull request #802 from Tpt/fr-tokenizer
Adds more French tokenizer exceptions
2017-02-03 10:52:20 +01:00
Tpt
75a74857bb Adds more French tokenizer exceptions 2017-02-03 13:45:18 +04:00
Ines Montani
afc6365388 Update regression test for #801 to match current expected behaviour 2017-02-02 16:23:05 +01:00
Ines Montani
012f4820cb Keep infixes of punctuation + hyphens as one token (see #801) 2017-02-02 16:22:40 +01:00
Ines Montani
1219a5f513 Add = to tokenizer prefixes 2017-02-02 16:21:11 +01:00
Ines Montani
ff04748eb6 Add missing emoticon 2017-02-02 16:21:00 +01:00
Ines Montani
13a4ab37e0 Add regression test for #801 2017-02-02 15:33:52 +01:00
Raphaël Bournhonesque
85f951ca99 Add tokenizer exceptions for French 2017-02-02 08:36:16 +01:00
Matvey Ezhov
32a22291bc Small Doc.count_by documentation update
Current example doesn't work
2017-01-31 19:18:45 +03:00
Ines Montani
e4875834fe Fix formatting 2017-01-31 15:19:33 +01:00
Ines Montani
c304834e45 Add missing import 2017-01-31 15:18:30 +01:00
Ines Montani
e6465b9ca3 Parametrize test cases and mark as xfail 2017-01-31 15:14:42 +01:00
latkins
e4c84321a5 Added regression test for Issue #792. 2017-01-31 13:47:42 +00:00
Matthew Honnibal
6c665b81df Fix redundant == TAG in from_array conditional 2017-01-31 00:46:21 +11:00
Ines Montani
19501f3340 Add regression test for #775 2017-01-25 13:16:52 +01:00
Ines Montani
209c37bbcf Exclude "shell" and "Shell" from English tokenizer exceptions (resolves #775) 2017-01-25 13:15:02 +01:00
Raphaël Bournhonesque
1be9c0e724 Add fr tokenization unit tests 2017-01-24 10:57:37 +01:00
Raphaël Bournhonesque
1faaf698ca Add infixes and abbreviation exceptions (fr) 2017-01-24 10:57:37 +01:00
Raphaël Bournhonesque
cf8474401b Remove unused import statement 2017-01-24 10:57:37 +01:00
Raphaël Bournhonesque
902f136f18 Add support for elision in French 2017-01-24 10:57:37 +01:00
Ines Montani
55c9c62abc Use relative import 2017-01-23 21:27:49 +01:00
Ines Montani
0967eb07be Add regression test for #768 2017-01-23 21:25:46 +01:00
Ines Montani
6baa98f774 Merge pull request #769 from raphael0202/spacy-768
Allow zero-width 'infix' token
2017-01-23 21:24:33 +01:00
Raphaël Bournhonesque
dce8f5515e Allow zero-width 'infix' token 2017-01-23 18:28:01 +01:00
Ines Montani
5f6f48e734 Add regression test for #759 2017-01-20 15:11:48 +01:00
Ines Montani
09ecc39b4e Fix multi-line string of NUM_WORDS (resolves #759) 2017-01-20 15:11:48 +01:00
Magnus Burton
69eab727d7 Added loops to handle contractions with verbs 2017-01-19 14:08:52 +01:00
Matthew Honnibal
be26085277 Fix missing import
Closes #755
2017-01-19 22:03:52 +11:00
Ines Montani
7e36568d5b Fix title to accommodate sputnik 2017-01-17 00:51:09 +01:00
Ines Montani
d704cfa60d Fix typo 2017-01-16 21:30:33 +01:00
Ines Montani
64e142f460 Update about.py 2017-01-16 14:23:08 +01:00
Matthew Honnibal
e889cd698e Increment version 2017-01-16 14:01:35 +01:00
Matthew Honnibal
e7f8e13cf3 Make Token hashable. Fixes #743 2017-01-16 13:27:57 +01:00
Matthew Honnibal
2c60d0cb1e Test #743: Tokens unhashable. 2017-01-16 13:27:26 +01:00
Matthew Honnibal
48c712f1c1 Merge branch 'master' of ssh://github.com/explosion/spaCy 2017-01-16 13:18:06 +01:00
Matthew Honnibal
7ccf490c73 Increment version 2017-01-16 13:17:58 +01:00
Ines Montani
50878ef598 Exclude "were" and "Were" from tokenizer exceptions and add regression test (resolves #744) 2017-01-16 13:10:38 +01:00
Ines Montani
e053c7693b Fix formatting 2017-01-16 13:09:52 +01:00
Ines Montani
116c675c3c Merge pull request #742 from oroszgy/hu_tokenizer_fix
Improved Hungarian tokenizer
2017-01-14 23:52:44 +01:00
Gyorgy Orosz
92345b6a41 Further numeric test. 2017-01-14 22:44:19 +01:00
Gyorgy Orosz
b4df202bfa Better error handling 2017-01-14 22:24:58 +01:00
Gyorgy Orosz
b03a46792c Better error handling 2017-01-14 22:09:29 +01:00
Gyorgy Orosz
a45f22913f Added further abbreviations present in the Szeged corpus 2017-01-14 22:08:55 +01:00
Ines Montani
332ce2d758 Update README.md 2017-01-14 21:12:11 +01:00
Gyorgy Orosz
9505c6a72b Passing all old tests. 2017-01-14 20:39:21 +01:00
Gyorgy Orosz
63037e79af Fixed hyphen handling in the Hungarian tokenizer. 2017-01-14 16:30:11 +01:00
Gyorgy Orosz
f77c0284d6 Maintaining compatibility with other spacy tokenizers. 2017-01-14 16:19:15 +01:00
Gyorgy Orosz
be7a7aeb1a Reversed accidental changes. 2017-01-14 15:59:36 +01:00
Gyorgy Orosz
1be5da1ac6 Fixed Hungarian tokenizer for numbers 2017-01-14 15:51:59 +01:00
Ines Montani
a89e269a5a Fix test formatting and consistency 2017-01-14 13:41:19 +01:00
Ines Montani
3424e3a7e5 Update README.md 2017-01-13 15:54:54 +01:00
Ines Montani
49186b34a1 Mark lemmatizer tests as models since they use installed data 2017-01-13 15:12:07 +01:00
Ines Montani
138deb80a1 Modernise vector tests, use add_vecs_to_vocab and don't depend on models 2017-01-13 15:12:07 +01:00
Ines Montani
96f0caa28a Fix test name for consistency 2017-01-13 15:12:07 +01:00
Ines Montani
dc2bb1259f Add util function to add vectors to vocab 2017-01-13 15:12:07 +01:00
Ines Montani
db9b25663d Reformat add_docs_equal and add docstring 2017-01-13 15:12:07 +01:00
Ines Montani
62ce0a0073 Add README.md to tests to explain organisation and conventions 2017-01-13 15:11:18 +01:00
Ines Montani
38d60f6b90 Modernise serializer I/O tests and don't depend on models where possible 2017-01-13 02:24:56 +01:00
Ines Montani
4bb5b89ee4 Add text_file_b fixture using BytesIO 2017-01-13 02:23:50 +01:00
Ines Montani
49febd8c62 Modernise noun chunks tests and don't depend on models 2017-01-13 02:01:00 +01:00
Ines Montani
3ee97b5686 Rename test_parser to test_noun_chunks 2017-01-13 01:36:33 +01:00
Ines Montani
a308703f47 Remove old tests 2017-01-13 01:34:48 +01:00
Ines Montani
12eb8edf26 Move parser tests from unit to parser 2017-01-13 01:34:38 +01:00
Ines Montani
138c53ff2e Merge tokenizer tests 2017-01-13 01:34:14 +01:00
Ines Montani
01f36ca3ff Move attrs tests from unit to root and modernise 2017-01-13 01:33:50 +01:00
Ines Montani
3610d27967 Move alignment tests from munge to gold and modernise 2017-01-13 01:33:31 +01:00
Ines Montani
094ff7396a Reformat and rename Pragmatic Segmenter tests and mark xfails 2017-01-13 01:30:20 +01:00
Ines Montani
affcf1b19d Modernise lemmatizer tests 2017-01-12 23:41:17 +01:00
Ines Montani
33d9cf87f9 Modernise tagger tests and fix xpassing test 2017-01-12 23:40:52 +01:00
Ines Montani
33e5f8dc2e Create basic and extended test set for URLs 2017-01-12 23:40:02 +01:00
Ines Montani
5e4f5ebfc8 Modernise BILUO tests 2017-01-12 23:39:18 +01:00
Ines Montani
09acfbca01 Add Lemmatizer fixture 2017-01-12 23:38:55 +01:00
Ines Montani
514bfa2597 Add path fixture for spaCy data path 2017-01-12 23:38:47 +01:00
Ines Montani
0894b8c0ef Don't split tokens with digits and "/" infixes (resolves #740) 2017-01-12 22:58:26 +01:00
Ines Montani
e9e99a5670 Add regression test for #740 2017-01-12 22:57:38 +01:00
Ines Montani
6935d55409 Fix formatting 2017-01-12 22:56:20 +01:00
Ines Montani
5f0d196a31 Modernise and merge matcher tests 2017-01-12 22:23:11 +01:00
Ines Montani
d5d774413a Update comments on EN and DE fixtures 2017-01-12 22:03:07 +01:00
Ines Montani
9b4bea1df9 Tidy up and rename regression tests and remove unnecessary imports 2017-01-12 22:00:37 +01:00