Commit Graph

2462 Commits

Author SHA1 Message Date
Matthew Honnibal
3d4e389d23 Whitespace 2017-03-15 09:29:42 -05:00
Matthew Honnibal
7769bc31e3 Add beam-search classes 2017-03-15 09:27:41 -05:00
Matthew Honnibal
c79b3129e3 Fix setting of empty lexeme in initial parse state 2017-03-15 09:26:53 -05:00
Matthew Honnibal
d864708072 Add more morphology names in attrs.pyx 2017-03-15 09:26:16 -05:00
Matthew Honnibal
b382dc902c Add morph rules in Language 2017-03-15 09:24:40 -05:00
Matthew Honnibal
8dbff4f5f4 Wire up English lemma and morph rules. 2017-03-15 09:23:22 -05:00
Matthew Honnibal
f70be44746 Use lemmatizer in code, not from downloaded model. 2017-03-15 04:52:50 -05:00
ines
42ba740dde Revert "Merge branch 'debug'"
This reverts commit 89b79d1178, reversing
changes made to 02bdf490a1.
2017-03-13 20:11:52 +01:00
ines
4c5f51e49e Update regression test 2017-03-13 15:16:11 +01:00
ines
02bdf490a1 Remove regression test to see if it caused pytest Travis error 2017-03-13 13:00:22 +01:00
ines
17018750ac Add regression test for #717 2017-03-13 12:58:22 +01:00
ines
2883ebfca2 Remove print statement 2017-03-13 12:30:42 +01:00
ines
98c13d8aa9 Add regression test for #401 2017-03-13 12:28:41 +01:00
ines
444d665f9d Add regression test for #686 2017-03-13 12:23:35 +01:00
ines
46b17e5b51 Add regression test for #719 2017-03-13 12:17:35 +01:00
ines
c8ae682ff9 Add regression test for #636 2017-03-13 12:08:31 +01:00
ines
337f9601f2 Add missing unicode declaration 2017-03-13 12:08:19 +01:00
ines
d70386ec6e Update docstring in #886 regression test 2017-03-13 12:00:38 +01:00
ines
51ba3ef0a8 Add regression test for #886 2017-03-13 11:44:58 +01:00
ines
eec3f21c50 Add WordNet license 2017-03-12 13:58:24 +01:00
ines
f9e603903b Rename stop_words.py to word_sets.py and include more sets
NUM_WORDS and ORDINAL_WORDS are currently not used, but the hard-coded
list should be removed from orth.pyx and replaced to use
language-specific functions. This will later allow other languages to
use their own functions to set those flags. (In English, this is easier
because it only needs to be checked against a set – in German for
example, this requires a more complex function, as most number words
are one word.)
2017-03-12 13:58:22 +01:00
ines
f24f9b4b7b Remove unused code 2017-03-12 13:58:22 +01:00
ines
1da29a7146 Use new Lemmatizer data and remove file import
Since there's currently only an English lemmatizer, the global
Lemmatizer imports from spacy.en. This is unideal and still needs to be
fixed.
2017-03-12 13:58:22 +01:00
ines
0957737ee8 Add Python-formatted lemmatizer data and rules 2017-03-12 13:58:22 +01:00
ines
c89e30d1a3 Add test for English time exceptions ("1a.m." etc.) 2017-03-12 13:58:22 +01:00
ines
ce9568af84 Move English time exceptions ("1a.m." etc.) and refactor 2017-03-12 13:58:22 +01:00
ines
6b30541774 Fix formatting 2017-03-12 13:58:22 +01:00
Ines Montani
e97a30b99a Merge pull request #885 from PySUST/master
[Bengali] 	Spell checked and add new stop words
2017-03-12 13:20:59 +01:00
ines
66c1f194f9 Use consistent unicode declarations 2017-03-12 13:07:28 +01:00
shuvanon
91cb4cdb2b Sort stop_words 2017-03-12 17:55:51 +06:00
shuvanon
784f6cfa49 Update stop_words 2017-03-12 17:41:01 +06:00
shuvanon
73cc17078e Merge branch 'master' of https://github.com/PySUST/spaCy 2017-03-12 14:52:17 +06:00
shuvanon
35ec7135bb Spell checked and add new stop words 2017-03-12 14:51:34 +06:00
Matthew Honnibal
fa23278ee3 Add classes for beam parser and beam NER 2017-03-11 12:45:37 -06:00
Matthew Honnibal
6c4108c073 Add header for beam parser 2017-03-11 12:45:12 -06:00
Matthew Honnibal
4382f175b3 Squelch compiler warnings 2017-03-11 12:44:43 -06:00
Matthew Honnibal
ea2592879f Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-11 11:13:37 -06:00
Matthew Honnibal
1224c4d3c6 Improve output on trainer 2017-03-11 11:12:48 -06:00
Matthew Honnibal
b438dfd3f3 Add itn argument to tagger.update 2017-03-11 11:12:21 -06:00
Matthew Honnibal
931feb3360 Allow beam parsing for NER 2017-03-11 11:12:01 -06:00
Matthew Honnibal
f77a5bb60a Switch back to greedy parser 2017-03-11 11:11:30 -06:00
Matthew Honnibal
ca9c8c57c0 Add iteration argument to parser.update 2017-03-11 07:00:47 -06:00
Matthew Honnibal
dcce9ca3f3 Use beam parser 2017-03-11 07:00:20 -06:00
Matthew Honnibal
e30ffdd003 Use ftrl optimizer in tagger 2017-03-11 06:59:13 -06:00
Matthew Honnibal
d59c6926c1 I think this fixes the segfault 2017-03-11 06:58:34 -06:00
Matthew Honnibal
318b9e32ff WIP on beam parser. Currently segfaults. 2017-03-11 06:19:52 -06:00
Matthew Honnibal
b0d80dc9ae Update name of 'train' function in BeamParser 2017-03-10 14:35:43 -06:00
Matthew Honnibal
d11f1a4ddf Record negative costs in non-monotonic arc eager oracle 2017-03-10 11:22:04 -06:00
Matthew Honnibal
ecf91a2dbb Support beam parser 2017-03-10 11:21:21 -06:00
Ines Montani
a16aff17aa Merge pull request #876 from PySUST/master
[Bangla] Update "tokenizer_exceptions.py"
2017-03-10 14:46:00 +01:00