Matthew Honnibal
d44b1b337a
Try using LinearModel in tagger.
2017-03-13 11:24:02 +01:00
Ines Montani
13b1e6b3e0
Update README.rst
2017-03-12 20:03:15 +01:00
Ines Montani
fa0be1f2fe
Update README.rst
2017-03-12 20:00:41 +01:00
Ines Montani
39dc0d98d2
Update changelog to use table
2017-03-12 19:32:17 +01:00
ines
eec3f21c50
Add WordNet license
2017-03-12 13:58:24 +01:00
ines
0fcdde6c4d
Remove old WordNet data files
2017-03-12 13:58:24 +01:00
ines
f9e603903b
Rename stop_words.py to word_sets.py and include more sets
...
NUM_WORDS and ORDINAL_WORDS are currently not used, but the hard-coded
list should be removed from orth.pyx and replaced to use
language-specific functions. This will later allow other languages to
use their own functions to set those flags. (In English, this is easier
because it only needs to be checked against a set – in German for
example, this requires a more complex function, as most number words
are one word.)
2017-03-12 13:58:22 +01:00
ines
f24f9b4b7b
Remove unused code
2017-03-12 13:58:22 +01:00
ines
1da29a7146
Use new Lemmatizer data and remove file import
...
Since there's currently only an English lemmatizer, the global
Lemmatizer imports from spacy.en. This is unideal and still needs to be
fixed.
2017-03-12 13:58:22 +01:00
ines
0957737ee8
Add Python-formatted lemmatizer data and rules
2017-03-12 13:58:22 +01:00
ines
c89e30d1a3
Add test for English time exceptions ("1a.m." etc.)
2017-03-12 13:58:22 +01:00
ines
ce9568af84
Move English time exceptions ("1a.m." etc.) and refactor
2017-03-12 13:58:22 +01:00
ines
6b30541774
Fix formatting
2017-03-12 13:58:22 +01:00
Ines Montani
e9524b7647
Update CONTRIBUTORS.md
2017-03-12 13:22:30 +01:00
Ines Montani
e97a30b99a
Merge pull request #885 from PySUST/master
...
[Bengali] Spell checked and add new stop words
2017-03-12 13:20:59 +01:00
ines
66c1f194f9
Use consistent unicode declarations
2017-03-12 13:07:28 +01:00
shuvanon
91cb4cdb2b
Sort stop_words
2017-03-12 17:55:51 +06:00
shuvanon
784f6cfa49
Update stop_words
2017-03-12 17:41:01 +06:00
shuvanon
8a2d22222d
filled up CONTRIBUTOR_AGREEMENT.md
2017-03-12 17:07:55 +06:00
shuvanon
73cc17078e
Merge branch 'master' of https://github.com/PySUST/spaCy
2017-03-12 14:52:17 +06:00
shuvanon
35ec7135bb
Spell checked and add new stop words
2017-03-12 14:51:34 +06:00
Em
9c809efc25
Removed mapStr
2017-03-11 16:23:26 -08:00
Matthew Honnibal
fa23278ee3
Add classes for beam parser and beam NER
2017-03-11 12:45:37 -06:00
Matthew Honnibal
cb39b6e337
Require recent thinc
2017-03-11 12:45:22 -06:00
Matthew Honnibal
6c4108c073
Add header for beam parser
2017-03-11 12:45:12 -06:00
Matthew Honnibal
4382f175b3
Squelch compiler warnings
2017-03-11 12:44:43 -06:00
Matthew Honnibal
93ab888d1d
Require recent preshed
2017-03-11 12:33:56 -06:00
Matthew Honnibal
ea2592879f
Merge branch 'master' of https://github.com/explosion/spaCy
2017-03-11 11:13:37 -06:00
Matthew Honnibal
1224c4d3c6
Improve output on trainer
2017-03-11 11:12:48 -06:00
Matthew Honnibal
b438dfd3f3
Add itn argument to tagger.update
2017-03-11 11:12:21 -06:00
Matthew Honnibal
931feb3360
Allow beam parsing for NER
2017-03-11 11:12:01 -06:00
Matthew Honnibal
f77a5bb60a
Switch back to greedy parser
2017-03-11 11:11:30 -06:00
Matthew Honnibal
a155482fda
Improve printing in train_ud script
2017-03-11 11:11:05 -06:00
Ines Montani
dae0701bbd
Fix typo
2017-03-11 16:43:51 +01:00
Matthew Honnibal
ca9c8c57c0
Add iteration argument to parser.update
2017-03-11 07:00:47 -06:00
Matthew Honnibal
dcce9ca3f3
Use beam parser
2017-03-11 07:00:20 -06:00
Matthew Honnibal
e30ffdd003
Use ftrl optimizer in tagger
2017-03-11 06:59:13 -06:00
Matthew Honnibal
d59c6926c1
I think this fixes the segfault
2017-03-11 06:58:34 -06:00
Matthew Honnibal
318b9e32ff
WIP on beam parser. Currently segfaults.
2017-03-11 06:19:52 -06:00
Em
1bb364a3b5
Adding venv to .gitignore
2017-03-10 16:52:04 -08:00
Em
426d17167f
Added string manipulation for spans
2017-03-10 16:50:02 -08:00
Matthew Honnibal
b0d80dc9ae
Update name of 'train' function in BeamParser
2017-03-10 14:35:43 -06:00
Matthew Honnibal
0ed2afde89
Compile beam parser
2017-03-10 11:22:22 -06:00
Matthew Honnibal
d11f1a4ddf
Record negative costs in non-monotonic arc eager oracle
2017-03-10 11:22:04 -06:00
Matthew Honnibal
ecf91a2dbb
Support beam parser
2017-03-10 11:21:21 -06:00
Ines Montani
a16aff17aa
Merge pull request #876 from PySUST/master
...
[Bangla] Update "tokenizer_exceptions.py"
2017-03-10 14:46:00 +01:00
ines
10e29189ac
Adjust URL testcases and xfail problems (instead of comment)
2017-03-10 14:22:50 +01:00
ines
b04893a059
Make regex locale-independent for Python 2
2017-03-10 14:21:57 +01:00
Ines Montani
9019658b40
Update CONTRIBUTORS.md
2017-03-10 13:37:41 +01:00
Matthew Honnibal
ea53647362
Merge branch 'develop'
2017-03-10 02:49:39 -06:00