Commit Graph

4140 Commits

Author SHA1 Message Date
Christos Savvopoulos
c19b83f6ae use model_dir inside of load_model 2016-12-12 20:23:24 +00:00
Christos Savvopoulos
93cf4af701 actually commit load_ner.py 2016-12-12 20:13:33 +00:00
Matthew Honnibal
c4d9ea1186 Merge pull request #679 from savvopoulos/train-ner-update
train_ner should save vocab; add load_ner example
2016-12-13 07:13:30 +11:00
Christos Savvopoulos
ad54a929f8 train_ner should save vocab; add load_ner example 2016-12-12 20:09:49 +00:00
Matthew Honnibal
bf59420b1f Merge pull request #677 from explosion/revert-676-patch-2
Revert "Add acl to symbols.pyx"
2016-12-12 10:15:13 +11:00
Matthew Honnibal
5965d3c2a7 Revert "Add acl to symbols.pyx" 2016-12-12 10:10:28 +11:00
Matthew Honnibal
6dee76dfed Update symbols.pxd 2016-12-12 10:09:58 +11:00
Ines Montani
fe10f9c702 Merge pull request #676 from pokey/patch-2
Add acl to symbols.pyx
2016-12-11 22:06:13 +01:00
Pokey Rule
18a15c0777 Add acl to symbols.pyx 2016-12-11 20:00:07 +00:00
Gyorgy Orosz
0cf2144d24 Adding partial hyphen and quote handling support. 2016-12-11 00:14:36 +01:00
Gyorgy Orosz
2051726fd3 Passing Hungatian abbrev tests. 2016-12-10 23:37:58 +01:00
Ines Montani
61783c5025 Merge pull request #675 from jaspb/patch-1
added 'en' to spacy.load(..)
2016-12-10 20:25:46 +01:00
jaspb
3d7f81ddf5 added 'en' to spacy.load(..) 2016-12-10 19:18:13 +00:00
Ines Montani
63024466a9 Add Portuguese stopwords 2016-12-08 20:45:07 +01:00
Ines Montani
7bfe2d4abc Update Portuguese language data 2016-12-08 20:41:41 +01:00
Ines Montani
c0c5f31950 Remove unused data and download script 2016-12-08 20:39:49 +01:00
Ines Montani
0a6d529104 Remove unused data 2016-12-08 20:36:56 +01:00
Ines Montani
1b3b043660 Add French stopwords 2016-12-08 20:12:43 +01:00
Ines Montani
8863e504eb Update French language data 2016-12-08 20:07:14 +01:00
Ines Montani
7cb9f51be6 Add Italian stopwords 2016-12-08 20:05:25 +01:00
Ines Montani
470a0e0bea Update Italian language data 2016-12-08 19:52:18 +01:00
Ines Montani
1a284d342e Add Spanish language data 2016-12-08 19:47:03 +01:00
Ines Montani
0c39654786 Remove unused import 2016-12-08 19:46:53 +01:00
Ines Montani
e47ee94761 Split punctuation into its own file 2016-12-08 19:46:43 +01:00
Ines Montani
70b51ed7c8 Remove time from German language data 2016-12-08 19:45:50 +01:00
Ines Montani
e8ae588be9 Add emoticons 2016-12-08 19:45:18 +01:00
Ines Montani
5908c0ed9f Fix formatting 2016-12-08 19:45:11 +01:00
Ines Montani
311b30ab35 Reorganize exceptions for English and German 2016-12-08 13:58:32 +01:00
Ines Montani
66c7348cda Add update_exc util function 2016-12-08 13:58:12 +01:00
Ines Montani
1256232fad Fix formatting 2016-12-08 13:56:40 +01:00
Ines Montani
8e977cc71c Fix formatting 2016-12-08 13:56:17 +01:00
Ines Montani
0176b99004 Fix formatting 2016-12-08 12:48:02 +01:00
Ines Montani
877f09218b Add more custom rules for abbreviations 2016-12-08 12:47:01 +01:00
Gyorgy Orosz
0289b8ceaa Additional abbreviation tests. 2016-12-08 12:17:44 +01:00
Gyorgy Orosz
90d22db023 Added Hungarian resource files. 2016-12-08 12:06:36 +01:00
Ines Montani
bfaa42636c Update language data for German 2016-12-08 12:01:09 +01:00
Ines Montani
ec44bee321 Fix capitalization on morphological features 2016-12-08 12:00:54 +01:00
Gyorgy Orosz
5b00039955 First steps towards the Hungarian tokenizer code. 2016-12-07 23:07:43 +01:00
Ines Montani
ce979553df Resolve conflict 2016-12-07 21:16:52 +01:00
Ines Montani
8350d65695 Change morphology and lemmatizer API
Take morphology features as object instead of keyword arguments
2016-12-07 21:12:49 +01:00
Ines Montani
52e7d634df Remove trailing whitespace 2016-12-07 21:12:19 +01:00
Ines Montani
0d07d7fc80 Apply emoticon exceptions to tokenizer 2016-12-07 21:11:59 +01:00
Ines Montani
71f0f34cb3 Fix formatting 2016-12-07 21:11:29 +01:00
Ines Montani
9413bcd9ee Declare encoding and unicode literals 2016-12-07 21:10:34 +01:00
Ines Montani
a280ff2657 Fix __all__ 2016-12-07 21:10:12 +01:00
Ines Montani
ba8721953c Add missing emoticons 2016-12-07 21:09:44 +01:00
Ines Montani
1285c4ba93 Update English language data 2016-12-07 20:33:28 +01:00
Ines Montani
4a1e206064 Remove old lang_data directory 2016-12-07 20:33:28 +01:00
Ines Montani
79dce0aabe Add emoticons 2016-12-07 20:33:28 +01:00
Ines Montani
a662a95294 Add line breaks 2016-12-07 20:33:28 +01:00