Matthew Honnibal
|
4bbd1388bd
|
* Whitespace
|
2015-10-10 16:03:48 +11:00 |
|
Matthew Honnibal
|
bdcb8d695c
|
* Add non-breaking space to specials.json
|
2015-10-10 15:54:06 +11:00 |
|
Matthew Honnibal
|
c12d36d5f4
|
* Fix quote marks in lemma_rules
|
2015-10-10 15:03:36 +11:00 |
|
Matthew Honnibal
|
57b3cd4661
|
* Add smart-quotes to lemma rules
|
2015-10-10 14:06:46 +11:00 |
|
Matthew Honnibal
|
7e7f28e1fd
|
* Add smart-quote possessive marker in generate_specials
|
2015-10-10 14:06:09 +11:00 |
|
Matthew Honnibal
|
a510858f5a
|
* Pretty-print specials.json, and add the em dash
|
2015-10-09 11:07:45 +02:00 |
|
Matthew Honnibal
|
49600a44a8
|
* Fix trailing comma in lemma_rules.json
|
2015-10-09 11:06:57 +02:00 |
|
Matthew Honnibal
|
0e92e8574a
|
* Fix pos tag in em-dash in specials
|
2015-10-09 11:06:37 +02:00 |
|
Matthew Honnibal
|
d341443282
|
* Remove em-dash from lemma rules. Handle instead in specials.
|
2015-10-09 10:27:13 +02:00 |
|
Matthew Honnibal
|
b6047afe4c
|
* Fix punctuation lemma rules, to resolve Issue #130
|
2015-10-09 10:25:37 +02:00 |
|
Matthew Honnibal
|
393a13d1af
|
* Add unicode em dash to specials.json, so that we can control what POS tag it gets. This way we can prevent sentence boundary detection errors, to address Issue #130.
|
2015-10-09 19:24:33 +11:00 |
|
Matthew Honnibal
|
1490feda29
|
* Make generate_specials pretty-print the specials.json file
|
2015-10-09 19:23:47 +11:00 |
|
Matthew Honnibal
|
1842a53e73
|
* Lemmatize smart quotes as plain quotes
|
2015-10-09 19:09:36 +11:00 |
|
Matthew Honnibal
|
5332c0b697
|
* Add support for punctuation lemmatization, to handle unicode characters. This should help in addressing Issue #130
|
2015-10-09 18:54:40 +11:00 |
|
Matthew Honnibal
|
095831e5bf
|
* Start adding auxiliaries to morphs.json
|
2015-09-27 16:56:34 +10:00 |
|
Matthew Honnibal
|
c579b6b96c
|
* Update English morphs.json
|
2015-09-24 22:38:41 +10:00 |
|
Matthew Honnibal
|
3b3547251c
|
* Fix Issue #102: DT tag was mapped to DET.
|
2015-09-24 18:38:47 +10:00 |
|
Matthew Honnibal
|
be4848fbcb
|
* Update morphs.json with universal dependencies/interset morphological features
|
2015-09-24 00:59:42 +10:00 |
|
Henning Peters
|
911de2ae49
|
add overseen (?) char
|
2015-09-22 12:29:47 +02:00 |
|
Matthew Honnibal
|
b9e31dc245
|
* Bug fix to gazetteer.json
|
2015-09-10 14:50:44 +02:00 |
|
Matthew Honnibal
|
623329b19a
|
Merge branch 'master' of ssh://github.com/honnibal/spaCy into develop
|
2015-09-08 14:27:01 +02:00 |
|
Matthew Honnibal
|
86c888667f
|
* Merge in changes from de branch
|
2015-09-06 19:49:28 +02:00 |
|
Matthew Honnibal
|
b3703836f9
|
* Add en lemma rules
|
2015-09-06 17:56:11 +02:00 |
|
Matthew Honnibal
|
c9f2082e3c
|
* Fix compilation error in en/tag_map.json
|
2015-09-06 17:54:51 +02:00 |
|
Matthew Honnibal
|
0af139e183
|
* Tagger training now working. Still need to test load/save of model. Morphology still broken.
|
2015-08-27 09:16:11 +02:00 |
|
Matthew Honnibal
|
56c4e07a59
|
Update gazetteer.json
|
2015-08-27 08:53:48 +10:00 |
|
Matthew Honnibal
|
494da25872
|
* Refactor for more universal spacy
|
2015-08-26 19:13:50 +02:00 |
|
jxs8172
|
85f01c5e16
|
Add contributor agreement. Add exception to 'it' so that 'its' and 'Its' isn't generated (its =/= it's)
|
2015-08-24 18:20:06 -04:00 |
|
jxs8172
|
5876248109
|
Add missing we've and hardcoded 's and 'S
|
2015-08-21 22:57:47 -04:00 |
|
jxs8172
|
a5e0a0073b
|
Add a script to generate the specials.json file, to take care of handling uppercase and missing apostrophe contractions
|
2015-08-21 22:39:33 -04:00 |
|
Matthew Honnibal
|
b27bd18d6e
|
* Add spaCy to gazetteer
|
2015-08-08 23:30:49 +02:00 |
|
Matthew Honnibal
|
855af087fc
|
* Fix gazetteer.json
|
2015-08-06 17:27:51 +02:00 |
|
Matthew Honnibal
|
0e098815cc
|
* Expand gazetteer with some of the errors from the reddit parse
|
2015-08-06 17:13:27 +02:00 |
|
Matthew Honnibal
|
6fcc3df989
|
* Expand gazetteer with some of the errors from the reddit parse
|
2015-08-06 17:11:00 +02:00 |
|
Matthew Honnibal
|
832896ea6c
|
* Add html to gazetteer
|
2015-08-06 16:36:54 +02:00 |
|
Matthew Honnibal
|
5c3c962038
|
* Add html to gazetteer
|
2015-08-06 16:34:51 +02:00 |
|
Matthew Honnibal
|
91a94e152b
|
* Make initial gazetteer
|
2015-08-06 16:10:04 +02:00 |
|
Matthew Honnibal
|
af84669306
|
* Add smart-quote possessive marker to tokenizer
|
2015-07-30 05:12:48 +02:00 |
|
Matthew Honnibal
|
3cfe3d8c1c
|
* Revert bad infix change
|
2015-07-26 17:32:37 +02:00 |
|
Matthew Honnibal
|
bd608559bc
|
* Fix infix-period tokenization
|
2015-07-26 17:14:52 +02:00 |
|
Matthew Honnibal
|
94f314c271
|
* Fix tokenization of email addresses.
|
2015-07-26 16:38:08 +02:00 |
|
Matthew Honnibal
|
6c01e01f12
|
* Fix some casing problems in specials.json
|
2015-07-26 01:38:29 +02:00 |
|
Matthew Honnibal
|
9cae1b4cad
|
* Restore accidentally clobbered updates to specials.json
|
2015-07-20 12:19:46 +02:00 |
|
Matthew Honnibal
|
14e9e6ec6c
|
* Fix ... tokenization, and correct orth inconsistencies in specials.json
|
2015-07-20 12:10:56 +02:00 |
|
Matthew Honnibal
|
d8bc279e0c
|
* Fix 'you' contraction capitals in specials.json
|
2015-07-16 01:28:32 +02:00 |
|
Matthew Honnibal
|
3c1e3e9ee8
|
* Fix capitalization problems in specials.json
|
2015-07-14 23:46:31 +02:00 |
|
Matthew Honnibal
|
b5223c4824
|
* Add whitespace to specials.json
|
2015-07-09 13:31:12 +02:00 |
|
Matthew Honnibal
|
aba0257894
|
* Add lemma rule for better and best in morphs.json
|
2015-06-28 09:26:25 +02:00 |
|
Matthew Honnibal
|
b5b869366b
|
* Adjust hyphenation rule in tokenizer
|
2015-06-28 06:18:58 +02:00 |
|
Matthew Honnibal
|
45ec92243a
|
* Add hyphenation rule to infix.txt for tokenizer
|
2015-06-06 05:56:00 +02:00 |
|