Matthew Honnibal
|
36e481c584
|
Revert "Improve parser oracle around sentence breaks."
This reverts commit 50817dc9ad .
|
2018-02-26 10:53:55 +01:00 |
|
Matthew Honnibal
|
f0478635df
|
Fix Japanese tokenizer flag
|
2018-02-26 10:32:12 +01:00 |
|
Matthew Honnibal
|
5faae803c6
|
Add option to not use Janome for Japanese tokenization
|
2018-02-26 09:39:46 +01:00 |
|
Matthew Honnibal
|
9b406181cd
|
Add Chinese.Defaults.use_jieba setting, for UD
|
2018-02-25 15:12:38 +01:00 |
|
Matthew Honnibal
|
9ccd0c643b
|
Add Vietnamese
|
2018-02-25 15:00:46 +01:00 |
|
Matthew Honnibal
|
d4fdb97c87
|
Fix alignment for words with spaces
|
2018-02-25 14:55:00 +01:00 |
|
Matthew Honnibal
|
9e960d24fc
|
Refactor conllu script, fix interface, generalize
|
2018-02-25 14:54:47 +01:00 |
|
Matthew Honnibal
|
551c93fe01
|
Shuffle data after each epoch. Improve script
|
2018-02-25 13:35:32 +01:00 |
|
Matthew Honnibal
|
bdb0174571
|
Update conllu training script
|
2018-02-25 13:12:39 +01:00 |
|
Matthew Honnibal
|
e09070eca7
|
Refactor conllu script
|
2018-02-25 12:50:29 +01:00 |
|
Matthew Honnibal
|
44e496a82e
|
Refactor conllu script
|
2018-02-25 12:48:22 +01:00 |
|
Matthew Honnibal
|
c388833ca6
|
Minibatch by number of tokens, support other vectors, refactor CoNLL printing
|
2018-02-25 10:38:06 +01:00 |
|
Matthew Honnibal
|
dd78ef066a
|
Unset data size limit in conll script
|
2018-02-24 18:14:57 +01:00 |
|
Matthew Honnibal
|
6d2c1ef52c
|
Fix SP tag in generic tag map
|
2018-02-24 16:04:56 +01:00 |
|
Matthew Honnibal
|
8adeea3746
|
Generalize conllu script. Now handling Chinese (maybe badly)
|
2018-02-24 16:04:27 +01:00 |
|
Matthew Honnibal
|
5cc3bd1c1d
|
Update alignment tests
|
2018-02-24 16:03:58 +01:00 |
|
Matthew Honnibal
|
6138439469
|
Fix many-to-one alignment
|
2018-02-24 16:03:50 +01:00 |
|
Matthew Honnibal
|
4890ee1732
|
Fix scoring of tokenization for punct
|
2018-02-24 10:32:32 +01:00 |
|
Matthew Honnibal
|
12b39f87da
|
Move cython declarations in matcher.pyx
|
2018-02-24 10:32:18 +01:00 |
|
Matthew Honnibal
|
329b14c9e6
|
Clean up conllu script
|
2018-02-24 10:31:53 +01:00 |
|
Matthew Honnibal
|
01d1b7abdf
|
Support many-to-one alignment in GoldParse
|
2018-02-24 10:17:01 +01:00 |
|
Matthew Honnibal
|
7865746574
|
Support many-to-one alignment
|
2018-02-24 02:09:53 +01:00 |
|
Matthew Honnibal
|
458710b831
|
Poke matcher test for appveyor
|
2018-02-23 23:53:48 +01:00 |
|
Matthew Honnibal
|
5be092ee72
|
CONLLU scoring 80.9% UAS with no oracle segments
|
2018-02-23 23:49:17 +01:00 |
|
Matthew Honnibal
|
968dabdde4
|
Fix bug in multi-task objective
|
2018-02-23 23:48:09 +01:00 |
|
Matthew Honnibal
|
2c9c8b8d72
|
Try comming out emoji test in matcher
|
2018-02-23 23:34:35 +01:00 |
|
Matthew Honnibal
|
980ad68cbe
|
Try to find test that fails on appveyor
|
2018-02-23 21:27:53 +01:00 |
|
Matthew Honnibal
|
39de8cd4d3
|
Try to find test failing on appveyor
|
2018-02-23 20:59:21 +01:00 |
|
Matthew Honnibal
|
4492a33a9d
|
Fix sent_start multi-task objective when alignment fails
|
2018-02-23 16:50:59 +01:00 |
|
Matthew Honnibal
|
5fa44e93f1
|
Set unicode_literals in matcher
|
2018-02-23 16:48:54 +01:00 |
|
Matthew Honnibal
|
12264f9296
|
Add multi-task objective for sentence segmentation
|
2018-02-23 16:25:57 +01:00 |
|
Matthew Honnibal
|
e7deadb519
|
Set version to 2.1.0.dev1
|
2018-02-23 16:22:24 +01:00 |
|
Matthew Honnibal
|
7b575a119e
|
Try to reduce memory usage of test_matcher
|
2018-02-23 15:34:37 +01:00 |
|
Matthew Honnibal
|
24563f4026
|
Fix data typing in align
|
2018-02-23 15:08:06 +01:00 |
|
Matthew Honnibal
|
7a5ba20692
|
Fix integer typing in _align
|
2018-02-23 14:51:24 +01:00 |
|
Matthew Honnibal
|
875411b875
|
Set unicode types in _align.pyx and test
|
2018-02-23 14:35:38 +01:00 |
|
Matthew Honnibal
|
51d9679aa3
|
Fix broken span.as_doc test
|
2018-02-23 14:22:24 +01:00 |
|
Matthew Honnibal
|
92892cbfee
|
Try to reduce appveyor memory usage
|
2018-02-23 13:48:05 +01:00 |
|
dejanmarich
|
71c261d58b
|
Update stop_words.py
Added more words
|
2018-02-23 10:31:01 +01:00 |
|
Matthew Honnibal
|
dd3ebe4931
|
Merge pull request #2019 from explosion/feature/better-gold
Make Levenshtein alignment faster, bug fixes to parser, add UD parsing script
|
2018-02-23 04:41:26 +01:00 |
|
Matthew Honnibal
|
3e6c1111b7
|
Remove obsolete test
|
2018-02-23 03:22:07 +01:00 |
|
Matthew Honnibal
|
6b30dbd736
|
Merge pull request #1999 from explosion/feature/better-faster-matcher
Improved Matcher engine
|
2018-02-22 21:50:05 +01:00 |
|
Matthew Honnibal
|
331904fa9c
|
Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-faster-matcher
|
2018-02-22 21:47:10 +01:00 |
|
Matthew Honnibal
|
a4fdec524a
|
Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-gold
|
2018-02-22 21:44:28 +01:00 |
|
Matthew Honnibal
|
23236340f4
|
Update CoNLL script. Don't preset SBD. Set batch size to 8, avoid writing twice
|
2018-02-22 21:35:50 +01:00 |
|
Matthew Honnibal
|
a26e399f84
|
Update conllu script
|
2018-02-22 19:43:54 +01:00 |
|
ines
|
9c8a0f6eba
|
Version-lock msgpack-python (see #2015)
|
2018-02-22 19:42:03 +01:00 |
|
Matthew Honnibal
|
50817dc9ad
|
Improve parser oracle around sentence breaks.
|
2018-02-22 19:22:26 +01:00 |
|
Matthew Honnibal
|
001e2ec6d6
|
Refactor CoNLL training script
|
2018-02-22 16:00:34 +01:00 |
|
ines
|
8c09850354
|
Version-lock msgpack-python (see #2015)
|
2018-02-22 13:25:52 +01:00 |
|