Commit Graph

8333 Commits

Author SHA1 Message Date
Matthew Honnibal
14f729c72a Add subtok label to parser 2018-02-26 12:26:35 +01:00
Matthew Honnibal
7137ad8b0b Make label filtering clearer for projectivisation 2018-02-26 12:02:01 +01:00
Matthew Honnibal
b8d52cb285 Fix inconsistent label freq cutoff for projectivisation 2018-02-26 12:01:44 +01:00
Matthew Honnibal
7b66ec896a Revert "Revert "Improve parser oracle around sentence breaks.""
This reverts commit 36e481c584.
2018-02-26 10:57:37 +01:00
Matthew Honnibal
36e481c584 Revert "Improve parser oracle around sentence breaks."
This reverts commit 50817dc9ad.
2018-02-26 10:53:55 +01:00
Matthew Honnibal
f0478635df Fix Japanese tokenizer flag 2018-02-26 10:32:12 +01:00
Matthew Honnibal
5faae803c6 Add option to not use Janome for Japanese tokenization 2018-02-26 09:39:46 +01:00
Matthew Honnibal
9b406181cd Add Chinese.Defaults.use_jieba setting, for UD 2018-02-25 15:12:38 +01:00
Matthew Honnibal
9ccd0c643b Add Vietnamese 2018-02-25 15:00:46 +01:00
Matthew Honnibal
d4fdb97c87 Fix alignment for words with spaces 2018-02-25 14:55:00 +01:00
Matthew Honnibal
9e960d24fc Refactor conllu script, fix interface, generalize 2018-02-25 14:54:47 +01:00
Matthew Honnibal
551c93fe01 Shuffle data after each epoch. Improve script 2018-02-25 13:35:32 +01:00
Matthew Honnibal
bdb0174571 Update conllu training script 2018-02-25 13:12:39 +01:00
Matthew Honnibal
e09070eca7 Refactor conllu script 2018-02-25 12:50:29 +01:00
Matthew Honnibal
44e496a82e Refactor conllu script 2018-02-25 12:48:22 +01:00
Matthew Honnibal
c388833ca6 Minibatch by number of tokens, support other vectors, refactor CoNLL printing 2018-02-25 10:38:06 +01:00
Matthew Honnibal
dd78ef066a Unset data size limit in conll script 2018-02-24 18:14:57 +01:00
Matthew Honnibal
6d2c1ef52c Fix SP tag in generic tag map 2018-02-24 16:04:56 +01:00
Matthew Honnibal
8adeea3746 Generalize conllu script. Now handling Chinese (maybe badly) 2018-02-24 16:04:27 +01:00
Matthew Honnibal
5cc3bd1c1d Update alignment tests 2018-02-24 16:03:58 +01:00
Matthew Honnibal
6138439469 Fix many-to-one alignment 2018-02-24 16:03:50 +01:00
Matthew Honnibal
4890ee1732 Fix scoring of tokenization for punct 2018-02-24 10:32:32 +01:00
Matthew Honnibal
12b39f87da Move cython declarations in matcher.pyx 2018-02-24 10:32:18 +01:00
Matthew Honnibal
329b14c9e6 Clean up conllu script 2018-02-24 10:31:53 +01:00
Matthew Honnibal
01d1b7abdf Support many-to-one alignment in GoldParse 2018-02-24 10:17:01 +01:00
Matthew Honnibal
7865746574 Support many-to-one alignment 2018-02-24 02:09:53 +01:00
Matthew Honnibal
458710b831 Poke matcher test for appveyor 2018-02-23 23:53:48 +01:00
Matthew Honnibal
5be092ee72 CONLLU scoring 80.9% UAS with no oracle segments 2018-02-23 23:49:17 +01:00
Matthew Honnibal
968dabdde4 Fix bug in multi-task objective 2018-02-23 23:48:09 +01:00
Matthew Honnibal
2c9c8b8d72 Try comming out emoji test in matcher 2018-02-23 23:34:35 +01:00
Matthew Honnibal
980ad68cbe Try to find test that fails on appveyor 2018-02-23 21:27:53 +01:00
Matthew Honnibal
39de8cd4d3 Try to find test failing on appveyor 2018-02-23 20:59:21 +01:00
Matthew Honnibal
4492a33a9d Fix sent_start multi-task objective when alignment fails 2018-02-23 16:50:59 +01:00
Matthew Honnibal
5fa44e93f1 Set unicode_literals in matcher 2018-02-23 16:48:54 +01:00
Matthew Honnibal
12264f9296 Add multi-task objective for sentence segmentation 2018-02-23 16:25:57 +01:00
Matthew Honnibal
e7deadb519 Set version to 2.1.0.dev1 2018-02-23 16:22:24 +01:00
Matthew Honnibal
7b575a119e Try to reduce memory usage of test_matcher 2018-02-23 15:34:37 +01:00
Matthew Honnibal
24563f4026 Fix data typing in align 2018-02-23 15:08:06 +01:00
Matthew Honnibal
7a5ba20692 Fix integer typing in _align 2018-02-23 14:51:24 +01:00
Matthew Honnibal
875411b875 Set unicode types in _align.pyx and test 2018-02-23 14:35:38 +01:00
Matthew Honnibal
51d9679aa3 Fix broken span.as_doc test 2018-02-23 14:22:24 +01:00
Matthew Honnibal
92892cbfee Try to reduce appveyor memory usage 2018-02-23 13:48:05 +01:00
Matthew Honnibal
dd3ebe4931
Merge pull request #2019 from explosion/feature/better-gold
Make Levenshtein alignment faster, bug fixes to parser, add UD parsing script
2018-02-23 04:41:26 +01:00
Matthew Honnibal
3e6c1111b7 Remove obsolete test 2018-02-23 03:22:07 +01:00
Matthew Honnibal
6b30dbd736
Merge pull request #1999 from explosion/feature/better-faster-matcher
Improved Matcher engine
2018-02-22 21:50:05 +01:00
Matthew Honnibal
331904fa9c Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-faster-matcher 2018-02-22 21:47:10 +01:00
Matthew Honnibal
a4fdec524a Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-gold 2018-02-22 21:44:28 +01:00
Matthew Honnibal
23236340f4 Update CoNLL script. Don't preset SBD. Set batch size to 8, avoid writing twice 2018-02-22 21:35:50 +01:00
Matthew Honnibal
a26e399f84 Update conllu script 2018-02-22 19:43:54 +01:00
ines
9c8a0f6eba Version-lock msgpack-python (see #2015) 2018-02-22 19:42:03 +01:00