Matthew Honnibal
e85dd038fe
Merge remote-tracking branch 'origin/master' into feature/single-thread
2018-03-16 02:41:11 +01:00
Matthew Honnibal
e3be3d65b3
Version as v2.0.10.dev0
2018-03-15 17:31:22 +01:00
ines
f3f8bfc367
Add built-in factories for merge_entities and merge_noun_chunks
...
Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).
2018-03-15 17:16:54 +01:00
Ines Montani
0d17377e8b
Merge pull request #2095 from DuyguA/quick-typo-fix ( resolves #2063 )
...
Quick typo fix
2018-03-15 00:29:56 +01:00
ines
d854f69fe3
Add built-in factories for merge_entities and merge_noun_chunks
...
Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).
2018-03-15 00:18:51 +01:00
ines
9ad5df41fe
Fix whitespace
2018-03-15 00:11:18 +01:00
Matthew Honnibal
d7ce6527fb
Use increasing batch sizes in ud-train
2018-03-14 20:15:28 +01:00
alldefector
f4e5904fc2
Fix Spanish noun_chunks failure caused by typo
2018-03-14 17:03:17 +01:00
Thomas Opsomer
fbf48b3f9f
lemma property to return hash instead of unicode
2018-03-14 17:03:00 +01:00
Matthew Honnibal
8cefc58abc
Fix Vectors pickling
2018-03-14 16:59:37 +01:00
DuyguA
be4f6da16b
maybe not a good idea to remove also
2018-03-14 14:47:24 +01:00
DuyguA
1a513f71e3
removed also from lookup
2018-03-14 11:57:15 +01:00
DuyguA
cca66abf1e
quick typo fix
2018-03-14 11:34:22 +01:00
Matthew Honnibal
7b755414eb
Update call into thinc
2018-03-13 13:59:59 +01:00
Matthew Honnibal
e101f10ef0
Fix header
2018-03-13 02:12:16 +01:00
Matthew Honnibal
952c87409e
Use openblas.sgemm in parser
2018-03-13 02:12:01 +01:00
Matthew Honnibal
d55620041b
Switch parser to gemm from thinc.openblas
2018-03-13 02:10:58 +01:00
Matthew Honnibal
c2f4759257
Fix test for Python 2
2018-03-12 23:03:05 +01:00
Matthew Honnibal
9aeec9c242
Increment dev version
2018-03-11 01:58:21 +01:00
Matthew Honnibal
f49d71fa7c
Merge branch 'master' of https://github.com/explosion/spaCy
2018-03-11 01:27:17 +01:00
Matthew Honnibal
5dddb30e5b
Fix ud-train script
2018-03-11 01:26:45 +01:00
Matthew Honnibal
e42960bd14
Merge pull request #2012 from alldefector/patch-1
...
Fix Spanish noun_chunks failure caused by typo
2018-03-11 01:05:19 +01:00
Matthew Honnibal
2cab4d6517
Remove use of attr module in ud_train
2018-03-11 00:59:39 +01:00
Matthew Honnibal
fa9fd21620
Increment dev version
2018-03-11 00:41:54 +01:00
Matthew Honnibal
53b3249e06
Add tests for arc eager oracle
2018-03-10 23:42:56 +01:00
Matthew Honnibal
754ea1b2f7
Link in spaCy CoNLL commands
2018-03-10 23:42:15 +01:00
Matthew Honnibal
3478ea76d1
Add ud_train and ud_evaluate CLI commands
2018-03-10 23:41:55 +01:00
Matthew Honnibal
4b72c38556
Fix dropout bug in beam parser
2018-03-10 23:16:40 +01:00
Matthew Honnibal
9cc202d670
Fix Vectors pickling
2018-03-10 22:53:42 +01:00
Matthew Honnibal
3d6487c734
Support dropout in beam parse
2018-03-10 22:41:55 +01:00
Matthew Honnibal
31b156d60b
Fix itershuffle
2018-03-10 22:32:59 +01:00
Matthew Honnibal
b59765ca9f
Stream gold during spacy train
2018-03-10 22:32:45 +01:00
Matthew Honnibal
c3d168509a
Stream the gold data during training, to reduce memory
2018-03-10 22:32:32 +01:00
DuyguA
cba63196f9
fixed typo
2018-03-09 10:54:18 +01:00
DuyguA
7a780476af
added more abbreviations
2018-03-09 10:13:00 +01:00
DuyguA
cca87756d7
added Sti
2018-03-08 18:07:52 +01:00
DuyguA
3c994311c5
added abbrevs
2018-03-08 18:03:27 +01:00
DuyguA
56d6fb180e
added like_num to lex
2018-03-08 15:25:25 +01:00
DuyguA
26ee0590a3
added some commonly used cases
2018-03-08 12:43:58 +01:00
DuyguA
ae6473e4d5
removed some words with negation particle.
2018-03-08 12:20:32 +01:00
DuyguA
6ed59a2198
removed number words to be caried to the lexical
2018-03-08 12:19:23 +01:00
DuyguA
04784a44a6
made alphabetical order for Turkish chaaracters
2018-03-08 12:11:32 +01:00
DuyguA
af33e022a5
added example sentences for Turkish
2018-03-08 12:06:03 +01:00
Matthew Honnibal
a1be01185c
Fix array out of bounds error in Span
2018-02-28 12:27:09 +01:00
Thomas Opsomer
8df9e52829
lemma property to return hash instead of unicode
2018-02-27 19:50:01 +01:00
Ines Montani
35634352fe
Merge pull request #2025 from dejanmarich/patch-1
...
Update stop_words.py for Croatian language
2018-02-26 18:22:32 +01:00
Matthew Honnibal
14f729c72a
Add subtok label to parser
2018-02-26 12:26:35 +01:00
Matthew Honnibal
7137ad8b0b
Make label filtering clearer for projectivisation
2018-02-26 12:02:01 +01:00
Matthew Honnibal
b8d52cb285
Fix inconsistent label freq cutoff for projectivisation
2018-02-26 12:01:44 +01:00
Matthew Honnibal
7b66ec896a
Revert "Revert "Improve parser oracle around sentence breaks.""
...
This reverts commit 36e481c584
.
2018-02-26 10:57:37 +01:00
Matthew Honnibal
36e481c584
Revert "Improve parser oracle around sentence breaks."
...
This reverts commit 50817dc9ad
.
2018-02-26 10:53:55 +01:00
Matthew Honnibal
5faae803c6
Add option to not use Janome for Japanese tokenization
2018-02-26 09:39:46 +01:00
Matthew Honnibal
9b406181cd
Add Chinese.Defaults.use_jieba setting, for UD
2018-02-25 15:12:38 +01:00
Matthew Honnibal
9ccd0c643b
Add Vietnamese
2018-02-25 15:00:46 +01:00
Matthew Honnibal
d4fdb97c87
Fix alignment for words with spaces
2018-02-25 14:55:00 +01:00
Matthew Honnibal
6d2c1ef52c
Fix SP tag in generic tag map
2018-02-24 16:04:56 +01:00
Matthew Honnibal
5cc3bd1c1d
Update alignment tests
2018-02-24 16:03:58 +01:00
Matthew Honnibal
6138439469
Fix many-to-one alignment
2018-02-24 16:03:50 +01:00
Matthew Honnibal
4890ee1732
Fix scoring of tokenization for punct
2018-02-24 10:32:32 +01:00
Matthew Honnibal
12b39f87da
Move cython declarations in matcher.pyx
2018-02-24 10:32:18 +01:00
Matthew Honnibal
01d1b7abdf
Support many-to-one alignment in GoldParse
2018-02-24 10:17:01 +01:00
Matthew Honnibal
7865746574
Support many-to-one alignment
2018-02-24 02:09:53 +01:00
Matthew Honnibal
458710b831
Poke matcher test for appveyor
2018-02-23 23:53:48 +01:00
Matthew Honnibal
968dabdde4
Fix bug in multi-task objective
2018-02-23 23:48:09 +01:00
Matthew Honnibal
2c9c8b8d72
Try comming out emoji test in matcher
2018-02-23 23:34:35 +01:00
Matthew Honnibal
980ad68cbe
Try to find test that fails on appveyor
2018-02-23 21:27:53 +01:00
Matthew Honnibal
39de8cd4d3
Try to find test failing on appveyor
2018-02-23 20:59:21 +01:00
Matthew Honnibal
4492a33a9d
Fix sent_start multi-task objective when alignment fails
2018-02-23 16:50:59 +01:00
Matthew Honnibal
5fa44e93f1
Set unicode_literals in matcher
2018-02-23 16:48:54 +01:00
Matthew Honnibal
12264f9296
Add multi-task objective for sentence segmentation
2018-02-23 16:25:57 +01:00
Matthew Honnibal
e7deadb519
Set version to 2.1.0.dev1
2018-02-23 16:22:24 +01:00
Matthew Honnibal
7b575a119e
Try to reduce memory usage of test_matcher
2018-02-23 15:34:37 +01:00
Matthew Honnibal
24563f4026
Fix data typing in align
2018-02-23 15:08:06 +01:00
Matthew Honnibal
7a5ba20692
Fix integer typing in _align
2018-02-23 14:51:24 +01:00
Matthew Honnibal
875411b875
Set unicode types in _align.pyx and test
2018-02-23 14:35:38 +01:00
Matthew Honnibal
51d9679aa3
Fix broken span.as_doc test
2018-02-23 14:22:24 +01:00
dejanmarich
71c261d58b
Update stop_words.py
...
Added more words
2018-02-23 10:31:01 +01:00
Matthew Honnibal
3e6c1111b7
Remove obsolete test
2018-02-23 03:22:07 +01:00
Matthew Honnibal
a4fdec524a
Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-gold
2018-02-22 21:44:28 +01:00
Matthew Honnibal
50817dc9ad
Improve parser oracle around sentence breaks.
2018-02-22 19:22:26 +01:00
Matthew Honnibal
307aefe131
Increment version to v2.0.9
2018-02-22 17:07:53 +01:00
Feng Niu
1c60384bed
return on empty doc
2018-02-21 15:39:04 -08:00
Feng Niu
7eb1cd100b
unbound doc var
2018-02-21 15:05:37 -08:00
Feng Niu
8df75b229c
fix unbound vars in es.syntax_iterators
2018-02-21 13:11:17 -08:00
alldefector
4244e285c2
Fix Spanish noun_chunks failure caused by typo
2018-02-21 12:43:21 -08:00
Matthew Honnibal
661873ee4c
Randomize the rebatch size in parser
2018-02-21 21:02:07 +01:00
Matthew Honnibal
0872cf611d
Don't lower-case lemmas of proper nouns
2018-02-21 16:01:16 +01:00
Matthew Honnibal
a0ddb803fd
Make error when no label found more helpful
2018-02-21 16:00:59 +01:00
Matthew Honnibal
ea2fc5d45f
Improve length and freq cutoffs in parser
2018-02-21 16:00:38 +01:00
Matthew Honnibal
e5757d4bf0
Add labels property to parser
2018-02-21 16:00:00 +01:00
Matthew Honnibal
eff4ae809a
Fix nonproj label filter
2018-02-21 15:59:04 +01:00
Matthew Honnibal
e624405cda
Temporarily remove cutoff when filtering labels in nonproj
2018-02-21 13:53:40 +01:00
Matthew Honnibal
f466f0186e
Use new alignment implementation in GoldParse
2018-02-20 21:16:35 +01:00
Matthew Honnibal
c0734ba526
Make alignment work with strings
2018-02-20 17:51:49 +01:00
Matthew Honnibal
8180c84a98
Add tests for new Levenshtein alignment
2018-02-20 17:32:25 +01:00
Matthew Honnibal
930c980570
Add improved Levenshtein alignment implementation
2018-02-20 17:31:56 +01:00
Ines Montani
14e7e0f12a
Merge pull request #2000 from jimregan/polish-tag-map
...
Polish tag map
2018-02-18 19:05:58 +01:00
Jim O'Regan
664407de5d
missing PrepCase attribute
2018-02-18 14:46:12 +00:00
Jim O'Regan
95f0673fbc
fix typo/missing here too
2018-02-18 14:38:27 +00:00
Matthew Honnibal
2bccad8815
Fix incorrect matcher test
2018-02-18 14:56:12 +01:00