ines
417d45f5d0
Add lemmatizer data as variable on language data
...
Don't create lookup lemmatizer within Language class and just pass in
the data so it can be set on Token creation
2017-10-11 02:24:58 +02:00
ines
0c2343d73a
Tidy up language data
2017-10-11 02:22:49 +02:00
Matthew Honnibal
73bca3d382
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-10 12:51:37 -05:00
Matthew Honnibal
5156074df1
Make loading code more consistent in train command
2017-10-10 12:51:20 -05:00
Matthew Honnibal
d70fba6807
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-10 19:33:10 +02:00
Matthew Honnibal
8143618497
Set prefix length back to 1
2017-10-10 19:32:54 +02:00
Matthew Honnibal
97c9b5db8b
Patch spacy.train for new pipeline management
2017-10-09 23:41:16 -05:00
Matthew Honnibal
a635240398
Add conll_ner2json converter
2017-10-09 22:03:26 -05:00
Matthew Honnibal
dce8afb9cf
Set prefix length to 3
2017-10-09 21:55:55 -05:00
Matthew Honnibal
8265b90c83
Update parser defaults
2017-10-09 21:55:20 -05:00
Matthew Honnibal
dd2b0601d1
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-09 21:30:46 -05:00
Matthew Honnibal
09d61ada5e
Merge pull request #1396 from explosion/feature/pipeline-management
...
💫 Improve pipeline and factory management
2017-10-10 04:29:54 +02:00
Matthew Honnibal
19136fd155
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-10 03:58:30 +02:00
Matthew Honnibal
8978212ee5
Patch serialization bug raised in #1105
2017-10-10 03:58:12 +02:00
Matthew Honnibal
f0f2739ae3
Add test for serialization issue raised in #1105
2017-10-10 03:57:58 +02:00
Matthew Honnibal
735d18654d
Add NER converter for CoNLL 2003 data
2017-10-09 20:06:28 -05:00
Matthew Honnibal
808d8740d6
Remove print statement
2017-10-09 08:45:20 -05:00
Matthew Honnibal
0f41b25f60
Add speed benchmarks to metadata
2017-10-09 08:05:37 -05:00
Matthew Honnibal
d8a2506023
Merge pull request #1401 from explosion/feature/add-parser-action
...
💫 Allow labels to be added to pre-trained parser and NER modes
2017-10-09 04:57:51 +02:00
Matthew Honnibal
689349e32f
Merge pull request #1400 from explosion/feature/sentence-parsing
...
💫 Force parser to respect preset sentence boundaries
2017-10-09 04:31:43 +02:00
Matthew Honnibal
e79fc41ff8
Merge pull request #1391 from explosion/feature/multilabel-textcat
...
💫 Fix multi-label support for text classification
2017-10-09 04:22:31 +02:00
Matthew Honnibal
fad2b8315f
Merge branch 'develop' into feature/add-parser-action
2017-10-09 04:13:04 +02:00
Matthew Honnibal
6c79841c0d
Fix tests for history features
2017-10-09 04:12:24 +02:00
Matthew Honnibal
dde87e6b0d
Add tests for adding parser actions
2017-10-09 03:42:35 +02:00
Matthew Honnibal
b2b8506f2c
Remove whitespace
2017-10-09 03:35:57 +02:00
Matthew Honnibal
d43a83e37a
Allow parser.add_label for pretrained models
2017-10-09 03:35:40 +02:00
Matthew Honnibal
81a64119db
Fix string-to-unicode problem
2017-10-09 00:59:49 +02:00
Matthew Honnibal
02c2af7119
Fix test
2017-10-09 00:29:37 +02:00
Matthew Honnibal
4cc84b0234
Prohibit Break when sent_start < 0
2017-10-09 00:02:45 +02:00
Matthew Honnibal
5a67efeccc
Add tests for sentence segmentation presetting
2017-10-09 00:02:23 +02:00
Matthew Honnibal
e938bce320
Adjust parsing transition system to allow preset sentence segments.
2017-10-08 23:53:34 +02:00
Matthew Honnibal
080afd4924
Add ternary value setting to Token.sent_start
2017-10-08 23:51:58 +02:00
Matthew Honnibal
7ae67ec6a1
Add Span.as_doc method
2017-10-08 23:50:20 +02:00
Matthew Honnibal
20309fb9db
Make history features default to zero
2017-10-08 20:32:14 +02:00
Matthew Honnibal
e74c8d2fad
Merge remote-tracking branch 'origin/develop' into feature/sentence-parsing
2017-10-08 20:20:41 +02:00
Matthew Honnibal
18063803de
Make TokenC.sent_tart an int, to allow ternary value
2017-10-08 19:58:54 +02:00
Matthew Honnibal
be4f0b6460
Update defaults
2017-10-08 02:08:12 -05:00
Matthew Honnibal
42b401d08b
Change default hidden depth to 1
2017-10-07 21:05:21 -05:00
Matthew Honnibal
9d66a915da
Update training defaults
2017-10-07 21:02:38 -05:00
Matthew Honnibal
d163115e91
Add non-linearity after history features
2017-10-07 21:00:43 -05:00
Matthew Honnibal
92c5d78b42
Unhack NER.add_action
2017-10-07 19:02:40 +02:00
Matthew Honnibal
f2b590f672
Increment version
2017-10-07 19:01:01 +02:00
Matthew Honnibal
eb0595bea9
Merge pull request #1392 from explosion/feature/parser-history-model
...
💫 Parser history features
2017-10-07 15:07:02 +02:00
Matthew Honnibal
3d22ccf495
Update default hyper-parameters
2017-10-07 07:16:41 -05:00
Matthew Honnibal
09442d25ec
Merge remote-tracking branch 'origin/develop' into feature/parser-history-model
2017-10-07 07:05:04 -05:00
Matthew Honnibal
3b67eabfea
Allow empty dictionaries to match any token in Matcher
...
Often patterns need to match "any token". A clean way to denote this
is with the empty dict {}: this sets no constraints on the token,
so should always match.
The problem was that having attributes length==0 was used as an
end-of-array signal, so the matcher didn't handle this case correctly.
This patch compiles empty token spec dicts into a constraint
NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the
lexeme -- so this always matches.
2017-10-07 03:36:15 +02:00
ines
0adadcb3f0
Fix beam parse model test
2017-10-07 02:15:15 +02:00
ines
b38a8f4a94
Fix and update pipe methods tests
2017-10-07 02:06:23 +02:00
Matthew Honnibal
0384f08218
Trigger nonproj.deprojectivize as a postprocess
2017-10-07 02:00:47 +02:00
Matthew Honnibal
3a65a0c970
Start adding tests for new pipeline management
2017-10-07 01:48:23 +02:00