Commit Graph

8743 Commits

Author SHA1 Message Date
Matthew Honnibal
659ec5b975 Avoid importing fused token symbol in ud-run-test, untl that's added 2018-05-08 19:40:33 +02:00
Matthew Honnibal
4cb0494bef Bug fixes to beam search after refactor 2018-05-08 13:48:50 +02:00
Matthew Honnibal
5ed71973b3 Add a keyword argument sink to GoldParse 2018-05-08 13:48:32 +02:00
Matthew Honnibal
8cfe326f87 Avoid relying on final gold check in beam search 2018-05-08 13:48:19 +02:00
Matthew Honnibal
fc4dd49b77 Support oracle segmentation in ud-train CLI command 2018-05-08 13:47:45 +02:00
Matthew Honnibal
c49e44349a Fix beam parsing 2018-05-08 02:53:24 +02:00
Matthew Honnibal
99649d114d Fix parser 2018-05-08 00:27:26 +02:00
Matthew Honnibal
8a82367a9d Fix beam search after refactor 2018-05-08 00:20:33 +02:00
Matthew Honnibal
5a0f26be0c Readd beam search after refactor 2018-05-08 00:19:52 +02:00
ines
7a3599c21a Fix formatting and consistency 2018-05-07 23:02:11 +02:00
ines
37facf9b4d Add config for no-response [ci skip] 2018-05-07 22:04:54 +02:00
ines
ac25bc4016 Add docs section on sentence segmentation [ci skip] 2018-05-07 21:25:20 +02:00
ines
14148cd147 Fix formatting and wording 2018-05-07 21:24:35 +02:00
ines
f803da609f Add scattertext [ci skip] 2018-05-07 19:10:23 +02:00
ines
a685fff875 Merge branch 'master' of https://github.com/explosion/spaCy 2018-05-07 18:58:57 +02:00
Matthew Honnibal
36b2c9bdd5 Fix refactored parser 2018-05-07 18:58:09 +02:00
ines
e2241c797c Add lock-threads configuration [ci skip] 2018-05-07 18:54:22 +02:00
Matthew Honnibal
bde3be1ad1 Fix refactored parser 2018-05-07 18:31:04 +02:00
B!
414f5270b3 B Cavello's signed Contributor Agreement v2 (#2302)
This time hopefully created in the right spot. (Sorry about that!)
2018-05-07 17:48:54 +02:00
Matthew Honnibal
01c4e13b02 Update test 2018-05-07 16:59:52 +02:00
Matthew Honnibal
f6cdafc00e Fix refactored parser 2018-05-07 16:59:38 +02:00
Matthew Honnibal
3e3771c010 Compile updated parser 2018-05-07 15:54:27 +02:00
Matthew Honnibal
f56bd4736b Improve dynamic oracle when values are missing in parse 2018-05-07 15:53:18 +02:00
Matthew Honnibal
eddc0e0c74 Set gold.sent_starts in ud_train 2018-05-07 15:52:47 +02:00
Matthew Honnibal
bf19f22340 Allow gold.sent_starts to be set from Python 2018-05-07 15:51:34 +02:00
Matthew Honnibal
7f163442e6 Work on refactoring greedy parser 2018-05-07 15:45:52 +02:00
Matt Upson
9a1d3b63fb Add missing default to .set_extension (#2297)
Failing to set a default, method, or getter results in a ValueError:

ValueError: [E083] Error setting extension: only one of `default`, `method`, or `getter` (plus optional `setter`) is allowed. Got: 0
2018-05-04 18:47:01 +02:00
ines
929a01139a Order issue templates 2018-05-04 03:04:41 +02:00
Ines Montani
7f39c8896b
Update issue templates (#2295)
* Update issue templates

* Update templates
2018-05-04 03:02:26 +02:00
Douglas Knox
9b49a40f4e Test and fix for Issue #2219 (#2272)
Test and fix for Issue #2219: Token.similarity() failed if single letter
2018-05-03 18:40:46 +02:00
Paul O'Leary McCann
bd72fbf09c Port Japanese mecab tokenizer from v1 (#2036)
* Port Japanese mecab tokenizer from v1

This brings the Mecab-based Japanese tokenization introduced in #1246 to
spaCy v2. There isn't a JapaneseTagger implementation yet, but POS tag
information from Mecab is stored in a token extension. A tag map is also
included.

As a reminder, Mecab is required because Universal Dependencies are
based on Unidic tags, and Janome doesn't support Unidic.

Things to check:

1. Is this the right way to use a token extension?

2. What's the right way to implement a JapaneseTagger? The approach in
 #1246 relied on `tag_from_strings` which is just gone now. I guess the
best thing is to just try training spaCy's default Tagger?

-POLM

* Add tagging/make_doc and tests
2018-05-03 18:38:26 +02:00
G.Pruvost
cc8e804648 #2211 - Support for ssl certs config on download command (#2212)
* Add support for SSL/Certs customization on download CLI

* Add a note on SSL options for the 'download' CLI in the README

* Add contributor agreement
2018-05-03 18:37:02 +02:00
Jens Dahl Møllerhøj
b9290397fb rename SP to _SP (#2289) 2018-05-03 18:33:49 +02:00
ines
c9547b7b8b Update Juniper (see #2293) 2018-05-03 15:36:02 +02:00
Matthew Honnibal
a8e70a4187 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-05-03 14:02:10 +02:00
Matthew Honnibal
c0e596283b Set version to 2.1.0a0 2018-05-03 14:00:11 +02:00
Alex Villarreal
647f2544c5 Fix code sample for span.set_extension (#2286) 2018-05-03 00:39:22 +02:00
Matthew Honnibal
8cd06cc763 Try to fix root-outside-sentence bug 2018-05-02 14:39:48 +00:00
Matthew Honnibal
acebd01033 Set cildren from heads in finalize doc 2018-05-02 14:19:22 +00:00
Alex Villarreal
13d562e1a4 Fix code sample for Doc.set_extension (#2282)
* Fix code sample for `set_extension`

The previous sample code for `set_extension` fails the assertion at the end, because `city_getter` it checked if the whole document text matches any of the city names. Now it checks if any of the city names is contained in the document text.

* Contributor agreement
2018-05-02 10:16:05 +02:00
Matthew Honnibal
569440a6db Dont normalize gradient by batch size 2018-05-02 08:42:10 +02:00
Matthew Honnibal
281e29cbcd Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-05-02 01:36:23 +00:00
Matthew Honnibal
2338e8c7fc Update develop from master 2018-05-02 01:36:12 +00:00
Matthew Honnibal
9d147e12c4 Merge remote-tracking branch 'origin/master' into develop 2018-05-01 18:18:51 +02:00
Matthew Honnibal
8562faeb39 Fix conll2017 fab command 2018-05-01 18:04:58 +02:00
Matthew Honnibal
116ae46802 Improve experiment management 2018-05-01 17:51:22 +02:00
Matthew Honnibal
6d0fe67b72 Constrain subtok label to adjacent tokens 2018-05-01 17:34:27 +02:00
Matthew Honnibal
8f21953fc5 Constrain subtok to adjacent words 2018-05-01 17:29:00 +02:00
Matthew Honnibal
b43bfd3524 Fix arc-eager oracle tests 2018-05-01 16:16:14 +02:00
Matthew Honnibal
31ed64e9b0 Fix textcat test 2018-05-01 15:18:39 +02:00