Commit Graph

8565 Commits

Author SHA1 Message Date
Matthew Honnibal
e70f80f29e Add Language.disable_pipes() 2017-10-25 13:46:41 +02:00
Matthew Honnibal
075e8118ea Update from develop 2017-10-25 12:45:21 +02:00
ines
72497c8cb2 Remove comments and add TODO 2017-10-25 12:15:43 +02:00
ines
4d97efc3b5 Add missing docstrings 2017-10-25 12:10:16 +02:00
ines
1262aa0bf9 Implement PhraseMatcher.__contains__ 2017-10-25 12:10:04 +02:00
ines
9c733a8849 Implement PhraseMatcher.__len__ 2017-10-25 12:09:56 +02:00
ines
7eebeeaf85 Fix Matcher.__contains__ 2017-10-25 12:09:47 +02:00
ines
7bcec57462 Remove unused attribute 2017-10-25 12:08:54 +02:00
ines
0b1dcbac14 Remove unused function 2017-10-25 12:08:46 +02:00
ines
3484174e48 Add Language.path 2017-10-25 11:57:43 +02:00
ines
4a06eddb5f Update README 2017-10-24 22:18:40 +02:00
Ines Montani
dee2896133 Update PULL_REQUEST_TEMPLATE.md 2017-10-24 21:52:12 +02:00
ines
1730648e19 Update pull request template 2017-10-24 21:49:11 +02:00
ines
972d9e832c Update README for v2.0 2017-10-24 21:49:11 +02:00
ines
63683a5151 Port over contributors from master 2017-10-24 21:49:11 +02:00
ines
c815ff65f6 Update feature list 2017-10-24 21:49:11 +02:00
Ines Montani
d3bf488e16 Merge pull request #1171 from mollerhoj/support-danish
Improve basic support for Danish
2017-10-24 20:29:57 +02:00
ines
7459ecfa87 Port over contributor agreements 2017-10-24 20:13:34 +02:00
ines
d71702b827 Fix formatting 2017-10-24 20:11:04 +02:00
Matthew Honnibal
d9bb1e5de8 Increment version 2017-10-24 17:06:19 +02:00
Matthew Honnibal
908809d488 Update tests 2017-10-24 17:05:15 +02:00
Matthew Honnibal
66766c1454 Restore SP tag to English tag_map, until models migrate 2017-10-24 17:05:00 +02:00
ines
b51dcee3ce Fix unicode in lightning tour example (resolves #1356) 2017-10-24 16:25:49 +02:00
ines
ebd2e5ff54 Fix matcher docs (resolves #1453) 2017-10-24 16:22:46 +02:00
ines
90601cf1b3 Fix formatting 2017-10-24 16:22:37 +02:00
ines
0e081d0167 Update JSON training format docs (resolves #1291) 2017-10-24 16:17:54 +02:00
ines
91dbee1b8f Add BILUO docs to NER annotation scheme 2017-10-24 16:17:03 +02:00
ines
fdd8dacb75 Fix compilation of color utility class names 2017-10-24 16:13:52 +02:00
ines
a2e7e9be98 Update landing 2017-10-24 16:12:47 +02:00
Matthew Honnibal
30e67fa808 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-24 16:08:23 +02:00
Matthew Honnibal
b0f6fd3f1d Disable tokenizer cache for special-cases. Fixes #1250 2017-10-24 16:08:05 +02:00
Matthew Honnibal
63f0bde749 Add test for #1250: Tokenizer cache clobbered special-case attrs 2017-10-24 16:07:18 +02:00
ines
8492d5be6d Always make lemmatizer return a list of lemmas, not a set 2017-10-24 16:00:56 +02:00
ines
95f866f99f Add lookup argument to Lemmatizer.load 2017-10-24 16:00:56 +02:00
ines
95f6174516 Remove tensorizer from model pipeline example in spacy package 2017-10-24 16:00:56 +02:00
ines
6686e53530 Allow GitHub embeds to specify optional language 2017-10-24 16:00:56 +02:00
ines
56a47f137f Add title description for tokenizer 2017-10-24 16:00:56 +02:00
ines
3944c1d6e7 Document lemmatizer 2017-10-24 16:00:56 +02:00
ines
c9dc88ddfc Document current JSON format for training 2017-10-24 16:00:56 +02:00
ines
2b8e7c45e0 Use better training data JSON example 2017-10-24 16:00:56 +02:00
ines
090aed940a Add test for currently failing span.as_doc case 2017-10-24 16:00:56 +02:00
ines
4ef81a9ebc Fix whitespace 2017-10-24 16:00:56 +02:00
Matthew Honnibal
18f1c1d0ba Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-24 14:29:43 +02:00
Matthew Honnibal
4bea65a1a8 Fix Issue #1450: Off-by-1 in * and ? matches
Patterns that end in variable-length operators e.g. * and ? now end on
the correct token. Previously, they were off by 1: the next token was
pulled into the match, even if that's where the pattern failed.
2017-10-24 14:26:27 +02:00
Matthew Honnibal
391d5ef0d1 Normalize imports in regression test 2017-10-24 14:25:49 +02:00
ines
c55db0a4a1 Add example sentences for Japanese and Chinese (see #1107) 2017-10-24 13:02:24 +02:00
ines
66f8f9d4a0 Fix Japanese tokenizer
JapaneseTokenizer now returns a Doc, not individual words
2017-10-24 13:02:19 +02:00
Matthew Honnibal
5ae0b8613a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-24 12:41:07 +02:00
Matthew Honnibal
dd5b2d8fa3 Check for out-of-memory when calling calloc. Closes #1446 2017-10-24 12:40:47 +02:00
ines
9bf5751064 Pretty-print JSON 2017-10-24 12:22:17 +02:00