Matthew Honnibal
df88690177
Fix encoding of path variable
2016-09-24 21:13:15 +02:00
Matthew Honnibal
af847e07fc
Fix usage of pathlib for Python3 -- turning paths to strings.
2016-09-24 21:05:27 +02:00
Matthew Honnibal
453683aaf0
Fix spacy/vocab.pyx
2016-09-24 20:50:31 +02:00
Matthew Honnibal
d310dc73ef
Fix bin/init_model.py after refactoring
2016-09-24 20:38:18 +02:00
Matthew Honnibal
fd65cf6cbb
Finish refactoring data loading
2016-09-24 20:26:17 +02:00
Matthew Honnibal
83e364188c
Mostly finished loading refactoring. Design is in place, but doesn't work yet.
2016-09-24 15:42:01 +02:00
Matthew Honnibal
9dc8043a7e
Refactor Language to use new Defaults class, and work on revised data loading. We're getting rid of sputnik's weird file-system wrapper, and using pathlib.
2016-09-24 14:08:53 +02:00
Matthew Honnibal
b00f683a0c
Fix matcher test
2016-09-24 11:20:58 +02:00
Matthew Honnibal
eaf4065480
Expose the _patterns private member
2016-09-24 11:20:42 +02:00
Matthew Honnibal
15e42a1ba9
Allow entities to be set by Span, or by 4-tuple (with entity ID)
2016-09-24 01:17:43 +02:00
Matthew Honnibal
60fdf4d5f1
Remove commented out debuggng code
2016-09-24 01:17:18 +02:00
Matthew Honnibal
939a791a52
Update tests
2016-09-24 01:17:03 +02:00
Matthew Honnibal
55f1f7edaf
Don't automatically write new entities into the Doc in the Matcher. This fixes a long-standing wart, but introduces a *backwards incompatibility.*
2016-09-24 01:16:45 +02:00
Matthew Honnibal
e48df859b5
Fix typedef import in span.pyx
2016-09-23 16:02:28 +02:00
Matthew Honnibal
4de13606fd
Fix token.pyx
2016-09-23 15:07:07 +02:00
Matthew Honnibal
b4de419e19
Import hash_t typedef in token.pyx
2016-09-23 14:22:06 +02:00
Matthew Honnibal
4c2ad9f063
Build travis with trusty
2016-09-23 14:17:46 +02:00
Matthew Honnibal
c1a2e96604
Clean up notes at end of token.pyx
2016-09-21 20:45:51 +02:00
Matthew Honnibal
f6e587b1c7
Fix matcher tests
2016-09-21 20:45:20 +02:00
Matthew Honnibal
58e83fe34b
Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match.
2016-09-21 14:54:55 +02:00
Matthew Honnibal
2735b6247b
Fix orths_and_spaces in Doc.__init__
2016-09-21 14:52:05 +02:00
Matthew Honnibal
070af4af9d
Revert "* Working neural net, but features hacky. Switching to extractor."
...
This reverts commit 7c2f1a673b
.
2016-09-21 12:26:14 +02:00
Matthew Honnibal
6b202ec43f
Merge branch 'master' of ssh://github.com/spacy-io/spaCy
2016-09-21 12:08:25 +02:00
Matthew Honnibal
2f7ef4b150
Merge pull request #501 from lababidi/master
...
Add parameter to download()
2016-09-14 23:35:15 +02:00
Mahmoud Lababidi
4c9ccc3b8b
Add parameter to download() for application to not exit if a Model exists. The default behavior is unchanged.
2016-09-14 10:04:09 -04:00
Matthew Honnibal
cccadbf2a2
Merge pull request #465 from izeye/patch-1
...
Fix doc
2016-08-10 05:17:22 +10:00
Matthew Honnibal
8c24fd2928
Merge pull request #447 from stared/patch-1
...
fixed sense2vec blog post code
2016-08-10 05:17:12 +10:00
Johnny Lim
4c53a8ecd7
Fix doc
...
This PR changes the `str`s to `unicode`s because `str`s throw the following error:
```
TypeError: Argument 'x' has incorrect type (expected unicode, got str)
```
2016-07-30 16:10:21 +09:00
Matthew Honnibal
d9fcf9b79e
Merge pull request #454 from adamhadani/download-noerrorifexists
...
Exit code 0 for when downloading a model that already was downloaded
2016-07-14 09:52:57 +10:00
Adam Ever Hadani
f1c0762443
exit code 0 for when downloading a model that already was downloaded
2016-07-13 16:22:14 -07:00
Piotr Migdał
3cd88c82b8
fixed sense2vec blog post code
...
it was not working - variable `tokens` does not exist; now it is fine
2016-07-06 15:09:49 +02:00
Matthew Honnibal
b1d06ff9e9
Merge pull request #422 from RahulKulhari/patch-1
...
plac version in requirements
2016-06-18 23:19:03 +10:00
Rahul Kulhari
afebf9ad9a
updated plac version
...
current new version of plac(0.9.3) is creating problem but it is working <0.9.3
2016-06-10 18:27:02 +05:30
Matthew Honnibal
7c2f1a673b
* Working neural net, but features hacky. Switching to extractor.
2016-05-26 19:06:10 +02:00
Matthew Honnibal
8036368d96
* Fix model saving
2016-05-23 12:01:46 +00:00
Matthew Honnibal
35214053fd
* Work around get_lex_attr bug introduced during German parsing
2016-05-23 10:53:00 +00:00
Matthew Honnibal
bc3c8d8adf
Fix lemma of "coping"
...
Fix Issue #389 : Incorrect lemma for "coping"
2016-05-20 19:03:41 +10:00
Matthew Honnibal
cdc10e9a1c
* Fix Issue #375 : noun phrase iteration results in index error if noun phrases are merged during the loop. Fix by accumulating the spans inside the noun_chunks property, allowing the Span index tricks to work.
2016-05-20 10:14:06 +02:00
Matthew Honnibal
13fad36e49
* Cosmetic change to english noun chunks iterator -- use enumerate instead of range loop
2016-05-20 10:11:05 +02:00
Matthew Honnibal
02276cc444
Merge branch 'master' of ssh://github.com/spacy-io/spaCy
2016-05-17 16:56:22 +02:00
Matthew Honnibal
4d7f5468bb
* Change Language class to use a .pipeline attribute, instead of having the pipeline hard coded
2016-05-17 16:55:42 +02:00
Matthew Honnibal
2d25339c47
Merge pull request #386 from daylen/master
...
Fix get_lang_class parsing (take 2)
2016-05-17 23:15:47 +10:00
Daylen Yang
5405e7dd73
Fix get_lang_class parsing (take 2)
2016-05-16 16:40:31 -07:00
Matthew Honnibal
88538b339e
Merge pull request #385 from spacy-io/revert-384-master
...
Revert "Fix get_lang_class parsing"
2016-05-17 08:04:44 +10:00
Matthew Honnibal
b240104f40
Revert "Fix get_lang_class parsing"
2016-05-17 08:04:26 +10:00
Matthew Honnibal
9bd3c316c9
Merge pull request #384 from daylen/master
...
Fix get_lang_class parsing
2016-05-17 07:52:22 +10:00
Daylen Yang
bffbe9b9d0
Merge pull request #1 from daylen/fixed_get_lang_class_parse
...
Fix get_lang_class parsing
2016-05-16 14:40:20 -07:00
Daylen Yang
1692c2df3c
Fix get_lang_class parsing
...
We want the get_lang_class to return "en" for both "en" and "en_glove_cc_300_1m_vectors". Changed the split rule to "_" so that this happens.
2016-05-16 14:38:20 -07:00
Matthew Honnibal
17137f5c0c
* Fix issue #372 : mistake in Lexeme rich comparison
2016-05-12 12:58:57 +02:00
Matthew Honnibal
cc8bf62208
* Fix Issue #360 : Tokenizer failed when the infix regex matched the start of the string while trying to tokenize multi-infix tokens.
2016-05-09 13:23:47 +02:00