Commit Graph

3080 Commits

Author SHA1 Message Date
Matthew Honnibal
9dc8043a7e Refactor Language to use new Defaults class, and work on revised data loading. We're getting rid of sputnik's weird file-system wrapper, and using pathlib. 2016-09-24 14:08:53 +02:00
Matthew Honnibal
b00f683a0c Fix matcher test 2016-09-24 11:20:58 +02:00
Matthew Honnibal
eaf4065480 Expose the _patterns private member 2016-09-24 11:20:42 +02:00
Matthew Honnibal
15e42a1ba9 Allow entities to be set by Span, or by 4-tuple (with entity ID) 2016-09-24 01:17:43 +02:00
Matthew Honnibal
60fdf4d5f1 Remove commented out debuggng code 2016-09-24 01:17:18 +02:00
Matthew Honnibal
939a791a52 Update tests 2016-09-24 01:17:03 +02:00
Matthew Honnibal
55f1f7edaf Don't automatically write new entities into the Doc in the Matcher. This fixes a long-standing wart, but introduces a *backwards incompatibility.* 2016-09-24 01:16:45 +02:00
Matthew Honnibal
e48df859b5 Fix typedef import in span.pyx 2016-09-23 16:02:28 +02:00
Matthew Honnibal
4de13606fd Fix token.pyx 2016-09-23 15:07:07 +02:00
Matthew Honnibal
b4de419e19 Import hash_t typedef in token.pyx 2016-09-23 14:22:06 +02:00
Matthew Honnibal
4c2ad9f063 Build travis with trusty 2016-09-23 14:17:46 +02:00
Matthew Honnibal
c1a2e96604 Clean up notes at end of token.pyx 2016-09-21 20:45:51 +02:00
Matthew Honnibal
f6e587b1c7 Fix matcher tests 2016-09-21 20:45:20 +02:00
Matthew Honnibal
58e83fe34b Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match. 2016-09-21 14:54:55 +02:00
Matthew Honnibal
2735b6247b Fix orths_and_spaces in Doc.__init__ 2016-09-21 14:52:05 +02:00
Matthew Honnibal
070af4af9d Revert "* Working neural net, but features hacky. Switching to extractor."
This reverts commit 7c2f1a673b.
2016-09-21 12:26:14 +02:00
Matthew Honnibal
6b202ec43f Merge branch 'master' of ssh://github.com/spacy-io/spaCy 2016-09-21 12:08:25 +02:00
Matthew Honnibal
2f7ef4b150 Merge pull request #501 from lababidi/master
Add parameter to download()
2016-09-14 23:35:15 +02:00
Mahmoud Lababidi
4c9ccc3b8b Add parameter to download() for application to not exit if a Model exists. The default behavior is unchanged. 2016-09-14 10:04:09 -04:00
Matthew Honnibal
cccadbf2a2 Merge pull request #465 from izeye/patch-1
Fix doc
2016-08-10 05:17:22 +10:00
Matthew Honnibal
8c24fd2928 Merge pull request #447 from stared/patch-1
fixed sense2vec blog post code
2016-08-10 05:17:12 +10:00
Johnny Lim
4c53a8ecd7 Fix doc
This PR changes the `str`s to `unicode`s because `str`s throw the following error:

```
TypeError: Argument 'x' has incorrect type (expected unicode, got str)
```
2016-07-30 16:10:21 +09:00
Matthew Honnibal
d9fcf9b79e Merge pull request #454 from adamhadani/download-noerrorifexists
Exit code 0 for when downloading a model that already was downloaded
2016-07-14 09:52:57 +10:00
Adam Ever Hadani
f1c0762443 exit code 0 for when downloading a model that already was downloaded 2016-07-13 16:22:14 -07:00
Piotr Migdał
3cd88c82b8 fixed sense2vec blog post code
it was not working - variable `tokens` does not exist; now it is fine
2016-07-06 15:09:49 +02:00
Matthew Honnibal
b1d06ff9e9 Merge pull request #422 from RahulKulhari/patch-1
plac version in requirements
2016-06-18 23:19:03 +10:00
Rahul Kulhari
afebf9ad9a updated plac version
current new version of plac(0.9.3) is creating problem but it is working <0.9.3
2016-06-10 18:27:02 +05:30
Matthew Honnibal
7c2f1a673b * Working neural net, but features hacky. Switching to extractor. 2016-05-26 19:06:10 +02:00
Matthew Honnibal
8036368d96 * Fix model saving 2016-05-23 12:01:46 +00:00
Matthew Honnibal
35214053fd * Work around get_lex_attr bug introduced during German parsing 2016-05-23 10:53:00 +00:00
Matthew Honnibal
bc3c8d8adf Fix lemma of "coping"
Fix Issue #389: Incorrect lemma for "coping"
2016-05-20 19:03:41 +10:00
Matthew Honnibal
cdc10e9a1c * Fix Issue #375: noun phrase iteration results in index error if noun phrases are merged during the loop. Fix by accumulating the spans inside the noun_chunks property, allowing the Span index tricks to work. 2016-05-20 10:14:06 +02:00
Matthew Honnibal
13fad36e49 * Cosmetic change to english noun chunks iterator -- use enumerate instead of range loop 2016-05-20 10:11:05 +02:00
Matthew Honnibal
02276cc444 Merge branch 'master' of ssh://github.com/spacy-io/spaCy 2016-05-17 16:56:22 +02:00
Matthew Honnibal
4d7f5468bb * Change Language class to use a .pipeline attribute, instead of having the pipeline hard coded 2016-05-17 16:55:42 +02:00
Matthew Honnibal
2d25339c47 Merge pull request #386 from daylen/master
Fix get_lang_class parsing (take 2)
2016-05-17 23:15:47 +10:00
Daylen Yang
5405e7dd73 Fix get_lang_class parsing (take 2) 2016-05-16 16:40:31 -07:00
Matthew Honnibal
88538b339e Merge pull request #385 from spacy-io/revert-384-master
Revert "Fix get_lang_class parsing"
2016-05-17 08:04:44 +10:00
Matthew Honnibal
b240104f40 Revert "Fix get_lang_class parsing" 2016-05-17 08:04:26 +10:00
Matthew Honnibal
9bd3c316c9 Merge pull request #384 from daylen/master
Fix get_lang_class parsing
2016-05-17 07:52:22 +10:00
Daylen Yang
bffbe9b9d0 Merge pull request #1 from daylen/fixed_get_lang_class_parse
Fix get_lang_class parsing
2016-05-16 14:40:20 -07:00
Daylen Yang
1692c2df3c Fix get_lang_class parsing
We want the get_lang_class to return "en" for both "en" and "en_glove_cc_300_1m_vectors". Changed the split rule to "_" so that this happens.
2016-05-16 14:38:20 -07:00
Matthew Honnibal
17137f5c0c * Fix issue #372: mistake in Lexeme rich comparison 2016-05-12 12:58:57 +02:00
Matthew Honnibal
cc8bf62208 * Fix Issue #360: Tokenizer failed when the infix regex matched the start of the string while trying to tokenize multi-infix tokens. 2016-05-09 13:23:47 +02:00
Matthew Honnibal
eab2376547 * Allow longer ellipses to be treated as a single token, e.g. Hello......there 2016-05-09 13:22:53 +02:00
Matthew Honnibal
c61ee8f9fa * Increment version 2016-05-09 13:20:00 +02:00
Matthew Honnibal
f6ef64f02c * Update changelog in preparation for 0.101.0 release 2016-05-09 12:57:07 +02:00
Matthew Honnibal
5d86c30f0b * Fix Issue #367: Missing has_vector property on Doc and Span objects 2016-05-09 12:36:14 +02:00
Wolfgang Seeker
7b78239436 add fix for German noun chunk iterator (issue #365) 2016-05-06 01:41:26 +02:00
Matthew Honnibal
8c0888d6cb * Fix error in span.sent 2016-05-06 00:28:05 +02:00