Matthew Honnibal
|
4ad7bb96c9
|
Increment version.
|
2016-10-20 20:48:30 +02:00 |
|
Matthew Honnibal
|
5ec32f5d97
|
Fix loading of GloVe vectors, to address Issue #541
|
2016-10-20 18:27:48 +02:00 |
|
Matthew Honnibal
|
ddeabd76c4
|
Fix mistake loading GloVe vectors. GloVe vectors now loaded by default if present, as promised.
|
2016-10-20 16:57:53 +02:00 |
|
Matthew Honnibal
|
bfe5cb1244
|
Increment version.
|
2016-10-20 14:52:00 +02:00 |
|
Matthew Honnibal
|
f189a3cb00
|
Fix encoding when opening files in Python 2.7, re Issue #539
|
2016-10-20 14:42:56 +02:00 |
|
Matthew Honnibal
|
c353a5214d
|
Increment version
|
2016-10-19 23:51:01 +02:00 |
|
Matthew Honnibal
|
d10c17f2a4
|
Fix Issue #536: oov_prob was 0 for OOV words.
|
2016-10-19 23:38:47 +02:00 |
|
Matthew Honnibal
|
dfa752d064
|
Increment version
|
2016-10-19 23:19:13 +02:00 |
|
Matthew Honnibal
|
3588a18fb8
|
Fix hook names in doc
|
2016-10-19 21:15:16 +02:00 |
|
Matthew Honnibal
|
5d5742b773
|
Add sentiment field to doc, rename getters_for_tokens and getters_for_spans, add user_hooks field to Doc.
|
2016-10-19 20:54:22 +02:00 |
|
Matthew Honnibal
|
ed5e178817
|
Add sentiment property on lexeme object
|
2016-10-19 20:52:52 +02:00 |
|
Matthew Honnibal
|
d4aaf2752c
|
Fix issue #535: Pipeline elements added even when data not installed.
|
2016-10-19 19:55:19 +02:00 |
|
Matthew Honnibal
|
04d1c959da
|
Fix version
|
2016-10-19 03:45:37 +02:00 |
|
Matthew Honnibal
|
d35aa7344e
|
Change version ID to make PyPi happy
|
2016-10-19 03:24:39 +02:00 |
|
Matthew Honnibal
|
89d2a5c8b3
|
Increment build version.
|
2016-10-19 03:05:17 +02:00 |
|
Matthew Honnibal
|
622b0a9674
|
Tweak download script
|
2016-10-19 00:52:16 +02:00 |
|
Matthew Honnibal
|
5a5c7192a5
|
Fix download.py for GloVe vectors.
|
2016-10-19 00:47:44 +02:00 |
|
Matthew Honnibal
|
edc45c19d6
|
Update download script
|
2016-10-19 00:41:14 +02:00 |
|
Matthew Honnibal
|
2bbb050500
|
Fix default of serializer_freqs
|
2016-10-18 19:55:41 +02:00 |
|
Matthew Honnibal
|
1b651db9c5
|
Fix parser creation in Language class.
|
2016-10-18 19:36:44 +02:00 |
|
Matthew Honnibal
|
45a6f9b9c7
|
Fix loading of tagger.
|
2016-10-18 19:33:04 +02:00 |
|
Matthew Honnibal
|
76c815f40d
|
Fix spacy.load
|
2016-10-18 19:23:31 +02:00 |
|
Matthew Honnibal
|
8c8f5c62c6
|
Add LANG attribute to English and German
|
2016-10-18 18:52:48 +02:00 |
|
Matthew Honnibal
|
05e2a589a4
|
Fix None label in matcher
|
2016-10-18 18:05:21 +02:00 |
|
Matthew Honnibal
|
c3a8a1cf51
|
Update serializer test.
|
2016-10-18 16:18:46 +02:00 |
|
Matthew Honnibal
|
7d5212f131
|
Refactor defaults
|
2016-10-18 16:18:25 +02:00 |
|
Matthew Honnibal
|
a45a9d5092
|
Remove stray .tensor attribute from Lexeme
|
2016-10-18 01:16:32 +02:00 |
|
Matthew Honnibal
|
9258db788a
|
Revert "Have the matcher return character offsets, to handle the match better."
This reverts commit 049c937540 .
|
2016-10-17 16:49:51 +02:00 |
|
Matthew Honnibal
|
7d446e5094
|
Revert "Update matcher test, to reflect character offset return instead of token offset."
This reverts commit f8d3e3bcfe .
|
2016-10-17 16:49:49 +02:00 |
|
Matthew Honnibal
|
4bf2c53c13
|
Revert "Hack on matcher tests, for new implementation."
This reverts commit dbe60644ab .
|
2016-10-17 16:49:48 +02:00 |
|
Matthew Honnibal
|
2fd97c71cc
|
Revert "Don't try to pickle matcher."
This reverts commit 97bd0c9d00 .
|
2016-10-17 16:49:43 +02:00 |
|
Matthew Honnibal
|
97bd0c9d00
|
Don't try to pickle matcher.
|
2016-10-17 16:38:40 +02:00 |
|
Matthew Honnibal
|
dbe60644ab
|
Hack on matcher tests, for new implementation.
|
2016-10-17 16:12:22 +02:00 |
|
Matthew Honnibal
|
f8d3e3bcfe
|
Update matcher test, to reflect character offset return instead of token offset.
|
2016-10-17 16:00:10 +02:00 |
|
Matthew Honnibal
|
049c937540
|
Have the matcher return character offsets, to handle the match better.
|
2016-10-17 15:58:57 +02:00 |
|
Matthew Honnibal
|
9b60186266
|
Fix doc class
|
2016-10-17 15:23:47 +02:00 |
|
Matthew Honnibal
|
6cbdc94959
|
Lots of updates to Matcher, to make entity handling sane.
|
2016-10-17 15:23:31 +02:00 |
|
Matthew Honnibal
|
7fd98fc91c
|
Remove deprecation shim around str/bytes in Token.
|
2016-10-17 14:02:47 +02:00 |
|
Matthew Honnibal
|
b67697a97b
|
Improve API for doc.merge() and span.merge(), to use keyword arguments.
|
2016-10-17 14:02:13 +02:00 |
|
Matthew Honnibal
|
fbb7f3f15c
|
Add user_data attribute to Doc object.
|
2016-10-17 11:43:22 +02:00 |
|
Matthew Honnibal
|
c1abc8f6ed
|
Fix deprecation stuff in Token: Remove the shim for the str/unicode semantics, and raise for has_repvec and repvec
|
2016-10-17 11:18:41 +02:00 |
|
Matthew Honnibal
|
4ba9eadf3d
|
Merge branch 'v1.0.0-rc1' of ssh://github.com/explosion/spaCy into v1.0.0-rc1
|
2016-10-17 02:45:44 +02:00 |
|
Matthew Honnibal
|
09ab447a18
|
Remove tensor property from token.
|
2016-10-17 02:45:09 +02:00 |
|
Matthew Honnibal
|
5d10e2005c
|
Defer some attributes to Doc, via getters_for_tokens attribute.
|
2016-10-17 02:44:49 +02:00 |
|
Matthew Honnibal
|
8829984efb
|
Remove tensor attribute from Span and Token.
|
2016-10-17 02:44:04 +02:00 |
|
Matthew Honnibal
|
d15a88c66a
|
Defer some attributes to Doc via getters_for_spans
|
2016-10-17 02:43:35 +02:00 |
|
Matthew Honnibal
|
62230dd13a
|
Add getters_for_spans and getters_for_tokens attributes to Doc. Fix docstring
|
2016-10-17 02:42:51 +02:00 |
|
Matthew Honnibal
|
ae11ea8240
|
Add getters_for_tokens and getters_for_spans attributes to Doc object.
|
2016-10-17 02:42:05 +02:00 |
|
Matthew Honnibal
|
be48a7b4f3
|
Fix conftest for website tests.
|
2016-10-17 01:54:26 +02:00 |
|
Matthew Honnibal
|
8951bf6989
|
Update matcher tests
|
2016-10-17 01:53:24 +02:00 |
|
Matthew Honnibal
|
0cf4aff470
|
Set default path in EN/DE tests.
|
2016-10-17 01:52:49 +02:00 |
|
Matthew Honnibal
|
cd71b6b0a9
|
Remove test of parser pickle
|
2016-10-17 01:52:10 +02:00 |
|
Matthew Honnibal
|
5bc101006e
|
Add cfg field to Tagger
|
2016-10-17 01:03:41 +02:00 |
|
Matthew Honnibal
|
517f090cbf
|
Use GoldParse in tagger.update
|
2016-10-17 00:55:15 +02:00 |
|
Matthew Honnibal
|
59038f7efa
|
Restore support for prior data format -- specifically, the labels field of the config.
|
2016-10-17 00:53:26 +02:00 |
|
Matthew Honnibal
|
7887ab3b36
|
Fix default use of feature_templates in parser
|
2016-10-16 21:41:56 +02:00 |
|
Matthew Honnibal
|
f787cd29fe
|
Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor.
|
2016-10-16 21:34:57 +02:00 |
|
Matthew Honnibal
|
311a985fe0
|
Add input error handling in Doc
|
2016-10-16 18:16:42 +02:00 |
|
Matthew Honnibal
|
06322ba99d
|
Add words and spaces keyword arguments to Doc.
|
2016-10-16 18:13:03 +02:00 |
|
Matthew Honnibal
|
ca51f3b77e
|
Use DependencyParser and EntityRecognizer in the Language class.
|
2016-10-16 17:58:12 +02:00 |
|
Matthew Honnibal
|
195d998a12
|
Fix GoldParse argument to tagger.update
|
2016-10-16 17:05:09 +02:00 |
|
Matthew Honnibal
|
274a4d4272
|
Fix queue Python property in StateClass
|
2016-10-16 17:04:41 +02:00 |
|
Matthew Honnibal
|
e8c8aa08ce
|
Make action_name optional in StepwiseState
|
2016-10-16 17:04:16 +02:00 |
|
Matthew Honnibal
|
4bb73b1a93
|
Fix parser labels in pipeline
|
2016-10-16 17:03:22 +02:00 |
|
Matthew Honnibal
|
a81c5a7abf
|
Fix name of labels keyword to 'actions'.
|
2016-10-16 12:00:27 +02:00 |
|
Matthew Honnibal
|
a079677984
|
Fix omission of O action when creating blank entity recognizer
|
2016-10-16 11:43:25 +02:00 |
|
Matthew Honnibal
|
5444d38cc6
|
Update test for biluo tags
|
2016-10-16 11:42:45 +02:00 |
|
Matthew Honnibal
|
4fc56d4a31
|
Rename 'labels' to 'actions' in parser options
|
2016-10-16 11:42:26 +02:00 |
|
Matthew Honnibal
|
8a6b35d266
|
Delay binding in MakeDoc
|
2016-10-16 11:41:55 +02:00 |
|
Matthew Honnibal
|
52b48b415e
|
Fix GoldParse class
|
2016-10-16 11:41:36 +02:00 |
|
Matthew Honnibal
|
3259a63779
|
Whitespace
|
2016-10-16 01:47:28 +02:00 |
|
Matthew Honnibal
|
509b30834f
|
Add a pipeline module, to collect and wrap processes for annotation
|
2016-10-16 01:47:12 +02:00 |
|
Matthew Honnibal
|
0317cea0ad
|
Fix GoldParse
|
2016-10-15 23:55:07 +02:00 |
|
Matthew Honnibal
|
1c62573a41
|
Fix spacy.train
|
2016-10-15 23:53:46 +02:00 |
|
Matthew Honnibal
|
a48aa15384
|
Improve the API for the GoldParse class.
|
2016-10-15 23:53:29 +02:00 |
|
Matthew Honnibal
|
e07fe92b27
|
Draft a refactored init for the GoldParse class
|
2016-10-15 22:09:52 +02:00 |
|
Matthew Honnibal
|
47afef7d6b
|
Add init.py for gold tests
|
2016-10-15 21:51:28 +02:00 |
|
Matthew Honnibal
|
86ae665c78
|
Add function for entity->biluo transformation
|
2016-10-15 21:51:04 +02:00 |
|
Matthew Honnibal
|
2163fd238f
|
Add tests for entity->biluo transformation
|
2016-10-15 21:50:43 +02:00 |
|
Matthew Honnibal
|
5e923b9bfa
|
Return None in match_best_version if not path exists.
|
2016-10-15 14:47:29 +02:00 |
|
Matthew Honnibal
|
2516382106
|
Fix loading of English in span test
|
2016-10-15 14:44:37 +02:00 |
|
Matthew Honnibal
|
dda2fc6bef
|
Add empty data directory
|
2016-10-15 14:25:25 +02:00 |
|
Matthew Honnibal
|
049197e0ae
|
Update tests, somewhat messily.
|
2016-10-15 14:14:04 +02:00 |
|
Matthew Honnibal
|
1e1a1d9517
|
Update matcher test
|
2016-10-15 14:13:41 +02:00 |
|
Matthew Honnibal
|
9cc9ce0f14
|
Load with default path=False in tests.
|
2016-10-15 14:13:23 +02:00 |
|
Matthew Honnibal
|
08e9134760
|
Change default value of path to True
|
2016-10-15 14:12:54 +02:00 |
|
Matthew Honnibal
|
788657f062
|
Ensure words are added to vocab before test, so that the lexicon is updated correctly.
|
2016-10-15 14:12:18 +02:00 |
|
Matthew Honnibal
|
4a1a2bce68
|
Update version in about.py
|
2016-10-15 13:44:27 +02:00 |
|
Matthew Honnibal
|
6d8cb515ac
|
Break the tokenization stage out of the pipeline into a function 'make_doc'. This allows all pipeline methods to have the same signature.
|
2016-10-14 17:38:29 +02:00 |
|
Matthew Honnibal
|
2cc515b2ed
|
Add add_flag method to Vocab, re Issue #504.
|
2016-10-14 12:15:38 +02:00 |
|
Matthew Honnibal
|
f3be9d0a9a
|
Add tensor field to Lexeme, Token, Doc and Span, so that users have a place to hang neural network outputs
|
2016-10-14 03:24:13 +02:00 |
|
Matthew Honnibal
|
9b55d97a8f
|
Update train method
|
2016-10-13 03:24:53 +02:00 |
|
Matthew Honnibal
|
645d99523a
|
Move merge_sents method into spacy.gold
|
2016-10-13 03:24:29 +02:00 |
|
Matthew Honnibal
|
41f88ce938
|
Fix dep model loading in parser
|
2016-10-12 20:26:38 +02:00 |
|
Matthew Honnibal
|
d9ae2d68af
|
Load features by string-name for backwards compatibility.
|
2016-10-12 20:15:11 +02:00 |
|
Matthew Honnibal
|
a42fbcf946
|
Require model for test_is_properties
|
2016-10-12 19:35:18 +02:00 |
|
Matthew Honnibal
|
20c948361b
|
Use local path in test_lemmatizer
|
2016-10-12 19:35:00 +02:00 |
|
Matthew Honnibal
|
1318d0bc65
|
Test with the non-loaded versions of the English and German pipelines.
|
2016-10-12 19:13:31 +02:00 |
|
Matthew Honnibal
|
0e2bedc373
|
Fix default labels for parser and NER
|
2016-10-12 19:12:40 +02:00 |
|
Matthew Honnibal
|
3a03c668c3
|
Fix message in ParserStateError
|
2016-10-12 14:44:31 +02:00 |
|
Matthew Honnibal
|
6bf505e865
|
Fix error on ParserStateError
|
2016-10-12 14:35:55 +02:00 |
|
Matthew Honnibal
|
ba5e048502
|
Add docstring for Trainer class.
|
2016-10-12 14:26:02 +02:00 |
|
Matthew Honnibal
|
847a4a4182
|
Refactor Language, dropping Language.blank() method.
|
2016-10-12 13:45:58 +02:00 |
|
Matthew Honnibal
|
ea23b64cc8
|
Refactor training, with new spacy.train module. Defaults still a little awkward.
|
2016-10-09 12:24:24 +02:00 |
|
Matthew Honnibal
|
ca32a1ab01
|
Revert "Work on Issue #285: intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good."
This reverts commit 8423e8627f .
|
2016-09-30 20:20:22 +02:00 |
|
Matthew Honnibal
|
90baa9c7e6
|
Revert "Changes to matcher.pyx for new StringStore scheme"
This reverts commit 3ff09614e0 .
|
2016-09-30 20:20:13 +02:00 |
|
Matthew Honnibal
|
1b6b129c04
|
Revert "Changes to morphology.pyx for new StringStore scheme"
This reverts commit 95f8cfd745 .
|
2016-09-30 20:20:02 +02:00 |
|
Matthew Honnibal
|
1d70db58aa
|
Revert "Changes to iterators.pyx for new StringStore scheme"
This reverts commit 4f794b215a .
|
2016-09-30 20:19:53 +02:00 |
|
Matthew Honnibal
|
de01e427fd
|
Revert "Changes to strings.pyx for new StringStore scheme"
This reverts commit 22d4752d64 .
|
2016-09-30 20:19:42 +02:00 |
|
Matthew Honnibal
|
9e09b39b9f
|
Revert "Changes to transition systems for new StringStore scheme"
This reverts commit 0442e0ab1e .
|
2016-09-30 20:11:49 +02:00 |
|
Matthew Honnibal
|
e3285f6f30
|
Revert "Fix report of ParserStateError"
This reverts commit 78f19baafa .
|
2016-09-30 20:11:33 +02:00 |
|
Matthew Honnibal
|
6736977d82
|
Revert "Changes to Doc and Token for new string store scheme"
This reverts commit 99de44d864 .
|
2016-09-30 20:11:15 +02:00 |
|
Matthew Honnibal
|
bd7fe6420c
|
Revert "Changes to test for new string-store"
This reverts commit 21e90d7d0b .
|
2016-09-30 20:11:01 +02:00 |
|
Matthew Honnibal
|
1f1cd5013f
|
Revert "Changes to vocab for new stringstore scheme"
This reverts commit a51149a717 .
|
2016-09-30 20:10:30 +02:00 |
|
Matthew Honnibal
|
1e7d0af127
|
Revert "Changes to Lexeme for new string store scheme"
This reverts commit 717741b6cf .
|
2016-09-30 20:10:13 +02:00 |
|
Matthew Honnibal
|
ba51cb8325
|
Revert "Changes to tagger for new string store scheme"
This reverts commit f5a6aac906 .
|
2016-09-30 20:09:53 +02:00 |
|
Matthew Honnibal
|
23b7244842
|
Make sure symbols are unicode strings
|
2016-09-30 20:02:19 +02:00 |
|
Matthew Honnibal
|
f5a6aac906
|
Changes to tagger for new string store scheme
|
2016-09-30 20:01:51 +02:00 |
|
Matthew Honnibal
|
717741b6cf
|
Changes to Lexeme for new string store scheme
|
2016-09-30 20:01:36 +02:00 |
|
Matthew Honnibal
|
a51149a717
|
Changes to vocab for new stringstore scheme
|
2016-09-30 20:01:19 +02:00 |
|
Matthew Honnibal
|
21e90d7d0b
|
Changes to test for new string-store
|
2016-09-30 20:00:58 +02:00 |
|
Matthew Honnibal
|
99de44d864
|
Changes to Doc and Token for new string store scheme
|
2016-09-30 20:00:21 +02:00 |
|
Matthew Honnibal
|
78f19baafa
|
Fix report of ParserStateError
|
2016-09-30 19:59:22 +02:00 |
|
Matthew Honnibal
|
0442e0ab1e
|
Changes to transition systems for new StringStore scheme
|
2016-09-30 19:58:51 +02:00 |
|
Matthew Honnibal
|
22d4752d64
|
Changes to strings.pyx for new StringStore scheme
|
2016-09-30 19:58:09 +02:00 |
|
Matthew Honnibal
|
4f794b215a
|
Changes to iterators.pyx for new StringStore scheme
|
2016-09-30 19:57:49 +02:00 |
|
Matthew Honnibal
|
95f8cfd745
|
Changes to morphology.pyx for new StringStore scheme
|
2016-09-30 19:57:10 +02:00 |
|
Matthew Honnibal
|
3ff09614e0
|
Changes to matcher.pyx for new StringStore scheme
|
2016-09-30 19:56:48 +02:00 |
|
Matthew Honnibal
|
eceeaefe53
|
Fix defaults for Parser and Entity, adding a blank= argument.
|
2016-09-30 19:56:06 +02:00 |
|
Matthew Honnibal
|
8423e8627f
|
Work on Issue #285: intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good.
|
2016-09-30 10:14:47 +02:00 |
|
Matthew Honnibal
|
d3dc5718b2
|
Fix syntax error in Doc
|
2016-09-28 11:39:49 +02:00 |
|
Matthew Honnibal
|
1b520e7bab
|
Improve docstrings for Doc object
|
2016-09-28 11:15:13 +02:00 |
|
Matthew Honnibal
|
81a47c01d8
|
Fix test for empty sentence string.
|
2016-09-27 19:21:22 +02:00 |
|
Matthew Honnibal
|
4cbf0d3bb6
|
Handle errors when no valid actions are available, pointing users to the issue tracker.
|
2016-09-27 19:19:53 +02:00 |
|
Matthew Honnibal
|
430473bd98
|
Raise errors when no actions are available, re Issue #429
|
2016-09-27 19:09:37 +02:00 |
|
Matthew Honnibal
|
fc4a7ad794
|
Test and fix Issue #411: IndexError when .sents property is used on empty string.
|
2016-09-27 18:49:14 +02:00 |
|
Matthew Honnibal
|
3d370b7d45
|
Add test for Issue #445, fixed in 3cb4d455d , with improved lemmatizer logic
|
2016-09-27 18:39:46 +02:00 |
|
Matthew Honnibal
|
a2f3510d6d
|
Fix lemmatizer
|
2016-09-27 17:47:05 +02:00 |
|
Matthew Honnibal
|
07776d8096
|
Fix pos name conflict in lemmatize
|
2016-09-27 17:35:58 +02:00 |
|
Matthew Honnibal
|
35cd953f9e
|
Fix pos name conflict with morphology
|
2016-09-27 14:16:22 +02:00 |
|
Matthew Honnibal
|
8e7df3c4ca
|
Expect the parser data, if parser.load() is called.
|
2016-09-27 14:02:12 +02:00 |
|
Matthew Honnibal
|
bb4f201ad2
|
Pass morphological features from tag map into the lemmatizer.
|
2016-09-27 14:01:43 +02:00 |
|
Matthew Honnibal
|
40509e8bca
|
Tweak the new is_base_form logic, because we can expect the 'pos' key in the morphology we're passed.
|
2016-09-27 14:01:16 +02:00 |
|
Matthew Honnibal
|
9c8ac91d72
|
Add test for Issue #435
|
2016-09-27 13:52:38 +02:00 |
|
Matthew Honnibal
|
3cb4d455d2
|
Pass lemmatizer morphological features, so that rules are sensitive to base/inflected distinction, which is how the WordNet data is designed. See Issue #435
|
2016-09-27 13:52:11 +02:00 |
|
Matthew Honnibal
|
e233328d38
|
Fix Issue #371: Lexeme objects were unhashable.
|
2016-09-27 13:22:30 +02:00 |
|
Matthew Honnibal
|
e382e48d9f
|
Temporarily patch handling of defaul templates for tagger. Need to move these to language_data.
|
2016-09-27 13:21:28 +02:00 |
|
Matthew Honnibal
|
a44763af0e
|
Fix Issue #469: Incorrectly cased root label in noun chunk iterator
|
2016-09-27 13:13:01 +02:00 |
|
Matthew Honnibal
|
b14b9b096b
|
Return None if /deps directory not present, instead of trying to load the parser.
|
2016-09-26 18:48:03 +02:00 |
|
Matthew Honnibal
|
e07b9665f7
|
Don't expect parser model
|
2016-09-26 18:09:33 +02:00 |
|