Commit Graph

1914 Commits

Author SHA1 Message Date
Matthew Honnibal
150e02d72e Fix Issue #566 2016-10-23 20:19:01 +02:00
Matthew Honnibal
e120561294 Fix vector_norm test. 2016-10-23 19:56:16 +02:00
Matthew Honnibal
fefde8aef8 Make installation print data path. 2016-10-23 19:46:44 +02:00
Matthew Honnibal
e7414cd064 Try to fix weird install glitch. 2016-10-23 19:46:28 +02:00
Matthew Honnibal
90f7544edd Increment version 2016-10-23 19:43:06 +02:00
Matthew Honnibal
6036ec7c77 Fix vector norm when loading lexemes. 2016-10-23 19:40:18 +02:00
Matthew Honnibal
c05cd2356e Fix similarity test for Python 3 2016-10-23 18:16:56 +02:00
Matthew Honnibal
3e688e6d4b Fix issue #514 -- serializer fails when new entity type has been added. The fix here is quite ugly. It's best to add the entities ASAP after loading the NLP pipeline, to mitigate the brittleness. 2016-10-23 17:45:44 +02:00
Matthew Honnibal
79aa03fe98 Test Issue #514: Serializer fails when new entity type has been added. 2016-10-23 17:41:44 +02:00
Matthew Honnibal
f97548c6f1 Fix broken test, re Issue #461 2016-10-23 17:02:23 +02:00
Matthew Honnibal
4de30a8e38 Test Issue #514: Serialization fails after adding a new entity label. 2016-10-23 16:40:27 +02:00
Matthew Honnibal
936e6246aa Fix Issue #459 -- failed to deserialize empty doc. 2016-10-23 16:31:05 +02:00
Matthew Honnibal
e99b3f5322 Test Issue #459: Fail to deserialize empty doc 2016-10-23 16:30:22 +02:00
Matthew Honnibal
49c117960c Fix bug where huffman codec died if given empty freqs dict. 2016-10-23 16:28:05 +02:00
Matthew Honnibal
99ff8b902f Test that huffman codec works with empty freqs dict 2016-10-23 16:27:45 +02:00
Matthew Honnibal
15c9b59f0e Fix Issue #461: O tag was being clobbered by doc.ents.__set__ 2016-10-23 15:50:26 +02:00
Matthew Honnibal
e5627134d9 Test Issue #461: ent_iob tag incorrect after setting entities. 2016-10-23 15:50:04 +02:00
Matthew Honnibal
f62088d646 Fix compile error 2016-10-23 14:50:50 +02:00
Matthew Honnibal
2c3a67b693 Fix calculation of vector norm, re Issue #522. Need to consolidate the calculations into a helper function. 2016-10-23 14:49:31 +02:00
Matthew Honnibal
a0a4ada42a Fix calculation of L2-norm for Lexeme 2016-10-23 14:44:45 +02:00
Matthew Honnibal
2989072aac Add tests to verify that Issue #442 is fixed in 1.1 2016-10-23 14:33:13 +02:00
Matthew Honnibal
739213a8af Fix create_pipeline keyword argument. 2016-10-23 14:24:16 +02:00
Matthew Honnibal
bea44bd3c4 Fix vector_norm when vector is assigned to Lexeme. 2016-10-23 14:23:56 +02:00
Matthew Honnibal
e838b6d53f Add tests for using the new Entity ID tracking in the rule matcher 2016-10-23 14:04:01 +02:00
Matthew Honnibal
e7af75e0a9 Add test for vector resizing, re Issue #544 2016-10-21 17:07:21 +02:00
Matthew Honnibal
ca8ea33abc Bump version to 1.1.0 2016-10-21 16:30:57 +02:00
Matthew Honnibal
7ab03050d4 Add resize_vectors method to Vocab 2016-10-21 01:44:50 +02:00
Matthew Honnibal
8ce8803824 Fix JSON in tokenizer 2016-10-21 01:44:20 +02:00
Matthew Honnibal
6eb73a095f Fix JSON in tagger 2016-10-21 01:44:10 +02:00
Matthew Honnibal
e16e78a737 Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-10-21 00:00:15 +02:00
Matthew Honnibal
147373c807 Increment version 2016-10-21 00:00:03 +02:00
Matthew Honnibal
e80944276f Fix Span.vector_norm 2016-10-20 21:58:56 +02:00
Matthew Honnibal
f5fe4f595b Fix json loading, for Python 3. 2016-10-20 21:23:26 +02:00
Matthew Honnibal
2e92c6fb3a Fix JSON encoding issue on load 2016-10-20 21:06:48 +02:00
Matthew Honnibal
4ad7bb96c9 Increment version. 2016-10-20 20:48:30 +02:00
Matthew Honnibal
5ec32f5d97 Fix loading of GloVe vectors, to address Issue #541 2016-10-20 18:27:48 +02:00
Matthew Honnibal
ddeabd76c4 Fix mistake loading GloVe vectors. GloVe vectors now loaded by default if present, as promised. 2016-10-20 16:57:53 +02:00
Matthew Honnibal
bfe5cb1244 Increment version. 2016-10-20 14:52:00 +02:00
Matthew Honnibal
f189a3cb00 Fix encoding when opening files in Python 2.7, re Issue #539 2016-10-20 14:42:56 +02:00
Matthew Honnibal
c353a5214d Increment version 2016-10-19 23:51:01 +02:00
Matthew Honnibal
d10c17f2a4 Fix Issue #536: oov_prob was 0 for OOV words. 2016-10-19 23:38:47 +02:00
Matthew Honnibal
dfa752d064 Increment version 2016-10-19 23:19:13 +02:00
Matthew Honnibal
3588a18fb8 Fix hook names in doc 2016-10-19 21:15:16 +02:00
Matthew Honnibal
5d5742b773 Add sentiment field to doc, rename getters_for_tokens and getters_for_spans, add user_hooks field to Doc. 2016-10-19 20:54:22 +02:00
Matthew Honnibal
ed5e178817 Add sentiment property on lexeme object 2016-10-19 20:52:52 +02:00
Matthew Honnibal
d4aaf2752c Fix issue #535: Pipeline elements added even when data not installed. 2016-10-19 19:55:19 +02:00
Matthew Honnibal
04d1c959da Fix version 2016-10-19 03:45:37 +02:00
Matthew Honnibal
d35aa7344e Change version ID to make PyPi happy 2016-10-19 03:24:39 +02:00
Matthew Honnibal
89d2a5c8b3 Increment build version. 2016-10-19 03:05:17 +02:00
Matthew Honnibal
622b0a9674 Tweak download script 2016-10-19 00:52:16 +02:00
Matthew Honnibal
5a5c7192a5 Fix download.py for GloVe vectors. 2016-10-19 00:47:44 +02:00
Matthew Honnibal
edc45c19d6 Update download script 2016-10-19 00:41:14 +02:00
Matthew Honnibal
2bbb050500 Fix default of serializer_freqs 2016-10-18 19:55:41 +02:00
Matthew Honnibal
1b651db9c5 Fix parser creation in Language class. 2016-10-18 19:36:44 +02:00
Matthew Honnibal
45a6f9b9c7 Fix loading of tagger. 2016-10-18 19:33:04 +02:00
Matthew Honnibal
76c815f40d Fix spacy.load 2016-10-18 19:23:31 +02:00
Matthew Honnibal
8c8f5c62c6 Add LANG attribute to English and German 2016-10-18 18:52:48 +02:00
Matthew Honnibal
05e2a589a4 Fix None label in matcher 2016-10-18 18:05:21 +02:00
Matthew Honnibal
c3a8a1cf51 Update serializer test. 2016-10-18 16:18:46 +02:00
Matthew Honnibal
7d5212f131 Refactor defaults 2016-10-18 16:18:25 +02:00
Matthew Honnibal
a45a9d5092 Remove stray .tensor attribute from Lexeme 2016-10-18 01:16:32 +02:00
Matthew Honnibal
9258db788a Revert "Have the matcher return character offsets, to handle the match better."
This reverts commit 049c937540.
2016-10-17 16:49:51 +02:00
Matthew Honnibal
7d446e5094 Revert "Update matcher test, to reflect character offset return instead of token offset."
This reverts commit f8d3e3bcfe.
2016-10-17 16:49:49 +02:00
Matthew Honnibal
4bf2c53c13 Revert "Hack on matcher tests, for new implementation."
This reverts commit dbe60644ab.
2016-10-17 16:49:48 +02:00
Matthew Honnibal
2fd97c71cc Revert "Don't try to pickle matcher."
This reverts commit 97bd0c9d00.
2016-10-17 16:49:43 +02:00
Matthew Honnibal
97bd0c9d00 Don't try to pickle matcher. 2016-10-17 16:38:40 +02:00
Matthew Honnibal
dbe60644ab Hack on matcher tests, for new implementation. 2016-10-17 16:12:22 +02:00
Matthew Honnibal
f8d3e3bcfe Update matcher test, to reflect character offset return instead of token offset. 2016-10-17 16:00:10 +02:00
Matthew Honnibal
049c937540 Have the matcher return character offsets, to handle the match better. 2016-10-17 15:58:57 +02:00
Matthew Honnibal
9b60186266 Fix doc class 2016-10-17 15:23:47 +02:00
Matthew Honnibal
6cbdc94959 Lots of updates to Matcher, to make entity handling sane. 2016-10-17 15:23:31 +02:00
Matthew Honnibal
7fd98fc91c Remove deprecation shim around str/bytes in Token. 2016-10-17 14:02:47 +02:00
Matthew Honnibal
b67697a97b Improve API for doc.merge() and span.merge(), to use keyword arguments. 2016-10-17 14:02:13 +02:00
Matthew Honnibal
fbb7f3f15c Add user_data attribute to Doc object. 2016-10-17 11:43:22 +02:00
Matthew Honnibal
c1abc8f6ed Fix deprecation stuff in Token: Remove the shim for the str/unicode semantics, and raise for has_repvec and repvec 2016-10-17 11:18:41 +02:00
Matthew Honnibal
4ba9eadf3d Merge branch 'v1.0.0-rc1' of ssh://github.com/explosion/spaCy into v1.0.0-rc1 2016-10-17 02:45:44 +02:00
Matthew Honnibal
09ab447a18 Remove tensor property from token. 2016-10-17 02:45:09 +02:00
Matthew Honnibal
5d10e2005c Defer some attributes to Doc, via getters_for_tokens attribute. 2016-10-17 02:44:49 +02:00
Matthew Honnibal
8829984efb Remove tensor attribute from Span and Token. 2016-10-17 02:44:04 +02:00
Matthew Honnibal
d15a88c66a Defer some attributes to Doc via getters_for_spans 2016-10-17 02:43:35 +02:00
Matthew Honnibal
62230dd13a Add getters_for_spans and getters_for_tokens attributes to Doc. Fix docstring 2016-10-17 02:42:51 +02:00
Matthew Honnibal
ae11ea8240 Add getters_for_tokens and getters_for_spans attributes to Doc object. 2016-10-17 02:42:05 +02:00
Matthew Honnibal
be48a7b4f3 Fix conftest for website tests. 2016-10-17 01:54:26 +02:00
Matthew Honnibal
8951bf6989 Update matcher tests 2016-10-17 01:53:24 +02:00
Matthew Honnibal
0cf4aff470 Set default path in EN/DE tests. 2016-10-17 01:52:49 +02:00
Matthew Honnibal
cd71b6b0a9 Remove test of parser pickle 2016-10-17 01:52:10 +02:00
Matthew Honnibal
5bc101006e Add cfg field to Tagger 2016-10-17 01:03:41 +02:00
Matthew Honnibal
517f090cbf Use GoldParse in tagger.update 2016-10-17 00:55:15 +02:00
Matthew Honnibal
59038f7efa Restore support for prior data format -- specifically, the labels field of the config. 2016-10-17 00:53:26 +02:00
Matthew Honnibal
7887ab3b36 Fix default use of feature_templates in parser 2016-10-16 21:41:56 +02:00
Matthew Honnibal
f787cd29fe Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor. 2016-10-16 21:34:57 +02:00
Matthew Honnibal
311a985fe0 Add input error handling in Doc 2016-10-16 18:16:42 +02:00
Matthew Honnibal
06322ba99d Add words and spaces keyword arguments to Doc. 2016-10-16 18:13:03 +02:00
Matthew Honnibal
ca51f3b77e Use DependencyParser and EntityRecognizer in the Language class. 2016-10-16 17:58:12 +02:00
Matthew Honnibal
195d998a12 Fix GoldParse argument to tagger.update 2016-10-16 17:05:09 +02:00
Matthew Honnibal
274a4d4272 Fix queue Python property in StateClass 2016-10-16 17:04:41 +02:00
Matthew Honnibal
e8c8aa08ce Make action_name optional in StepwiseState 2016-10-16 17:04:16 +02:00
Matthew Honnibal
4bb73b1a93 Fix parser labels in pipeline 2016-10-16 17:03:22 +02:00
Matthew Honnibal
a81c5a7abf Fix name of labels keyword to 'actions'. 2016-10-16 12:00:27 +02:00
Matthew Honnibal
a079677984 Fix omission of O action when creating blank entity recognizer 2016-10-16 11:43:25 +02:00
Matthew Honnibal
5444d38cc6 Update test for biluo tags 2016-10-16 11:42:45 +02:00
Matthew Honnibal
4fc56d4a31 Rename 'labels' to 'actions' in parser options 2016-10-16 11:42:26 +02:00
Matthew Honnibal
8a6b35d266 Delay binding in MakeDoc 2016-10-16 11:41:55 +02:00
Matthew Honnibal
52b48b415e Fix GoldParse class 2016-10-16 11:41:36 +02:00
Matthew Honnibal
3259a63779 Whitespace 2016-10-16 01:47:28 +02:00
Matthew Honnibal
509b30834f Add a pipeline module, to collect and wrap processes for annotation 2016-10-16 01:47:12 +02:00
Matthew Honnibal
0317cea0ad Fix GoldParse 2016-10-15 23:55:07 +02:00
Matthew Honnibal
1c62573a41 Fix spacy.train 2016-10-15 23:53:46 +02:00
Matthew Honnibal
a48aa15384 Improve the API for the GoldParse class. 2016-10-15 23:53:29 +02:00
Matthew Honnibal
e07fe92b27 Draft a refactored init for the GoldParse class 2016-10-15 22:09:52 +02:00
Matthew Honnibal
47afef7d6b Add init.py for gold tests 2016-10-15 21:51:28 +02:00
Matthew Honnibal
86ae665c78 Add function for entity->biluo transformation 2016-10-15 21:51:04 +02:00
Matthew Honnibal
2163fd238f Add tests for entity->biluo transformation 2016-10-15 21:50:43 +02:00
Matthew Honnibal
5e923b9bfa Return None in match_best_version if not path exists. 2016-10-15 14:47:29 +02:00
Matthew Honnibal
2516382106 Fix loading of English in span test 2016-10-15 14:44:37 +02:00
Matthew Honnibal
dda2fc6bef Add empty data directory 2016-10-15 14:25:25 +02:00
Matthew Honnibal
049197e0ae Update tests, somewhat messily. 2016-10-15 14:14:04 +02:00
Matthew Honnibal
1e1a1d9517 Update matcher test 2016-10-15 14:13:41 +02:00
Matthew Honnibal
9cc9ce0f14 Load with default path=False in tests. 2016-10-15 14:13:23 +02:00
Matthew Honnibal
08e9134760 Change default value of path to True 2016-10-15 14:12:54 +02:00
Matthew Honnibal
788657f062 Ensure words are added to vocab before test, so that the lexicon is updated correctly. 2016-10-15 14:12:18 +02:00
Matthew Honnibal
4a1a2bce68 Update version in about.py 2016-10-15 13:44:27 +02:00
Matthew Honnibal
6d8cb515ac Break the tokenization stage out of the pipeline into a function 'make_doc'. This allows all pipeline methods to have the same signature. 2016-10-14 17:38:29 +02:00
Matthew Honnibal
2cc515b2ed Add add_flag method to Vocab, re Issue #504. 2016-10-14 12:15:38 +02:00
Matthew Honnibal
f3be9d0a9a Add tensor field to Lexeme, Token, Doc and Span, so that users have a place to hang neural network outputs 2016-10-14 03:24:13 +02:00
Matthew Honnibal
9b55d97a8f Update train method 2016-10-13 03:24:53 +02:00
Matthew Honnibal
645d99523a Move merge_sents method into spacy.gold 2016-10-13 03:24:29 +02:00
Matthew Honnibal
41f88ce938 Fix dep model loading in parser 2016-10-12 20:26:38 +02:00
Matthew Honnibal
d9ae2d68af Load features by string-name for backwards compatibility. 2016-10-12 20:15:11 +02:00
Matthew Honnibal
a42fbcf946 Require model for test_is_properties 2016-10-12 19:35:18 +02:00
Matthew Honnibal
20c948361b Use local path in test_lemmatizer 2016-10-12 19:35:00 +02:00
Matthew Honnibal
1318d0bc65 Test with the non-loaded versions of the English and German pipelines. 2016-10-12 19:13:31 +02:00
Matthew Honnibal
0e2bedc373 Fix default labels for parser and NER 2016-10-12 19:12:40 +02:00
Matthew Honnibal
3a03c668c3 Fix message in ParserStateError 2016-10-12 14:44:31 +02:00
Matthew Honnibal
6bf505e865 Fix error on ParserStateError 2016-10-12 14:35:55 +02:00
Matthew Honnibal
ba5e048502 Add docstring for Trainer class. 2016-10-12 14:26:02 +02:00
Matthew Honnibal
847a4a4182 Refactor Language, dropping Language.blank() method. 2016-10-12 13:45:58 +02:00
Matthew Honnibal
ea23b64cc8 Refactor training, with new spacy.train module. Defaults still a little awkward. 2016-10-09 12:24:24 +02:00
Matthew Honnibal
ca32a1ab01 Revert "Work on Issue #285: intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good."
This reverts commit 8423e8627f.
2016-09-30 20:20:22 +02:00
Matthew Honnibal
90baa9c7e6 Revert "Changes to matcher.pyx for new StringStore scheme"
This reverts commit 3ff09614e0.
2016-09-30 20:20:13 +02:00
Matthew Honnibal
1b6b129c04 Revert "Changes to morphology.pyx for new StringStore scheme"
This reverts commit 95f8cfd745.
2016-09-30 20:20:02 +02:00
Matthew Honnibal
1d70db58aa Revert "Changes to iterators.pyx for new StringStore scheme"
This reverts commit 4f794b215a.
2016-09-30 20:19:53 +02:00
Matthew Honnibal
de01e427fd Revert "Changes to strings.pyx for new StringStore scheme"
This reverts commit 22d4752d64.
2016-09-30 20:19:42 +02:00
Matthew Honnibal
9e09b39b9f Revert "Changes to transition systems for new StringStore scheme"
This reverts commit 0442e0ab1e.
2016-09-30 20:11:49 +02:00
Matthew Honnibal
e3285f6f30 Revert "Fix report of ParserStateError"
This reverts commit 78f19baafa.
2016-09-30 20:11:33 +02:00
Matthew Honnibal
6736977d82 Revert "Changes to Doc and Token for new string store scheme"
This reverts commit 99de44d864.
2016-09-30 20:11:15 +02:00
Matthew Honnibal
bd7fe6420c Revert "Changes to test for new string-store"
This reverts commit 21e90d7d0b.
2016-09-30 20:11:01 +02:00
Matthew Honnibal
1f1cd5013f Revert "Changes to vocab for new stringstore scheme"
This reverts commit a51149a717.
2016-09-30 20:10:30 +02:00
Matthew Honnibal
1e7d0af127 Revert "Changes to Lexeme for new string store scheme"
This reverts commit 717741b6cf.
2016-09-30 20:10:13 +02:00
Matthew Honnibal
ba51cb8325 Revert "Changes to tagger for new string store scheme"
This reverts commit f5a6aac906.
2016-09-30 20:09:53 +02:00