Matthew Honnibal
|
b86f8af0c1
|
Fix doc strings
|
2016-11-01 12:25:36 +01:00 |
|
Matthew Honnibal
|
d563f1eadb
|
Fix Issue #587: Segfault in Matcher, due to simple error in the state machine.
|
2016-10-28 17:42:00 +02:00 |
|
Matthew Honnibal
|
7e5f63a595
|
Improve test slightly
|
2016-10-28 17:41:16 +02:00 |
|
Matthew Honnibal
|
782e4814f4
|
Test Issue #587: Matcher segfaults on particular input
|
2016-10-28 16:38:32 +02:00 |
|
Matthew Honnibal
|
708ea22208
|
Infer types in transition_system.pyx
|
2016-10-27 18:08:13 +02:00 |
|
Matthew Honnibal
|
18590eba94
|
Fix training evaluate method
|
2016-10-27 18:02:19 +02:00 |
|
Matthew Honnibal
|
301f3cc898
|
Fix Issue #429. Add an initialize_state method to the named entity recogniser that adds missing entity types. This is a messy place to add this, because it's strange to have the method mutate state. A better home for this logic could be found.
|
2016-10-27 18:01:55 +02:00 |
|
Matthew Honnibal
|
afea6505f3
|
Test Issue 429: No valid actions for NER after matcher adds a new entity label.
|
2016-10-27 18:01:34 +02:00 |
|
Matthew Honnibal
|
03a520ec4f
|
Change signature of Parser.parseC, so that nr_class is read from the transition system. This allows the transition system to modify the number of actions in initialize_state.
|
2016-10-27 17:58:56 +02:00 |
|
Matthew Honnibal
|
6c47048912
|
Fix test, after IOB tweak.
|
2016-10-26 17:22:03 +02:00 |
|
Matthew Honnibal
|
4ca31b4d87
|
Fix clobbering of 'missing' named ent values after assigning ents.
|
2016-10-26 13:13:56 +02:00 |
|
Matthew Honnibal
|
cb49189477
|
Remove dead code
|
2016-10-26 13:11:07 +02:00 |
|
Matthew Honnibal
|
a209b10579
|
Improve error message when oracle fails for non-projective trees, re Issue #571.
|
2016-10-24 20:31:30 +02:00 |
|
Matthew Honnibal
|
b2d43b93d2
|
Fix Python 3 basestring error
|
2016-10-24 14:22:51 +02:00 |
|
Matthew Honnibal
|
276478fe0f
|
Update strings.pxd
|
2016-10-24 14:00:35 +02:00 |
|
Matthew Honnibal
|
d8134817ff
|
Workaround Issue #285: Allow the StringStore to be 'frozen', in which case strings will be pushed into an OOV map. We can then flush this OOV map, freeing all of the OOV strings.
|
2016-10-24 13:49:03 +02:00 |
|
Matthew Honnibal
|
d3a617aa99
|
Test workaround for Issue #285: Streaming data memory growth
|
2016-10-24 13:48:06 +02:00 |
|
Matthew Honnibal
|
64e5f02cf7
|
Update test
|
2016-10-23 21:08:07 +02:00 |
|
Matthew Honnibal
|
66d7a6eca2
|
Update test
|
2016-10-23 21:02:05 +02:00 |
|
Matthew Honnibal
|
90bf797125
|
Update test
|
2016-10-23 20:54:17 +02:00 |
|
Matthew Honnibal
|
5e76320ffe
|
Update test
|
2016-10-23 20:44:54 +02:00 |
|
Matthew Honnibal
|
aa105927f3
|
Update test
|
2016-10-23 20:31:25 +02:00 |
|
Matthew Honnibal
|
6b9237aa83
|
Increment version
|
2016-10-23 20:22:53 +02:00 |
|
Matthew Honnibal
|
150e02d72e
|
Fix Issue #566
|
2016-10-23 20:19:01 +02:00 |
|
Matthew Honnibal
|
e120561294
|
Fix vector_norm test.
|
2016-10-23 19:56:16 +02:00 |
|
Matthew Honnibal
|
fefde8aef8
|
Make installation print data path.
|
2016-10-23 19:46:44 +02:00 |
|
Matthew Honnibal
|
e7414cd064
|
Try to fix weird install glitch.
|
2016-10-23 19:46:28 +02:00 |
|
Matthew Honnibal
|
90f7544edd
|
Increment version
|
2016-10-23 19:43:06 +02:00 |
|
Matthew Honnibal
|
6036ec7c77
|
Fix vector norm when loading lexemes.
|
2016-10-23 19:40:18 +02:00 |
|
Matthew Honnibal
|
c05cd2356e
|
Fix similarity test for Python 3
|
2016-10-23 18:16:56 +02:00 |
|
Matthew Honnibal
|
3e688e6d4b
|
Fix issue #514 -- serializer fails when new entity type has been added. The fix here is quite ugly. It's best to add the entities ASAP after loading the NLP pipeline, to mitigate the brittleness.
|
2016-10-23 17:45:44 +02:00 |
|
Matthew Honnibal
|
79aa03fe98
|
Test Issue #514: Serializer fails when new entity type has been added.
|
2016-10-23 17:41:44 +02:00 |
|
Matthew Honnibal
|
f97548c6f1
|
Fix broken test, re Issue #461
|
2016-10-23 17:02:23 +02:00 |
|
Matthew Honnibal
|
4de30a8e38
|
Test Issue #514: Serialization fails after adding a new entity label.
|
2016-10-23 16:40:27 +02:00 |
|
Matthew Honnibal
|
936e6246aa
|
Fix Issue #459 -- failed to deserialize empty doc.
|
2016-10-23 16:31:05 +02:00 |
|
Matthew Honnibal
|
e99b3f5322
|
Test Issue #459: Fail to deserialize empty doc
|
2016-10-23 16:30:22 +02:00 |
|
Matthew Honnibal
|
49c117960c
|
Fix bug where huffman codec died if given empty freqs dict.
|
2016-10-23 16:28:05 +02:00 |
|
Matthew Honnibal
|
99ff8b902f
|
Test that huffman codec works with empty freqs dict
|
2016-10-23 16:27:45 +02:00 |
|
Matthew Honnibal
|
15c9b59f0e
|
Fix Issue #461: O tag was being clobbered by doc.ents.__set__
|
2016-10-23 15:50:26 +02:00 |
|
Matthew Honnibal
|
e5627134d9
|
Test Issue #461: ent_iob tag incorrect after setting entities.
|
2016-10-23 15:50:04 +02:00 |
|
Matthew Honnibal
|
f62088d646
|
Fix compile error
|
2016-10-23 14:50:50 +02:00 |
|
Matthew Honnibal
|
2c3a67b693
|
Fix calculation of vector norm, re Issue #522. Need to consolidate the calculations into a helper function.
|
2016-10-23 14:49:31 +02:00 |
|
Matthew Honnibal
|
a0a4ada42a
|
Fix calculation of L2-norm for Lexeme
|
2016-10-23 14:44:45 +02:00 |
|
Matthew Honnibal
|
2989072aac
|
Add tests to verify that Issue #442 is fixed in 1.1
|
2016-10-23 14:33:13 +02:00 |
|
Matthew Honnibal
|
739213a8af
|
Fix create_pipeline keyword argument.
|
2016-10-23 14:24:16 +02:00 |
|
Matthew Honnibal
|
bea44bd3c4
|
Fix vector_norm when vector is assigned to Lexeme.
|
2016-10-23 14:23:56 +02:00 |
|
Matthew Honnibal
|
e838b6d53f
|
Add tests for using the new Entity ID tracking in the rule matcher
|
2016-10-23 14:04:01 +02:00 |
|
Matthew Honnibal
|
e7af75e0a9
|
Add test for vector resizing, re Issue #544
|
2016-10-21 17:07:21 +02:00 |
|
Matthew Honnibal
|
ca8ea33abc
|
Bump version to 1.1.0
|
2016-10-21 16:30:57 +02:00 |
|
Matthew Honnibal
|
7ab03050d4
|
Add resize_vectors method to Vocab
|
2016-10-21 01:44:50 +02:00 |
|
Matthew Honnibal
|
8ce8803824
|
Fix JSON in tokenizer
|
2016-10-21 01:44:20 +02:00 |
|
Matthew Honnibal
|
6eb73a095f
|
Fix JSON in tagger
|
2016-10-21 01:44:10 +02:00 |
|
Matthew Honnibal
|
e16e78a737
|
Merge branch 'master' of ssh://github.com/explosion/spaCy
|
2016-10-21 00:00:15 +02:00 |
|
Matthew Honnibal
|
147373c807
|
Increment version
|
2016-10-21 00:00:03 +02:00 |
|
Matthew Honnibal
|
e80944276f
|
Fix Span.vector_norm
|
2016-10-20 21:58:56 +02:00 |
|
Matthew Honnibal
|
f5fe4f595b
|
Fix json loading, for Python 3.
|
2016-10-20 21:23:26 +02:00 |
|
Matthew Honnibal
|
2e92c6fb3a
|
Fix JSON encoding issue on load
|
2016-10-20 21:06:48 +02:00 |
|
Matthew Honnibal
|
4ad7bb96c9
|
Increment version.
|
2016-10-20 20:48:30 +02:00 |
|
Matthew Honnibal
|
5ec32f5d97
|
Fix loading of GloVe vectors, to address Issue #541
|
2016-10-20 18:27:48 +02:00 |
|
Matthew Honnibal
|
ddeabd76c4
|
Fix mistake loading GloVe vectors. GloVe vectors now loaded by default if present, as promised.
|
2016-10-20 16:57:53 +02:00 |
|
Matthew Honnibal
|
bfe5cb1244
|
Increment version.
|
2016-10-20 14:52:00 +02:00 |
|
Matthew Honnibal
|
f189a3cb00
|
Fix encoding when opening files in Python 2.7, re Issue #539
|
2016-10-20 14:42:56 +02:00 |
|
Matthew Honnibal
|
c353a5214d
|
Increment version
|
2016-10-19 23:51:01 +02:00 |
|
Matthew Honnibal
|
d10c17f2a4
|
Fix Issue #536: oov_prob was 0 for OOV words.
|
2016-10-19 23:38:47 +02:00 |
|
Matthew Honnibal
|
dfa752d064
|
Increment version
|
2016-10-19 23:19:13 +02:00 |
|
Matthew Honnibal
|
3588a18fb8
|
Fix hook names in doc
|
2016-10-19 21:15:16 +02:00 |
|
Matthew Honnibal
|
5d5742b773
|
Add sentiment field to doc, rename getters_for_tokens and getters_for_spans, add user_hooks field to Doc.
|
2016-10-19 20:54:22 +02:00 |
|
Matthew Honnibal
|
ed5e178817
|
Add sentiment property on lexeme object
|
2016-10-19 20:52:52 +02:00 |
|
Matthew Honnibal
|
d4aaf2752c
|
Fix issue #535: Pipeline elements added even when data not installed.
|
2016-10-19 19:55:19 +02:00 |
|
Matthew Honnibal
|
04d1c959da
|
Fix version
|
2016-10-19 03:45:37 +02:00 |
|
Matthew Honnibal
|
d35aa7344e
|
Change version ID to make PyPi happy
|
2016-10-19 03:24:39 +02:00 |
|
Matthew Honnibal
|
89d2a5c8b3
|
Increment build version.
|
2016-10-19 03:05:17 +02:00 |
|
Matthew Honnibal
|
622b0a9674
|
Tweak download script
|
2016-10-19 00:52:16 +02:00 |
|
Matthew Honnibal
|
5a5c7192a5
|
Fix download.py for GloVe vectors.
|
2016-10-19 00:47:44 +02:00 |
|
Matthew Honnibal
|
edc45c19d6
|
Update download script
|
2016-10-19 00:41:14 +02:00 |
|
Matthew Honnibal
|
2bbb050500
|
Fix default of serializer_freqs
|
2016-10-18 19:55:41 +02:00 |
|
Matthew Honnibal
|
1b651db9c5
|
Fix parser creation in Language class.
|
2016-10-18 19:36:44 +02:00 |
|
Matthew Honnibal
|
45a6f9b9c7
|
Fix loading of tagger.
|
2016-10-18 19:33:04 +02:00 |
|
Matthew Honnibal
|
76c815f40d
|
Fix spacy.load
|
2016-10-18 19:23:31 +02:00 |
|
Matthew Honnibal
|
8c8f5c62c6
|
Add LANG attribute to English and German
|
2016-10-18 18:52:48 +02:00 |
|
Matthew Honnibal
|
05e2a589a4
|
Fix None label in matcher
|
2016-10-18 18:05:21 +02:00 |
|
Matthew Honnibal
|
c3a8a1cf51
|
Update serializer test.
|
2016-10-18 16:18:46 +02:00 |
|
Matthew Honnibal
|
7d5212f131
|
Refactor defaults
|
2016-10-18 16:18:25 +02:00 |
|
Matthew Honnibal
|
a45a9d5092
|
Remove stray .tensor attribute from Lexeme
|
2016-10-18 01:16:32 +02:00 |
|
Matthew Honnibal
|
9258db788a
|
Revert "Have the matcher return character offsets, to handle the match better."
This reverts commit 049c937540 .
|
2016-10-17 16:49:51 +02:00 |
|
Matthew Honnibal
|
7d446e5094
|
Revert "Update matcher test, to reflect character offset return instead of token offset."
This reverts commit f8d3e3bcfe .
|
2016-10-17 16:49:49 +02:00 |
|
Matthew Honnibal
|
4bf2c53c13
|
Revert "Hack on matcher tests, for new implementation."
This reverts commit dbe60644ab .
|
2016-10-17 16:49:48 +02:00 |
|
Matthew Honnibal
|
2fd97c71cc
|
Revert "Don't try to pickle matcher."
This reverts commit 97bd0c9d00 .
|
2016-10-17 16:49:43 +02:00 |
|
Matthew Honnibal
|
97bd0c9d00
|
Don't try to pickle matcher.
|
2016-10-17 16:38:40 +02:00 |
|
Matthew Honnibal
|
dbe60644ab
|
Hack on matcher tests, for new implementation.
|
2016-10-17 16:12:22 +02:00 |
|
Matthew Honnibal
|
f8d3e3bcfe
|
Update matcher test, to reflect character offset return instead of token offset.
|
2016-10-17 16:00:10 +02:00 |
|
Matthew Honnibal
|
049c937540
|
Have the matcher return character offsets, to handle the match better.
|
2016-10-17 15:58:57 +02:00 |
|
Matthew Honnibal
|
9b60186266
|
Fix doc class
|
2016-10-17 15:23:47 +02:00 |
|
Matthew Honnibal
|
6cbdc94959
|
Lots of updates to Matcher, to make entity handling sane.
|
2016-10-17 15:23:31 +02:00 |
|
Matthew Honnibal
|
7fd98fc91c
|
Remove deprecation shim around str/bytes in Token.
|
2016-10-17 14:02:47 +02:00 |
|
Matthew Honnibal
|
b67697a97b
|
Improve API for doc.merge() and span.merge(), to use keyword arguments.
|
2016-10-17 14:02:13 +02:00 |
|
Matthew Honnibal
|
fbb7f3f15c
|
Add user_data attribute to Doc object.
|
2016-10-17 11:43:22 +02:00 |
|
Matthew Honnibal
|
c1abc8f6ed
|
Fix deprecation stuff in Token: Remove the shim for the str/unicode semantics, and raise for has_repvec and repvec
|
2016-10-17 11:18:41 +02:00 |
|
Matthew Honnibal
|
4ba9eadf3d
|
Merge branch 'v1.0.0-rc1' of ssh://github.com/explosion/spaCy into v1.0.0-rc1
|
2016-10-17 02:45:44 +02:00 |
|
Matthew Honnibal
|
09ab447a18
|
Remove tensor property from token.
|
2016-10-17 02:45:09 +02:00 |
|
Matthew Honnibal
|
5d10e2005c
|
Defer some attributes to Doc, via getters_for_tokens attribute.
|
2016-10-17 02:44:49 +02:00 |
|
Matthew Honnibal
|
8829984efb
|
Remove tensor attribute from Span and Token.
|
2016-10-17 02:44:04 +02:00 |
|
Matthew Honnibal
|
d15a88c66a
|
Defer some attributes to Doc via getters_for_spans
|
2016-10-17 02:43:35 +02:00 |
|
Matthew Honnibal
|
62230dd13a
|
Add getters_for_spans and getters_for_tokens attributes to Doc. Fix docstring
|
2016-10-17 02:42:51 +02:00 |
|
Matthew Honnibal
|
ae11ea8240
|
Add getters_for_tokens and getters_for_spans attributes to Doc object.
|
2016-10-17 02:42:05 +02:00 |
|
Matthew Honnibal
|
be48a7b4f3
|
Fix conftest for website tests.
|
2016-10-17 01:54:26 +02:00 |
|
Matthew Honnibal
|
8951bf6989
|
Update matcher tests
|
2016-10-17 01:53:24 +02:00 |
|
Matthew Honnibal
|
0cf4aff470
|
Set default path in EN/DE tests.
|
2016-10-17 01:52:49 +02:00 |
|
Matthew Honnibal
|
cd71b6b0a9
|
Remove test of parser pickle
|
2016-10-17 01:52:10 +02:00 |
|
Matthew Honnibal
|
5bc101006e
|
Add cfg field to Tagger
|
2016-10-17 01:03:41 +02:00 |
|
Matthew Honnibal
|
517f090cbf
|
Use GoldParse in tagger.update
|
2016-10-17 00:55:15 +02:00 |
|
Matthew Honnibal
|
59038f7efa
|
Restore support for prior data format -- specifically, the labels field of the config.
|
2016-10-17 00:53:26 +02:00 |
|
Matthew Honnibal
|
7887ab3b36
|
Fix default use of feature_templates in parser
|
2016-10-16 21:41:56 +02:00 |
|
Matthew Honnibal
|
f787cd29fe
|
Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor.
|
2016-10-16 21:34:57 +02:00 |
|
Matthew Honnibal
|
311a985fe0
|
Add input error handling in Doc
|
2016-10-16 18:16:42 +02:00 |
|
Matthew Honnibal
|
06322ba99d
|
Add words and spaces keyword arguments to Doc.
|
2016-10-16 18:13:03 +02:00 |
|
Matthew Honnibal
|
ca51f3b77e
|
Use DependencyParser and EntityRecognizer in the Language class.
|
2016-10-16 17:58:12 +02:00 |
|
Matthew Honnibal
|
195d998a12
|
Fix GoldParse argument to tagger.update
|
2016-10-16 17:05:09 +02:00 |
|
Matthew Honnibal
|
274a4d4272
|
Fix queue Python property in StateClass
|
2016-10-16 17:04:41 +02:00 |
|
Matthew Honnibal
|
e8c8aa08ce
|
Make action_name optional in StepwiseState
|
2016-10-16 17:04:16 +02:00 |
|
Matthew Honnibal
|
4bb73b1a93
|
Fix parser labels in pipeline
|
2016-10-16 17:03:22 +02:00 |
|
Matthew Honnibal
|
a81c5a7abf
|
Fix name of labels keyword to 'actions'.
|
2016-10-16 12:00:27 +02:00 |
|
Matthew Honnibal
|
a079677984
|
Fix omission of O action when creating blank entity recognizer
|
2016-10-16 11:43:25 +02:00 |
|
Matthew Honnibal
|
5444d38cc6
|
Update test for biluo tags
|
2016-10-16 11:42:45 +02:00 |
|
Matthew Honnibal
|
4fc56d4a31
|
Rename 'labels' to 'actions' in parser options
|
2016-10-16 11:42:26 +02:00 |
|
Matthew Honnibal
|
8a6b35d266
|
Delay binding in MakeDoc
|
2016-10-16 11:41:55 +02:00 |
|
Matthew Honnibal
|
52b48b415e
|
Fix GoldParse class
|
2016-10-16 11:41:36 +02:00 |
|
Matthew Honnibal
|
3259a63779
|
Whitespace
|
2016-10-16 01:47:28 +02:00 |
|
Matthew Honnibal
|
509b30834f
|
Add a pipeline module, to collect and wrap processes for annotation
|
2016-10-16 01:47:12 +02:00 |
|
Matthew Honnibal
|
0317cea0ad
|
Fix GoldParse
|
2016-10-15 23:55:07 +02:00 |
|
Matthew Honnibal
|
1c62573a41
|
Fix spacy.train
|
2016-10-15 23:53:46 +02:00 |
|
Matthew Honnibal
|
a48aa15384
|
Improve the API for the GoldParse class.
|
2016-10-15 23:53:29 +02:00 |
|
Matthew Honnibal
|
e07fe92b27
|
Draft a refactored init for the GoldParse class
|
2016-10-15 22:09:52 +02:00 |
|
Matthew Honnibal
|
47afef7d6b
|
Add init.py for gold tests
|
2016-10-15 21:51:28 +02:00 |
|
Matthew Honnibal
|
86ae665c78
|
Add function for entity->biluo transformation
|
2016-10-15 21:51:04 +02:00 |
|
Matthew Honnibal
|
2163fd238f
|
Add tests for entity->biluo transformation
|
2016-10-15 21:50:43 +02:00 |
|
Matthew Honnibal
|
5e923b9bfa
|
Return None in match_best_version if not path exists.
|
2016-10-15 14:47:29 +02:00 |
|
Matthew Honnibal
|
2516382106
|
Fix loading of English in span test
|
2016-10-15 14:44:37 +02:00 |
|
Matthew Honnibal
|
dda2fc6bef
|
Add empty data directory
|
2016-10-15 14:25:25 +02:00 |
|
Matthew Honnibal
|
049197e0ae
|
Update tests, somewhat messily.
|
2016-10-15 14:14:04 +02:00 |
|
Matthew Honnibal
|
1e1a1d9517
|
Update matcher test
|
2016-10-15 14:13:41 +02:00 |
|
Matthew Honnibal
|
9cc9ce0f14
|
Load with default path=False in tests.
|
2016-10-15 14:13:23 +02:00 |
|
Matthew Honnibal
|
08e9134760
|
Change default value of path to True
|
2016-10-15 14:12:54 +02:00 |
|
Matthew Honnibal
|
788657f062
|
Ensure words are added to vocab before test, so that the lexicon is updated correctly.
|
2016-10-15 14:12:18 +02:00 |
|
Matthew Honnibal
|
4a1a2bce68
|
Update version in about.py
|
2016-10-15 13:44:27 +02:00 |
|
Matthew Honnibal
|
6d8cb515ac
|
Break the tokenization stage out of the pipeline into a function 'make_doc'. This allows all pipeline methods to have the same signature.
|
2016-10-14 17:38:29 +02:00 |
|
Matthew Honnibal
|
2cc515b2ed
|
Add add_flag method to Vocab, re Issue #504.
|
2016-10-14 12:15:38 +02:00 |
|
Matthew Honnibal
|
f3be9d0a9a
|
Add tensor field to Lexeme, Token, Doc and Span, so that users have a place to hang neural network outputs
|
2016-10-14 03:24:13 +02:00 |
|
Matthew Honnibal
|
9b55d97a8f
|
Update train method
|
2016-10-13 03:24:53 +02:00 |
|
Matthew Honnibal
|
645d99523a
|
Move merge_sents method into spacy.gold
|
2016-10-13 03:24:29 +02:00 |
|