Matthew Honnibal
|
c68dfe2965
|
Stub out support for Italian
|
2016-11-02 20:03:24 +01:00 |
|
Matthew Honnibal
|
6dbf4f7ad7
|
Stub out support for French, Spanish, Italian and Portuguese
|
2016-11-02 20:02:41 +01:00 |
|
Matthew Honnibal
|
6b8b05ef83
|
Specify that spacy.util is encoded in utf8
|
2016-11-02 19:58:00 +01:00 |
|
Matthew Honnibal
|
5363224395
|
Add draft Jieba tokenizer for Chinese
|
2016-11-02 19:57:38 +01:00 |
|
Matthew Honnibal
|
f7fee6c24b
|
Check for class-defined make_docs method before assigning one provided as an argument
|
2016-11-02 19:57:13 +01:00 |
|
Matthew Honnibal
|
19c1e83d3d
|
Work on draft Italian tokenizer
|
2016-11-02 19:56:32 +01:00 |
|
Matthew Honnibal
|
9efe568177
|
Add missing unicode_literals to spacy.util. I think this was messing up the tokenizer regex for non-ascii characters in Python 2. Re Issue #596
|
2016-11-02 12:31:34 +01:00 |
|
Matthew Honnibal
|
d8db648ebf
|
Add __init__.py file for regression tests
|
2016-11-01 13:45:06 +01:00 |
|
Matthew Honnibal
|
11664b9f20
|
Fix variable error in token
|
2016-11-01 13:28:00 +01:00 |
|
Matthew Honnibal
|
8c4d1b46ce
|
Fix variable error in Span
|
2016-11-01 13:27:44 +01:00 |
|
Matthew Honnibal
|
e7af6b937f
|
Fix syntax error while fixing doc strings
|
2016-11-01 13:27:32 +01:00 |
|
Matthew Honnibal
|
62fc6b1afa
|
Use 32 bit hashes for OOV, re Issue #589, Issue #285
|
2016-11-01 13:27:13 +01:00 |
|
Matthew Honnibal
|
6977a2b8cd
|
Add test for Issue #589
|
2016-11-01 12:33:36 +01:00 |
|
Matthew Honnibal
|
b86f8af0c1
|
Fix doc strings
|
2016-11-01 12:25:36 +01:00 |
|
Matthew Honnibal
|
d563f1eadb
|
Fix Issue #587: Segfault in Matcher, due to simple error in the state machine.
|
2016-10-28 17:42:00 +02:00 |
|
Matthew Honnibal
|
7e5f63a595
|
Improve test slightly
|
2016-10-28 17:41:16 +02:00 |
|
Matthew Honnibal
|
782e4814f4
|
Test Issue #587: Matcher segfaults on particular input
|
2016-10-28 16:38:32 +02:00 |
|
Matthew Honnibal
|
708ea22208
|
Infer types in transition_system.pyx
|
2016-10-27 18:08:13 +02:00 |
|
Matthew Honnibal
|
18590eba94
|
Fix training evaluate method
|
2016-10-27 18:02:19 +02:00 |
|
Matthew Honnibal
|
301f3cc898
|
Fix Issue #429. Add an initialize_state method to the named entity recogniser that adds missing entity types. This is a messy place to add this, because it's strange to have the method mutate state. A better home for this logic could be found.
|
2016-10-27 18:01:55 +02:00 |
|
Matthew Honnibal
|
afea6505f3
|
Test Issue 429: No valid actions for NER after matcher adds a new entity label.
|
2016-10-27 18:01:34 +02:00 |
|
Matthew Honnibal
|
03a520ec4f
|
Change signature of Parser.parseC, so that nr_class is read from the transition system. This allows the transition system to modify the number of actions in initialize_state.
|
2016-10-27 17:58:56 +02:00 |
|
Matthew Honnibal
|
6c47048912
|
Fix test, after IOB tweak.
|
2016-10-26 17:22:03 +02:00 |
|
Matthew Honnibal
|
4ca31b4d87
|
Fix clobbering of 'missing' named ent values after assigning ents.
|
2016-10-26 13:13:56 +02:00 |
|
Matthew Honnibal
|
cb49189477
|
Remove dead code
|
2016-10-26 13:11:07 +02:00 |
|
Matthew Honnibal
|
a209b10579
|
Improve error message when oracle fails for non-projective trees, re Issue #571.
|
2016-10-24 20:31:30 +02:00 |
|
Matthew Honnibal
|
b2d43b93d2
|
Fix Python 3 basestring error
|
2016-10-24 14:22:51 +02:00 |
|
Matthew Honnibal
|
276478fe0f
|
Update strings.pxd
|
2016-10-24 14:00:35 +02:00 |
|
Matthew Honnibal
|
d8134817ff
|
Workaround Issue #285: Allow the StringStore to be 'frozen', in which case strings will be pushed into an OOV map. We can then flush this OOV map, freeing all of the OOV strings.
|
2016-10-24 13:49:03 +02:00 |
|
Matthew Honnibal
|
d3a617aa99
|
Test workaround for Issue #285: Streaming data memory growth
|
2016-10-24 13:48:06 +02:00 |
|
Matthew Honnibal
|
64e5f02cf7
|
Update test
|
2016-10-23 21:08:07 +02:00 |
|
Matthew Honnibal
|
66d7a6eca2
|
Update test
|
2016-10-23 21:02:05 +02:00 |
|
Matthew Honnibal
|
90bf797125
|
Update test
|
2016-10-23 20:54:17 +02:00 |
|
Matthew Honnibal
|
5e76320ffe
|
Update test
|
2016-10-23 20:44:54 +02:00 |
|
Matthew Honnibal
|
aa105927f3
|
Update test
|
2016-10-23 20:31:25 +02:00 |
|
Matthew Honnibal
|
6b9237aa83
|
Increment version
|
2016-10-23 20:22:53 +02:00 |
|
Matthew Honnibal
|
150e02d72e
|
Fix Issue #566
|
2016-10-23 20:19:01 +02:00 |
|
Matthew Honnibal
|
e120561294
|
Fix vector_norm test.
|
2016-10-23 19:56:16 +02:00 |
|
Matthew Honnibal
|
fefde8aef8
|
Make installation print data path.
|
2016-10-23 19:46:44 +02:00 |
|
Matthew Honnibal
|
e7414cd064
|
Try to fix weird install glitch.
|
2016-10-23 19:46:28 +02:00 |
|
Matthew Honnibal
|
90f7544edd
|
Increment version
|
2016-10-23 19:43:06 +02:00 |
|
Matthew Honnibal
|
6036ec7c77
|
Fix vector norm when loading lexemes.
|
2016-10-23 19:40:18 +02:00 |
|
Matthew Honnibal
|
c05cd2356e
|
Fix similarity test for Python 3
|
2016-10-23 18:16:56 +02:00 |
|
Matthew Honnibal
|
3e688e6d4b
|
Fix issue #514 -- serializer fails when new entity type has been added. The fix here is quite ugly. It's best to add the entities ASAP after loading the NLP pipeline, to mitigate the brittleness.
|
2016-10-23 17:45:44 +02:00 |
|
Matthew Honnibal
|
79aa03fe98
|
Test Issue #514: Serializer fails when new entity type has been added.
|
2016-10-23 17:41:44 +02:00 |
|
Matthew Honnibal
|
f97548c6f1
|
Fix broken test, re Issue #461
|
2016-10-23 17:02:23 +02:00 |
|
Matthew Honnibal
|
4de30a8e38
|
Test Issue #514: Serialization fails after adding a new entity label.
|
2016-10-23 16:40:27 +02:00 |
|
Matthew Honnibal
|
936e6246aa
|
Fix Issue #459 -- failed to deserialize empty doc.
|
2016-10-23 16:31:05 +02:00 |
|
Matthew Honnibal
|
e99b3f5322
|
Test Issue #459: Fail to deserialize empty doc
|
2016-10-23 16:30:22 +02:00 |
|
Matthew Honnibal
|
49c117960c
|
Fix bug where huffman codec died if given empty freqs dict.
|
2016-10-23 16:28:05 +02:00 |
|
Matthew Honnibal
|
99ff8b902f
|
Test that huffman codec works with empty freqs dict
|
2016-10-23 16:27:45 +02:00 |
|
Matthew Honnibal
|
15c9b59f0e
|
Fix Issue #461: O tag was being clobbered by doc.ents.__set__
|
2016-10-23 15:50:26 +02:00 |
|
Matthew Honnibal
|
e5627134d9
|
Test Issue #461: ent_iob tag incorrect after setting entities.
|
2016-10-23 15:50:04 +02:00 |
|
Matthew Honnibal
|
f62088d646
|
Fix compile error
|
2016-10-23 14:50:50 +02:00 |
|
Matthew Honnibal
|
2c3a67b693
|
Fix calculation of vector norm, re Issue #522. Need to consolidate the calculations into a helper function.
|
2016-10-23 14:49:31 +02:00 |
|
Matthew Honnibal
|
a0a4ada42a
|
Fix calculation of L2-norm for Lexeme
|
2016-10-23 14:44:45 +02:00 |
|
Matthew Honnibal
|
2989072aac
|
Add tests to verify that Issue #442 is fixed in 1.1
|
2016-10-23 14:33:13 +02:00 |
|
Matthew Honnibal
|
739213a8af
|
Fix create_pipeline keyword argument.
|
2016-10-23 14:24:16 +02:00 |
|
Matthew Honnibal
|
bea44bd3c4
|
Fix vector_norm when vector is assigned to Lexeme.
|
2016-10-23 14:23:56 +02:00 |
|
Matthew Honnibal
|
e838b6d53f
|
Add tests for using the new Entity ID tracking in the rule matcher
|
2016-10-23 14:04:01 +02:00 |
|
Matthew Honnibal
|
e7af75e0a9
|
Add test for vector resizing, re Issue #544
|
2016-10-21 17:07:21 +02:00 |
|
Matthew Honnibal
|
ca8ea33abc
|
Bump version to 1.1.0
|
2016-10-21 16:30:57 +02:00 |
|
Matthew Honnibal
|
7ab03050d4
|
Add resize_vectors method to Vocab
|
2016-10-21 01:44:50 +02:00 |
|
Matthew Honnibal
|
8ce8803824
|
Fix JSON in tokenizer
|
2016-10-21 01:44:20 +02:00 |
|
Matthew Honnibal
|
6eb73a095f
|
Fix JSON in tagger
|
2016-10-21 01:44:10 +02:00 |
|
Matthew Honnibal
|
e16e78a737
|
Merge branch 'master' of ssh://github.com/explosion/spaCy
|
2016-10-21 00:00:15 +02:00 |
|
Matthew Honnibal
|
147373c807
|
Increment version
|
2016-10-21 00:00:03 +02:00 |
|
Matthew Honnibal
|
e80944276f
|
Fix Span.vector_norm
|
2016-10-20 21:58:56 +02:00 |
|
Matthew Honnibal
|
f5fe4f595b
|
Fix json loading, for Python 3.
|
2016-10-20 21:23:26 +02:00 |
|
Matthew Honnibal
|
2e92c6fb3a
|
Fix JSON encoding issue on load
|
2016-10-20 21:06:48 +02:00 |
|
Matthew Honnibal
|
4ad7bb96c9
|
Increment version.
|
2016-10-20 20:48:30 +02:00 |
|
Matthew Honnibal
|
5ec32f5d97
|
Fix loading of GloVe vectors, to address Issue #541
|
2016-10-20 18:27:48 +02:00 |
|
Matthew Honnibal
|
ddeabd76c4
|
Fix mistake loading GloVe vectors. GloVe vectors now loaded by default if present, as promised.
|
2016-10-20 16:57:53 +02:00 |
|
Matthew Honnibal
|
bfe5cb1244
|
Increment version.
|
2016-10-20 14:52:00 +02:00 |
|
Matthew Honnibal
|
f189a3cb00
|
Fix encoding when opening files in Python 2.7, re Issue #539
|
2016-10-20 14:42:56 +02:00 |
|
Matthew Honnibal
|
c353a5214d
|
Increment version
|
2016-10-19 23:51:01 +02:00 |
|
Matthew Honnibal
|
d10c17f2a4
|
Fix Issue #536: oov_prob was 0 for OOV words.
|
2016-10-19 23:38:47 +02:00 |
|
Matthew Honnibal
|
dfa752d064
|
Increment version
|
2016-10-19 23:19:13 +02:00 |
|
Matthew Honnibal
|
3588a18fb8
|
Fix hook names in doc
|
2016-10-19 21:15:16 +02:00 |
|
Matthew Honnibal
|
5d5742b773
|
Add sentiment field to doc, rename getters_for_tokens and getters_for_spans, add user_hooks field to Doc.
|
2016-10-19 20:54:22 +02:00 |
|
Matthew Honnibal
|
ed5e178817
|
Add sentiment property on lexeme object
|
2016-10-19 20:52:52 +02:00 |
|
Matthew Honnibal
|
d4aaf2752c
|
Fix issue #535: Pipeline elements added even when data not installed.
|
2016-10-19 19:55:19 +02:00 |
|
Matthew Honnibal
|
04d1c959da
|
Fix version
|
2016-10-19 03:45:37 +02:00 |
|
Matthew Honnibal
|
d35aa7344e
|
Change version ID to make PyPi happy
|
2016-10-19 03:24:39 +02:00 |
|
Matthew Honnibal
|
89d2a5c8b3
|
Increment build version.
|
2016-10-19 03:05:17 +02:00 |
|
Matthew Honnibal
|
622b0a9674
|
Tweak download script
|
2016-10-19 00:52:16 +02:00 |
|
Matthew Honnibal
|
5a5c7192a5
|
Fix download.py for GloVe vectors.
|
2016-10-19 00:47:44 +02:00 |
|
Matthew Honnibal
|
edc45c19d6
|
Update download script
|
2016-10-19 00:41:14 +02:00 |
|
Matthew Honnibal
|
2bbb050500
|
Fix default of serializer_freqs
|
2016-10-18 19:55:41 +02:00 |
|
Matthew Honnibal
|
1b651db9c5
|
Fix parser creation in Language class.
|
2016-10-18 19:36:44 +02:00 |
|
Matthew Honnibal
|
45a6f9b9c7
|
Fix loading of tagger.
|
2016-10-18 19:33:04 +02:00 |
|
Matthew Honnibal
|
76c815f40d
|
Fix spacy.load
|
2016-10-18 19:23:31 +02:00 |
|
Matthew Honnibal
|
8c8f5c62c6
|
Add LANG attribute to English and German
|
2016-10-18 18:52:48 +02:00 |
|
Matthew Honnibal
|
05e2a589a4
|
Fix None label in matcher
|
2016-10-18 18:05:21 +02:00 |
|
Matthew Honnibal
|
c3a8a1cf51
|
Update serializer test.
|
2016-10-18 16:18:46 +02:00 |
|
Matthew Honnibal
|
7d5212f131
|
Refactor defaults
|
2016-10-18 16:18:25 +02:00 |
|
Matthew Honnibal
|
a45a9d5092
|
Remove stray .tensor attribute from Lexeme
|
2016-10-18 01:16:32 +02:00 |
|
Matthew Honnibal
|
9258db788a
|
Revert "Have the matcher return character offsets, to handle the match better."
This reverts commit 049c937540 .
|
2016-10-17 16:49:51 +02:00 |
|
Matthew Honnibal
|
7d446e5094
|
Revert "Update matcher test, to reflect character offset return instead of token offset."
This reverts commit f8d3e3bcfe .
|
2016-10-17 16:49:49 +02:00 |
|
Matthew Honnibal
|
4bf2c53c13
|
Revert "Hack on matcher tests, for new implementation."
This reverts commit dbe60644ab .
|
2016-10-17 16:49:48 +02:00 |
|