Matthew Honnibal
|
e5c447e237
|
* Questionable fix to problem in Span.root
|
2016-02-05 19:18:35 +01:00 |
|
Matthew Honnibal
|
1ef84a0557
|
* Merge master into rethinc2
|
2016-02-05 12:55:59 +01:00 |
|
Matthew Honnibal
|
4cf34fc170
|
Merge branch 'rethinc2' of ssh://github.com/honnibal/spaCy into rethinc2
|
2016-02-05 12:48:28 +01:00 |
|
Matthew Honnibal
|
249dccbe95
|
* Fix Language.pipe
|
2016-02-05 12:47:57 +01:00 |
|
Matthew Honnibal
|
c0e63feccc
|
* xfail pickle tests
|
2016-02-05 12:46:58 +01:00 |
|
Matthew Honnibal
|
6aa92b70f1
|
* Fix merge problem in span
|
2016-02-05 12:46:11 +01:00 |
|
Matthew Honnibal
|
048dfe35aa
|
* cimport cython.parallel
|
2016-02-05 12:20:42 +01:00 |
|
Matthew Honnibal
|
af58f273b3
|
* Fix spacy.language.pipe
|
2016-02-05 12:20:29 +01:00 |
|
Matthew Honnibal
|
8a13cebdcc
|
* Update for modified thinc interface
|
2016-02-05 11:44:39 +01:00 |
|
Matthew Honnibal
|
48ce09687d
|
* Skip pickling the vocab in the tests
|
2016-02-04 15:51:19 +01:00 |
|
Matthew Honnibal
|
419edfab50
|
* Use generic flags for the new attributes until they're added
|
2016-02-04 15:50:54 +01:00 |
|
Matthew Honnibal
|
c4017a06d9
|
* Add placeholders for the new flags in attrs and symbols
|
2016-02-04 15:49:45 +01:00 |
|
Matthew Honnibal
|
e5c96c969f
|
* Wire up new attributes
|
2016-02-04 13:04:58 +01:00 |
|
Matthew Honnibal
|
9703ccc3de
|
* Remove unused import
|
2016-02-04 13:04:33 +01:00 |
|
Matthew Honnibal
|
11810be33e
|
* Add Python hooks for is_bracket/is_quote/is_left_punct/is_right_punct
|
2016-02-04 13:04:16 +01:00 |
|
Matthew Honnibal
|
fe611132f0
|
* Add stubs for is_bracket/is_quote/is_left_punct/is_right_punct functions
|
2016-02-04 13:03:04 +01:00 |
|
Matthew Honnibal
|
ee975d36d0
|
* Add stubs to test is_bracket/is_quote/is_left_punct/is_right_punct functions
|
2016-02-04 13:02:25 +01:00 |
|
Matthew Honnibal
|
f9e765cae7
|
* Add pipe() method to tokenizer
|
2016-02-03 02:32:37 +01:00 |
|
Matthew Honnibal
|
4cbad510ff
|
* Fix calculation of head for spans with punctuation.
|
2016-02-03 02:32:21 +01:00 |
|
Matthew Honnibal
|
84b247ef83
|
* Add a .pipe method, that takes a stream of input, operates on it, and streams the output. Internally, the stream may be buffered, to allow multi-threading.
|
2016-02-03 02:10:58 +01:00 |
|
Matthew Honnibal
|
fcfc17a164
|
Merge branch 'master' into rethinc2
|
2016-02-02 23:05:34 +01:00 |
|
Matthew Honnibal
|
f204daf27b
|
* Add error warning that a gold tag is unrecognised
|
2016-02-02 22:59:59 +01:00 |
|
Matthew Honnibal
|
99b8906100
|
* Accept punct_labels as an argument to the scorer
|
2016-02-02 22:59:06 +01:00 |
|
Matthew Honnibal
|
59123443e2
|
* Check for presence/absence of the different models in Language.end_training
|
2016-02-02 22:49:55 +01:00 |
|
Matthew Honnibal
|
9e9d4c8706
|
* Fix stupid error in Language.batch
|
2016-02-01 09:49:32 +01:00 |
|
Matthew Honnibal
|
e3db39dd21
|
* Fix compiler warning about signed/unsigned comparison
|
2016-02-01 09:08:07 +01:00 |
|
Matthew Honnibal
|
98fbdf2856
|
* Add Language.batch() method, to support multi-threaded jobs
|
2016-02-01 09:01:13 +01:00 |
|
Matthew Honnibal
|
b3802562d6
|
Merge branch 'rethinc2' of https://github.com/honnibal/spaCy into rethinc2
|
2016-02-01 08:59:24 +01:00 |
|
Matthew Honnibal
|
4b08a3fafd
|
* Fix merge conflict
|
2016-02-01 08:58:18 +01:00 |
|
Matthew Honnibal
|
5188f6d9d8
|
* Fix parseC function
|
2016-02-01 08:48:48 +01:00 |
|
Matthew Honnibal
|
bcf8f7ba40
|
* Add a parse_batch method to Parser, that releases the GIL around a batch of documents.
|
2016-02-01 08:34:55 +01:00 |
|
Matthew Honnibal
|
d5579cd0d8
|
Merge branch 'rethinc2' of https://github.com/honnibal/spaCy into rethinc2
|
2016-02-01 03:08:49 +01:00 |
|
Matthew Honnibal
|
490ba65398
|
* Use openmp in parser
|
2016-02-01 03:08:42 +01:00 |
|
Matthew Honnibal
|
cb78d91ec5
|
* Fix ArcEager.set_valid
|
2016-02-01 03:07:37 +01:00 |
|
Matthew Honnibal
|
28e5ad62bc
|
* Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents
|
2016-02-01 03:00:15 +01:00 |
|
Matthew Honnibal
|
a47f00901b
|
* Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents
|
2016-02-01 02:58:14 +01:00 |
|
Matthew Honnibal
|
daaad66448
|
* Now fully proxied
|
2016-02-01 02:37:08 +01:00 |
|
Matthew Honnibal
|
7a0e3bb9c1
|
* Continue proxying. Some problem currently
|
2016-02-01 02:22:21 +01:00 |
|
Matthew Honnibal
|
2169bbb7ea
|
* Shadow StateClass with StateC, to start proxying
|
2016-02-01 01:16:14 +01:00 |
|
Matthew Honnibal
|
2fa228458e
|
* Add _state file, which StateClass will proxy to
|
2016-02-01 01:09:21 +01:00 |
|
Matthew Honnibal
|
6bb007d16e
|
* Make set_parse nogil
|
2016-01-30 20:27:52 +01:00 |
|
Matthew Honnibal
|
9410e74c92
|
* Switch parser to use nogil functions
|
2016-01-30 20:27:07 +01:00 |
|
Matthew Honnibal
|
10877a7791
|
* Update for thinc 5.0, including changing cost from int to weight_t, and updating the tagger and parser
|
2016-01-30 14:31:36 +01:00 |
|
Matthew Honnibal
|
ea4ff94cde
|
* Whitespace
|
2016-01-29 03:59:22 +01:00 |
|
Matthew Honnibal
|
b0718b6ee1
|
* Move to thinc 5.0
|
2016-01-29 03:58:55 +01:00 |
|
Matthew Honnibal
|
9721502c81
|
* Update version
|
2016-01-25 15:52:59 +01:00 |
|
Matthew Honnibal
|
907e8cf07d
|
* Add u prefix to string in web example
|
2016-01-25 15:51:38 +01:00 |
|
Matthew Honnibal
|
eba03695ef
|
* Comment out pickle tests
|
2016-01-25 15:51:13 +01:00 |
|
Matthew Honnibal
|
de94e6c525
|
* Mark pickle tests as xfail, due to temp files problem
|
2016-01-25 15:24:17 +01:00 |
|
Matthew Honnibal
|
87172a15c6
|
* Fix runtime error bug that arose from updated Span.root function.
|
2016-01-25 15:22:42 +01:00 |
|
Matthew Honnibal
|
2c8dd91785
|
* Fix first code example on the website
|
2016-01-23 18:09:19 +01:00 |
|
Matthew Honnibal
|
3af84cfd6e
|
* Increment version
|
2016-01-21 17:49:27 +01:00 |
|
Henning Peters
|
65aeac24cb
|
remove package version constraint
|
2016-01-21 17:40:51 +01:00 |
|
Matthew Honnibal
|
792c98a438
|
* Increment version for OSX-fixed release of v0.100
|
2016-01-21 00:23:04 +01:00 |
|
Matthew Honnibal
|
82d011ac43
|
* Fix test for whitespace
|
2016-01-19 20:38:26 +01:00 |
|
Matthew Honnibal
|
e89069dcae
|
* Fix matcher test
|
2016-01-19 20:24:01 +01:00 |
|
Matthew Honnibal
|
63e3d4e27f
|
* Add comment on Vocab.__reduce__
|
2016-01-19 20:11:25 +01:00 |
|
Matthew Honnibal
|
e1282b7f2f
|
* Require user-custom NER classes to work without adding the label.
|
2016-01-19 20:11:03 +01:00 |
|
Matthew Honnibal
|
84c5dfbfc3
|
* Clean up debugging python list
|
2016-01-19 20:10:32 +01:00 |
|
Matthew Honnibal
|
04d0686b26
|
* Make TransitionSystem.add_action idempotent, i.e. ignore duplicate added actions.
|
2016-01-19 20:10:04 +01:00 |
|
Matthew Honnibal
|
c4a89d56bd
|
* Automatically register any entity types pre-set on the tokens, so that the NER works with user-given entity types.
|
2016-01-19 20:09:26 +01:00 |
|
Matthew Honnibal
|
f0f92793f6
|
* Add test for user NER classes in matcher blocking the NER model. Re Issue #178 and Issue #217
|
2016-01-19 19:23:16 +01:00 |
|
Matthew Honnibal
|
65c5bc4988
|
* Add add_label method, to allow users to register new entity types and dependency labels.
|
2016-01-19 19:11:02 +01:00 |
|
Matthew Honnibal
|
151aa0b0e2
|
* Allow users to add_label, in order to extend the entity recogniser to new classes. Does not by itself add a class to the model
|
2016-01-19 19:09:33 +01:00 |
|
Matthew Honnibal
|
c8e0011ebc
|
* Add iterators to the NER and parser transition systems, to get the action types
|
2016-01-19 19:07:43 +01:00 |
|
Matthew Honnibal
|
515493c675
|
* Add xfail test for Issue #225: tokenization with non-whitespace delimiters
|
2016-01-19 13:20:14 +01:00 |
|
Matthew Honnibal
|
7abe653223
|
* Fix imports
|
2016-01-19 03:36:51 +01:00 |
|
Matthew Honnibal
|
590f38bdb2
|
* Add hacky solution to Issue #220. Currently specials.json only supports literal patterns, which doesn't allow us to pre-tag whitespace with the correct token, SP, as a rule. The data-driven approach should be easy but for some reason fails here. Adding a hard code in Morphology isn't a good solution, but we do want to fix the behaviour right away, and don't want to wait for an architecturally better solution.
|
2016-01-19 03:35:20 +01:00 |
|
Matthew Honnibal
|
445164d5b4
|
* Restore the LOCAL_DATA_DIR global in spacy/en/__init__.py, although this is now deprecated
|
2016-01-19 02:54:56 +01:00 |
|
Matthew Honnibal
|
04177debd0
|
* Unwind limit to sentence boundary detection that prevents it from inserting boundaries on whitespace. Replace it with a check for whitespace in StateClass.fast_forward, so that whitespace is LeftArced when it's on the stack. This should prevent the previous problem of whitespace-only sentences. Should fix Issue #184, but may cause further problems. Needs testing.
|
2016-01-19 02:54:15 +01:00 |
|
Matthew Honnibal
|
7893de3203
|
* Add test for Issue #184: Whitespace at sentence boundary causes sentence boundary error.
|
2016-01-18 23:04:38 +01:00 |
|
Matthew Honnibal
|
bba0a5e078
|
* Handle string paths in default_vocab, default_parser, default_entity in Language class
|
2016-01-18 22:37:24 +01:00 |
|
Matthew Honnibal
|
e825fd9554
|
* Make some of the website tests work without models
|
2016-01-18 18:14:44 +01:00 |
|
Matthew Honnibal
|
334c4b2b57
|
* Disprefer punctuation and spaces as heads of spans
|
2016-01-18 18:14:09 +01:00 |
|
Matthew Honnibal
|
bed36ab0ff
|
* Fix import of HEAD attribute
|
2016-01-18 17:34:43 +01:00 |
|
Matthew Honnibal
|
28c659c1fe
|
* Fix import for numpy
|
2016-01-18 17:25:04 +01:00 |
|
Matthew Honnibal
|
fc36bcf458
|
* Fix import for English
|
2016-01-18 17:14:40 +01:00 |
|
Matthew Honnibal
|
cc4c335e14
|
* Set heads for test_merge_tokens, to make the test run without models
|
2016-01-18 17:00:11 +01:00 |
|
Matthew Honnibal
|
c107da9738
|
* Bug fix to _count_words_to_root
|
2016-01-18 16:59:38 +01:00 |
|
Matthew Honnibal
|
f24833d607
|
* Fix merge for coordinations
|
2016-01-18 16:03:19 +01:00 |
|
Matthew Honnibal
|
14534958a9
|
* Fix bug in Span.root
|
2016-01-18 15:40:28 +01:00 |
|
Matthew Honnibal
|
714cbc03d5
|
* Add test for Issue #203: nested noun chunks.
|
2016-01-16 18:02:30 +01:00 |
|
Matthew Honnibal
|
4e2253170c
|
* Move test for doc.merge to tokens_api file, to avoid name conflicts which upset pytest
|
2016-01-16 18:01:36 +01:00 |
|
Matthew Honnibal
|
34a157511f
|
* Move test_merge_hang to test_tokens_api
|
2016-01-16 18:00:26 +01:00 |
|
Matthew Honnibal
|
fc8f26584a
|
* Don't consider NPs connected to parse via conj relation as noun chunks. Change motivated by the nested noun chunks identified in Issue #203, but might be problematic. Also allow root NPs to be considered noun chunks.
|
2016-01-16 17:52:40 +01:00 |
|
Matthew Honnibal
|
4a16dbfeca
|
* Add test for Issue #203: noun chunks should be flat, but sometimes are nested
|
2016-01-16 17:41:25 +01:00 |
|
Matthew Honnibal
|
995b2d18fd
|
* Route token.string via token.txt_with_ws, to deprecate token.string in future
|
2016-01-16 17:14:34 +01:00 |
|
Matthew Honnibal
|
54a98eaf19
|
* Fix typo text_wth_ws --> text_with_ws. Reroute .string attribute to text_with_ws, to deprecate .string in future
|
2016-01-16 17:13:50 +01:00 |
|
Matthew Honnibal
|
3e9961d2c4
|
* If final token is whitespace, don't mark it as owning a trailing space. Fixes Issue #154
|
2016-01-16 17:08:59 +01:00 |
|
Matthew Honnibal
|
223d2b3484
|
* Add test for Issue #154: Additional whitespace introduced when string ends with a whitespace token.
|
2016-01-16 17:08:07 +01:00 |
|
Matthew Honnibal
|
3dc398b727
|
* Fix merge conflict in requirements.txt
|
2016-01-16 16:20:49 +01:00 |
|
Matthew Honnibal
|
fc5962a77d
|
* Improve test for root token in Span
|
2016-01-16 16:19:09 +01:00 |
|
Matthew Honnibal
|
c025a0c64b
|
* Check for KeyboardInerrupt in parser.__call__
|
2016-01-16 16:18:44 +01:00 |
|
Matthew Honnibal
|
03e8a4293d
|
* Add loop guard to Token.lefts and Token.rights properties
|
2016-01-16 16:18:17 +01:00 |
|
Matthew Honnibal
|
304339985e
|
* Add a linear scan to Span.root method, to help with long sentences
|
2016-01-16 16:17:28 +01:00 |
|
Matthew Honnibal
|
aa0dd79f52
|
* Delete test_token_references, which checked a flakey strategy for preventing orphan tokens from a while ago. Now orphan tokens simply hold a reference to Pool, preventing the memory from being freed underneath them. This means that we don't need to run this slow test.
|
2016-01-16 16:03:35 +01:00 |
|
Matthew Honnibal
|
8cbcc3a799
|
* Fix calculation of root token in Span. Now take root to be word with shortest tree path. Avoids parse trees ending up in inconsistent state, as had occurred in Issue #214.
|
2016-01-16 15:38:50 +01:00 |
|
Matthew Honnibal
|
c1039fa4b4
|
* Add test for Issue #214. Resolved in change to Span.root
|
2016-01-16 15:37:47 +01:00 |
|
Henning Peters
|
41ea14a56f
|
fix pickling
|
2016-01-16 13:23:11 +01:00 |
|
Henning Peters
|
5551052840
|
fix py2/3 issue
|
2016-01-16 12:44:53 +01:00 |
|