Henning Peters
12d58a7099
remove text-unidecode dependency
2016-02-24 08:01:59 +01:00
Wolfgang Seeker
8d531c958b
replace tests for non-projectivity
...
- add functions to find non-projective edges
- add test file for non-projectivity functions
2016-02-22 14:40:40 +01:00
Matthew Honnibal
141639ea3a
* Fix bug in tokenizer that caused new tokens to be added for affixes
2016-02-21 23:17:47 +00:00
Wolfgang Seeker
eae35e9b27
add tokenizer files for German, add/change code to train German pos tagger
...
- add files to specify rules for German tokenization
- change generate_specials.py to generate from an external file (abbrev.de.tab)
- copy gazetteer.json from lang_data/en/
- init_model.py
- change doc freq threshold to 0
- add train_german_tagger.py
- expects conll09-formatted input
2016-02-18 13:24:20 +01:00
Henning Peters
9cc4f8d5b3
avoid shadowing __name__
2016-02-15 01:33:39 +01:00
Henning Peters
4c9e3c7911
upgrade spuntik, enforce data api via model version constraints
2016-02-14 16:03:17 +01:00
Henning Peters
9d8966a2c0
Update test_tokenizer.py
2016-02-10 19:24:37 +01:00
Henning Peters
3b5f1e753b
py26 compatibility
2016-02-10 14:32:54 +01:00
Henning Peters
ee1f1ac300
mark test_sentence_space() as model test
2016-02-10 07:49:11 +01:00
Matthew Honnibal
5d96b3ef4f
* Increment version
2016-02-07 13:48:58 +01:00
Matthew Honnibal
1b83cb9dfa
* Fix Issue #251 : Incorrect right edge calculation on left-clobber low in the tree
2016-02-07 00:00:42 +01:00
Matthew Honnibal
c6623889c1
* Add test for Issue #251 : Incorrect right edges, caused by bad update to r_edge in del_arc, triggered from non-monotonic left-arc
2016-02-06 23:47:51 +01:00
Matthew Honnibal
a95974ad3f
* Fix oov probability
2016-02-06 15:13:55 +01:00
Matthew Honnibal
af8514cb0c
* Refine the way the is_parsed attribute is set by from_array
2016-02-06 14:44:35 +01:00
Matthew Honnibal
161b01d4c0
* Tweak usage example for multi-processing
2016-02-06 14:44:11 +01:00
Matthew Honnibal
7f24229f10
* Don't try to pickle the tokenizer
2016-02-06 14:09:05 +01:00
Matthew Honnibal
dcb401f3e1
* Remove broken Vocab pickling
2016-02-06 14:08:47 +01:00
Matthew Honnibal
e66d45bf66
* Restore previous patch to Span.root, as it seems it wasn't the cause of the problem.
2016-02-06 13:37:41 +01:00
Matthew Honnibal
4412a70dc5
* Initialize StateC._empty_token to 0, to avoid undefined behaviour.
2016-02-06 13:34:38 +01:00
Matthew Honnibal
1b41f868d2
* Check for errors in parser, and parallelise the left-over batch
2016-02-06 10:06:30 +01:00
Matthew Honnibal
031b00cb91
* Fix Span.root calculation
2016-02-05 20:12:09 +01:00
Matthew Honnibal
165ca28b80
* Set is_parsed flag in Parser.pipe
2016-02-05 19:51:44 +01:00
Matthew Honnibal
bdd579db0a
* Set is_parsed flag in Parser.pipe
2016-02-05 19:50:11 +01:00
Matthew Honnibal
7119e77fb6
* Fix Matcher.pipe
2016-02-05 19:46:02 +01:00
Matthew Honnibal
1cf0100bf6
* Add test for multithreading
2016-02-05 19:38:22 +01:00
Matthew Honnibal
b04c9aad71
* Fix off-by-one in Parser.pipe
2016-02-05 19:37:50 +01:00
Matthew Honnibal
e5c447e237
* Questionable fix to problem in Span.root
2016-02-05 19:18:35 +01:00
Matthew Honnibal
1ef84a0557
* Merge master into rethinc2
2016-02-05 12:55:59 +01:00
Matthew Honnibal
4cf34fc170
Merge branch 'rethinc2' of ssh://github.com/honnibal/spaCy into rethinc2
2016-02-05 12:48:28 +01:00
Matthew Honnibal
249dccbe95
* Fix Language.pipe
2016-02-05 12:47:57 +01:00
Matthew Honnibal
c0e63feccc
* xfail pickle tests
2016-02-05 12:46:58 +01:00
Matthew Honnibal
6aa92b70f1
* Fix merge problem in span
2016-02-05 12:46:11 +01:00
Matthew Honnibal
048dfe35aa
* cimport cython.parallel
2016-02-05 12:20:42 +01:00
Matthew Honnibal
af58f273b3
* Fix spacy.language.pipe
2016-02-05 12:20:29 +01:00
Matthew Honnibal
8a13cebdcc
* Update for modified thinc interface
2016-02-05 11:44:39 +01:00
Matthew Honnibal
48ce09687d
* Skip pickling the vocab in the tests
2016-02-04 15:51:19 +01:00
Matthew Honnibal
419edfab50
* Use generic flags for the new attributes until they're added
2016-02-04 15:50:54 +01:00
Matthew Honnibal
c4017a06d9
* Add placeholders for the new flags in attrs and symbols
2016-02-04 15:49:45 +01:00
Matthew Honnibal
e5c96c969f
* Wire up new attributes
2016-02-04 13:04:58 +01:00
Matthew Honnibal
9703ccc3de
* Remove unused import
2016-02-04 13:04:33 +01:00
Matthew Honnibal
11810be33e
* Add Python hooks for is_bracket/is_quote/is_left_punct/is_right_punct
2016-02-04 13:04:16 +01:00
Matthew Honnibal
fe611132f0
* Add stubs for is_bracket/is_quote/is_left_punct/is_right_punct functions
2016-02-04 13:03:04 +01:00
Matthew Honnibal
ee975d36d0
* Add stubs to test is_bracket/is_quote/is_left_punct/is_right_punct functions
2016-02-04 13:02:25 +01:00
Matthew Honnibal
f9e765cae7
* Add pipe() method to tokenizer
2016-02-03 02:32:37 +01:00
Matthew Honnibal
4cbad510ff
* Fix calculation of head for spans with punctuation.
2016-02-03 02:32:21 +01:00
Matthew Honnibal
84b247ef83
* Add a .pipe method, that takes a stream of input, operates on it, and streams the output. Internally, the stream may be buffered, to allow multi-threading.
2016-02-03 02:10:58 +01:00
Matthew Honnibal
fcfc17a164
Merge branch 'master' into rethinc2
2016-02-02 23:05:34 +01:00
Matthew Honnibal
f204daf27b
* Add error warning that a gold tag is unrecognised
2016-02-02 22:59:59 +01:00
Matthew Honnibal
99b8906100
* Accept punct_labels as an argument to the scorer
2016-02-02 22:59:06 +01:00
Matthew Honnibal
59123443e2
* Check for presence/absence of the different models in Language.end_training
2016-02-02 22:49:55 +01:00
Matthew Honnibal
9e9d4c8706
* Fix stupid error in Language.batch
2016-02-01 09:49:32 +01:00
Matthew Honnibal
e3db39dd21
* Fix compiler warning about signed/unsigned comparison
2016-02-01 09:08:07 +01:00
Matthew Honnibal
98fbdf2856
* Add Language.batch() method, to support multi-threaded jobs
2016-02-01 09:01:13 +01:00
Matthew Honnibal
b3802562d6
Merge branch 'rethinc2' of https://github.com/honnibal/spaCy into rethinc2
2016-02-01 08:59:24 +01:00
Matthew Honnibal
4b08a3fafd
* Fix merge conflict
2016-02-01 08:58:18 +01:00
Matthew Honnibal
5188f6d9d8
* Fix parseC function
2016-02-01 08:48:48 +01:00
Matthew Honnibal
bcf8f7ba40
* Add a parse_batch method to Parser, that releases the GIL around a batch of documents.
2016-02-01 08:34:55 +01:00
Matthew Honnibal
d5579cd0d8
Merge branch 'rethinc2' of https://github.com/honnibal/spaCy into rethinc2
2016-02-01 03:08:49 +01:00
Matthew Honnibal
490ba65398
* Use openmp in parser
2016-02-01 03:08:42 +01:00
Matthew Honnibal
cb78d91ec5
* Fix ArcEager.set_valid
2016-02-01 03:07:37 +01:00
Matthew Honnibal
28e5ad62bc
* Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents
2016-02-01 03:00:15 +01:00
Matthew Honnibal
a47f00901b
* Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents
2016-02-01 02:58:14 +01:00
Matthew Honnibal
daaad66448
* Now fully proxied
2016-02-01 02:37:08 +01:00
Matthew Honnibal
7a0e3bb9c1
* Continue proxying. Some problem currently
2016-02-01 02:22:21 +01:00
Matthew Honnibal
2169bbb7ea
* Shadow StateClass with StateC, to start proxying
2016-02-01 01:16:14 +01:00
Matthew Honnibal
2fa228458e
* Add _state file, which StateClass will proxy to
2016-02-01 01:09:21 +01:00
Matthew Honnibal
6bb007d16e
* Make set_parse nogil
2016-01-30 20:27:52 +01:00
Matthew Honnibal
9410e74c92
* Switch parser to use nogil functions
2016-01-30 20:27:07 +01:00
Matthew Honnibal
10877a7791
* Update for thinc 5.0, including changing cost from int to weight_t, and updating the tagger and parser
2016-01-30 14:31:36 +01:00
Matthew Honnibal
ea4ff94cde
* Whitespace
2016-01-29 03:59:22 +01:00
Matthew Honnibal
b0718b6ee1
* Move to thinc 5.0
2016-01-29 03:58:55 +01:00
Matthew Honnibal
9721502c81
* Update version
2016-01-25 15:52:59 +01:00
Matthew Honnibal
907e8cf07d
* Add u prefix to string in web example
2016-01-25 15:51:38 +01:00
Matthew Honnibal
eba03695ef
* Comment out pickle tests
2016-01-25 15:51:13 +01:00
Matthew Honnibal
de94e6c525
* Mark pickle tests as xfail, due to temp files problem
2016-01-25 15:24:17 +01:00
Matthew Honnibal
87172a15c6
* Fix runtime error bug that arose from updated Span.root function.
2016-01-25 15:22:42 +01:00
Matthew Honnibal
2c8dd91785
* Fix first code example on the website
2016-01-23 18:09:19 +01:00
Matthew Honnibal
3af84cfd6e
* Increment version
2016-01-21 17:49:27 +01:00
Henning Peters
65aeac24cb
remove package version constraint
2016-01-21 17:40:51 +01:00
Matthew Honnibal
792c98a438
* Increment version for OSX-fixed release of v0.100
2016-01-21 00:23:04 +01:00
Matthew Honnibal
82d011ac43
* Fix test for whitespace
2016-01-19 20:38:26 +01:00
Matthew Honnibal
e89069dcae
* Fix matcher test
2016-01-19 20:24:01 +01:00
Matthew Honnibal
63e3d4e27f
* Add comment on Vocab.__reduce__
2016-01-19 20:11:25 +01:00
Matthew Honnibal
e1282b7f2f
* Require user-custom NER classes to work without adding the label.
2016-01-19 20:11:03 +01:00
Matthew Honnibal
84c5dfbfc3
* Clean up debugging python list
2016-01-19 20:10:32 +01:00
Matthew Honnibal
04d0686b26
* Make TransitionSystem.add_action idempotent, i.e. ignore duplicate added actions.
2016-01-19 20:10:04 +01:00
Matthew Honnibal
c4a89d56bd
* Automatically register any entity types pre-set on the tokens, so that the NER works with user-given entity types.
2016-01-19 20:09:26 +01:00
Matthew Honnibal
f0f92793f6
* Add test for user NER classes in matcher blocking the NER model. Re Issue #178 and Issue #217
2016-01-19 19:23:16 +01:00
Matthew Honnibal
65c5bc4988
* Add add_label method, to allow users to register new entity types and dependency labels.
2016-01-19 19:11:02 +01:00
Matthew Honnibal
151aa0b0e2
* Allow users to add_label, in order to extend the entity recogniser to new classes. Does not by itself add a class to the model
2016-01-19 19:09:33 +01:00
Matthew Honnibal
c8e0011ebc
* Add iterators to the NER and parser transition systems, to get the action types
2016-01-19 19:07:43 +01:00
Matthew Honnibal
515493c675
* Add xfail test for Issue #225 : tokenization with non-whitespace delimiters
2016-01-19 13:20:14 +01:00
Matthew Honnibal
7abe653223
* Fix imports
2016-01-19 03:36:51 +01:00
Matthew Honnibal
590f38bdb2
* Add hacky solution to Issue #220 . Currently specials.json only supports literal patterns, which doesn't allow us to pre-tag whitespace with the correct token, SP, as a rule. The data-driven approach should be easy but for some reason fails here. Adding a hard code in Morphology isn't a good solution, but we do want to fix the behaviour right away, and don't want to wait for an architecturally better solution.
2016-01-19 03:35:20 +01:00
Matthew Honnibal
445164d5b4
* Restore the LOCAL_DATA_DIR global in spacy/en/__init__.py, although this is now deprecated
2016-01-19 02:54:56 +01:00
Matthew Honnibal
04177debd0
* Unwind limit to sentence boundary detection that prevents it from inserting boundaries on whitespace. Replace it with a check for whitespace in StateClass.fast_forward, so that whitespace is LeftArced when it's on the stack. This should prevent the previous problem of whitespace-only sentences. Should fix Issue #184 , but may cause further problems. Needs testing.
2016-01-19 02:54:15 +01:00
Matthew Honnibal
7893de3203
* Add test for Issue #184 : Whitespace at sentence boundary causes sentence boundary error.
2016-01-18 23:04:38 +01:00
Matthew Honnibal
bba0a5e078
* Handle string paths in default_vocab, default_parser, default_entity in Language class
2016-01-18 22:37:24 +01:00
Matthew Honnibal
e825fd9554
* Make some of the website tests work without models
2016-01-18 18:14:44 +01:00
Matthew Honnibal
334c4b2b57
* Disprefer punctuation and spaces as heads of spans
2016-01-18 18:14:09 +01:00