Commit Graph

1480 Commits

Author SHA1 Message Date
Matthew Honnibal
9c73983bdd * Add test for hyphenation problem in Issue #302 2016-03-29 14:27:13 +11:00
Matthew Honnibal
ad119c074f * Fix incorrect whitespacing in Doc.text. This change is potentially breaking, to anyone who was relying on the previous incorrect semantics. 2016-03-29 13:02:42 +11:00
Matthew Honnibal
80134eb12d Merge branch 'master' of https://github.com/spacy-io/spaCy 2016-03-15 19:14:50 +00:00
Henning Peters
c12d3dd200 add __init__.py to empty package dirs 2016-03-14 11:28:03 +01:00
Henning Peters
54f3447b5f cleanup 2016-03-14 01:46:33 +01:00
Matthew Honnibal
1508528c8c * Increment version 2016-03-08 15:58:45 +00:00
Matthew Honnibal
963fe5258e * Add missing __contains__ method to vocab 2016-03-08 15:49:10 +00:00
Matthew Honnibal
478aa21cb0 * Remove broken __reduce__ method on vocab 2016-03-08 15:48:21 +00:00
Matthew Honnibal
20235bde00 Merge pull request #282 from henningpeters/switch_vectors
initial proposal for ability to switch vectors
2016-03-09 01:39:41 +11:00
Henning Peters
eb7ae61b1c cleanup api 2016-03-08 12:59:18 +01:00
Henning Peters
b740f20191 hash_string() should not depend on python's internal unicode representation, also fixes https://github.com/spacy-io/sense2vec/issues/5 for py2 2016-03-06 09:19:27 +01:00
Henning Peters
aa4d964c14 cleanup api 2016-03-05 17:51:32 +01:00
Henning Peters
931c07a609 initial proposal for separate vector package 2016-03-04 11:09:06 +01:00
Wolfgang Seeker
7adbd7a785 replace Counter with normal dict 2016-03-03 21:36:27 +01:00
Wolfgang Seeker
1ae487a4f6 add backwards compatibility with python 2.6 2016-03-03 21:18:12 +01:00
Wolfgang Seeker
9d1e6de4a0 make a proper list from zip iterator 2016-03-03 19:51:01 +01:00
Wolfgang Seeker
49f9d1c085 change test_nonproj.py to not use zip inside numpy.asarray 2016-03-03 19:42:09 +01:00
Wolfgang Seeker
72b8df0684 turned PseudoProjectivity into a normal python class 2016-03-03 19:05:08 +01:00
Matthew Honnibal
fcaa0ad7ce Merge pull request #280 from wbwseeker/german_parser
German parser
2016-03-04 03:27:42 +11:00
Wolfgang Seeker
690c5acabf adjust train.py to train both english and german models 2016-03-03 15:21:00 +01:00
Wolfgang Seeker
3448cb40a4 integrated pseudo-projective parsing into parser
- nonproj.pyx holds a class PseudoProjectivity which currently holds
  all functionality to implement Nivre & Nilsson 2005's pseudo-projective
  parsing using the HEAD decoration scheme
- changed lefts/rights in Token to account for possible non-projective
  structures
2016-03-01 10:09:08 +01:00
Wolfgang Seeker
56b7210e82 moved nonproj.py to syntax/nonproj.pyx 2016-02-25 15:08:49 +01:00
Henning Peters
f3df736e0a remove unidecode-related test 2016-02-24 18:22:22 +01:00
Wolfgang Seeker
4b2297d5d4 add class PseudoProjective for pseudo-projective parsing
PseudoProjective() implements the algorithm from Nivre & Nilsson 2005
using their HEAD decoration scheme.
2016-02-24 11:26:25 +01:00
Henning Peters
12d58a7099 remove text-unidecode dependency 2016-02-24 08:01:59 +01:00
Wolfgang Seeker
8d531c958b replace tests for non-projectivity
- add functions to find non-projective edges
- add test file for non-projectivity functions
2016-02-22 14:40:40 +01:00
Matthew Honnibal
141639ea3a * Fix bug in tokenizer that caused new tokens to be added for affixes 2016-02-21 23:17:47 +00:00
Wolfgang Seeker
eae35e9b27 add tokenizer files for German, add/change code to train German pos tagger
- add files to specify rules for German tokenization
- change generate_specials.py to generate from an external file (abbrev.de.tab)
- copy gazetteer.json from lang_data/en/

- init_model.py
	- change doc freq threshold to 0
- add train_german_tagger.py
	- expects conll09-formatted input
2016-02-18 13:24:20 +01:00
Henning Peters
9cc4f8d5b3 avoid shadowing __name__ 2016-02-15 01:33:39 +01:00
Henning Peters
4c9e3c7911 upgrade spuntik, enforce data api via model version constraints 2016-02-14 16:03:17 +01:00
Henning Peters
9d8966a2c0 Update test_tokenizer.py 2016-02-10 19:24:37 +01:00
Henning Peters
3b5f1e753b py26 compatibility 2016-02-10 14:32:54 +01:00
Henning Peters
ee1f1ac300 mark test_sentence_space() as model test 2016-02-10 07:49:11 +01:00
Matthew Honnibal
5d96b3ef4f * Increment version 2016-02-07 13:48:58 +01:00
Matthew Honnibal
1b83cb9dfa * Fix Issue #251: Incorrect right edge calculation on left-clobber low in the tree 2016-02-07 00:00:42 +01:00
Matthew Honnibal
c6623889c1 * Add test for Issue #251: Incorrect right edges, caused by bad update to r_edge in del_arc, triggered from non-monotonic left-arc 2016-02-06 23:47:51 +01:00
Matthew Honnibal
a95974ad3f * Fix oov probability 2016-02-06 15:13:55 +01:00
Matthew Honnibal
af8514cb0c * Refine the way the is_parsed attribute is set by from_array 2016-02-06 14:44:35 +01:00
Matthew Honnibal
161b01d4c0 * Tweak usage example for multi-processing 2016-02-06 14:44:11 +01:00
Matthew Honnibal
7f24229f10 * Don't try to pickle the tokenizer 2016-02-06 14:09:05 +01:00
Matthew Honnibal
dcb401f3e1 * Remove broken Vocab pickling 2016-02-06 14:08:47 +01:00
Matthew Honnibal
e66d45bf66 * Restore previous patch to Span.root, as it seems it wasn't the cause of the problem. 2016-02-06 13:37:41 +01:00
Matthew Honnibal
4412a70dc5 * Initialize StateC._empty_token to 0, to avoid undefined behaviour. 2016-02-06 13:34:38 +01:00
Matthew Honnibal
1b41f868d2 * Check for errors in parser, and parallelise the left-over batch 2016-02-06 10:06:30 +01:00
Matthew Honnibal
031b00cb91 * Fix Span.root calculation 2016-02-05 20:12:09 +01:00
Matthew Honnibal
165ca28b80 * Set is_parsed flag in Parser.pipe 2016-02-05 19:51:44 +01:00
Matthew Honnibal
bdd579db0a * Set is_parsed flag in Parser.pipe 2016-02-05 19:50:11 +01:00
Matthew Honnibal
7119e77fb6 * Fix Matcher.pipe 2016-02-05 19:46:02 +01:00
Matthew Honnibal
1cf0100bf6 * Add test for multithreading 2016-02-05 19:38:22 +01:00
Matthew Honnibal
b04c9aad71 * Fix off-by-one in Parser.pipe 2016-02-05 19:37:50 +01:00