Commit Graph

3063 Commits

Author SHA1 Message Date
Matthew Honnibal
519366f677 * Fix Issue #351: Indices off when leading whitespace 2016-05-04 15:53:36 +02:00
Matthew Honnibal
b4bfc6ae55 * Add test for Issue #351: Indices off when leading whitespace 2016-05-04 15:53:17 +02:00
Matthew Honnibal
76021cb853 * Fix bug in Doc.text, introduced by a862edc 2016-05-04 11:02:16 +02:00
Matthew Honnibal
1822bb4ff1 Merge pull request #359 from wbwseeker/reorganize_tests
Fix German noun chunker
2016-05-04 18:15:17 +10:00
Wolfgang Seeker
e4ea2bea01 fix whitespace 2016-05-04 07:40:38 +02:00
Wolfgang Seeker
5bf2fd1f78 make the code less cryptic 2016-05-03 17:19:05 +02:00
Wolfgang Seeker
a06fca9fdf German noun chunk iterator now doesn't return tokens more than once 2016-05-03 16:58:59 +02:00
Wolfgang Seeker
fd8019ec92 fix typo in german_noun_chunks 2016-05-03 15:53:30 +02:00
Wolfgang Seeker
7825b75548 add tests for German noun chunker 2016-05-03 15:01:28 +02:00
Matthew Honnibal
24337175df * Register zh package in setup.py 2016-05-03 14:36:59 +02:00
Wolfgang Seeker
7b246c13cb reformulate noun chunk tests for English 2016-05-03 14:24:35 +02:00
Wolfgang Seeker
1786331cd8 add model sanity test 2016-05-03 12:51:47 +02:00
Matthew Honnibal
1f1532142f * Fix cost calculation on non-monotonic oracle 2016-05-03 00:21:08 +02:00
Matthew Honnibal
377a624046 Merge pull request #358 from wbwseeker/german_lemmatizer_dummy
German lemmatizer dummy
2016-05-03 07:38:26 +10:00
Wolfgang Seeker
92bfbebeec remove unnecessary imports 2016-05-02 17:33:22 +02:00
Wolfgang Seeker
857454ffa0 fix indentation -.- 2016-05-02 17:10:41 +02:00
Matthew Honnibal
308a28c26c * Whitespace 2016-05-02 16:08:11 +02:00
Matthew Honnibal
29a114e645 * Don't assign 0-valued tags in Doc.from_array 2016-05-02 16:07:50 +02:00
Matthew Honnibal
c1c11a8ae0 * Fix formatting on serializer tests 2016-05-02 16:07:21 +02:00
Wolfgang Seeker
dae6bc05eb define German dummy lemmatizer until morphology is done 2016-05-02 16:04:53 +02:00
Matthew Honnibal
6e1f1c4b9e Merge pull request #357 from wbwseeker/german_ner
German ner
2016-05-02 23:39:34 +10:00
Wolfgang Seeker
b6b96b233c don't require read_json_file to expect particular annotations 2016-05-02 15:29:30 +02:00
Matthew Honnibal
902a389d85 * Fix merge conflict in test_parse 2016-05-02 15:28:07 +02:00
Matthew Honnibal
276fbe9996 * Fix assignment of iterator on Doc object 2016-05-02 15:26:24 +02:00
Matthew Honnibal
02c23cc1d0 * Fix sentence boundary test 2016-05-02 15:26:07 +02:00
Matthew Honnibal
d2f469b809 * Fix parsing tests, so that labels are added if they're missing, and so that the branching test values are correct 2016-05-02 15:25:27 +02:00
Wolfgang Seeker
b11cbb06c6 remove old tests for sentence boundary detection 2016-05-02 14:36:35 +02:00
Matthew Honnibal
508fd1f6dc * Refactor noun chunk iterators, so that they're simple functions. Install the iterator when the Doc is created, but allow users to write to the noun_chunk_iterator attribute. The iterator functions accept an object and yield (int start, int end, int label) triples. 2016-05-02 14:25:10 +02:00
Matthew Honnibal
e526be5602 Merge branch 'master' of ssh://github.com/spacy-io/spaCy 2016-05-02 13:08:08 +02:00
Wolfgang Seeker
fa961ea694 add tests for serialization bug 2016-05-02 11:01:56 +02:00
Henning Peters
9b142d4438 can't work around build issue on windows 2016-05-01 12:30:59 +02:00
Henning Peters
749cbd359e Update LICENSE 2016-04-29 09:49:28 +02:00
Henning Peters
fac209cb7e add stdint.h fallback (vs 2008) 2016-04-29 00:08:14 +02:00
Henning Peters
2bf34687ea add stdint.h fallback (vs 2008) 2016-04-28 22:10:43 +02:00
Matthew Honnibal
97b2bba249 * Merge updated/simplified Break approach 2016-04-25 19:44:42 +00:00
Matthew Honnibal
77609588b6 * Fix assignment of root label to words left as root implicitly, after parsing ends. 2016-04-25 19:41:59 +00:00
Matthew Honnibal
7c2d2deaa7 * Revise transition system so that the Break transition retains sole responsibility for setting sentence boundaries. Re Issue #322 2016-04-25 19:41:59 +00:00
Wolfgang Seeker
c2f76a4024 Merge branch 'master' into german_ner 2016-04-25 13:21:23 +02:00
Matthew Honnibal
feb65fcaa1 Merge pull request #346 from wbwseeker/sentbnd_bug
introduce sentence boundaries for additional root tokens
2016-04-25 20:31:27 +10:00
Wolfgang Seeker
1003e7ccec remove debug output from tests 2016-04-25 12:12:40 +02:00
Wolfgang Seeker
f57f843e85 fix bug in updating tree structure when introducing additional roots 2016-04-25 12:01:19 +02:00
Matthew Honnibal
478a8d1829 * Register Chinese language in spacy/__init__.py 2016-04-24 18:45:16 +02:00
Matthew Honnibal
8569dbc2d0 * Add initial stuff for Chinese parsing 2016-04-24 18:44:24 +02:00
Wolfgang Seeker
4d7f393fae don't require json-files to have syntactic annotation 2016-04-22 16:32:27 +02:00
Wolfgang Seeker
b6477fc4f4 adjusted tests to Travis Setup 2016-04-21 17:15:10 +02:00
Wolfgang Seeker
736ffcb9a2 remove whitespace 2016-04-21 16:55:55 +02:00
Wolfgang Seeker
6c7301cc6d the parser now introduces sentence boundaries properly when predicting dependents with root labels 2016-04-21 16:50:53 +02:00
Wolfgang Seeker
12024b0b0a bugfix: introducing multiple roots now updates original head's properties
adjust tests to rely less on statistical model
2016-04-20 16:42:41 +02:00
Henning Peters
c356251f45 Merge branch 'master' of github.com:spacy-io/spaCy 2016-04-19 19:50:55 +02:00
Henning Peters
bb3238bcdd pin numpy to >=1.7, ship headers 2016-04-19 19:50:42 +02:00