Matthew Honnibal
|
377a624046
|
Merge pull request #358 from wbwseeker/german_lemmatizer_dummy
German lemmatizer dummy
|
2016-05-03 07:38:26 +10:00 |
|
Wolfgang Seeker
|
92bfbebeec
|
remove unnecessary imports
|
2016-05-02 17:33:22 +02:00 |
|
Wolfgang Seeker
|
857454ffa0
|
fix indentation -.-
|
2016-05-02 17:10:41 +02:00 |
|
Matthew Honnibal
|
308a28c26c
|
* Whitespace
|
2016-05-02 16:08:11 +02:00 |
|
Matthew Honnibal
|
29a114e645
|
* Don't assign 0-valued tags in Doc.from_array
|
2016-05-02 16:07:50 +02:00 |
|
Matthew Honnibal
|
c1c11a8ae0
|
* Fix formatting on serializer tests
|
2016-05-02 16:07:21 +02:00 |
|
Wolfgang Seeker
|
dae6bc05eb
|
define German dummy lemmatizer until morphology is done
|
2016-05-02 16:04:53 +02:00 |
|
Matthew Honnibal
|
6e1f1c4b9e
|
Merge pull request #357 from wbwseeker/german_ner
German ner
|
2016-05-02 23:39:34 +10:00 |
|
Wolfgang Seeker
|
b6b96b233c
|
don't require read_json_file to expect particular annotations
|
2016-05-02 15:29:30 +02:00 |
|
Matthew Honnibal
|
902a389d85
|
* Fix merge conflict in test_parse
|
2016-05-02 15:28:07 +02:00 |
|
Matthew Honnibal
|
276fbe9996
|
* Fix assignment of iterator on Doc object
|
2016-05-02 15:26:24 +02:00 |
|
Matthew Honnibal
|
02c23cc1d0
|
* Fix sentence boundary test
|
2016-05-02 15:26:07 +02:00 |
|
Matthew Honnibal
|
d2f469b809
|
* Fix parsing tests, so that labels are added if they're missing, and so that the branching test values are correct
|
2016-05-02 15:25:27 +02:00 |
|
Wolfgang Seeker
|
b11cbb06c6
|
remove old tests for sentence boundary detection
|
2016-05-02 14:36:35 +02:00 |
|
Matthew Honnibal
|
508fd1f6dc
|
* Refactor noun chunk iterators, so that they're simple functions. Install the iterator when the Doc is created, but allow users to write to the noun_chunk_iterator attribute. The iterator functions accept an object and yield (int start, int end, int label) triples.
|
2016-05-02 14:25:10 +02:00 |
|
Matthew Honnibal
|
e526be5602
|
Merge branch 'master' of ssh://github.com/spacy-io/spaCy
|
2016-05-02 13:08:08 +02:00 |
|
Wolfgang Seeker
|
fa961ea694
|
add tests for serialization bug
|
2016-05-02 11:01:56 +02:00 |
|
Henning Peters
|
9b142d4438
|
can't work around build issue on windows
|
2016-05-01 12:30:59 +02:00 |
|
Henning Peters
|
749cbd359e
|
Update LICENSE
|
2016-04-29 09:49:28 +02:00 |
|
Henning Peters
|
fac209cb7e
|
add stdint.h fallback (vs 2008)
|
2016-04-29 00:08:14 +02:00 |
|
Henning Peters
|
2bf34687ea
|
add stdint.h fallback (vs 2008)
|
2016-04-28 22:10:43 +02:00 |
|
Matthew Honnibal
|
97b2bba249
|
* Merge updated/simplified Break approach
|
2016-04-25 19:44:42 +00:00 |
|
Matthew Honnibal
|
77609588b6
|
* Fix assignment of root label to words left as root implicitly, after parsing ends.
|
2016-04-25 19:41:59 +00:00 |
|
Matthew Honnibal
|
7c2d2deaa7
|
* Revise transition system so that the Break transition retains sole responsibility for setting sentence boundaries. Re Issue #322
|
2016-04-25 19:41:59 +00:00 |
|
Wolfgang Seeker
|
c2f76a4024
|
Merge branch 'master' into german_ner
|
2016-04-25 13:21:23 +02:00 |
|
Matthew Honnibal
|
feb65fcaa1
|
Merge pull request #346 from wbwseeker/sentbnd_bug
introduce sentence boundaries for additional root tokens
|
2016-04-25 20:31:27 +10:00 |
|
Wolfgang Seeker
|
1003e7ccec
|
remove debug output from tests
|
2016-04-25 12:12:40 +02:00 |
|
Wolfgang Seeker
|
f57f843e85
|
fix bug in updating tree structure when introducing additional roots
|
2016-04-25 12:01:19 +02:00 |
|
Matthew Honnibal
|
478a8d1829
|
* Register Chinese language in spacy/__init__.py
|
2016-04-24 18:45:16 +02:00 |
|
Matthew Honnibal
|
8569dbc2d0
|
* Add initial stuff for Chinese parsing
|
2016-04-24 18:44:24 +02:00 |
|
Wolfgang Seeker
|
4d7f393fae
|
don't require json-files to have syntactic annotation
|
2016-04-22 16:32:27 +02:00 |
|
Wolfgang Seeker
|
b6477fc4f4
|
adjusted tests to Travis Setup
|
2016-04-21 17:15:10 +02:00 |
|
Wolfgang Seeker
|
736ffcb9a2
|
remove whitespace
|
2016-04-21 16:55:55 +02:00 |
|
Wolfgang Seeker
|
6c7301cc6d
|
the parser now introduces sentence boundaries properly when predicting dependents with root labels
|
2016-04-21 16:50:53 +02:00 |
|
Wolfgang Seeker
|
12024b0b0a
|
bugfix: introducing multiple roots now updates original head's properties
adjust tests to rely less on statistical model
|
2016-04-20 16:42:41 +02:00 |
|
Henning Peters
|
c356251f45
|
Merge branch 'master' of github.com:spacy-io/spaCy
|
2016-04-19 19:50:55 +02:00 |
|
Henning Peters
|
bb3238bcdd
|
pin numpy to >=1.7, ship headers
|
2016-04-19 19:50:42 +02:00 |
|
Matthew Honnibal
|
67ce96c9c9
|
* Make patterns argument to Matcher class optional
|
2016-04-17 21:32:24 +02:00 |
|
Matthew Honnibal
|
8b4677d34d
|
* Add missing keyword arguments to spacy.load() function
|
2016-04-17 21:31:50 +02:00 |
|
Matthew Honnibal
|
2add5206aa
|
* Fix description of matcher test
|
2016-04-17 15:40:21 +02:00 |
|
Matthew Honnibal
|
2b419d5b8c
|
* Update test for Issue #242
|
2016-04-17 15:34:23 +02:00 |
|
Matthew Honnibal
|
f12b043308
|
* Add test for Issue #242: Overlapping matches not well recognised.
|
2016-04-17 15:19:17 +02:00 |
|
Wolfgang Seeker
|
b98cc3266d
|
bugfix: iterators now reset properly when called a second time
|
2016-04-15 17:49:16 +02:00 |
|
Wolfgang Seeker
|
e6945c4d0e
|
bugfix: uppercase attr values before looking them up
|
2016-04-15 15:46:31 +02:00 |
|
Matthew Honnibal
|
c0909afe22
|
Merge pull request #312 from wbwseeker/space_head_bug
add restrictions to L-arc and R-arc to prevent space heads
|
2016-04-15 20:36:03 +10:00 |
|
Wolfgang Seeker
|
289b10f441
|
remove some comments
|
2016-04-14 15:37:51 +02:00 |
|
Matthew Honnibal
|
fe9299a118
|
* Fix long-standing issue with coarse-grained tags: proper nouns weren't receiving the PROPN tag, and personal pronouns weren't receiving the PRON tag. This should fix Issue #191, and also Issue #325, which reported that proper nouns were being lemmatized using the common noun policies. This lemmatization will be prevented if the universal tag is PROPN, not NOUN, as no lemmatization rules are loaded for the PROPN tag.
|
2016-04-14 12:46:43 +02:00 |
|
Matthew Honnibal
|
6f82065761
|
* Fix infixed commas in tokenizer, re Issue #326. Need to benchmark on empirical data, to make sure this doesn't break other cases.
|
2016-04-14 11:36:03 +02:00 |
|
Matthew Honnibal
|
0f957dd586
|
Merge branch 'master' of ssh://github.com/honnibal/spaCy
|
2016-04-14 10:37:56 +02:00 |
|
Matthew Honnibal
|
108aca0e50
|
* Make Matcher use attrs from the attrs.pyx file, rather than having an incomplete function doing the mapping.
|
2016-04-14 10:37:39 +02:00 |
|