Matthew Honnibal
|
bb4f201ad2
|
Pass morphological features from tag map into the lemmatizer.
|
2016-09-27 14:01:43 +02:00 |
|
Matthew Honnibal
|
40509e8bca
|
Tweak the new is_base_form logic, because we can expect the 'pos' key in the morphology we're passed.
|
2016-09-27 14:01:16 +02:00 |
|
Matthew Honnibal
|
9c8ac91d72
|
Add test for Issue #435
|
2016-09-27 13:52:38 +02:00 |
|
Matthew Honnibal
|
3cb4d455d2
|
Pass lemmatizer morphological features, so that rules are sensitive to base/inflected distinction, which is how the WordNet data is designed. See Issue #435
|
2016-09-27 13:52:11 +02:00 |
|
Matthew Honnibal
|
e233328d38
|
Fix Issue #371: Lexeme objects were unhashable.
|
2016-09-27 13:22:30 +02:00 |
|
Matthew Honnibal
|
e382e48d9f
|
Temporarily patch handling of defaul templates for tagger. Need to move these to language_data.
|
2016-09-27 13:21:28 +02:00 |
|
Matthew Honnibal
|
a44763af0e
|
Fix Issue #469: Incorrectly cased root label in noun chunk iterator
|
2016-09-27 13:13:01 +02:00 |
|
Matthew Honnibal
|
b14b9b096b
|
Return None if /deps directory not present, instead of trying to load the parser.
|
2016-09-26 18:48:03 +02:00 |
|
Matthew Honnibal
|
e07b9665f7
|
Don't expect parser model
|
2016-09-26 18:09:33 +02:00 |
|
Matthew Honnibal
|
ee6fa106da
|
Fix parser features
|
2016-09-26 17:57:32 +02:00 |
|
Matthew Honnibal
|
e607e4b598
|
Fix parser loading
|
2016-09-26 17:51:11 +02:00 |
|
Matthew Honnibal
|
0b2d7ae9d6
|
Fix Entity creation
|
2016-09-26 15:41:22 +02:00 |
|
Matthew Honnibal
|
2debc4e0a2
|
Add .blank() method to Parser. Start housing default dep labels and entity types within the Defaults class.
|
2016-09-26 11:57:54 +02:00 |
|
Matthew Honnibal
|
722199acb8
|
Add spacy.blank() method, that doesn't load data. Don't try to load data if path is falsey
|
2016-09-26 11:07:46 +02:00 |
|
Matthew Honnibal
|
e56653f848
|
Add language data for German
|
2016-09-25 15:44:45 +02:00 |
|
Matthew Honnibal
|
7db956133e
|
Move tokenizer data for German into spacy.de.language_data
|
2016-09-25 15:37:33 +02:00 |
|
Matthew Honnibal
|
95aaea0d3f
|
Refactor so that the tokenizer data is read from Python data, rather than from disk
|
2016-09-25 14:49:53 +02:00 |
|
Matthew Honnibal
|
d7e9acdcdf
|
Add English language data, so that the tokenizer doesn't require the data download
|
2016-09-25 14:49:00 +02:00 |
|
Matthew Honnibal
|
82b8cc5efb
|
Whitespace
|
2016-09-24 22:17:01 +02:00 |
|
Matthew Honnibal
|
fd58f7655a
|
Python 3 compatible basestring
|
2016-09-24 22:16:43 +02:00 |
|
Matthew Honnibal
|
082e95b19e
|
Python 3 compatible basestring
|
2016-09-24 22:09:21 +02:00 |
|
Matthew Honnibal
|
f19af6cb2c
|
Python 3 compatible basestring
|
2016-09-24 22:08:43 +02:00 |
|
Matthew Honnibal
|
3ed4cdfe32
|
Handle pathlib.Path objects in CFile
|
2016-09-24 22:01:46 +02:00 |
|
Matthew Honnibal
|
df88690177
|
Fix encoding of path variable
|
2016-09-24 21:13:15 +02:00 |
|
Matthew Honnibal
|
af847e07fc
|
Fix usage of pathlib for Python3 -- turning paths to strings.
|
2016-09-24 21:05:27 +02:00 |
|
Matthew Honnibal
|
453683aaf0
|
Fix spacy/vocab.pyx
|
2016-09-24 20:50:31 +02:00 |
|
Matthew Honnibal
|
fd65cf6cbb
|
Finish refactoring data loading
|
2016-09-24 20:26:17 +02:00 |
|
Matthew Honnibal
|
83e364188c
|
Mostly finished loading refactoring. Design is in place, but doesn't work yet.
|
2016-09-24 15:42:01 +02:00 |
|
Matthew Honnibal
|
9dc8043a7e
|
Refactor Language to use new Defaults class, and work on revised data loading. We're getting rid of sputnik's weird file-system wrapper, and using pathlib.
|
2016-09-24 14:08:53 +02:00 |
|
Matthew Honnibal
|
b00f683a0c
|
Fix matcher test
|
2016-09-24 11:20:58 +02:00 |
|
Matthew Honnibal
|
eaf4065480
|
Expose the _patterns private member
|
2016-09-24 11:20:42 +02:00 |
|
Matthew Honnibal
|
15e42a1ba9
|
Allow entities to be set by Span, or by 4-tuple (with entity ID)
|
2016-09-24 01:17:43 +02:00 |
|
Matthew Honnibal
|
60fdf4d5f1
|
Remove commented out debuggng code
|
2016-09-24 01:17:18 +02:00 |
|
Matthew Honnibal
|
939a791a52
|
Update tests
|
2016-09-24 01:17:03 +02:00 |
|
Matthew Honnibal
|
55f1f7edaf
|
Don't automatically write new entities into the Doc in the Matcher. This fixes a long-standing wart, but introduces a *backwards incompatibility.*
|
2016-09-24 01:16:45 +02:00 |
|
Matthew Honnibal
|
e48df859b5
|
Fix typedef import in span.pyx
|
2016-09-23 16:02:28 +02:00 |
|
Matthew Honnibal
|
4de13606fd
|
Fix token.pyx
|
2016-09-23 15:07:07 +02:00 |
|
Matthew Honnibal
|
b4de419e19
|
Import hash_t typedef in token.pyx
|
2016-09-23 14:22:06 +02:00 |
|
Matthew Honnibal
|
c1a2e96604
|
Clean up notes at end of token.pyx
|
2016-09-21 20:45:51 +02:00 |
|
Matthew Honnibal
|
f6e587b1c7
|
Fix matcher tests
|
2016-09-21 20:45:20 +02:00 |
|
Matthew Honnibal
|
58e83fe34b
|
Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match.
|
2016-09-21 14:54:55 +02:00 |
|
Matthew Honnibal
|
2735b6247b
|
Fix orths_and_spaces in Doc.__init__
|
2016-09-21 14:52:05 +02:00 |
|
Matthew Honnibal
|
070af4af9d
|
Revert "* Working neural net, but features hacky. Switching to extractor."
This reverts commit 7c2f1a673b .
|
2016-09-21 12:26:14 +02:00 |
|
Matthew Honnibal
|
6b202ec43f
|
Merge branch 'master' of ssh://github.com/spacy-io/spaCy
|
2016-09-21 12:08:25 +02:00 |
|
Mahmoud Lababidi
|
4c9ccc3b8b
|
Add parameter to download() for application to not exit if a Model exists. The default behavior is unchanged.
|
2016-09-14 10:04:09 -04:00 |
|
Adam Ever Hadani
|
f1c0762443
|
exit code 0 for when downloading a model that already was downloaded
|
2016-07-13 16:22:14 -07:00 |
|
Matthew Honnibal
|
7c2f1a673b
|
* Working neural net, but features hacky. Switching to extractor.
|
2016-05-26 19:06:10 +02:00 |
|
Matthew Honnibal
|
cdc10e9a1c
|
* Fix Issue #375: noun phrase iteration results in index error if noun phrases are merged during the loop. Fix by accumulating the spans inside the noun_chunks property, allowing the Span index tricks to work.
|
2016-05-20 10:14:06 +02:00 |
|
Matthew Honnibal
|
13fad36e49
|
* Cosmetic change to english noun chunks iterator -- use enumerate instead of range loop
|
2016-05-20 10:11:05 +02:00 |
|
Matthew Honnibal
|
02276cc444
|
Merge branch 'master' of ssh://github.com/spacy-io/spaCy
|
2016-05-17 16:56:22 +02:00 |
|
Matthew Honnibal
|
4d7f5468bb
|
* Change Language class to use a .pipeline attribute, instead of having the pipeline hard coded
|
2016-05-17 16:55:42 +02:00 |
|
Daylen Yang
|
5405e7dd73
|
Fix get_lang_class parsing (take 2)
|
2016-05-16 16:40:31 -07:00 |
|
Matthew Honnibal
|
b240104f40
|
Revert "Fix get_lang_class parsing"
|
2016-05-17 08:04:26 +10:00 |
|
Daylen Yang
|
1692c2df3c
|
Fix get_lang_class parsing
We want the get_lang_class to return "en" for both "en" and "en_glove_cc_300_1m_vectors". Changed the split rule to "_" so that this happens.
|
2016-05-16 14:38:20 -07:00 |
|
Matthew Honnibal
|
17137f5c0c
|
* Fix issue #372: mistake in Lexeme rich comparison
|
2016-05-12 12:58:57 +02:00 |
|
Matthew Honnibal
|
cc8bf62208
|
* Fix Issue #360: Tokenizer failed when the infix regex matched the start of the string while trying to tokenize multi-infix tokens.
|
2016-05-09 13:23:47 +02:00 |
|
Matthew Honnibal
|
c61ee8f9fa
|
* Increment version
|
2016-05-09 13:20:00 +02:00 |
|
Matthew Honnibal
|
5d86c30f0b
|
* Fix Issue #367: Missing has_vector property on Doc and Span objects
|
2016-05-09 12:36:14 +02:00 |
|
Wolfgang Seeker
|
7b78239436
|
add fix for German noun chunk iterator (issue #365)
|
2016-05-06 01:41:26 +02:00 |
|
Matthew Honnibal
|
8c0888d6cb
|
* Fix error in span.sent
|
2016-05-06 00:28:05 +02:00 |
|
Matthew Honnibal
|
bb94022975
|
* Fix Issue #365: Error introduced during noun phrase chunking, due to use of corrected PRON/PROPN/etc tags.
|
2016-05-06 00:21:05 +02:00 |
|
Matthew Honnibal
|
41342ca79b
|
Merge branch 'master' of ssh://github.com/spacy-io/spaCy
|
2016-05-06 00:17:58 +02:00 |
|
Matthew Honnibal
|
26095f9722
|
* Add span.sent property, re Issue #366
|
2016-05-06 00:17:38 +02:00 |
|
Wolfgang Seeker
|
dbf8f5f3ec
|
fix bug in StateC.set_break()
|
2016-05-05 15:15:34 +02:00 |
|
Wolfgang Seeker
|
3c44b5dc1a
|
call deprojectivization after parsing
|
2016-05-05 15:10:36 +02:00 |
|
Matthew Honnibal
|
472f576b82
|
* Deprojectivize German parses
|
2016-05-05 15:01:10 +02:00 |
|
Matthew Honnibal
|
9bbd6cf031
|
* Work on Chinese support
|
2016-05-05 11:39:12 +02:00 |
|
Matthew Honnibal
|
a6a25166ba
|
* Remove print from test
|
2016-05-05 11:10:59 +02:00 |
|
Matthew Honnibal
|
e31df66d26
|
* Fix Issue #361: Lexemes didn't have rich comparison.
|
2016-05-05 01:32:26 +02:00 |
|
Matthew Honnibal
|
7441ca30ee
|
* Add tests for Issue #361: Lexeme rich comparison
|
2016-05-05 01:31:58 +02:00 |
|
Matthew Honnibal
|
72564213e3
|
* Add test for Issue #309
|
2016-05-04 16:00:28 +02:00 |
|
Matthew Honnibal
|
76f1d871da
|
Merge branch 'master' of ssh://github.com/spacy-io/spaCy
|
2016-05-04 15:54:00 +02:00 |
|
Matthew Honnibal
|
519366f677
|
* Fix Issue #351: Indices off when leading whitespace
|
2016-05-04 15:53:36 +02:00 |
|
Matthew Honnibal
|
b4bfc6ae55
|
* Add test for Issue #351: Indices off when leading whitespace
|
2016-05-04 15:53:17 +02:00 |
|
Matthew Honnibal
|
76021cb853
|
* Fix bug in Doc.text, introduced by a862edc
|
2016-05-04 11:02:16 +02:00 |
|
Wolfgang Seeker
|
e4ea2bea01
|
fix whitespace
|
2016-05-04 07:40:38 +02:00 |
|
Wolfgang Seeker
|
5bf2fd1f78
|
make the code less cryptic
|
2016-05-03 17:19:05 +02:00 |
|
Wolfgang Seeker
|
a06fca9fdf
|
German noun chunk iterator now doesn't return tokens more than once
|
2016-05-03 16:58:59 +02:00 |
|
Wolfgang Seeker
|
7825b75548
|
add tests for German noun chunker
|
2016-05-03 15:01:28 +02:00 |
|
Wolfgang Seeker
|
7b246c13cb
|
reformulate noun chunk tests for English
|
2016-05-03 14:24:35 +02:00 |
|
Wolfgang Seeker
|
1786331cd8
|
add model sanity test
|
2016-05-03 12:51:47 +02:00 |
|
Matthew Honnibal
|
1f1532142f
|
* Fix cost calculation on non-monotonic oracle
|
2016-05-03 00:21:08 +02:00 |
|
Matthew Honnibal
|
377a624046
|
Merge pull request #358 from wbwseeker/german_lemmatizer_dummy
German lemmatizer dummy
|
2016-05-03 07:38:26 +10:00 |
|
Wolfgang Seeker
|
92bfbebeec
|
remove unnecessary imports
|
2016-05-02 17:33:22 +02:00 |
|
Wolfgang Seeker
|
857454ffa0
|
fix indentation -.-
|
2016-05-02 17:10:41 +02:00 |
|
Matthew Honnibal
|
308a28c26c
|
* Whitespace
|
2016-05-02 16:08:11 +02:00 |
|
Matthew Honnibal
|
29a114e645
|
* Don't assign 0-valued tags in Doc.from_array
|
2016-05-02 16:07:50 +02:00 |
|
Matthew Honnibal
|
c1c11a8ae0
|
* Fix formatting on serializer tests
|
2016-05-02 16:07:21 +02:00 |
|
Wolfgang Seeker
|
dae6bc05eb
|
define German dummy lemmatizer until morphology is done
|
2016-05-02 16:04:53 +02:00 |
|
Matthew Honnibal
|
6e1f1c4b9e
|
Merge pull request #357 from wbwseeker/german_ner
German ner
|
2016-05-02 23:39:34 +10:00 |
|
Wolfgang Seeker
|
b6b96b233c
|
don't require read_json_file to expect particular annotations
|
2016-05-02 15:29:30 +02:00 |
|
Matthew Honnibal
|
902a389d85
|
* Fix merge conflict in test_parse
|
2016-05-02 15:28:07 +02:00 |
|
Matthew Honnibal
|
276fbe9996
|
* Fix assignment of iterator on Doc object
|
2016-05-02 15:26:24 +02:00 |
|
Matthew Honnibal
|
02c23cc1d0
|
* Fix sentence boundary test
|
2016-05-02 15:26:07 +02:00 |
|
Matthew Honnibal
|
d2f469b809
|
* Fix parsing tests, so that labels are added if they're missing, and so that the branching test values are correct
|
2016-05-02 15:25:27 +02:00 |
|
Wolfgang Seeker
|
b11cbb06c6
|
remove old tests for sentence boundary detection
|
2016-05-02 14:36:35 +02:00 |
|
Matthew Honnibal
|
508fd1f6dc
|
* Refactor noun chunk iterators, so that they're simple functions. Install the iterator when the Doc is created, but allow users to write to the noun_chunk_iterator attribute. The iterator functions accept an object and yield (int start, int end, int label) triples.
|
2016-05-02 14:25:10 +02:00 |
|
Matthew Honnibal
|
e526be5602
|
Merge branch 'master' of ssh://github.com/spacy-io/spaCy
|
2016-05-02 13:08:08 +02:00 |
|
Wolfgang Seeker
|
fa961ea694
|
add tests for serialization bug
|
2016-05-02 11:01:56 +02:00 |
|
Matthew Honnibal
|
97b2bba249
|
* Merge updated/simplified Break approach
|
2016-04-25 19:44:42 +00:00 |
|