Matthew Honnibal
|
3ff09614e0
|
Changes to matcher.pyx for new StringStore scheme
|
2016-09-30 19:56:48 +02:00 |
|
Matthew Honnibal
|
eceeaefe53
|
Fix defaults for Parser and Entity, adding a blank= argument.
|
2016-09-30 19:56:06 +02:00 |
|
Matthew Honnibal
|
8423e8627f
|
Work on Issue #285: intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good.
|
2016-09-30 10:14:47 +02:00 |
|
Matthew Honnibal
|
d3dc5718b2
|
Fix syntax error in Doc
|
2016-09-28 11:39:49 +02:00 |
|
Matthew Honnibal
|
1b520e7bab
|
Improve docstrings for Doc object
|
2016-09-28 11:15:13 +02:00 |
|
Matthew Honnibal
|
81a47c01d8
|
Fix test for empty sentence string.
|
2016-09-27 19:21:22 +02:00 |
|
Matthew Honnibal
|
4cbf0d3bb6
|
Handle errors when no valid actions are available, pointing users to the issue tracker.
|
2016-09-27 19:19:53 +02:00 |
|
Matthew Honnibal
|
430473bd98
|
Raise errors when no actions are available, re Issue #429
|
2016-09-27 19:09:37 +02:00 |
|
Matthew Honnibal
|
fc4a7ad794
|
Test and fix Issue #411: IndexError when .sents property is used on empty string.
|
2016-09-27 18:49:14 +02:00 |
|
Matthew Honnibal
|
3d370b7d45
|
Add test for Issue #445, fixed in 3cb4d455d , with improved lemmatizer logic
|
2016-09-27 18:39:46 +02:00 |
|
Matthew Honnibal
|
a2f3510d6d
|
Fix lemmatizer
|
2016-09-27 17:47:05 +02:00 |
|
Matthew Honnibal
|
07776d8096
|
Fix pos name conflict in lemmatize
|
2016-09-27 17:35:58 +02:00 |
|
Matthew Honnibal
|
35cd953f9e
|
Fix pos name conflict with morphology
|
2016-09-27 14:16:22 +02:00 |
|
Matthew Honnibal
|
8e7df3c4ca
|
Expect the parser data, if parser.load() is called.
|
2016-09-27 14:02:12 +02:00 |
|
Matthew Honnibal
|
bb4f201ad2
|
Pass morphological features from tag map into the lemmatizer.
|
2016-09-27 14:01:43 +02:00 |
|
Matthew Honnibal
|
40509e8bca
|
Tweak the new is_base_form logic, because we can expect the 'pos' key in the morphology we're passed.
|
2016-09-27 14:01:16 +02:00 |
|
Matthew Honnibal
|
9c8ac91d72
|
Add test for Issue #435
|
2016-09-27 13:52:38 +02:00 |
|
Matthew Honnibal
|
3cb4d455d2
|
Pass lemmatizer morphological features, so that rules are sensitive to base/inflected distinction, which is how the WordNet data is designed. See Issue #435
|
2016-09-27 13:52:11 +02:00 |
|
Matthew Honnibal
|
e233328d38
|
Fix Issue #371: Lexeme objects were unhashable.
|
2016-09-27 13:22:30 +02:00 |
|
Matthew Honnibal
|
e382e48d9f
|
Temporarily patch handling of defaul templates for tagger. Need to move these to language_data.
|
2016-09-27 13:21:28 +02:00 |
|
Matthew Honnibal
|
a44763af0e
|
Fix Issue #469: Incorrectly cased root label in noun chunk iterator
|
2016-09-27 13:13:01 +02:00 |
|
Matthew Honnibal
|
b14b9b096b
|
Return None if /deps directory not present, instead of trying to load the parser.
|
2016-09-26 18:48:03 +02:00 |
|
Matthew Honnibal
|
e07b9665f7
|
Don't expect parser model
|
2016-09-26 18:09:33 +02:00 |
|
Matthew Honnibal
|
ee6fa106da
|
Fix parser features
|
2016-09-26 17:57:32 +02:00 |
|
Matthew Honnibal
|
e607e4b598
|
Fix parser loading
|
2016-09-26 17:51:11 +02:00 |
|
Matthew Honnibal
|
0b2d7ae9d6
|
Fix Entity creation
|
2016-09-26 15:41:22 +02:00 |
|
Matthew Honnibal
|
2debc4e0a2
|
Add .blank() method to Parser. Start housing default dep labels and entity types within the Defaults class.
|
2016-09-26 11:57:54 +02:00 |
|
Matthew Honnibal
|
722199acb8
|
Add spacy.blank() method, that doesn't load data. Don't try to load data if path is falsey
|
2016-09-26 11:07:46 +02:00 |
|
Matthew Honnibal
|
e56653f848
|
Add language data for German
|
2016-09-25 15:44:45 +02:00 |
|
Matthew Honnibal
|
7db956133e
|
Move tokenizer data for German into spacy.de.language_data
|
2016-09-25 15:37:33 +02:00 |
|
Matthew Honnibal
|
95aaea0d3f
|
Refactor so that the tokenizer data is read from Python data, rather than from disk
|
2016-09-25 14:49:53 +02:00 |
|
Matthew Honnibal
|
d7e9acdcdf
|
Add English language data, so that the tokenizer doesn't require the data download
|
2016-09-25 14:49:00 +02:00 |
|
Matthew Honnibal
|
82b8cc5efb
|
Whitespace
|
2016-09-24 22:17:01 +02:00 |
|
Matthew Honnibal
|
fd58f7655a
|
Python 3 compatible basestring
|
2016-09-24 22:16:43 +02:00 |
|
Matthew Honnibal
|
082e95b19e
|
Python 3 compatible basestring
|
2016-09-24 22:09:21 +02:00 |
|
Matthew Honnibal
|
f19af6cb2c
|
Python 3 compatible basestring
|
2016-09-24 22:08:43 +02:00 |
|
Matthew Honnibal
|
3ed4cdfe32
|
Handle pathlib.Path objects in CFile
|
2016-09-24 22:01:46 +02:00 |
|
Matthew Honnibal
|
df88690177
|
Fix encoding of path variable
|
2016-09-24 21:13:15 +02:00 |
|
Matthew Honnibal
|
af847e07fc
|
Fix usage of pathlib for Python3 -- turning paths to strings.
|
2016-09-24 21:05:27 +02:00 |
|
Matthew Honnibal
|
453683aaf0
|
Fix spacy/vocab.pyx
|
2016-09-24 20:50:31 +02:00 |
|
Matthew Honnibal
|
fd65cf6cbb
|
Finish refactoring data loading
|
2016-09-24 20:26:17 +02:00 |
|
Matthew Honnibal
|
83e364188c
|
Mostly finished loading refactoring. Design is in place, but doesn't work yet.
|
2016-09-24 15:42:01 +02:00 |
|
Matthew Honnibal
|
9dc8043a7e
|
Refactor Language to use new Defaults class, and work on revised data loading. We're getting rid of sputnik's weird file-system wrapper, and using pathlib.
|
2016-09-24 14:08:53 +02:00 |
|
Matthew Honnibal
|
b00f683a0c
|
Fix matcher test
|
2016-09-24 11:20:58 +02:00 |
|
Matthew Honnibal
|
eaf4065480
|
Expose the _patterns private member
|
2016-09-24 11:20:42 +02:00 |
|
Matthew Honnibal
|
15e42a1ba9
|
Allow entities to be set by Span, or by 4-tuple (with entity ID)
|
2016-09-24 01:17:43 +02:00 |
|
Matthew Honnibal
|
60fdf4d5f1
|
Remove commented out debuggng code
|
2016-09-24 01:17:18 +02:00 |
|
Matthew Honnibal
|
939a791a52
|
Update tests
|
2016-09-24 01:17:03 +02:00 |
|
Matthew Honnibal
|
55f1f7edaf
|
Don't automatically write new entities into the Doc in the Matcher. This fixes a long-standing wart, but introduces a *backwards incompatibility.*
|
2016-09-24 01:16:45 +02:00 |
|
Matthew Honnibal
|
e48df859b5
|
Fix typedef import in span.pyx
|
2016-09-23 16:02:28 +02:00 |
|