spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-11-12 22:05:52 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	23b7244842	Make sure symbols are unicode strings	2016-09-30 20:02:19 +02:00
Matthew Honnibal	f5a6aac906	Changes to tagger for new string store scheme	2016-09-30 20:01:51 +02:00
Matthew Honnibal	717741b6cf	Changes to Lexeme for new string store scheme	2016-09-30 20:01:36 +02:00
Matthew Honnibal	a51149a717	Changes to vocab for new stringstore scheme	2016-09-30 20:01:19 +02:00
Matthew Honnibal	21e90d7d0b	Changes to test for new string-store	2016-09-30 20:00:58 +02:00
Matthew Honnibal	99de44d864	Changes to Doc and Token for new string store scheme	2016-09-30 20:00:21 +02:00
Matthew Honnibal	78f19baafa	Fix report of ParserStateError	2016-09-30 19:59:22 +02:00
Matthew Honnibal	0442e0ab1e	Changes to transition systems for new StringStore scheme	2016-09-30 19:58:51 +02:00
Matthew Honnibal	22d4752d64	Changes to strings.pyx for new StringStore scheme	2016-09-30 19:58:09 +02:00
Matthew Honnibal	4f794b215a	Changes to iterators.pyx for new StringStore scheme	2016-09-30 19:57:49 +02:00
Matthew Honnibal	95f8cfd745	Changes to morphology.pyx for new StringStore scheme	2016-09-30 19:57:10 +02:00
Matthew Honnibal	3ff09614e0	Changes to matcher.pyx for new StringStore scheme	2016-09-30 19:56:48 +02:00
Matthew Honnibal	eceeaefe53	Fix defaults for Parser and Entity, adding a blank= argument.	2016-09-30 19:56:06 +02:00
Matthew Honnibal	d61feffe24	Require new preshed	2016-09-30 18:41:01 +02:00
Ines Montani	7537b0d637	Update README.rst	2016-09-30 14:41:35 +02:00
Ines Montani	8039c1a92d	Update README.rst	2016-09-30 14:21:19 +02:00
Ines Montani	d6cc4d3dfe	Update README.rst	2016-09-30 14:17:23 +02:00
Matthew Honnibal	8423e8627f	Work on Issue #285 : intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good.	2016-09-30 10:14:47 +02:00
Matthew Honnibal	d3dc5718b2	Fix syntax error in Doc	2016-09-28 11:39:49 +02:00
Matthew Honnibal	1b520e7bab	Improve docstrings for Doc object	2016-09-28 11:15:13 +02:00
Matthew Honnibal	81a47c01d8	Fix test for empty sentence string.	2016-09-27 19:21:22 +02:00
Matthew Honnibal	4cbf0d3bb6	Handle errors when no valid actions are available, pointing users to the issue tracker.	2016-09-27 19:19:53 +02:00
Matthew Honnibal	430473bd98	Raise errors when no actions are available, re Issue #429	2016-09-27 19:09:37 +02:00
Matthew Honnibal	fc4a7ad794	Test and fix Issue #411 : IndexError when .sents property is used on empty string.	2016-09-27 18:49:14 +02:00
Matthew Honnibal	3d370b7d45	Add test for Issue #445 , fixed in `3cb4d455d`, with improved lemmatizer logic	2016-09-27 18:39:46 +02:00
Matthew Honnibal	a2f3510d6d	Fix lemmatizer	2016-09-27 17:47:05 +02:00
Matthew Honnibal	07776d8096	Fix pos name conflict in lemmatize	2016-09-27 17:35:58 +02:00
Matthew Honnibal	35cd953f9e	Fix pos name conflict with morphology	2016-09-27 14:16:22 +02:00
Matthew Honnibal	8e7df3c4ca	Expect the parser data, if parser.load() is called.	2016-09-27 14:02:12 +02:00
Matthew Honnibal	bb4f201ad2	Pass morphological features from tag map into the lemmatizer.	2016-09-27 14:01:43 +02:00
Matthew Honnibal	40509e8bca	Tweak the new is_base_form logic, because we can expect the 'pos' key in the morphology we're passed.	2016-09-27 14:01:16 +02:00
Matthew Honnibal	9c8ac91d72	Add test for Issue #435	2016-09-27 13:52:38 +02:00
Matthew Honnibal	3cb4d455d2	Pass lemmatizer morphological features, so that rules are sensitive to base/inflected distinction, which is how the WordNet data is designed. See Issue #435	2016-09-27 13:52:11 +02:00
Matthew Honnibal	e233328d38	Fix Issue #371 : Lexeme objects were unhashable.	2016-09-27 13:22:30 +02:00
Matthew Honnibal	e382e48d9f	Temporarily patch handling of defaul templates for tagger. Need to move these to language_data.	2016-09-27 13:21:28 +02:00
Matthew Honnibal	a44763af0e	Fix Issue #469 : Incorrectly cased root label in noun chunk iterator	2016-09-27 13:13:01 +02:00
Matthew Honnibal	b14b9b096b	Return None if /deps directory not present, instead of trying to load the parser.	2016-09-26 18:48:03 +02:00
Matthew Honnibal	e07b9665f7	Don't expect parser model	2016-09-26 18:09:33 +02:00
Matthew Honnibal	ee6fa106da	Fix parser features	2016-09-26 17:57:32 +02:00
Matthew Honnibal	e607e4b598	Fix parser loading	2016-09-26 17:51:11 +02:00
Matthew Honnibal	0b2d7ae9d6	Fix Entity creation	2016-09-26 15:41:22 +02:00
Matthew Honnibal	2debc4e0a2	Add .blank() method to Parser. Start housing default dep labels and entity types within the Defaults class.	2016-09-26 11:57:54 +02:00
Matthew Honnibal	722199acb8	Add spacy.blank() method, that doesn't load data. Don't try to load data if path is falsey	2016-09-26 11:07:46 +02:00
Matthew Honnibal	ae202e7a60	Fix init_model.py	2016-09-25 15:58:51 +02:00
Matthew Honnibal	e56653f848	Add language data for German	2016-09-25 15:44:45 +02:00
Matthew Honnibal	7db956133e	Move tokenizer data for German into spacy.de.language_data	2016-09-25 15:37:33 +02:00
Matthew Honnibal	95aaea0d3f	Refactor so that the tokenizer data is read from Python data, rather than from disk	2016-09-25 14:49:53 +02:00
Matthew Honnibal	d7e9acdcdf	Add English language data, so that the tokenizer doesn't require the data download	2016-09-25 14:49:00 +02:00
Matthew Honnibal	82b8cc5efb	Whitespace	2016-09-24 22:17:01 +02:00
Matthew Honnibal	fd58f7655a	Python 3 compatible basestring	2016-09-24 22:16:43 +02:00

... 43 44 45 46 47 ...

5306 Commits