spaCy

Matthew Honnibal d563f1eadb Fix Issue #587 : Segfault in Matcher, due to simple error in the state machine.	2016-10-28 17:42:00 +02:00
..
de	Add LANG attribute to English and German	2016-10-18 18:52:48 +02:00
en	Try to fix weird install glitch.	2016-10-23 19:46:28 +02:00
fi	access model via sputnik	2015-12-07 06:01:28 +01:00
it	access model via sputnik	2015-12-07 06:01:28 +01:00
munge	* Fix Python3 problem in align_raw	2015-07-28 16:06:53 +02:00
serialize	Fix Issue #459 -- failed to deserialize empty doc.	2016-10-23 16:31:05 +02:00
syntax	Infer types in transition_system.pyx	2016-10-27 18:08:13 +02:00
tests	Improve test slightly	2016-10-28 17:41:16 +02:00
tokens	Fix clobbering of 'missing' named ent values after assigning ents.	2016-10-26 13:13:56 +02:00
zh	* Work on Chinese support	2016-05-05 11:39:12 +02:00
__init__.pxd	* Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags.	2014-10-24 02:23:42 +11:00
__init__.py	Fix loading of GloVe vectors, to address Issue #541	2016-10-20 18:27:48 +02:00
about.py	Increment version	2016-10-23 20:22:53 +02:00
attrs.pxd	introduce lang field for LexemeC to hold language id	2016-03-10 13:01:34 +01:00
attrs.pyx	introduce lang field for LexemeC to hold language id	2016-03-10 13:01:34 +01:00
cfile.pxd	* Add cfile.pyx	2015-07-23 01:10:36 +02:00
cfile.pyx	Handle pathlib.Path objects in CFile	2016-09-24 22:01:46 +02:00
deprecated.py	Finish refactoring data loading	2016-09-24 20:26:17 +02:00
download.py	Make installation print data path.	2016-10-23 19:46:44 +02:00
gold.pxd	* Remove unused import	2015-07-25 18:11:16 +02:00
gold.pyx	Fix json loading, for Python 3.	2016-10-20 21:23:26 +02:00
language.py	Remove dead code	2016-10-26 13:11:07 +02:00
lemmatizer.py	Fix json loading, for Python 3.	2016-10-20 21:23:26 +02:00
lexeme.pxd	Remove stray .tensor attribute from Lexeme	2016-10-18 01:16:32 +02:00
lexeme.pyx	Fix vector_norm when vector is assigned to Lexeme.	2016-10-23 14:23:56 +02:00
matcher.pyx	Fix Issue #587 : Segfault in Matcher, due to simple error in the state machine.	2016-10-28 17:42:00 +02:00
morphology.pxd	Revert "Changes to morphology.pyx for new StringStore scheme"	2016-09-30 20:20:02 +02:00
morphology.pyx	Revert "Changes to morphology.pyx for new StringStore scheme"	2016-09-30 20:20:02 +02:00
multi_words.py	* Fix Issue #50 : Python 3 compatibility of v0.80	2015-04-13 05:59:43 +02:00
orth.pxd	remove text-unidecode dependency	2016-02-24 08:01:59 +01:00
orth.pyx	introduce lang field for LexemeC to hold language id	2016-03-10 13:01:34 +01:00
parts_of_speech.pxd	* Fix parts_of_speech now that symbols list has been reformed	2015-10-13 13:44:40 +11:00
parts_of_speech.pyx	* Fix NAMES list in spacy/parts_of_speech.pyx	2015-10-13 14:18:45 +11:00
pipeline.pxd	Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor.	2016-10-16 21:34:57 +02:00
pipeline.pyx	Fix issue #514 -- serializer fails when new entity type has been added. The fix here is quite ugly. It's best to add the entities ASAP after loading the NLP pipeline, to mitigate the brittleness.	2016-10-23 17:45:44 +02:00
scorer.py	Refactor training, with new spacy.train module. Defaults still a little awkward.	2016-10-09 12:24:24 +02:00
strings.pxd	Update strings.pxd	2016-10-24 14:00:35 +02:00
strings.pyx	Fix Python 3 basestring error	2016-10-24 14:22:51 +02:00
structs.pxd	Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match.	2016-09-21 14:54:55 +02:00
symbols.pxd	German noun chunk iterator now doesn't return tokens more than once	2016-05-03 16:58:59 +02:00
symbols.pyx	Make sure symbols are unicode strings	2016-09-30 20:02:19 +02:00
tagger.pxd	Add cfg field to Tagger	2016-10-17 01:03:41 +02:00
tagger.pyx	Fix JSON in tagger	2016-10-21 01:44:10 +02:00
tokenizer.pxd	Finish refactoring data loading	2016-09-24 20:26:17 +02:00
tokenizer.pyx	Fix JSON in tokenizer	2016-10-21 01:44:20 +02:00
train.py	Fix training evaluate method	2016-10-27 18:02:19 +02:00
typedefs.pxd	Revert "Work on Issue #285 : intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good."	2016-09-30 20:20:22 +02:00
typedefs.pyx	* Move POS tag definitions to parts_of_speech.pxd	2015-01-25 16:31:07 +11:00
util.py	Return None in match_best_version if not path exists.	2016-10-15 14:47:29 +02:00
vocab.pxd	Revert "Work on Issue #285 : intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good."	2016-09-30 20:20:22 +02:00
vocab.pyx	Fix vector norm when loading lexemes.	2016-10-23 19:40:18 +02:00

de

Add LANG attribute to English and German

2016-10-18 18:52:48 +02:00

en

Try to fix weird install glitch.

2016-10-23 19:46:28 +02:00

fi

access model via sputnik

2015-12-07 06:01:28 +01:00

it

access model via sputnik

2015-12-07 06:01:28 +01:00

munge

* Fix Python3 problem in align_raw

2015-07-28 16:06:53 +02:00

serialize

Fix Issue #459 -- failed to deserialize empty doc.

2016-10-23 16:31:05 +02:00

syntax

Infer types in transition_system.pyx

2016-10-27 18:08:13 +02:00

tests

Improve test slightly

2016-10-28 17:41:16 +02:00

tokens

Fix clobbering of 'missing' named ent values after assigning ents.

2016-10-26 13:13:56 +02:00

zh

* Work on Chinese support

2016-05-05 11:39:12 +02:00

__init__.pxd

* Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags.

2014-10-24 02:23:42 +11:00

__init__.py

Fix loading of GloVe vectors, to address Issue #541

2016-10-20 18:27:48 +02:00

about.py

Increment version

2016-10-23 20:22:53 +02:00

attrs.pxd

introduce lang field for LexemeC to hold language id

2016-03-10 13:01:34 +01:00

attrs.pyx

introduce lang field for LexemeC to hold language id

2016-03-10 13:01:34 +01:00

cfile.pxd

* Add cfile.pyx

2015-07-23 01:10:36 +02:00

cfile.pyx

Handle pathlib.Path objects in CFile

2016-09-24 22:01:46 +02:00

deprecated.py

Finish refactoring data loading

2016-09-24 20:26:17 +02:00

download.py

Make installation print data path.

2016-10-23 19:46:44 +02:00

gold.pxd

* Remove unused import

2015-07-25 18:11:16 +02:00

gold.pyx

Fix json loading, for Python 3.

2016-10-20 21:23:26 +02:00

language.py

Remove dead code

2016-10-26 13:11:07 +02:00

lemmatizer.py

Fix json loading, for Python 3.

2016-10-20 21:23:26 +02:00

lexeme.pxd

Remove stray .tensor attribute from Lexeme

2016-10-18 01:16:32 +02:00

lexeme.pyx

Fix vector_norm when vector is assigned to Lexeme.

2016-10-23 14:23:56 +02:00

matcher.pyx

Fix Issue #587 : Segfault in Matcher, due to simple error in the state machine.

2016-10-28 17:42:00 +02:00

morphology.pxd

Revert "Changes to morphology.pyx for new StringStore scheme"

2016-09-30 20:20:02 +02:00

morphology.pyx

Revert "Changes to morphology.pyx for new StringStore scheme"

2016-09-30 20:20:02 +02:00

multi_words.py

* Fix Issue #50 : Python 3 compatibility of v0.80

2015-04-13 05:59:43 +02:00

orth.pxd

remove text-unidecode dependency

2016-02-24 08:01:59 +01:00

orth.pyx

introduce lang field for LexemeC to hold language id

2016-03-10 13:01:34 +01:00

parts_of_speech.pxd

* Fix parts_of_speech now that symbols list has been reformed

2015-10-13 13:44:40 +11:00

parts_of_speech.pyx

* Fix NAMES list in spacy/parts_of_speech.pyx

2015-10-13 14:18:45 +11:00

pipeline.pxd

Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor.

2016-10-16 21:34:57 +02:00

pipeline.pyx

Fix issue #514 -- serializer fails when new entity type has been added. The fix here is quite ugly. It's best to add the entities ASAP after loading the NLP pipeline, to mitigate the brittleness.

2016-10-23 17:45:44 +02:00

scorer.py

Refactor training, with new spacy.train module. Defaults still a little awkward.

2016-10-09 12:24:24 +02:00

strings.pxd

Update strings.pxd

2016-10-24 14:00:35 +02:00

strings.pyx

Fix Python 3 basestring error

2016-10-24 14:22:51 +02:00

structs.pxd

Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match.

2016-09-21 14:54:55 +02:00

symbols.pxd

German noun chunk iterator now doesn't return tokens more than once

2016-05-03 16:58:59 +02:00

symbols.pyx

Make sure symbols are unicode strings

2016-09-30 20:02:19 +02:00

tagger.pxd

Add cfg field to Tagger

2016-10-17 01:03:41 +02:00

tagger.pyx

Fix JSON in tagger

2016-10-21 01:44:10 +02:00

tokenizer.pxd

Finish refactoring data loading

2016-09-24 20:26:17 +02:00

tokenizer.pyx

Fix JSON in tokenizer

2016-10-21 01:44:20 +02:00

train.py

Fix training evaluate method

2016-10-27 18:02:19 +02:00

typedefs.pxd

Revert "Work on Issue #285 : intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good."

2016-09-30 20:20:22 +02:00

typedefs.pyx

* Move POS tag definitions to parts_of_speech.pxd

2015-01-25 16:31:07 +11:00

util.py

Return None in match_best_version if not path exists.

2016-10-15 14:47:29 +02:00

vocab.pxd

Revert "Work on Issue #285 : intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good."

2016-09-30 20:20:22 +02:00

vocab.pyx

Fix vector norm when loading lexemes.

2016-10-23 19:40:18 +02:00