spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-15 22:27:12 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	0f9b8a00a5	Unbreak data download	2017-01-09 23:40:26 +01:00
Matthew Honnibal	d9a77ddf14	Return None for data path if it doesn't exist	2017-01-09 14:10:05 +01:00
Ines Montani	de5aa92bc2	Handle deprecated tokenizer prefix data	2017-01-08 20:33:28 +01:00
Ines Montani	6a60a61086	Move update_exc to global language data utils	2016-12-17 12:29:02 +01:00
Ines Montani	66c7348cda	Add update_exc util function	2016-12-08 13:58:12 +01:00
Ines Montani	8e977cc71c	Fix formatting	2016-12-08 13:56:17 +01:00
Matthew Honnibal	6b8b05ef83	Specify that spacy.util is encoded in utf8	2016-11-02 19:58:00 +01:00
Matthew Honnibal	9efe568177	Add missing unicode_literals to spacy.util. I think this was messing up the tokenizer regex for non-ascii characters in Python 2. Re Issue #596	2016-11-02 12:31:34 +01:00
Matthew Honnibal	5e923b9bfa	Return None in match_best_version if not path exists.	2016-10-15 14:47:29 +02:00
Matthew Honnibal	ea23b64cc8	Refactor training, with new spacy.train module. Defaults still a little awkward.	2016-10-09 12:24:24 +02:00
Matthew Honnibal	95aaea0d3f	Refactor so that the tokenizer data is read from Python data, rather than from disk	2016-09-25 14:49:53 +02:00
Matthew Honnibal	82b8cc5efb	Whitespace	2016-09-24 22:17:01 +02:00
Matthew Honnibal	f19af6cb2c	Python 3 compatible basestring	2016-09-24 22:08:43 +02:00
Matthew Honnibal	fd65cf6cbb	Finish refactoring data loading	2016-09-24 20:26:17 +02:00
Matthew Honnibal	83e364188c	Mostly finished loading refactoring. Design is in place, but doesn't work yet.	2016-09-24 15:42:01 +02:00
Daylen Yang	5405e7dd73	Fix get_lang_class parsing (take 2)	2016-05-16 16:40:31 -07:00
Matthew Honnibal	b240104f40	Revert "Fix get_lang_class parsing"	2016-05-17 08:04:26 +10:00
Daylen Yang	1692c2df3c	Fix get_lang_class parsing We want the get_lang_class to return "en" for both "en" and "en_glove_cc_300_1m_vectors". Changed the split rule to "_" so that this happens.	2016-05-16 14:38:20 -07:00
Henning Peters	ff690f76ba	fix loading non-german models	2016-04-12 16:00:56 +02:00
Henning Peters	c90d4a6f17	relative imports in __init__.py	2016-03-26 11:44:53 +01:00
Henning Peters	b8f63071eb	add lang registration facility	2016-03-25 18:54:45 +01:00
Henning Peters	a7d7ea3afa	first idea for supporting multiple langs in download script	2016-03-24 11:19:43 +01:00
Henning Peters	eb7ae61b1c	cleanup api	2016-03-08 12:59:18 +01:00
Henning Peters	9cc4f8d5b3	avoid shadowing __name__	2016-02-15 01:33:39 +01:00
Henning Peters	235f094534	untangle data_path/via	2016-01-16 12:23:45 +01:00
Henning Peters	6d1a3af343	cleanup unused	2016-01-16 10:05:04 +01:00
Henning Peters	846fa49b2a	distinct load() and from_package() methods	2016-01-16 10:00:57 +01:00
Henning Peters	211913d689	add about.py, adapt setup.py	2016-01-15 18:57:01 +01:00
Henning Peters	788f734513	refactored data_dir->via, add zip_safe, add spacy.load()	2016-01-15 18:01:02 +01:00
Henning Peters	d9471f684f	fix typo	2016-01-14 12:14:12 +01:00
Henning Peters	9b75d872b0	fix model download	2016-01-14 12:02:56 +01:00
Henning Peters	bc229790ac	integrate with sputnik	2016-01-13 19:46:17 +01:00
Matthew Honnibal	eaf2ad59f1	* Fix use of mock Package object	2015-12-31 04:13:15 +01:00
Matthew Honnibal	a2dfdec85d	* Clean up spacy.util	2015-12-29 18:06:09 +01:00
Matthew Honnibal	aec130af56	Use util.Package class for io Previous Sputnik integration caused API change: Vocab, Tagger, etc were loaded via a from_package classmethod, that required a sputnik.Package instance. This forced users to first create a sputnik.Sputnik() instance, in order to acquire a Package via sp.pool(). Instead I've created a small file-system shim, util.Package, which allows classes to have a .load() classmethod, that accepts either util.Package objects, or strings. We can later gut the internals of this and make it a proxy for Sputnik if we need more functionality that should live in the Sputnik library. Sputnik is now only used to download and install the data, in spacy.en.download	2015-12-29 18:00:48 +01:00
Matthew Honnibal	4131e45543	* Add MockPackage class, to see whether we can proxy for Sputnik in a lightweight way	2015-12-29 16:55:03 +01:00
Henning Peters	d8d348bb55	allow to specify version constraint within model name	2015-12-18 19:12:08 +01:00
Henning Peters	cfa187aaf0	fix tests	2015-12-18 10:58:02 +01:00
Henning Peters	8359bd4d93	strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible	2015-12-18 09:52:55 +01:00
Henning Peters	9027cef3bc	access model via sputnik	2015-12-07 06:01:28 +01:00
Matthew Honnibal	dc393a5f1d	Merge pull request #126 from tomtung/master Improve slicing support for both Doc and Span	2015-10-10 14:14:57 +11:00
Matthew Honnibal	83dccf0fd7	* Use io module insteads of deprecated codecs module	2015-10-10 14:13:01 +11:00
Yubing (Tom) Dong	3fd3bc79aa	Refactor to remove duplicate slicing logic	2015-10-07 01:25:35 -07:00
alvations	8199012d26	changing deprecated codecs.open to io.open =)	2015-09-30 20:10:15 +02:00
Matthew Honnibal	6ab1696b15	* Remove read_encoding_freqs from util.py	2015-07-23 01:17:32 +02:00
Matthew Honnibal	317cbbc015	* Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time.	2015-07-19 15:18:17 +02:00
Jordan Suchow	3a8d9b37a6	Remove trailing whitespace	2015-04-19 13:01:38 -07:00
Jordan Suchow	5f0f940a1f	Remove unused imports	2015-04-19 01:05:22 -07:00
Matthew Honnibal	3f1944d688	* Make PyPy work	2015-01-05 17:54:38 +11:00
Matthew Honnibal	f5d41028b5	* Move around data files for test release	2015-01-03 01:59:22 +11:00
Matthew Honnibal	e1c1a4b868	* Tmp	2014-12-21 05:36:29 +11:00
Matthew Honnibal	b962fe73d7	* Make suffixes file use full-power regex, so that we can handle periods properly	2014-12-09 19:04:27 +11:00
Matthew Honnibal	302e09018b	* Work on fixing special-cases, reading them in as JSON objects so that they can specify lemmas	2014-12-09 14:48:01 +11:00
Matthew Honnibal	ea8f1e7053	* Tighten interfaces	2014-10-30 18:14:42 +11:00
Matthew Honnibal	67c8c8019f	* Update lexeme serialization, using a binary file format	2014-10-30 01:01:00 +11:00
Matthew Honnibal	43d5964e13	* Add function to read detokenization rules	2014-10-22 12:54:59 +11:00
Matthew Honnibal	12742f4f83	* Add detokenize method and test	2014-10-18 18:07:29 +11:00
Matthew Honnibal	6fb42c4919	* Add offsets to Tokens class. Some changes to interfaces, and reorganization of spacy.Lang	2014-10-14 16:17:45 +11:00
Matthew Honnibal	e40caae51f	* Update Lexicon class to expect a list of lexeme dict descriptions	2014-10-09 14:51:35 +11:00
Matthew Honnibal	2e44fa7179	* Add util.py	2014-09-25 18:26:22 +02:00
Matthew Honnibal	e9a62b6eba	* Refactoring with Lexeme as a class now compiles. Basic design seems to work	2014-08-27 17:15:39 +02:00
Matthew Honnibal	d10993f41a	* More docs work	2014-08-21 16:37:13 +02:00
Matthew Honnibal	3379d7a571	* Reforming data model for lexemes	2014-08-19 02:40:37 +02:00
Matthew Honnibal	01469b0888	* Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word.	2014-08-18 19:14:00 +02:00
Matthew Honnibal	ff1869ff07	* Fixed major efficiency problem, from not quite grokking pass by reference in cython c++	2014-07-07 07:36:43 +02:00
Matthew Honnibal	25849fc926	* Generalize tokenization rules to capitals	2014-07-07 05:07:21 +02:00
Matthew Honnibal	4e79446dc2	* Reading in tokenization rules correctly. Passing tests.	2014-07-07 00:02:55 +02:00
Matthew Honnibal	556f6a18ca	* Initial commit. Tests passing for punctuation handling. Need contractions, file transport, tokenize function, etc.	2014-07-05 20:51:42 +02:00

... 6 7 8 9 10

468 Commits