spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-11 12:18:04 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	11810be33e	* Add Python hooks for is_bracket/is_quote/is_left_punct/is_right_punct	2016-02-04 13:04:16 +01:00
Matthew Honnibal	ab5aac5b2f	* Add .rank property to Token and Lexeme, for frequency rank	2015-11-08 16:18:25 +01:00
Matthew Honnibal	1e99fcd413	* Rename .repvec to .vector in C API	2015-11-03 23:47:59 +11:00
Matthew Honnibal	f7283a5067	* Fix vectors bugs for OOV words	2015-09-22 02:10:25 +02:00
Matthew Honnibal	44aecba701	* Fix Token.has_vector and Lexeme.has_vector	2015-09-22 01:43:16 +02:00
Matthew Honnibal	596fde8daa	* Add has_vector attribute to Token and Lexeme	2015-09-21 19:52:43 +10:00
Matthew Honnibal	f32927efbf	* Raise exceptions if attempt to access parse, but data is not installed. This partly but not fully addresses Issue #97 . Still need exceptions on the various Token attributes that access the parse tree, e.g. token.head, token.lefts, token.rights, etc. Exceptions should be centralized, too.	2015-09-21 18:35:40 +10:00
Matthew Honnibal	191d593e03	* Fix vectors bug in lexeme	2015-09-15 19:05:11 +10:00
Matthew Honnibal	dd4d64b235	* Support setting of word vectors on Lexeme object.	2015-09-15 14:42:27 +10:00
Matthew Honnibal	193f127f81	* Fix ugly py_check_flag and py_set_flag functions in Lexeme	2015-09-15 13:06:18 +10:00
Matthew Honnibal	9561d88529	* Add is_stop to Python API	2015-09-14 18:25:40 +10:00
Matthew Honnibal	65dc0d1dfb	* Extend word vectors support, with .similarity() function, vector_norm property, and rename repvec to vector. Keep repvec name as well for now for backwards compatibility.	2015-09-14 17:49:58 +10:00
Matthew Honnibal	07c09a0e1b	* Fix attribute getters and setters in Lexeme	2015-09-09 14:29:22 +02:00
Matthew Honnibal	86c888667f	* Merge in changes from de branch	2015-09-06 19:49:28 +02:00
Matthew Honnibal	d2fc104a26	* Begin merge of Gazetteer and DE branches	2015-09-06 19:45:15 +02:00
Matthew Honnibal	7cc56ada6e	* Temporarily add py_set_flag attribute in Lexeme	2015-09-06 17:52:51 +02:00
Matthew Honnibal	3acf60df06	* Add missing properties in Lexeme class	2015-08-26 19:16:28 +02:00
Matthew Honnibal	6f1743692a	* Work on language-independent refactoring	2015-08-23 20:49:18 +02:00
Matthew Honnibal	cad0cca4e3	* Tmp	2015-08-22 22:04:34 +02:00
Matthew Honnibal	8e4c69ee8c	* Add is_oov property, and fix up handling of attributes	2015-07-27 01:50:06 +02:00
Matthew Honnibal	6bb96c122d	* Host IS_ flags in attrs.pxd, and add properties for them on Token and Lexeme objects	2015-07-26 16:37:16 +02:00
Matthew Honnibal	3c270fc8ff	* Remove has_sense method from Lexeme	2015-07-08 19:28:29 +02:00
Matthew Honnibal	b64c843861	* Remove senses attr	2015-07-08 19:26:24 +02:00
Matthew Honnibal	e23d1582a2	* Add supersense data to Lexeme objects. Add simple has_sense method to check the flag.	2015-07-01 18:50:37 +02:00
Jordan Suchow	3a8d9b37a6	Remove trailing whitespace	2015-04-19 13:01:38 -07:00
Matthew Honnibal	51b618d646	* Add a has_repvec property to Lexeme, and a check function to check flags	2015-02-07 08:42:44 -05:00
Matthew Honnibal	6d1c08dafd	* Add docstring to Lexeme	2015-01-24 20:48:34 +11:00
Matthew Honnibal	fda94271af	* Rename NORM1 and NORM2 attrs to lower and norm	2015-01-24 06:17:03 +11:00
Matthew Honnibal	5ed8b2b98f	* Rename sic to orth	2015-01-23 02:08:25 +11:00
Matthew Honnibal	5e63c606ad	* Rename vec to repvec	2015-01-22 02:03:54 +11:00
Matthew Honnibal	6c7e44140b	* Work on word vectors, and other stuff	2015-01-17 16:21:17 +11:00
Matthew Honnibal	7d3c40de7d	* Tests passing after refactor. API has obvious warts, particularly in Token and Lexeme	2015-01-15 00:33:16 +11:00
Matthew Honnibal	0930892fc1	* Tmp. Working on refactor. Compiles, must hook up lexical feats.	2015-01-14 00:03:48 +11:00
Matthew Honnibal	46da3d74d2	* Tmp. Refactoring, introducing a Lexeme PyObject.	2015-01-12 11:23:44 +11:00
Matthew Honnibal	ce2edd6312	* Tmp commit. Refactoring to create a Python Lexeme class.	2015-01-12 10:26:22 +11:00
Matthew Honnibal	90c143bd85	* Fix orth import	2015-01-05 18:49:19 +11:00
Matthew Honnibal	4c4aa2c5c9	* Work on train	2014-12-22 07:25:43 +11:00
Matthew Honnibal	ef4398b204	* Rearrange POS stuff, so that language-specific stuff can live in language-specific modules	2014-12-07 23:52:41 +11:00
Matthew Honnibal	75b8dfb348	* Remove upper_pc from lexeme.pyx	2014-12-04 22:14:34 +11:00
Matthew Honnibal	e1b1f45cc9	* Add STEM attribute to lexeme	2014-12-04 20:46:20 +11:00
Matthew Honnibal	d70d31aa45	* Introduce first attempt at const-ness	2014-12-03 15:44:25 +11:00
Matthew Honnibal	b463a7eb86	* Make flag-setting a language-specific thing	2014-12-03 11:04:32 +11:00
Matthew Honnibal	70ea862703	* Remove vocab10k field, and add flags for gazetteers	2014-11-03 00:13:51 +11:00
Matthew Honnibal	8335706321	* Add LIKE_URL and LIKE_NUMBER flag features	2014-11-02 13:19:23 +11:00
Matthew Honnibal	6c807aa45f	* Restore id attribute to lexeme, and rename pos field to postype, to store clustered tag dictionaries	2014-10-31 17:43:00 +11:00
Matthew Honnibal	c6fcd03692	* Small efficiency tweak to lexeme init	2014-10-30 17:56:11 +11:00
Matthew Honnibal	87c2418a89	* Fiddle with data types on Lexeme, to compress them to a much smaller size.	2014-10-30 15:42:15 +11:00
Matthew Honnibal	e6b87766fe	* Remove lexemes vector from Lexicon, and the id and hash attributes from Lexeme	2014-10-30 15:21:38 +11:00
Matthew Honnibal	67c8c8019f	* Update lexeme serialization, using a binary file format	2014-10-30 01:01:00 +11:00
Matthew Honnibal	13909a2e24	* Rewriting Lexeme serialization.	2014-10-29 23:19:38 +11:00
Matthew Honnibal	08ce602243	* Large refactor, particularly to Python API	2014-10-24 00:59:17 +11:00
Matthew Honnibal	e5e951ae67	* Remove the feature array stuff from Tokens class, and replace vector with array-based implementation, with padding.	2014-10-23 01:57:59 +11:00
Matthew Honnibal	0a0e41f6c8	* Add prefix and suffix features	2014-10-22 12:56:09 +11:00
Matthew Honnibal	65d3ead4fd	* Rename LexStr_casefix to LexStr_norm and LexInt_i to LexInt_id	2014-10-14 15:19:07 +11:00
Matthew Honnibal	71ee921055	* Slight cleaning of tokenizer code	2014-10-10 19:17:22 +11:00
Matthew Honnibal	59b41a9fd3	* Switch to new data model, tests passing	2014-10-10 08:11:31 +11:00
Matthew Honnibal	1b0e01d3d8	* Revising data model of lexeme. Compiles.	2014-10-09 19:53:30 +11:00
Matthew Honnibal	51d75b244b	* Add serialize/deserialize functions for lexeme, transport to/from python dict.	2014-10-09 14:10:46 +11:00
Matthew Honnibal	d73d89a2de	* Add i attribute to lexeme, giving lexemes sequential IDs.	2014-10-09 13:50:05 +11:00
Matthew Honnibal	ac522e2553	* Switch from own memory class to cymem, in pip	2014-09-17 23:09:24 +02:00
Matthew Honnibal	6266cac593	* Switch to using a Python ref counted gateway to malloc/free, to prevent memory leaks	2014-09-17 20:02:26 +02:00
Matthew Honnibal	c396581a0b	* Fiddle with the way strings are interned in lexeme	2014-09-15 06:34:45 +02:00
Matthew Honnibal	f77b7098c0	* Upd Tokens to use vector, with bounds checking.	2014-09-15 03:22:40 +02:00
Matthew Honnibal	df24e3708c	* Move EnglishTokens stuff to Tokens	2014-09-15 01:31:44 +02:00
Matthew Honnibal	b488224c09	* Restoring Lexeme-as-struct	2014-09-10 20:41:37 +02:00
Matthew Honnibal	88095666dc	* Remove Lexeme struct, preparing to rename Word to Lexeme.	2014-08-24 19:24:42 +02:00
Matthew Honnibal	e289896603	* Fix ptb3 module	2014-08-22 16:36:17 +02:00
Matthew Honnibal	d10993f41a	* More docs work	2014-08-21 16:37:13 +02:00
Matthew Honnibal	a78ad4152d	* Broken version being refactored for docs	2014-08-20 13:39:39 +02:00
Matthew Honnibal	5fddb8d165	* Working refactor, with updated data model for Lexemes	2014-08-19 04:21:20 +02:00
Matthew Honnibal	3379d7a571	* Reforming data model for lexemes	2014-08-19 02:40:37 +02:00
Matthew Honnibal	01469b0888	* Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word.	2014-08-18 19:14:00 +02:00
Matthew Honnibal	6319ff0f22	* Add length property	2014-08-02 21:26:44 +01:00
Matthew Honnibal	571808a274	Group-by seems to be working	2014-07-07 20:27:02 +02:00
Matthew Honnibal	80b36f9f27	* 710k words per second for counts	2014-07-07 19:12:19 +02:00
Matthew Honnibal	057c21969b	* Refactor for string view features. Working on setting up flags and enums.	2014-07-07 16:58:48 +02:00
Matthew Honnibal	f1bcbd4c4e	* Reorganized code to accomodate Tokens class. Need string views before group_by and count_by can be done well.	2014-07-07 12:47:21 +02:00
Matthew Honnibal	ff1869ff07	* Fixed major efficiency problem, from not quite grokking pass by reference in cython c++	2014-07-07 07:36:43 +02:00
Matthew Honnibal	d5bef02c72	* Reorganized, moving language-independent stuff to spacy. The functions in spacy ask for the dictionaries and split function on input, but the language-specific modules are curried versions that use the globals	2014-07-07 04:21:06 +02:00
Matthew Honnibal	556f6a18ca	* Initial commit. Tests passing for punctuation handling. Need contractions, file transport, tokenize function, etc.	2014-07-05 20:51:42 +02:00

1 2 3

130 Commits