spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-09-21 19:39:13 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	ef4398b204	* Rearrange POS stuff, so that language-specific stuff can live in language-specific modules	2014-12-07 23:52:41 +11:00
Matthew Honnibal	9f17467c2e	* Fix EMPTY_TOKEN	2014-12-07 22:07:41 +11:00
Matthew Honnibal	e27b912ef9	* Remove need for confusing _data pointer to be stored on Tokens	2014-12-05 16:31:30 +11:00
Matthew Honnibal	1c9253701d	* Introduce a TokenC struct, to handle token indices, pos tags and sense tags	2014-12-05 15:56:14 +11:00
Matthew Honnibal	564082e48e	* Hack Token class to take lex.dense inplace of the old lex.norm. This needs to be fixed...	2014-12-04 20:51:29 +11:00
Matthew Honnibal	69bb022204	* Add as_array and count_by method	2014-12-04 20:46:55 +11:00
Matthew Honnibal	d70d31aa45	* Introduce first attempt at const-ness	2014-12-03 15:44:25 +11:00
Matthew Honnibal	e170faf5b0	* Hack Tokens to work without tagger.pyx	2014-12-03 11:05:15 +11:00
Matthew Honnibal	522bb0346e	* Work on get_array method of Tokens	2014-12-02 23:48:05 +11:00
Matthew Honnibal	4ecbe8c893	* Complete refactor of Tagger features, to use a generic list of context names.	2014-11-05 20:45:29 +11:00
Matthew Honnibal	3733444101	* Generalize tagger code, in preparation for NER and supersense tagging.	2014-11-05 03:42:14 +11:00
Matthew Honnibal	954c970415	* Add __iter__ method to tokens	2014-11-04 01:07:08 +11:00
Matthew Honnibal	ae52f9f38c	* Remove vocab10k from tokens	2014-11-03 00:23:20 +11:00
Matthew Honnibal	b186a66bae	* Rename Token.lex_pos to Token.postype, and Token.lex_supersense to Token.sensetype	2014-10-31 17:44:39 +11:00
Matthew Honnibal	ac88893232	* Fix Token after lexeme changes	2014-10-30 15:30:52 +11:00
Matthew Honnibal	e6b87766fe	* Remove lexemes vector from Lexicon, and the id and hash attributes from Lexeme	2014-10-30 15:21:38 +11:00
Matthew Honnibal	13909a2e24	* Rewriting Lexeme serialization.	2014-10-29 23:19:38 +11:00
Matthew Honnibal	08ce602243	* Large refactor, particularly to Python API	2014-10-24 00:59:17 +11:00
Matthew Honnibal	7baef5b7ff	* Fix padding on tokens	2014-10-23 04:01:17 +11:00
Matthew Honnibal	e5e951ae67	* Remove the feature array stuff from Tokens class, and replace vector with array-based implementation, with padding.	2014-10-23 01:57:59 +11:00
Matthew Honnibal	7018b53d3a	* Improve array features in tokens	2014-10-22 12:55:42 +11:00
Matthew Honnibal	43743a5d63	* Work on efficiency	2014-10-14 18:22:41 +11:00
Matthew Honnibal	6fb42c4919	* Add offsets to Tokens class. Some changes to interfaces, and reorganization of spacy.Lang	2014-10-14 16:17:45 +11:00
Matthew Honnibal	2805068ca8	* Have tokens track tuples that record the start offset and pos tag as well as a lexeme pointer	2014-10-14 15:21:03 +11:00
Matthew Honnibal	71ee921055	* Slight cleaning of tokenizer code	2014-10-10 19:17:22 +11:00
Matthew Honnibal	59b41a9fd3	* Switch to new data model, tests passing	2014-10-10 08:11:31 +11:00
Matthew Honnibal	6266cac593	* Switch to using a Python ref counted gateway to malloc/free, to prevent memory leaks	2014-09-17 20:02:26 +02:00
Matthew Honnibal	08cef75ffd	* Switch to using a heap-allocated vector in tokens	2014-09-15 03:46:14 +02:00
Matthew Honnibal	f77b7098c0	* Upd Tokens to use vector, with bounds checking.	2014-09-15 03:22:40 +02:00
Matthew Honnibal	0f6bf2a2ee	* Fix niggling memory error, which was caused by bug in the way tokens resized their internal vector.	2014-09-15 02:08:39 +02:00
Matthew Honnibal	df24e3708c	* Move EnglishTokens stuff to Tokens	2014-09-15 01:31:44 +02:00
Matthew Honnibal	5aa591106b	* Fiddle with token features	2014-09-12 15:49:36 +02:00
Matthew Honnibal	073ee0de63	* Restore dense_hash_map for cache dictionary. Seems to double efficiency	2014-09-12 02:23:51 +02:00
Matthew Honnibal	2389bd1b10	* Improve cache mechanism by including a random element depending on the size of the cache.	2014-09-12 00:19:16 +02:00
Matthew Honnibal	563047e90f	* Switch to returning a Tokens object	2014-09-11 21:37:32 +02:00
Matthew Honnibal	1a3222af4b	* Moving tokens to use an array internally, instead of a list of Lexeme objects.	2014-09-11 16:57:08 +02:00
Matthew Honnibal	cf412adba8	* Refactoring to use Tokens object	2014-09-10 18:11:13 +02:00
Matthew Honnibal	68bae2fec6	* More refactoring	2014-08-25 16:42:22 +02:00
Matthew Honnibal	07ecf5d2f4	* Fixed group_by, removed idea of general attr_of function.	2014-08-22 00:02:37 +02:00
Matthew Honnibal	811b7a6b91	* Struggling with arbitrary attr access...	2014-08-21 23:49:14 +02:00
Matthew Honnibal	a78ad4152d	* Broken version being refactored for docs	2014-08-20 13:39:39 +02:00
Matthew Honnibal	365a2af756	* Restore happax. commit uncommited work	2014-08-02 21:27:03 +01:00
Matthew Honnibal	a895fe5ddb	* Upd from spacy	2014-07-23 17:35:18 +01:00
Matthew Honnibal	571808a274	Group-by seems to be working	2014-07-07 20:27:02 +02:00
Matthew Honnibal	80b36f9f27	* 710k words per second for counts	2014-07-07 19:12:19 +02:00
Matthew Honnibal	057c21969b	* Refactor for string view features. Working on setting up flags and enums.	2014-07-07 16:58:48 +02:00
Matthew Honnibal	f1bcbd4c4e	* Reorganized code to accomodate Tokens class. Need string views before group_by and count_by can be done well.	2014-07-07 12:47:21 +02:00

47 Commits