spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-16 11:12:25 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	a24321b63a	* Add downloader	2015-01-02 21:44:41 +11:00
Matthew Honnibal	5d9a096e2f	* Some minor clean-up after HastyModel	2014-12-31 19:46:04 +11:00
Matthew Honnibal	aafaf58cbe	* Refactor _ml.Model, and finish implementing HastyModel so far not worthwhile.	2014-12-31 19:40:59 +11:00
Matthew Honnibal	bcd038e7b6	* Implement HastyModel	2014-12-31 01:16:47 +11:00
Matthew Honnibal	1a075f77ff	* Don't over-ride pre-loaded POS tags, if set by special-cases	2014-12-30 23:26:32 +11:00
Matthew Honnibal	785c7ba76a	* Embed signature on attrs	2014-12-30 23:25:31 +11:00
Matthew Honnibal	30e5805656	* Lazy-load tagger and parser	2014-12-30 23:25:09 +11:00
Matthew Honnibal	9976aa976e	* Messily fix morphology and POS tags on special tokens.	2014-12-30 23:24:37 +11:00
Matthew Honnibal	c1ef3febee	* Embedsignature in tokens.pyx	2014-12-30 21:22:00 +11:00
Matthew Honnibal	aac5028b6e	* Move tagger to _ml	2014-12-30 21:21:38 +11:00
Matthew Honnibal	1ffb0229ed	* Import tokens in parser.pxd	2014-12-30 21:21:17 +11:00
Matthew Honnibal	bb0b00f819	* Repurporse the Tagger class as a generic Model, wrapping thinc's interface	2014-12-30 21:20:15 +11:00
Matthew Honnibal	fe2a5e0370	* Work on docstrings	2014-12-27 21:46:04 +11:00
Matthew Honnibal	bb80937544	* Upd docstrings	2014-12-27 18:45:16 +11:00
Matthew Honnibal	b8b65903fc	* Tmp	2014-12-24 17:42:00 +11:00
Matthew Honnibal	ab61673edd	* Fix api of array method	2014-12-23 15:18:48 +11:00
Matthew Honnibal	7708d0e24a	* Move lemmatizer to en dir	2014-12-23 15:16:57 +11:00
Matthew Honnibal	98eb4c0426	* Fix path to parser model	2014-12-23 15:09:09 +11:00
Matthew Honnibal	b00bc01d8c	* All tests now passing for reorg	2014-12-23 13:18:59 +11:00
Matthew Honnibal	73f200436f	* Tests passing except for morphology/lemmatization stuff	2014-12-23 11:40:32 +11:00
Matthew Honnibal	cf8d26c3d2	* POS tagger training working after reorg	2014-12-22 08:54:47 +11:00
Matthew Honnibal	4c4aa2c5c9	* Work on train	2014-12-22 07:25:43 +11:00
Matthew Honnibal	61df50b598	* Add English-subclass POS tagger	2014-12-21 20:59:07 +11:00
Matthew Honnibal	9f3f07cab6	* Add attrs file for English	2014-12-21 11:29:11 +11:00
Matthew Honnibal	2a89d70429	* Add vocab.pyx to setup, and ensure we can import spacy.en.lang	2014-12-21 06:03:53 +11:00
Matthew Honnibal	b34a1325d3	* Everything compiling after reorg. About to start testing.	2014-12-21 05:42:23 +11:00
Matthew Honnibal	e1c1a4b868	* Tmp	2014-12-21 05:36:29 +11:00
Matthew Honnibal	d11c1edf8c	* Import slice_unicode from strings.pyx	2014-12-20 07:56:26 +11:00
Matthew Honnibal	be1bdcbd85	* Move lang.pyx to tokenizer.pyx	2014-12-20 07:55:40 +11:00
Matthew Honnibal	89a1cc1a48	* Move murmurhash to .pxd in strings file	2014-12-20 07:41:08 +11:00
Matthew Honnibal	d5a942c4a4	* Rename lang.pyx to tokenizer.pyx	2014-12-20 07:30:39 +11:00
Matthew Honnibal	a60ae261ae	* Move tokenizer to its own file, and refactor	2014-12-20 07:29:16 +11:00
Matthew Honnibal	867a4a000c	* Export set_morph_from_dict function	2014-12-20 07:28:27 +11:00
Matthew Honnibal	4e30195c6d	* Refactor morphology.pyx	2014-12-20 07:27:28 +11:00
Matthew Honnibal	4c6ce7ee84	* Update tokens.pyx as part of reorg	2014-12-20 07:03:26 +11:00
Matthew Honnibal	116f7f3bc1	* Rename Lexicon to Vocab, and move it to its own file	2014-12-20 06:54:03 +11:00
Matthew Honnibal	780cbd68b1	* Move all struct definitions to structs.pxd, to avoid circular dependencies	2014-12-20 06:51:33 +11:00
Matthew Honnibal	f6556d8e5d	* Refactor, move Lexeme struct to structs.pxd	2014-12-20 06:51:03 +11:00
Matthew Honnibal	7d48bba6c4	* Move StringStore class to its own file	2014-12-20 06:42:01 +11:00
Matthew Honnibal	b066102d2d	* Remove POS cache for now	2014-12-20 03:49:58 +11:00
Matthew Honnibal	ff252dd535	* Clean up 'guess_cache' idea, which didnt work well enough	2014-12-20 03:49:11 +11:00
Matthew Honnibal	9d3ca13909	* Start work on parse-tree iteration classes	2014-12-20 03:48:10 +11:00
Matthew Honnibal	bed680c632	* Remove commented-out features	2014-12-20 03:47:32 +11:00
Matthew Honnibal	3d178c03ae	* Prune the features a bit	2014-12-20 02:46:14 +11:00
Matthew Honnibal	a0408e1758	* Working DecisionMemory class	2014-12-20 01:43:26 +11:00
Matthew Honnibal	7920ea72b4	* Working parser with the decision memory idea. Disabling that for now, for simplicity	2014-12-20 01:43:15 +11:00
Matthew Honnibal	a2f2a48da9	* Add some extra features	2014-12-20 01:42:24 +11:00
Matthew Honnibal	8fd9762d91	* Start laying out parse tree iteration methods	2014-12-20 01:42:09 +11:00
Matthew Honnibal	53b8bc1f3c	* Work on implementing a trainable cache for the parser. So far, doesn't improve efficiency	2014-12-19 09:30:50 +11:00
Matthew Honnibal	033d6c9ac2	* Adapt POS tagger decision-memory for use in parser	2014-12-19 07:23:04 +11:00
Matthew Honnibal	809ddf7887	* Add index.pxd	2014-12-19 07:23:00 +11:00
Matthew Honnibal	1879abd16a	* Set const-correctness for tagger	2014-12-18 20:41:52 +11:00
Matthew Honnibal	f72243b156	* Set const-correctness for Feature* array	2014-12-18 20:41:32 +11:00
Matthew Honnibal	6ab7e40590	* Add non-monotonic parsing with cost-sensitive update. 92.26 on Y&M set	2014-12-18 11:33:25 +11:00
Matthew Honnibal	7e0c692daf	* Automatically push when the stack is empty	2014-12-18 09:16:10 +11:00
Matthew Honnibal	61142a8eff	* Tweak features	2014-12-18 09:15:03 +11:00
Matthew Honnibal	8446ebfbbb	* Work on parser. Up to 92 UAS on YM labels	2014-12-18 09:05:31 +11:00
Matthew Honnibal	55de747bfc	* Remove .cpp files	2014-12-18 02:43:13 +11:00
Matthew Honnibal	4448a840f7	* Work on greedy parsing. Scoring about 91.2	2014-12-18 02:42:55 +11:00
Matthew Honnibal	87e9487d76	* Work on parser	2014-12-17 21:10:12 +11:00
Matthew Honnibal	9d7d97978d	* Work on greedy parser	2014-12-17 21:09:29 +11:00
Matthew Honnibal	d524dd306a	* Work on greedy parser	2014-12-17 03:19:43 +11:00
Matthew Honnibal	95ccea03b2	* Work on greedy parser	2014-12-16 22:46:55 +11:00
Matthew Honnibal	a432862fde	* Add exception type to _arg_max_among in tagger	2014-12-16 09:44:19 +11:00
Matthew Honnibal	9e00798820	* Work on integrating a greedy dependency parser	2014-12-16 08:06:04 +11:00
Matthew Honnibal	792802b2b9	* POS tag memoisation working, with good speed-up	2014-12-12 14:33:51 +11:00
Matthew Honnibal	ca54d58638	* Merge setup.py	2014-12-10 15:21:27 +11:00
Matthew Honnibal	9959a64f7b	* Working morphology and lemmatisation. POS tagging quite fast.	2014-12-10 08:09:32 +11:00
Matthew Honnibal	df3be14987	* Add pos_type features to POS tagger	2014-12-10 08:08:55 +11:00
Matthew Honnibal	42973c4b37	* Improve efficiency of tagger, and improve morphological processing	2014-12-10 01:02:04 +11:00
Matthew Honnibal	6b34a2f34b	* Move morphological analysis into its own module, morphology.pyx	2014-12-09 21:16:17 +11:00
Matthew Honnibal	b962fe73d7	* Make suffixes file use full-power regex, so that we can handle periods properly	2014-12-09 19:04:27 +11:00
Matthew Honnibal	accdbe989b	* Remove Tokens.extend method	2014-12-09 17:09:23 +11:00
Matthew Honnibal	495e1c7366	* Use fused type in Tokens.push_back, simplifying the use of the cache	2014-12-09 16:50:01 +11:00
Matthew Honnibal	302e09018b	* Work on fixing special-cases, reading them in as JSON objects so that they can specify lemmas	2014-12-09 14:48:01 +11:00
Matthew Honnibal	99bbbb6feb	* Work on morphological processing	2014-12-08 21:12:15 +11:00
Matthew Honnibal	7b68f911cf	* Add WordNet lemmatizer	2014-12-08 01:39:13 +11:00
Matthew Honnibal	c20dd79748	* Fiddle with const correctness and comments	2014-12-08 00:03:55 +11:00
Matthew Honnibal	b031c7c430	* Remove language-general context module	2014-12-07 23:53:01 +11:00
Matthew Honnibal	ef4398b204	* Rearrange POS stuff, so that language-specific stuff can live in language-specific modules	2014-12-07 23:52:41 +11:00
Matthew Honnibal	327383e38a	* Remove unused code in tagger.pyx	2014-12-07 22:16:17 +11:00
Matthew Honnibal	9f17467c2e	* Fix EMPTY_TOKEN	2014-12-07 22:07:41 +11:00
Matthew Honnibal	3819a88e1b	* Add support for tag dictionary, and fix error-code for predict method	2014-12-07 22:07:16 +11:00
Matthew Honnibal	f00afe12c4	* Load POS tagger in load() function if path exists	2014-12-07 22:05:57 +11:00
Matthew Honnibal	5fe5e6e66b	* Move context functions to header, inlining them.	2014-12-07 21:59:04 +11:00
Matthew Honnibal	5caabec789	* Link in tagger, to work on integrating POS tagging	2014-12-07 15:29:41 +11:00
Matthew Honnibal	0c7aeb9de7	* Begin revising tagger, focussing on POS tagging	2014-12-07 15:29:04 +11:00
Matthew Honnibal	f5c4f2eb52	* Revise context, focussing on POS tagging for now	2014-12-07 15:28:22 +11:00
Matthew Honnibal	e27b912ef9	* Remove need for confusing _data pointer to be stored on Tokens	2014-12-05 16:31:30 +11:00
Matthew Honnibal	1c9253701d	* Introduce a TokenC struct, to handle token indices, pos tags and sense tags	2014-12-05 15:56:14 +11:00
Matthew Honnibal	187372c7f3	* Allow the lexicon to create lexemes using an external memory pool, so that it can decide to make some lexemes temporary, rather than cached	2014-12-05 03:29:50 +11:00
Matthew Honnibal	75b8dfb348	* Remove upper_pc from lexeme.pyx	2014-12-04 22:14:34 +11:00
Matthew Honnibal	49f3780ff5	* Fiddle with lexeme attrs	2014-12-04 21:22:38 +11:00
Matthew Honnibal	564082e48e	* Hack Token class to take lex.dense inplace of the old lex.norm. This needs to be fixed...	2014-12-04 20:51:29 +11:00
Matthew Honnibal	69bb022204	* Add as_array and count_by method	2014-12-04 20:46:55 +11:00
Matthew Honnibal	e1b1f45cc9	* Add STEM attribute to lexeme	2014-12-04 20:46:20 +11:00
Matthew Honnibal	d7952634ca	* Make the string-store serve const pointers to Utf8Str	2014-12-03 16:01:47 +11:00
Matthew Honnibal	7e04c22f8f	* const added to Lexicon interface. Seems to work.	2014-12-03 15:58:17 +11:00
Matthew Honnibal	d70d31aa45	* Introduce first attempt at const-ness	2014-12-03 15:44:25 +11:00
Matthew Honnibal	4560ada85b	* Add typedef for attr_t. Change flag_t to flags_t	2014-12-03 11:06:31 +11:00

1 2 3 4 5 ...

383 Commits