spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-22 20:16:43 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	6405d2384c	* Add first draft of annotation standards doc	2015-07-14 12:50:13 +02:00
Matthew Honnibal	935ac53ee3	* Extend count_by method	2015-07-14 03:20:09 +02:00
Matthew Honnibal	39c93116eb	* Add get_freqs script	2015-07-14 02:31:32 +02:00
Matthew Honnibal	3b5baa660f	* Fix tokenizer	2015-07-14 00:10:51 +02:00
Matthew Honnibal	2ae0b439b2	* Fix space check in gold.pyx	2015-07-14 00:10:27 +02:00
Matthew Honnibal	81aa4e6dcc	* Go back to having token reference doc, instead of complicated gymnastics. Rename the attr 'doc', to expose it in the API	2015-07-14 00:10:11 +02:00
Matthew Honnibal	e1c702e498	* Upd tests after refactor	2015-07-14 00:08:50 +02:00
Matthew Honnibal	ba9a22ae0b	* Ignore cpp files in spacy/tokens	2015-07-13 22:30:15 +02:00
Matthew Honnibal	98382bd7a0	* Update tests after refactor	2015-07-13 22:30:01 +02:00
Matthew Honnibal	d87d71caf4	* Compile the new modules after refactor	2015-07-13 22:29:33 +02:00
Matthew Honnibal	24d6ce99ec	* Add comment to tokenizer, explaining the spacy attr	2015-07-13 22:29:13 +02:00
Matthew Honnibal	8214b74eec	* Restore _py_tokens cache, to handle orphan tokens.	2015-07-13 22:28:10 +02:00
Matthew Honnibal	67641f3b58	* Refactor tokenizer, to set the 'spacy' field on TokenC instead of passing a string	2015-07-13 21:46:02 +02:00
Matthew Honnibal	6eef0bf9ab	* Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx	2015-07-13 20:20:58 +02:00
Matthew Honnibal	3ea8756c24	* Add spacy/tokens/doc.pyx, for Doc class in its own file	2015-07-13 19:58:26 +02:00
Matthew Honnibal	c99387155f	* Refactor tokens, moving classes into a module instead of a single file	2015-07-13 19:49:55 +02:00
Matthew Honnibal	d27899658e	* Import classes in spacy.tokens.__init__	2015-07-13 19:48:55 +02:00
Matthew Honnibal	aa82caf8f5	* Add TokenC.spacy attr	2015-07-13 19:48:07 +02:00
Matthew Honnibal	dba6b47d4e	* Refactor monster tokens.pyx file, into a tokens/ subpackage. Try to break the cycle between Doc and Token, and remove the need to pass around a unicode string reference	2015-07-13 19:20:48 +02:00
Matthew Honnibal	5b0a7190c9	* Round-trip for serialization finally working. Needs a lot of optimization.	2015-07-13 18:39:38 +02:00
Matthew Honnibal	edd371246c	* Make huffman coder take BitArray in encode/decode. Add __iter__ method to BitArray.	2015-07-13 17:33:33 +02:00
Matthew Honnibal	af5cc926a4	* Add codec property to Vocab, to use the Huffman encoding	2015-07-13 13:55:14 +02:00
Matthew Honnibal	77385d5580	* Make .pxd file for huffman codec	2015-07-13 13:54:51 +02:00
Matthew Honnibal	0628e0e2a8	* Add tests for huffman encoding	2015-07-13 12:58:07 +02:00
Matthew Honnibal	083b6ea7ae	* Clean up encoder a bit. now read for integration into Vocab.	2015-07-13 12:57:22 +02:00
Matthew Honnibal	8d0f1d98da	* Draft dockstring for HuffmanCache	2015-07-13 12:01:18 +02:00
Matthew Honnibal	281f1faefb	* Nearly finished huffman coder	2015-07-12 23:48:46 +02:00
Matthew Honnibal	e1a25fba32	* Work on huffman coder	2015-07-12 19:58:05 +02:00
Matthew Honnibal	3fb9de2d13	* Remove vector[bint], in favor of simple Code struct.	2015-07-12 17:58:27 +02:00
Matthew Honnibal	aa7bfd932b	* Work on compressor	2015-07-12 16:03:43 +02:00
Matthew Honnibal	14eafcab15	* Refactor to use vector[bint]	2015-07-12 05:27:47 +02:00
Matthew Honnibal	6a6e852a39	* Refactor huffman coding stuff into class	2015-07-12 05:06:36 +02:00
Matthew Honnibal	aad96fdb5c	* Improve efficiency of huffman coding	2015-07-12 01:31:37 +02:00
Matthew Honnibal	ff9ff6f3fa	* Ensure unseen words are given low log probability	2015-07-12 01:31:09 +02:00
Matthew Honnibal	9d3b0d83de	* Refactor huffman coding	2015-07-11 22:27:43 +02:00
Matthew Honnibal	8d29406cd6	* Rename span.right to span.rights	2015-07-11 22:15:04 +02:00
Matthew Honnibal	da9f358166	* Fix span getting	2015-07-11 21:41:41 +02:00
Matthew Honnibal	11e8f2ffb4	* Huffman codes working	2015-07-11 20:01:10 +02:00
Matthew Honnibal	cb6fc81909	* Work on huffman coding.	2015-07-11 15:23:35 +02:00
Matthew Honnibal	4c9b77fe95	* Begin working on serialization code	2015-07-11 10:57:30 +02:00
Matthew Honnibal	11a380e00f	* Draft v0.89 update notes	2015-07-10 19:41:42 +02:00
Matthew Honnibal	53d1f5b2eb	* Rename Span.head to Span.root.	2015-07-09 17:30:58 +02:00
Matthew Honnibal	c0255ed7d8	* Allow slice indexing in Doc.__getitem__, returning a Span object	2015-07-09 15:15:32 +02:00
Matthew Honnibal	7d2964f673	* Test that whitespace is not assigned a tag	2015-07-09 13:31:40 +02:00
Matthew Honnibal	b5223c4824	* Add whitespace to specials.json	2015-07-09 13:31:12 +02:00
Matthew Honnibal	89a91ad726	* Add SPACE part-of-speech tag, and train tagger to assign it. Also train tagger not to make whitespace an entity	2015-07-09 13:30:41 +02:00
Matthew Honnibal	f95da0bd52	* Allow tests to read model dir from SPACY_DATA environment variable	2015-07-09 12:18:02 +02:00
Matthew Honnibal	55f1042443	* Improve efficiency of L and R features, correcting the non-linear-in-length problem.	2015-07-09 12:17:26 +02:00
Matthew Honnibal	70d2acb579	* Fix edge features	2015-07-09 12:15:01 +02:00
Matthew Honnibal	8a7bbd5850	* Announce v0.88	2015-07-09 12:12:45 +02:00

1 2 3 4 5 ...

1428 Commits