spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-08-07 05:40:20 +03:00

Author	SHA1	Message	Date
Adriane Boyd	b2a162361f	Rewrap stdsort with specific types	2019-09-27 11:20:44 +02:00
Adriane Boyd	5983b7b612	Rewrap sort as stdsort for OS X	2019-09-27 10:03:30 +02:00
Adriane Boyd	ccd94809fa	Merge remote-tracking branch 'upstream/master' into bugfix/tokenizer-special-cases-matcher	2019-09-27 09:32:15 +02:00
Adriane Boyd	0b7e52c797	Move more of special case retokenize to cdef nogil Move as much of the special case retokenization to nogil as possible.	2019-09-27 09:26:20 +02:00
Adriane Boyd	72c2f98dc9	Switch special case reload threshold to variable Refer to variable instead of hard-coded threshold	2019-09-27 09:24:52 +02:00
Adriane Boyd	669bc1a314	Switch to local cdef functions for span filtering	2019-09-26 21:00:46 +02:00
Ines Montani	eb0649e38e	Fix tag [ci skip]	2019-09-26 16:22:33 +02:00
Ines Montani	da9a869d3f	Update vectors name docs [ci skip]	2019-09-26 16:21:32 +02:00
Adriane Boyd	ae348bee43	Switch to PhraseMatcher.find_matches	2019-09-26 14:43:22 +02:00
Adriane Boyd	63b014d09f	Merge branch 'feature/hashmatcher' into bugfix/tokenizer-special-cases-matcher	2019-09-26 14:34:09 +02:00
Adriane Boyd	3fdb22d832	Implement full remove() Remove unnecessary trie paths and free unused maps. Parallel to Matcher, raise KeyError when attempting to remove a match ID that has not been added.	2019-09-26 11:31:03 +02:00
Matthew Honnibal	58533f01bf	Set version to v2.2.0.dev10	2019-09-26 03:03:50 +02:00
Matthew Honnibal	27ace84f4a	Support model name in init-model	2019-09-26 03:01:32 +02:00
Matthew Honnibal	d0b30bf8cd	Merge branch 'master' of https://github.com/explosion/spaCy	2019-09-25 21:14:30 +02:00
Matthew Honnibal	eced2f3211	Set version to v2.2.0.dev9	2019-09-25 21:14:07 +02:00
Em Zhan	aafa091541	Fix typo in documentation (#4322 ) * Fix typo 'probj' instead of 'pobj' * Add spaCy contributor agreement for zqianem	2019-09-25 19:42:18 +02:00
Matthew Honnibal	1251b57dbb	Fix vectors name arg to init-model	2019-09-25 14:21:27 +02:00
Matthew Honnibal	92ed4dc5e0	Allow vectors name to be set in init-model (#4321 ) * Allow vectors name to be specified in init-model * Document --vectors-name argument to init-model * Update website/docs/api/cli.md Co-Authored-By: Ines Montani <ines@ines.io>	2019-09-25 13:11:00 +02:00
Eric Semeniuc	09816f8323	update sense2vec version (#4320 )	2019-09-25 12:17:54 +02:00
Adriane Boyd	230699e4fe	Merge branch 'feature/ud-script-update' into bugfix/tokenizer-special-cases-matcher	2019-09-25 11:10:30 +02:00
Adriane Boyd	7862a6eb01	Restructure imports to export find_matches	2019-09-25 11:03:58 +02:00
Adriane Boyd	3c6f1d7e3a	Switch from numpy array to Token.get_struct_attr Access token attributes directly in Doc instead of making a copy of the relevant values in a numpy array. Add unsatisfactory warning for hash collision with reserved terminal hash key. (Ideally it would change the reserved terminal hash and redo the whole trie, but for now, I'm hoping there won't be collisions.)	2019-09-25 09:41:27 +02:00
Ines Montani	52904b7270	Raise if on_match is not callable or None	2019-09-24 23:06:24 +02:00
Adriane Boyd	d995a7849e	Switch from map_get_unless_missing to map_get	2019-09-24 16:20:24 +02:00
Adriane Boyd	34550ef662	Update fix for match ID vocab	2019-09-24 16:07:38 +02:00
Adriane Boyd	d4141302b6	Fix how match ID hash is stored/added	2019-09-24 15:36:26 +02:00
Adriane Boyd	39540ed1ce	Replace dict trie with MapStruct trie	2019-09-24 14:39:50 +02:00
Ines Montani	38de08c7a9	Update README.md [ci skip]	2019-09-24 14:31:09 +02:00
Sofie Van Landeghem	42340740e3	update neuralcoref example (#4317 )	2019-09-24 10:47:17 +02:00
Adriane Boyd	a7e9c0fd3e	Remove cruft in matching loop for partial matches There was a bit of unnecessary code left over from FlashText in the matching loop to handle partial token matches, which we don't have with PhraseMatcher.	2019-09-23 09:11:13 +02:00
Adriane Boyd	c38c330585	Add missing loop for match ID set in search loop	2019-09-21 15:57:38 +02:00
Ines Montani	16aa092fb5	Improve Morphology errors (#4314 ) * Improve Morphology errors * Also clean up some other errors * Update errors.py	2019-09-21 14:37:06 +02:00
Adriane Boyd	ede32c01e2	Update UD bin scripts * Update imports for `bin/` * Add all currently supported languages * Update subtok merger for new Matcher validation * Modify blinded check to look at tokens instead of lemmas (for corpora with tokens but not lemmas like Telugu)	2019-09-21 12:20:22 +02:00
Adriane Boyd	97327bd268	Remove final traces of UD script modifications	2019-09-21 12:13:31 +02:00
Adriane Boyd	046a62741a	Remove UD script modifications Only used for timing/testing, should be a separate PR	2019-09-21 11:09:00 +02:00
Adriane Boyd	d92e8c8ac8	Update error message number	2019-09-20 20:36:53 +02:00
Adriane Boyd	73ca0ce4f3	Merge remote-tracking branch 'upstream/master' into bugfix/tokenizer-special-cases-matcher	2019-09-20 16:44:33 +02:00
Adriane Boyd	d3990d080c	Improve efficiency of special cases handling * Use PhraseMatcher instead of Matcher * Improve efficiency of merging/splitting special cases in document * Process merge/splits in one pass without repeated token shifting * Merge in place if no splits	2019-09-20 16:39:30 +02:00
Adriane Boyd	e74963acd4	Add test for #4248 , clean up test	2019-09-20 09:20:57 +02:00
Adriane Boyd	3a4e1f5ca7	Fix internal keyword add/remove for numpy arrays	2019-09-20 09:18:38 +02:00
Adriane Boyd	0d851db6d9	Restore support for pickling	2019-09-19 20:20:53 +02:00
Adriane Boyd	3931368ce8	Merge remote-tracking branch 'upstream/master' into feature/hashmatcher	2019-09-19 17:42:17 +02:00
Ines Montani	9bf69bfbb2	Remove test	2019-09-19 17:38:41 +02:00
Adriane Boyd	0d9740e826	Replace PhraseMatcher with Aho-Corasick Replace PhraseMatcher with the Aho-Corasick algorithm over numpy arrays of the hash values for the relevant attribute. The implementation is based on FlashText. The speed should be similar to the previous PhraseMatcher. It is now possible to easily remove match IDs and matches don't go missing with large keyword lists / vocabularies. Fixes #4308.	2019-09-19 16:49:05 +02:00
Ines Montani	197406de1d	Update v2-2.md [ci skip]	2019-09-19 14:33:58 +02:00
Ines Montani	c1030b1ad2	Update README.md [ci skip]	2019-09-19 13:35:12 +02:00
Ines Montani	0f9e253a69	Update README.md [ci skip]	2019-09-19 13:34:37 +02:00
Ines Montani	f2d224756b	Update README.md [ci skip]	2019-09-19 12:52:26 +02:00
Ines Montani	80d554f2e2	Remove unsupported version [ci skip]	2019-09-19 01:14:42 +02:00
Ines Montani	8cd3763678	Update about.py [ci skip]	2019-09-19 01:02:25 +02:00

1 2 3 4 5 ...

10831 Commits