Adriane Boyd
b2a162361f
Rewrap stdsort with specific types
2019-09-27 11:20:44 +02:00
Adriane Boyd
5983b7b612
Rewrap sort as stdsort for OS X
2019-09-27 10:03:30 +02:00
Adriane Boyd
ccd94809fa
Merge remote-tracking branch 'upstream/master' into bugfix/tokenizer-special-cases-matcher
2019-09-27 09:32:15 +02:00
Adriane Boyd
0b7e52c797
Move more of special case retokenize to cdef nogil
...
Move as much of the special case retokenization to nogil as possible.
2019-09-27 09:26:20 +02:00
Adriane Boyd
72c2f98dc9
Switch special case reload threshold to variable
...
Refer to variable instead of hard-coded threshold
2019-09-27 09:24:52 +02:00
Adriane Boyd
669bc1a314
Switch to local cdef functions for span filtering
2019-09-26 21:00:46 +02:00
Ines Montani
eb0649e38e
Fix tag [ci skip]
2019-09-26 16:22:33 +02:00
Ines Montani
da9a869d3f
Update vectors name docs [ci skip]
2019-09-26 16:21:32 +02:00
Adriane Boyd
ae348bee43
Switch to PhraseMatcher.find_matches
2019-09-26 14:43:22 +02:00
Adriane Boyd
63b014d09f
Merge branch 'feature/hashmatcher' into bugfix/tokenizer-special-cases-matcher
2019-09-26 14:34:09 +02:00
Adriane Boyd
3fdb22d832
Implement full remove()
...
Remove unnecessary trie paths and free unused maps.
Parallel to Matcher, raise KeyError when attempting to remove a match ID
that has not been added.
2019-09-26 11:31:03 +02:00
Matthew Honnibal
58533f01bf
Set version to v2.2.0.dev10
2019-09-26 03:03:50 +02:00
Matthew Honnibal
27ace84f4a
Support model name in init-model
2019-09-26 03:01:32 +02:00
Matthew Honnibal
d0b30bf8cd
Merge branch 'master' of https://github.com/explosion/spaCy
2019-09-25 21:14:30 +02:00
Matthew Honnibal
eced2f3211
Set version to v2.2.0.dev9
2019-09-25 21:14:07 +02:00
Em Zhan
aafa091541
Fix typo in documentation ( #4322 )
...
* Fix typo 'probj' instead of 'pobj'
* Add spaCy contributor agreement for zqianem
2019-09-25 19:42:18 +02:00
Matthew Honnibal
1251b57dbb
Fix vectors name arg to init-model
2019-09-25 14:21:27 +02:00
Matthew Honnibal
92ed4dc5e0
Allow vectors name to be set in init-model ( #4321 )
...
* Allow vectors name to be specified in init-model
* Document --vectors-name argument to init-model
* Update website/docs/api/cli.md
Co-Authored-By: Ines Montani <ines@ines.io>
2019-09-25 13:11:00 +02:00
Eric Semeniuc
09816f8323
update sense2vec version ( #4320 )
2019-09-25 12:17:54 +02:00
Adriane Boyd
230699e4fe
Merge branch 'feature/ud-script-update' into bugfix/tokenizer-special-cases-matcher
2019-09-25 11:10:30 +02:00
Adriane Boyd
7862a6eb01
Restructure imports to export find_matches
2019-09-25 11:03:58 +02:00
Adriane Boyd
3c6f1d7e3a
Switch from numpy array to Token.get_struct_attr
...
Access token attributes directly in Doc instead of making a copy of the
relevant values in a numpy array.
Add unsatisfactory warning for hash collision with reserved terminal
hash key. (Ideally it would change the reserved terminal hash and redo
the whole trie, but for now, I'm hoping there won't be collisions.)
2019-09-25 09:41:27 +02:00
Ines Montani
52904b7270
Raise if on_match is not callable or None
2019-09-24 23:06:24 +02:00
Adriane Boyd
d995a7849e
Switch from map_get_unless_missing to map_get
2019-09-24 16:20:24 +02:00
Adriane Boyd
34550ef662
Update fix for match ID vocab
2019-09-24 16:07:38 +02:00
Adriane Boyd
d4141302b6
Fix how match ID hash is stored/added
2019-09-24 15:36:26 +02:00
Adriane Boyd
39540ed1ce
Replace dict trie with MapStruct trie
2019-09-24 14:39:50 +02:00
Ines Montani
38de08c7a9
Update README.md [ci skip]
2019-09-24 14:31:09 +02:00
Sofie Van Landeghem
42340740e3
update neuralcoref example ( #4317 )
2019-09-24 10:47:17 +02:00
Adriane Boyd
a7e9c0fd3e
Remove cruft in matching loop for partial matches
...
There was a bit of unnecessary code left over from FlashText in the
matching loop to handle partial token matches, which we don't have with
PhraseMatcher.
2019-09-23 09:11:13 +02:00
Adriane Boyd
c38c330585
Add missing loop for match ID set in search loop
2019-09-21 15:57:38 +02:00
Ines Montani
16aa092fb5
Improve Morphology errors ( #4314 )
...
* Improve Morphology errors
* Also clean up some other errors
* Update errors.py
2019-09-21 14:37:06 +02:00
Adriane Boyd
ede32c01e2
Update UD bin scripts
...
* Update imports for `bin/`
* Add all currently supported languages
* Update subtok merger for new Matcher validation
* Modify blinded check to look at tokens instead of lemmas (for corpora
with tokens but not lemmas like Telugu)
2019-09-21 12:20:22 +02:00
Adriane Boyd
97327bd268
Remove final traces of UD script modifications
2019-09-21 12:13:31 +02:00
Adriane Boyd
046a62741a
Remove UD script modifications
...
Only used for timing/testing, should be a separate PR
2019-09-21 11:09:00 +02:00
Adriane Boyd
d92e8c8ac8
Update error message number
2019-09-20 20:36:53 +02:00
Adriane Boyd
73ca0ce4f3
Merge remote-tracking branch 'upstream/master' into bugfix/tokenizer-special-cases-matcher
2019-09-20 16:44:33 +02:00
Adriane Boyd
d3990d080c
Improve efficiency of special cases handling
...
* Use PhraseMatcher instead of Matcher
* Improve efficiency of merging/splitting special cases in document
* Process merge/splits in one pass without repeated token shifting
* Merge in place if no splits
2019-09-20 16:39:30 +02:00
Adriane Boyd
e74963acd4
Add test for #4248 , clean up test
2019-09-20 09:20:57 +02:00
Adriane Boyd
3a4e1f5ca7
Fix internal keyword add/remove for numpy arrays
2019-09-20 09:18:38 +02:00
Adriane Boyd
0d851db6d9
Restore support for pickling
2019-09-19 20:20:53 +02:00
Adriane Boyd
3931368ce8
Merge remote-tracking branch 'upstream/master' into feature/hashmatcher
2019-09-19 17:42:17 +02:00
Ines Montani
9bf69bfbb2
Remove test
2019-09-19 17:38:41 +02:00
Adriane Boyd
0d9740e826
Replace PhraseMatcher with Aho-Corasick
...
Replace PhraseMatcher with the Aho-Corasick algorithm over numpy arrays
of the hash values for the relevant attribute. The implementation is
based on FlashText.
The speed should be similar to the previous PhraseMatcher. It is now
possible to easily remove match IDs and matches don't go missing with
large keyword lists / vocabularies.
Fixes #4308 .
2019-09-19 16:49:05 +02:00
Ines Montani
197406de1d
Update v2-2.md [ci skip]
2019-09-19 14:33:58 +02:00
Ines Montani
c1030b1ad2
Update README.md [ci skip]
2019-09-19 13:35:12 +02:00
Ines Montani
0f9e253a69
Update README.md [ci skip]
2019-09-19 13:34:37 +02:00
Ines Montani
f2d224756b
Update README.md [ci skip]
2019-09-19 12:52:26 +02:00
Ines Montani
80d554f2e2
Remove unsupported version [ci skip]
2019-09-19 01:14:42 +02:00
Ines Montani
8cd3763678
Update about.py [ci skip]
2019-09-19 01:02:25 +02:00