Jens Dahl Møllerhøj
e5055e3cf6
Add Danish lemmatizer ( #2184 )
...
* add danish lemmatizer
* fill contributor agreement
2018-04-07 19:07:28 +02:00
ines
10462816bc
Fix tests for Python 2
2018-04-03 18:51:31 +02:00
ines
62b4b527d7
Don't raise error if set_extension has getter and setter ( closes #2177 )
...
Improve error messages, raise error if setter is specified without a getter and compare against _unset to allow default=None. Also add more tests.
2018-04-03 18:30:17 +02:00
Suraj Rajan
1cdbb7c97c
[2032] - Changed python set to cpp stl set ( #2170 )
...
Changed python set to cpp stl set #2032
## Description
Changed python set to cpp stl set. CPP stl set works better due to the logarithmic run time of its methods. Finding minimum in the cpp set is done in constant time as opposed to the worst case linear runtime of python set. Operations such as find,count,insert,delete are also done in either constant and logarithmic time thus making cpp set a better option to manage vectors.
Reference : http://www.cplusplus.com/reference/set/set/
### Types of change
Enhancement for `Vectors` for faster initialising of word vectors(fasttext)
2018-03-31 13:28:25 +02:00
Ines Montani
0de599b16b
Merge pull request #2159 from explosion/feature/fix-merged-entity-iob ( resolves #1554 , resolves #1752 )
...
💫 Fix token.ent_iob after doc.merge(), and ensure consistency in doc.ents
2018-03-28 23:10:00 +02:00
Ines Montani
98e9cda677
Merge pull request #2158 from explosion/feature/fix-multiple-vectors ( resolves #1660 )
...
💫 Fix loading of multiple vector models
2018-03-28 23:08:24 +02:00
ines
3eb67bbe4b
Allow entity types with dashes ( resolves #1967 )
2018-03-28 20:51:26 +02:00
Matthew Honnibal
cf5fcf0546
Update serialization test
2018-03-28 20:12:53 +02:00
Matthew Honnibal
95fa89c4b8
Update doc.ents test
2018-03-28 18:39:03 +02:00
Matthew Honnibal
cbd2794be0
Add test for ent_iob during span merge
2018-03-28 18:36:53 +02:00
Matthew Honnibal
fd9e259414
Add test for #1660
2018-03-28 18:22:51 +02:00
Matthew Honnibal
95a9615221
Fix loading of multiple pre-trained vectors
...
This patch addresses #1660 , which was caused by keying all pre-trained
vectors with the same ID when telling Thinc how to refer to them. This
meant that if multiple models were loaded that had pre-trained vectors,
errors or incorrect behaviour resulted.
The vectors class now includes a .name attribute, which defaults to:
{nlp.meta['lang']_nlp.meta['name']}.vectors
The vectors name is set in the cfg of the pipeline components under the
key pretrained_vectors. This replaces the previous cfg key
pretrained_dims.
In order to make existing models compatible with this change, we check
for the pretrained_dims key when loading models in from_disk and
from_bytes, and add the cfg key pretrained_vectors if we find it.
2018-03-28 16:02:59 +02:00
ines
6d2c85f428
Drop six and related hacks as a dependency
2018-03-28 10:45:25 +02:00
ines
f3f8bfc367
Add built-in factories for merge_entities and merge_noun_chunks
...
Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).
2018-03-15 17:16:54 +01:00
Matthew Honnibal
f9f46e5a07
Revert matcher fixes from GregDubbin
2018-02-18 10:59:28 +01:00
Aaron Marquez
f0d3672e17
Changed loading EN model
2018-02-15 14:28:38 -08:00
Aaron Marquez
7ba4111554
Add test for issue-1959
2018-02-15 12:46:22 -08:00
Matthew Honnibal
4cb861e080
Merge pull request #1968 from DuyguA/is_currency
...
New lexical feature is_currency
2018-02-15 12:13:36 +01:00
Claudiu-Vlad Ursache
e28de12cbd
Ensure files opened in from_disk
are closed
...
Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706 ).
2018-02-13 20:49:43 +01:00
4altinok
471d3c9e23
added lex test for is_currency
2018-02-11 18:50:50 +01:00
Matthew Honnibal
fd9fd275c5
Make test for #1945 more precise
2018-02-07 02:06:11 +01:00
Matthew Honnibal
c087a14380
Merge branch 'master' of https://github.com/explosion/spaCy
2018-02-07 01:29:39 +01:00
Matthew Honnibal
76d89b2180
Add test for #1945 : PhraseMatcher regression
2018-02-07 01:29:23 +01:00
Matthew Honnibal
2e7391e627
Merge pull request #1916 from tokestermw/bug/fix-not-passing-in-model-cfg-in-nlp
...
Bug/fix not passing in model cfg in nlp
2018-02-05 01:19:40 +01:00
Matthew Honnibal
f74a802d09
Test and fix #1919 : Error resuming training
2018-02-02 02:32:40 +01:00
Motoki Wu
54062b7326
added tests for issue #1915
2018-01-30 18:30:19 -08:00
ines
8901814248
Improve error handling if pipeline component is not callable ( resolves #1911 )
...
Also add help message if user accidentally calls nlp.add_pipe() with a string of a built-in component name.
2018-01-30 15:43:03 +01:00
Matthew Honnibal
512e6adb08
Merge pull request #1896 from thomasopsomer/fix-sent
...
Fix sentence boundaries serialization (issue #1834 )
2018-01-28 21:18:51 +01:00
Matthew Honnibal
f5b1ad4100
Limit parser model size, to hopefully reduce memory during CI tests
2018-01-28 21:00:32 +01:00
Thomas Opsomer
45d62561f7
add test for the issue
2018-01-28 19:49:56 +01:00
Matthew Honnibal
6a8cb905aa
Merge pull request #1876 from GregDubbin/master
...
Pattern matcher fixes
2018-01-24 16:38:11 +01:00
Matthew Honnibal
edb71a280e
Add test for #1883 : Unpickling Matcher
2018-01-24 15:42:33 +01:00
Matthew Honnibal
42a18ef903
Add test for #1868 : Vocab.__contains__ with ints
2018-01-23 23:27:05 +01:00
greg
85ab99e692
Correct test examples
2018-01-23 15:00:14 -05:00
Matthew Honnibal
91e916cb67
Add comment to new test
2018-01-23 19:11:53 +01:00
Matthew Honnibal
fd187d71ad
Add test for #1727
2018-01-23 19:11:01 +01:00
Matthew Honnibal
7e6dc283db
Fix unicode import in test
2018-01-22 23:55:44 +01:00
greg
686735b94e
Fix matcher import
2018-01-22 16:53:05 -05:00
Matthew Honnibal
4ce7d24fd5
Add test for #1799 : Set left and right edges (and thus sentences) in non-projective parses.
2018-01-22 20:18:38 +01:00
greg
7072b395c9
Add greedy matcher tests
2018-01-16 15:46:13 -05:00
Matthew Honnibal
ccb51a9f36
Make .similarity() return 1.0 if all orth attrs match
2018-01-15 16:29:48 +01:00
Matthew Honnibal
82135d85b7
Fix test
2018-01-15 15:55:15 +01:00
Matthew Honnibal
4b09616b58
Add test for #1757 : Comparison against None
2018-01-15 15:55:01 +01:00
Matthew Honnibal
9e413449f6
Fix unicode error in new test
2018-01-15 15:39:00 +01:00
Matthew Honnibal
6b215d2dd3
Add test for Issue #1537
2018-01-15 15:20:56 +01:00
ines
5babb7d6f6
Merge branch 'master' of https://github.com/explosion/spaCy
2018-01-14 17:31:09 +01:00
ines
793890cb4d
Remove test for removed deprecation warning
2018-01-14 17:31:06 +01:00
Matthew Honnibal
1a1cca6052
Fix vectors.resize() on Py3. Closes #1539
2018-01-14 14:48:51 +01:00
Matthew Honnibal
0153220304
Make set_vector add word to vocab. Fixes #1807
2018-01-14 13:57:57 +01:00
Ines Montani
55754f0cee
Merge pull request #1836 from fucking-signup/master
...
Add tests for issue #1769
2018-01-13 00:23:35 +00:00