Commit Graph

969 Commits

Author SHA1 Message Date
Ines Montani
0de599b16b
Merge pull request #2159 from explosion/feature/fix-merged-entity-iob (resolves #1554, resolves #1752)
💫 Fix token.ent_iob after doc.merge(), and ensure consistency in doc.ents
2018-03-28 23:10:00 +02:00
Ines Montani
98e9cda677
Merge pull request #2158 from explosion/feature/fix-multiple-vectors (resolves #1660)
💫 Fix loading of multiple vector models
2018-03-28 23:08:24 +02:00
ines
3eb67bbe4b Allow entity types with dashes (resolves #1967) 2018-03-28 20:51:26 +02:00
Matthew Honnibal
cf5fcf0546 Update serialization test 2018-03-28 20:12:53 +02:00
Matthew Honnibal
95fa89c4b8 Update doc.ents test 2018-03-28 18:39:03 +02:00
Matthew Honnibal
cbd2794be0 Add test for ent_iob during span merge 2018-03-28 18:36:53 +02:00
Matthew Honnibal
fd9e259414 Add test for #1660 2018-03-28 18:22:51 +02:00
Matthew Honnibal
95a9615221 Fix loading of multiple pre-trained vectors
This patch addresses #1660, which was caused by keying all pre-trained
vectors with the same ID when telling Thinc how to refer to them. This
meant that if multiple models were loaded that had pre-trained vectors,
errors or incorrect behaviour resulted.

The vectors class now includes a .name attribute, which defaults to:
{nlp.meta['lang']_nlp.meta['name']}.vectors
The vectors name is set in the cfg of the pipeline components under the
key pretrained_vectors. This replaces the previous cfg key
pretrained_dims.

In order to make existing models compatible with this change, we check
for the pretrained_dims key when loading models in from_disk and
from_bytes, and add the cfg key pretrained_vectors if we find it.
2018-03-28 16:02:59 +02:00
ines
6d2c85f428 Drop six and related hacks as a dependency 2018-03-28 10:45:25 +02:00
ines
f3f8bfc367 Add built-in factories for merge_entities and merge_noun_chunks
Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).
2018-03-15 17:16:54 +01:00
Matthew Honnibal
f9f46e5a07 Revert matcher fixes from GregDubbin 2018-02-18 10:59:28 +01:00
Aaron Marquez
f0d3672e17 Changed loading EN model 2018-02-15 14:28:38 -08:00
Aaron Marquez
7ba4111554 Add test for issue-1959 2018-02-15 12:46:22 -08:00
Matthew Honnibal
4cb861e080
Merge pull request #1968 from DuyguA/is_currency
New lexical feature is_currency
2018-02-15 12:13:36 +01:00
Claudiu-Vlad Ursache
e28de12cbd
Ensure files opened in from_disk are closed
Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706).
2018-02-13 20:49:43 +01:00
4altinok
471d3c9e23 added lex test for is_currency 2018-02-11 18:50:50 +01:00
Matthew Honnibal
fd9fd275c5 Make test for #1945 more precise 2018-02-07 02:06:11 +01:00
Matthew Honnibal
c087a14380 Merge branch 'master' of https://github.com/explosion/spaCy 2018-02-07 01:29:39 +01:00
Matthew Honnibal
76d89b2180 Add test for #1945: PhraseMatcher regression 2018-02-07 01:29:23 +01:00
Matthew Honnibal
2e7391e627
Merge pull request #1916 from tokestermw/bug/fix-not-passing-in-model-cfg-in-nlp
Bug/fix not passing in model cfg in nlp
2018-02-05 01:19:40 +01:00
Matthew Honnibal
f74a802d09 Test and fix #1919: Error resuming training 2018-02-02 02:32:40 +01:00
Motoki Wu
54062b7326 added tests for issue #1915 2018-01-30 18:30:19 -08:00
ines
8901814248 Improve error handling if pipeline component is not callable (resolves #1911)
Also add help message if user accidentally calls nlp.add_pipe() with a string of a built-in component name.
2018-01-30 15:43:03 +01:00
Matthew Honnibal
512e6adb08
Merge pull request #1896 from thomasopsomer/fix-sent
Fix sentence boundaries serialization (issue #1834)
2018-01-28 21:18:51 +01:00
Matthew Honnibal
f5b1ad4100 Limit parser model size, to hopefully reduce memory during CI tests 2018-01-28 21:00:32 +01:00
Thomas Opsomer
45d62561f7 add test for the issue 2018-01-28 19:49:56 +01:00
Matthew Honnibal
6a8cb905aa
Merge pull request #1876 from GregDubbin/master
Pattern matcher fixes
2018-01-24 16:38:11 +01:00
Matthew Honnibal
edb71a280e Add test for #1883: Unpickling Matcher 2018-01-24 15:42:33 +01:00
Matthew Honnibal
42a18ef903 Add test for #1868: Vocab.__contains__ with ints 2018-01-23 23:27:05 +01:00
greg
85ab99e692 Correct test examples 2018-01-23 15:00:14 -05:00
Matthew Honnibal
91e916cb67 Add comment to new test 2018-01-23 19:11:53 +01:00
Matthew Honnibal
fd187d71ad Add test for #1727 2018-01-23 19:11:01 +01:00
Matthew Honnibal
7e6dc283db Fix unicode import in test 2018-01-22 23:55:44 +01:00
greg
686735b94e Fix matcher import 2018-01-22 16:53:05 -05:00
Matthew Honnibal
4ce7d24fd5 Add test for #1799: Set left and right edges (and thus sentences) in non-projective parses. 2018-01-22 20:18:38 +01:00
greg
7072b395c9 Add greedy matcher tests 2018-01-16 15:46:13 -05:00
Matthew Honnibal
ccb51a9f36 Make .similarity() return 1.0 if all orth attrs match 2018-01-15 16:29:48 +01:00
Matthew Honnibal
82135d85b7 Fix test 2018-01-15 15:55:15 +01:00
Matthew Honnibal
4b09616b58 Add test for #1757: Comparison against None 2018-01-15 15:55:01 +01:00
Matthew Honnibal
9e413449f6 Fix unicode error in new test 2018-01-15 15:39:00 +01:00
Matthew Honnibal
6b215d2dd3 Add test for Issue #1537 2018-01-15 15:20:56 +01:00
ines
5babb7d6f6 Merge branch 'master' of https://github.com/explosion/spaCy 2018-01-14 17:31:09 +01:00
ines
793890cb4d Remove test for removed deprecation warning 2018-01-14 17:31:06 +01:00
Matthew Honnibal
1a1cca6052 Fix vectors.resize() on Py3. Closes #1539 2018-01-14 14:48:51 +01:00
Matthew Honnibal
0153220304 Make set_vector add word to vocab. Fixes #1807 2018-01-14 13:57:57 +01:00
Ines Montani
55754f0cee
Merge pull request #1836 from fucking-signup/master
Add tests for issue #1769
2018-01-13 00:23:35 +00:00
Kit
4ee97f20a0
Mark like_num tests as slow 2018-01-13 00:44:15 +01:00
Kit
855531537e
Rewrite tests for issue #1769 2018-01-12 23:49:51 +01:00
Kit
5b541cb5ec
Simplify tests for issue #1769 2018-01-12 23:34:27 +01:00
Kit
7a2adc4633
Remove some tests to see build status changes 2018-01-12 22:49:16 +01:00
Kit
0e62809a43
Rewrite tests for issue #1769 2018-01-12 22:26:06 +01:00
Ines Montani
36f426fe0a
Merge pull request #1808 from fucking-signup/master
Fix issue #1769
2018-01-12 21:12:02 +00:00
Kit
76f4eeca44
Remove tests to see build changes on Windows (Python 2.7) 2018-01-12 20:30:51 +01:00
Kit
7ec0956e8d
Add regression test (issue #1769) 2018-01-08 03:42:04 +01:00
Søren Lind Kristiansen
62de5da1ff Remove unsused dummy variable 2018-01-05 09:57:24 +01:00
Søren Lind Kristiansen
10dab8eef8 Remove dummy variable from function calls 2018-01-05 09:37:05 +01:00
Kevin Humphreys
597df5bf83 add test 2018-01-03 13:00:05 -08:00
Ines Montani
ff9fc945ab
Merge pull request #1749 from sorenlind/da_ud_tokenization
Tune Danish tokenizer to more closely match Universal Dependencies
2017-12-22 16:00:49 +00:00
ines
26f313dabc Fix missing import 2017-12-22 16:21:44 +01:00
ines
8dc1c27841 Merge branch 'master' of https://github.com/explosion/spaCy 2017-12-22 16:01:00 +01:00
ines
b10ba848b8 xfail test that causes MemoryError on Python 2 on Windows
Need to investigate this further!
2017-12-22 16:00:58 +01:00
Ines Montani
a3dd167d7f
Merge branch 'master' into da_ud_tokenization 2017-12-20 21:05:34 +00:00
Ines Montani
d682a8803e
Merge pull request #1672 from cbilgili/master
Adds Turkish Lemmatization
2017-12-20 21:01:00 +00:00
Søren Lind Kristiansen
15d13efafd Tune Danish tokenizer to more closely match tokenization in Universal Dependencies. 2017-12-20 17:36:52 +01:00
Ines Montani
9c1ee65268
Add regression test for #1698 2017-12-12 10:36:11 +01:00
Isaac Sijaranamual
38021fbb00 Switch from python 3 only TemporaryDirectory to pytest's tmpdir 2017-12-11 00:16:04 +01:00
Isaac Sijaranamual
568130ce7c Adds regression test_issue1622 2017-12-10 23:00:48 +01:00
Matthew Honnibal
36b47e3fa6 Fix (and test) vector pickling 2017-12-07 09:53:30 +01:00
Canbey Bilgili
abe098b255 Adds Turkish Lemmatization 2017-12-01 17:04:32 +03:00
Vadim Mazaev
4ba7ddf651 Bugfixies 2017-11-30 12:29:38 +03:00
Matthew Honnibal
6bc0f4d29f
Merge pull request #1611 from fsonntag/master
Solving #1494
2017-11-29 23:11:23 +01:00
Matthew Honnibal
f9ed9ea529
Merge pull request #1624 from GreenRiverRUS/russian
Add support for Russian
2017-11-29 23:10:01 +01:00
ines
a31506e060 Fix off-by-one error in nlp.add_pipe(after=name) (fixes #1654) 2017-11-28 20:37:55 +01:00
ines
b62739fbfe Add regression test for #1654 2017-11-28 20:27:54 +01:00
ines
2e50dbb9d7 Simplify test 2017-11-28 20:27:27 +01:00
Felix Sonntag
724ae7dc55 Fixed issue of infix capturing prefixes 2017-11-28 17:17:12 +01:00
Søren Lind Kristiansen
0ffd27b0f6 Add several Danish alternative spellings 2017-11-27 13:35:41 +01:00
Vadim Mazaev
53e7c38637 Fixed tests depends on pymorphy2 2017-11-26 21:04:44 +03:00
Vadim Mazaev
cacd859dcd Added tag map, fixed tests fails, added more exceptions 2017-11-26 20:54:48 +03:00
Ines Montani
a7bb8f1b42
Merge pull request #1637 from sorenlind/da_tokenization
Improve Danish tokenization
2017-11-26 15:41:38 +00:00
ines
c699aec089 Add offsets_from_biluo_tags helper and tests (see #1626) 2017-11-26 16:38:01 +01:00
Søren Lind Kristiansen
6aa241bcec Add day of month tokenizer exceptions for Danish. 2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen
0c276ed020 Add weekday abbreviations and remove abiguous month abbreviations for Danish. 2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen
056547e989 Add multiple tokenizer exceptions for Danish. 2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen
8dc265ac0c Add test for tokenization of 'i.' for Danish. 2017-11-24 11:29:37 +01:00
Matthew Honnibal
30ba81f881
Merge pull request #1576 from ligser/master
Actually reset caches in pipe [wip]
2017-11-23 12:54:48 +01:00
ines
c90fe92e15 Fix displaCy test 2017-11-22 05:04:39 +01:00
ines
a6f33ac27d Fix displaCy test 2017-11-22 04:19:28 +01:00
Vadim Mazaev
81314f8659 Fixed tokenizer: added char classes; added first lemmatizer and
tokenizer tests
2017-11-21 22:23:59 +03:00
Burton DeWilde
635792997c Add regression test for #1612 2017-11-20 12:05:35 -06:00
ines
d70a64d78b Fix syntax error and formatting in test (see #1617) 2017-11-20 14:01:25 +01:00
ines
17849dee4b Fix French test (see #1617) 2017-11-20 13:59:59 +01:00
Felix Sonntag
8be3392302 Added regression text for 1494 2017-11-19 16:30:35 +01:00
Motoki Wu
b818afaa0e Added failing test for Issue #1207.
The noun chunk iterator should work for `Doc` but not for `Span`.
2017-11-17 17:04:27 -08:00
ines
a3d4dd1a5d Test adding of lots of pipeline components (see #1585)
Just to make sure that there's no error now or in the future with adding a large number of pipeline components.
2017-11-15 17:28:06 +01:00
Roman Domrachev
505c6a2f2f Completely cleanup tokenizer cache
Tokenizer cache can have be different keys than string

That modification can slow down tokenizer and need to be measured
2017-11-15 17:55:48 +03:00
Roman Domrachev
3e21680814 Use safer method to get string without hit 2017-11-14 22:58:46 +03:00
Roman Domrachev
4e378dc4a4 Remove all obsolete code and test only initial problem 2017-11-14 20:45:04 +03:00
Roman
47ce2347b0
Create test that fails when actual cleanup caused 2017-11-14 20:28:13 +03:00
Roman Domrachev
3d247d2bb8 Get back previous testcase 2017-11-14 18:01:37 +03:00