Commit Graph

997 Commits

Author SHA1 Message Date
Matthew Honnibal
512e6adb08
Merge pull request #1896 from thomasopsomer/fix-sent
Fix sentence boundaries serialization (issue #1834)
2018-01-28 21:18:51 +01:00
Matthew Honnibal
f5b1ad4100 Limit parser model size, to hopefully reduce memory during CI tests 2018-01-28 21:00:32 +01:00
Thomas Opsomer
45d62561f7 add test for the issue 2018-01-28 19:49:56 +01:00
Kit
52ef51f36e
Add test for issue #1889 2018-01-25 22:56:48 +01:00
Matthew Honnibal
6a8cb905aa
Merge pull request #1876 from GregDubbin/master
Pattern matcher fixes
2018-01-24 16:38:11 +01:00
Matthew Honnibal
edb71a280e Add test for #1883: Unpickling Matcher 2018-01-24 15:42:33 +01:00
Matthew Honnibal
42a18ef903 Add test for #1868: Vocab.__contains__ with ints 2018-01-23 23:27:05 +01:00
greg
85ab99e692 Correct test examples 2018-01-23 15:00:14 -05:00
Matthew Honnibal
91e916cb67 Add comment to new test 2018-01-23 19:11:53 +01:00
Matthew Honnibal
fd187d71ad Add test for #1727 2018-01-23 19:11:01 +01:00
Matthew Honnibal
7e6dc283db Fix unicode import in test 2018-01-22 23:55:44 +01:00
greg
686735b94e Fix matcher import 2018-01-22 16:53:05 -05:00
Matthew Honnibal
4ce7d24fd5 Add test for #1799: Set left and right edges (and thus sentences) in non-projective parses. 2018-01-22 20:18:38 +01:00
greg
7072b395c9 Add greedy matcher tests 2018-01-16 15:46:13 -05:00
Matthew Honnibal
ccb51a9f36 Make .similarity() return 1.0 if all orth attrs match 2018-01-15 16:29:48 +01:00
Matthew Honnibal
82135d85b7 Fix test 2018-01-15 15:55:15 +01:00
Matthew Honnibal
4b09616b58 Add test for #1757: Comparison against None 2018-01-15 15:55:01 +01:00
Matthew Honnibal
9e413449f6 Fix unicode error in new test 2018-01-15 15:39:00 +01:00
Matthew Honnibal
6b215d2dd3 Add test for Issue #1537 2018-01-15 15:20:56 +01:00
ines
5babb7d6f6 Merge branch 'master' of https://github.com/explosion/spaCy 2018-01-14 17:31:09 +01:00
ines
793890cb4d Remove test for removed deprecation warning 2018-01-14 17:31:06 +01:00
Matthew Honnibal
1a1cca6052 Fix vectors.resize() on Py3. Closes #1539 2018-01-14 14:48:51 +01:00
Matthew Honnibal
0153220304 Make set_vector add word to vocab. Fixes #1807 2018-01-14 13:57:57 +01:00
Ines Montani
55754f0cee
Merge pull request #1836 from fucking-signup/master
Add tests for issue #1769
2018-01-13 00:23:35 +00:00
Kit
4ee97f20a0
Mark like_num tests as slow 2018-01-13 00:44:15 +01:00
Kit
855531537e
Rewrite tests for issue #1769 2018-01-12 23:49:51 +01:00
Kit
5b541cb5ec
Simplify tests for issue #1769 2018-01-12 23:34:27 +01:00
Kit
7a2adc4633
Remove some tests to see build status changes 2018-01-12 22:49:16 +01:00
Kit
0e62809a43
Rewrite tests for issue #1769 2018-01-12 22:26:06 +01:00
Ines Montani
36f426fe0a
Merge pull request #1808 from fucking-signup/master
Fix issue #1769
2018-01-12 21:12:02 +00:00
Kit
76f4eeca44
Remove tests to see build changes on Windows (Python 2.7) 2018-01-12 20:30:51 +01:00
Kit
7ec0956e8d
Add regression test (issue #1769) 2018-01-08 03:42:04 +01:00
Søren Lind Kristiansen
62de5da1ff Remove unsused dummy variable 2018-01-05 09:57:24 +01:00
Søren Lind Kristiansen
10dab8eef8 Remove dummy variable from function calls 2018-01-05 09:37:05 +01:00
Kevin Humphreys
597df5bf83 add test 2018-01-03 13:00:05 -08:00
Ines Montani
ff9fc945ab
Merge pull request #1749 from sorenlind/da_ud_tokenization
Tune Danish tokenizer to more closely match Universal Dependencies
2017-12-22 16:00:49 +00:00
ines
26f313dabc Fix missing import 2017-12-22 16:21:44 +01:00
ines
8dc1c27841 Merge branch 'master' of https://github.com/explosion/spaCy 2017-12-22 16:01:00 +01:00
ines
b10ba848b8 xfail test that causes MemoryError on Python 2 on Windows
Need to investigate this further!
2017-12-22 16:00:58 +01:00
Ines Montani
a3dd167d7f
Merge branch 'master' into da_ud_tokenization 2017-12-20 21:05:34 +00:00
Ines Montani
d682a8803e
Merge pull request #1672 from cbilgili/master
Adds Turkish Lemmatization
2017-12-20 21:01:00 +00:00
Søren Lind Kristiansen
15d13efafd Tune Danish tokenizer to more closely match tokenization in Universal Dependencies. 2017-12-20 17:36:52 +01:00
Ines Montani
9c1ee65268
Add regression test for #1698 2017-12-12 10:36:11 +01:00
Isaac Sijaranamual
38021fbb00 Switch from python 3 only TemporaryDirectory to pytest's tmpdir 2017-12-11 00:16:04 +01:00
Isaac Sijaranamual
568130ce7c Adds regression test_issue1622 2017-12-10 23:00:48 +01:00
Matthew Honnibal
36b47e3fa6 Fix (and test) vector pickling 2017-12-07 09:53:30 +01:00
Canbey Bilgili
abe098b255 Adds Turkish Lemmatization 2017-12-01 17:04:32 +03:00
Vadim Mazaev
4ba7ddf651 Bugfixies 2017-11-30 12:29:38 +03:00
Matthew Honnibal
6bc0f4d29f
Merge pull request #1611 from fsonntag/master
Solving #1494
2017-11-29 23:11:23 +01:00
Matthew Honnibal
f9ed9ea529
Merge pull request #1624 from GreenRiverRUS/russian
Add support for Russian
2017-11-29 23:10:01 +01:00
ines
a31506e060 Fix off-by-one error in nlp.add_pipe(after=name) (fixes #1654) 2017-11-28 20:37:55 +01:00
ines
b62739fbfe Add regression test for #1654 2017-11-28 20:27:54 +01:00
ines
2e50dbb9d7 Simplify test 2017-11-28 20:27:27 +01:00
Felix Sonntag
724ae7dc55 Fixed issue of infix capturing prefixes 2017-11-28 17:17:12 +01:00
Søren Lind Kristiansen
0ffd27b0f6 Add several Danish alternative spellings 2017-11-27 13:35:41 +01:00
Vadim Mazaev
53e7c38637 Fixed tests depends on pymorphy2 2017-11-26 21:04:44 +03:00
Vadim Mazaev
cacd859dcd Added tag map, fixed tests fails, added more exceptions 2017-11-26 20:54:48 +03:00
Ines Montani
a7bb8f1b42
Merge pull request #1637 from sorenlind/da_tokenization
Improve Danish tokenization
2017-11-26 15:41:38 +00:00
ines
c699aec089 Add offsets_from_biluo_tags helper and tests (see #1626) 2017-11-26 16:38:01 +01:00
Søren Lind Kristiansen
6aa241bcec Add day of month tokenizer exceptions for Danish. 2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen
0c276ed020 Add weekday abbreviations and remove abiguous month abbreviations for Danish. 2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen
056547e989 Add multiple tokenizer exceptions for Danish. 2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen
8dc265ac0c Add test for tokenization of 'i.' for Danish. 2017-11-24 11:29:37 +01:00
Matthew Honnibal
30ba81f881
Merge pull request #1576 from ligser/master
Actually reset caches in pipe [wip]
2017-11-23 12:54:48 +01:00
ines
c90fe92e15 Fix displaCy test 2017-11-22 05:04:39 +01:00
ines
a6f33ac27d Fix displaCy test 2017-11-22 04:19:28 +01:00
Vadim Mazaev
81314f8659 Fixed tokenizer: added char classes; added first lemmatizer and
tokenizer tests
2017-11-21 22:23:59 +03:00
Burton DeWilde
635792997c Add regression test for #1612 2017-11-20 12:05:35 -06:00
ines
d70a64d78b Fix syntax error and formatting in test (see #1617) 2017-11-20 14:01:25 +01:00
ines
17849dee4b Fix French test (see #1617) 2017-11-20 13:59:59 +01:00
Felix Sonntag
8be3392302 Added regression text for 1494 2017-11-19 16:30:35 +01:00
Motoki Wu
b818afaa0e Added failing test for Issue #1207.
The noun chunk iterator should work for `Doc` but not for `Span`.
2017-11-17 17:04:27 -08:00
ines
a3d4dd1a5d Test adding of lots of pipeline components (see #1585)
Just to make sure that there's no error now or in the future with adding a large number of pipeline components.
2017-11-15 17:28:06 +01:00
Roman Domrachev
505c6a2f2f Completely cleanup tokenizer cache
Tokenizer cache can have be different keys than string

That modification can slow down tokenizer and need to be measured
2017-11-15 17:55:48 +03:00
Roman Domrachev
3e21680814 Use safer method to get string without hit 2017-11-14 22:58:46 +03:00
Roman Domrachev
4e378dc4a4 Remove all obsolete code and test only initial problem 2017-11-14 20:45:04 +03:00
Roman
47ce2347b0
Create test that fails when actual cleanup caused 2017-11-14 20:28:13 +03:00
Roman Domrachev
3d247d2bb8 Get back previous testcase 2017-11-14 18:01:37 +03:00
Roman Domrachev
a2745b0e84 StringStore now actually cleaned
Do not lose docs in ref tracking
2017-11-14 17:45:50 +03:00
Roman Domrachev
ee60a52ee7 Fix test imports and last batch cleanup 2017-11-11 11:32:16 +03:00
Roman Domrachev
3c600adf23 Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
ines
ee97fd3cb4 Add regression test for #1547 2017-11-11 00:14:03 +01:00
ines
2df27db671 Add unicode declaration 2017-11-11 00:13:56 +01:00
ines
1c218397f6 Ensure path in Doc.to_disk/from_disk (resolves ##1521)
Also add Doc serialization tests with both Path and string path options
2017-11-09 02:29:03 +01:00
Matthew Honnibal
a5ea0fdf5a Fix #1518: vocab.vectors.resize() didn't work 2017-11-08 22:18:37 +01:00
Matthew Honnibal
4194bc5744 Xfail flakey serialization test 2017-11-08 13:55:13 +01:00
ines
42a0fbf291 Fix textcat simple train example 2017-11-07 01:25:54 +01:00
ines
5f43953536 Move test 2017-11-06 23:14:10 +01:00
Matthew Honnibal
1831dbd065 Add test of simple textcat workflow 2017-11-06 22:04:29 +01:00
Matthew Honnibal
2f7e9f390d Make test less flakey 2017-11-06 17:34:50 +01:00
Matthew Honnibal
407b08017e Make test less flakey 2017-11-06 17:31:40 +01:00
Matthew Honnibal
102f797933 Fix lemma ordering in test 2017-11-06 17:02:17 +01:00
Matthew Honnibal
63c6ae4191 Fix lemmatizer test 2017-11-06 11:57:06 +01:00
Matthew Honnibal
00435d8f0c Add extra beam parsing test 2017-11-05 14:39:57 +01:00
ines
5e7d98f72a Remove test for #1491 2017-11-03 22:10:57 +01:00
ines
718f1c50fb Add regression test for #1491 2017-11-03 21:11:20 +01:00
Matthew Honnibal
144a93c2a5 Back-off to tensor for similarity if no vectors 2017-11-03 20:56:33 +01:00
Matthew Honnibal
d6e831bf89 Fix lemmatizer tests 2017-11-03 19:46:34 +01:00
ines
eef930c73e Assert instead of print 2017-11-03 18:50:57 +01:00
ines
f0986df94b Add test for #1488 (passes on v2.0.0a18?) 2017-11-03 14:44:36 +01:00
Matthew Honnibal
711278b667 Make test less flakey 2017-11-03 14:36:08 +01:00
Matthew Honnibal
0a534ae96a Fix test for backprop d_pad 2017-11-03 14:04:16 +01:00
Matthew Honnibal
a22f96c3f1 Add test for backpropagating padding 2017-11-03 00:48:54 +01:00
ines
3af281a334 Update test model name 2017-11-01 23:02:00 +01:00
ines
8c2260e18c Move span tests to /doc 2017-11-01 16:56:35 +01:00
ines
260cb37224 Catch deprecation warning 2017-11-01 16:49:18 +01:00
ines
5914faafbb Fix .merge tests to not use deprecated API 2017-11-01 16:49:11 +01:00
Matthew Honnibal
9e0ebee81c Add Token.is_sent_start property, so can deprecate Token.sent_start 2017-11-01 13:27:14 +01:00
Matthew Honnibal
c047498f87 Fix vectors test 2017-11-01 13:24:47 +01:00
Matthew Honnibal
86eba61fae Fix token.vector when vectors are missing 2017-11-01 00:47:35 +01:00
Ines Montani
d11659463b
Merge pull request #1152 from jimregan/develop-irish
[WIP] attempt a port from #1147
2017-11-01 00:23:43 +01:00
Jim O'Regan
08b0bfd153 merge 2017-10-31 22:55:59 +00:00
Jim O'Regan
00ecfa5417 Ó, not O 2017-10-31 22:54:42 +00:00
Ines Montani
25b1d6cd91
Fix syntax error 2017-10-31 22:36:03 +01:00
Matthew Honnibal
92dc127569 Fix test for Python 3 2017-10-31 22:21:55 +01:00
Jim O'Regan
fe4b10346a replace example sentence until I get around to adding a punctuation.py 2017-10-31 20:24:53 +00:00
Matthew Honnibal
77d8f5de9a Revise and simplify Vectors class 2017-10-31 18:25:08 +01:00
Jim O'Regan
d4a8160c36 change quotes 2017-10-31 15:15:44 +00:00
Jim O'Regan
34ca59691b no idea what is wrong here 2017-10-31 14:50:13 +00:00
Jim O'Regan
41dd29e48e merge 2017-10-31 14:07:45 +00:00
Matthew Honnibal
cb5217012f Fix vector remapping 2017-10-31 11:40:46 +01:00
Matthew Honnibal
9c11ee4a1c WIP on vectors fixes 2017-10-31 11:22:56 +01:00
Matthew Honnibal
368fdb389a WIP on refactoring and fixing vectors 2017-10-31 02:00:26 +01:00
Explosion Bot
72aea8f105 Update vectors.add() to allow setting keys to rows 2017-10-30 10:03:08 +01:00
Matthew Honnibal
64e4ff7c4b Merge 'tidy-up' changes into branch. Resolve conflicts 2017-10-28 13:16:06 +02:00
Ines Montani
4033e70c71 Merge pull request #1461 from explosion/feature/disable-pipes
💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples
2017-10-27 12:21:40 +02:00
Matthew Honnibal
b0f3ea2200 Fix names of pipeline components
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder     --> Tensorizer
NeuralLabeller         --> MultitaskObjective
2017-10-26 12:38:23 +02:00
ines
de1e5f35d5 Merge branch 'develop' into feature/disable-pipes 2017-10-25 16:33:12 +02:00
ines
c0b55ebdac Fix PhraseMatcher.__contains__ and add more tests 2017-10-25 16:31:11 +02:00
ines
657a4d91bc Merge branch 'develop' into feature/disable-pipes 2017-10-25 15:19:05 +02:00
ines
1a722dac31 Merge branch 'develop' into feature/disable-pipes 2017-10-25 15:18:18 +02:00
Matthew Honnibal
b5de768852 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-25 14:44:16 +02:00
Matthew Honnibal
094512fd47 Fix model-mark on regression test. 2017-10-25 14:44:00 +02:00
Matthew Honnibal
e70f80f29e Add Language.disable_pipes() 2017-10-25 13:46:41 +02:00
Ines Montani
d3bf488e16 Merge pull request #1171 from mollerhoj/support-danish
Improve basic support for Danish
2017-10-24 20:29:57 +02:00
Matthew Honnibal
908809d488 Update tests 2017-10-24 17:05:15 +02:00
Matthew Honnibal
30e67fa808 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-24 16:08:23 +02:00
Matthew Honnibal
63f0bde749 Add test for #1250: Tokenizer cache clobbered special-case attrs 2017-10-24 16:07:18 +02:00
ines
090aed940a Add test for currently failing span.as_doc case 2017-10-24 16:00:56 +02:00
ines
4ef81a9ebc Fix whitespace 2017-10-24 16:00:56 +02:00
Matthew Honnibal
4bea65a1a8 Fix Issue #1450: Off-by-1 in * and ? matches
Patterns that end in variable-length operators e.g. * and ? now end on
the correct token. Previously, they were off by 1: the next token was
pulled into the match, even if that's where the pattern failed.
2017-10-24 14:26:27 +02:00
Matthew Honnibal
391d5ef0d1 Normalize imports in regression test 2017-10-24 14:25:49 +02:00
Matthew Honnibal
b66b8f028b Fix #1375 -- out-of-bounds on token.nbor() 2017-10-24 12:10:39 +02:00
Matthew Honnibal
a68d89a4f3 Add failing test for bug #1375 -- no out-of-bounds error for token.nbor() 2017-10-24 12:05:25 +02:00
Ines Montani
facf77e541 Merge branch 'develop' into support-danish 2017-10-24 11:53:19 +02:00
Matthew Honnibal
ccd2ab1a62 Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix
Add LCA matrix for spans and docs
2017-10-24 11:22:46 +02:00
Matthew Honnibal
ef3e5a361b Merge pull request #1442 from explosion/feature/fix-sp
💫Fix SP tag, tweak Vectors.__init__, fix Morphology
2017-10-24 10:24:07 +02:00
Matthew Honnibal
fdf25d10ba Merge pull request #1440 from ramananbalakrishnan/develop
Support single value for attribute list in doc.to_array
2017-10-24 10:23:12 +02:00
Matthew Honnibal
490ad3eaf0 Check that empty strings are handled. Closes #1242 2017-10-21 00:52:14 +02:00
Ramanan Balakrishnan
d2fe56a577
Add LCA matrix for spans and docs 2017-10-20 23:58:00 +05:30