Commit Graph

1139 Commits

Author SHA1 Message Date
ines
f3f8bfc367 Add built-in factories for merge_entities and merge_noun_chunks
Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).
2018-03-15 17:16:54 +01:00
ines
d854f69fe3 Add built-in factories for merge_entities and merge_noun_chunks
Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).
2018-03-15 00:18:51 +01:00
Matthew Honnibal
c2f4759257
Fix test for Python 2 2018-03-12 23:03:05 +01:00
Matthew Honnibal
53b3249e06 Add tests for arc eager oracle 2018-03-10 23:42:56 +01:00
Matthew Honnibal
5cc3bd1c1d Update alignment tests 2018-02-24 16:03:58 +01:00
Matthew Honnibal
7865746574 Support many-to-one alignment 2018-02-24 02:09:53 +01:00
Matthew Honnibal
458710b831 Poke matcher test for appveyor 2018-02-23 23:53:48 +01:00
Matthew Honnibal
2c9c8b8d72 Try comming out emoji test in matcher 2018-02-23 23:34:35 +01:00
Matthew Honnibal
980ad68cbe Try to find test that fails on appveyor 2018-02-23 21:27:53 +01:00
Matthew Honnibal
39de8cd4d3 Try to find test failing on appveyor 2018-02-23 20:59:21 +01:00
Matthew Honnibal
7b575a119e Try to reduce memory usage of test_matcher 2018-02-23 15:34:37 +01:00
Matthew Honnibal
875411b875 Set unicode types in _align.pyx and test 2018-02-23 14:35:38 +01:00
Matthew Honnibal
51d9679aa3 Fix broken span.as_doc test 2018-02-23 14:22:24 +01:00
Matthew Honnibal
3e6c1111b7 Remove obsolete test 2018-02-23 03:22:07 +01:00
Matthew Honnibal
c0734ba526 Make alignment work with strings 2018-02-20 17:51:49 +01:00
Matthew Honnibal
8180c84a98 Add tests for new Levenshtein alignment 2018-02-20 17:32:25 +01:00
Matthew Honnibal
2bccad8815 Fix incorrect matcher test 2018-02-18 14:56:12 +01:00
Matthew Honnibal
530172d57a Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-faster-matcher 2018-02-18 14:40:42 +01:00
Matthew Honnibal
1e5aeb4eec
Merge pull request #1987 from thomasopsomer/span-sent
Make span.sent work when only manual / custom sbd
2018-02-18 14:05:37 +01:00
Matthew Honnibal
eb3040ce46
Merge pull request #1891 from fucking-signup/master
Fix issue #1889
2018-02-18 13:47:47 +01:00
Matthew Honnibal
3d7285870b Update matcher branch with v2.0.8 master 2018-02-18 13:42:58 +01:00
ines
6bba1db4cc Drop six and related hacks as a dependency 2018-02-18 13:29:56 +01:00
Matthew Honnibal
f9f46e5a07 Revert matcher fixes from GregDubbin 2018-02-18 10:59:28 +01:00
Matthew Honnibal
f7dc64d2a3 Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-faster-matcher 2018-02-17 16:47:35 +01:00
Aaron Marquez
f0d3672e17 Changed loading EN model 2018-02-15 14:28:38 -08:00
Aaron Marquez
7ba4111554 Add test for issue-1959 2018-02-15 12:46:22 -08:00
Thomas Opsomer
5d24a81c0b add test for span.sent when doc not parsed 2018-02-15 16:59:16 +01:00
Matthew Honnibal
4533c7408d Update matcher tests 2018-02-15 15:39:47 +01:00
Matthew Honnibal
4cb861e080
Merge pull request #1968 from DuyguA/is_currency
New lexical feature is_currency
2018-02-15 12:13:36 +01:00
Matthew Honnibal
00261eea27 Make tests refer to matcher2 2018-02-14 12:10:51 +01:00
Claudiu-Vlad Ursache
e28de12cbd
Ensure files opened in from_disk are closed
Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706).
2018-02-13 20:49:43 +01:00
Matthew Honnibal
dcd8d89aef Update test for 850, making it work with matcher2 2018-02-13 16:35:20 +01:00
Matthew Honnibal
9bdfa5cd4f Remove re comparisons tests, as matcher behaves differently 2018-02-13 16:28:52 +01:00
Matthew Honnibal
6d7986b0f1 Fix matcher test 2018-02-13 16:28:06 +01:00
4altinok
471d3c9e23 added lex test for is_currency 2018-02-11 18:50:50 +01:00
Matthew Honnibal
fd9fd275c5 Make test for #1945 more precise 2018-02-07 02:06:11 +01:00
Matthew Honnibal
c087a14380 Merge branch 'master' of https://github.com/explosion/spaCy 2018-02-07 01:29:39 +01:00
Matthew Honnibal
76d89b2180 Add test for #1945: PhraseMatcher regression 2018-02-07 01:29:23 +01:00
Matthew Honnibal
2e7391e627
Merge pull request #1916 from tokestermw/bug/fix-not-passing-in-model-cfg-in-nlp
Bug/fix not passing in model cfg in nlp
2018-02-05 01:19:40 +01:00
Matthew Honnibal
f74a802d09 Test and fix #1919: Error resuming training 2018-02-02 02:32:40 +01:00
Motoki Wu
54062b7326 added tests for issue #1915 2018-01-30 18:30:19 -08:00
ines
8901814248 Improve error handling if pipeline component is not callable (resolves #1911)
Also add help message if user accidentally calls nlp.add_pipe() with a string of a built-in component name.
2018-01-30 15:43:03 +01:00
Matthew Honnibal
512e6adb08
Merge pull request #1896 from thomasopsomer/fix-sent
Fix sentence boundaries serialization (issue #1834)
2018-01-28 21:18:51 +01:00
Matthew Honnibal
f5b1ad4100 Limit parser model size, to hopefully reduce memory during CI tests 2018-01-28 21:00:32 +01:00
Thomas Opsomer
45d62561f7 add test for the issue 2018-01-28 19:49:56 +01:00
Kit
52ef51f36e
Add test for issue #1889 2018-01-25 22:56:48 +01:00
Matthew Honnibal
6a8cb905aa
Merge pull request #1876 from GregDubbin/master
Pattern matcher fixes
2018-01-24 16:38:11 +01:00
Matthew Honnibal
edb71a280e Add test for #1883: Unpickling Matcher 2018-01-24 15:42:33 +01:00
Matthew Honnibal
42a18ef903 Add test for #1868: Vocab.__contains__ with ints 2018-01-23 23:27:05 +01:00
greg
85ab99e692 Correct test examples 2018-01-23 15:00:14 -05:00
Matthew Honnibal
91e916cb67 Add comment to new test 2018-01-23 19:11:53 +01:00
Matthew Honnibal
fd187d71ad Add test for #1727 2018-01-23 19:11:01 +01:00
Matthew Honnibal
7e6dc283db Fix unicode import in test 2018-01-22 23:55:44 +01:00
greg
686735b94e Fix matcher import 2018-01-22 16:53:05 -05:00
Matthew Honnibal
4ce7d24fd5 Add test for #1799: Set left and right edges (and thus sentences) in non-projective parses. 2018-01-22 20:18:38 +01:00
greg
7072b395c9 Add greedy matcher tests 2018-01-16 15:46:13 -05:00
Matthew Honnibal
ccb51a9f36 Make .similarity() return 1.0 if all orth attrs match 2018-01-15 16:29:48 +01:00
Matthew Honnibal
82135d85b7 Fix test 2018-01-15 15:55:15 +01:00
Matthew Honnibal
4b09616b58 Add test for #1757: Comparison against None 2018-01-15 15:55:01 +01:00
Matthew Honnibal
9e413449f6 Fix unicode error in new test 2018-01-15 15:39:00 +01:00
Matthew Honnibal
6b215d2dd3 Add test for Issue #1537 2018-01-15 15:20:56 +01:00
ines
5babb7d6f6 Merge branch 'master' of https://github.com/explosion/spaCy 2018-01-14 17:31:09 +01:00
ines
793890cb4d Remove test for removed deprecation warning 2018-01-14 17:31:06 +01:00
Matthew Honnibal
1a1cca6052 Fix vectors.resize() on Py3. Closes #1539 2018-01-14 14:48:51 +01:00
Matthew Honnibal
0153220304 Make set_vector add word to vocab. Fixes #1807 2018-01-14 13:57:57 +01:00
Ines Montani
55754f0cee
Merge pull request #1836 from fucking-signup/master
Add tests for issue #1769
2018-01-13 00:23:35 +00:00
Kit
4ee97f20a0
Mark like_num tests as slow 2018-01-13 00:44:15 +01:00
Kit
855531537e
Rewrite tests for issue #1769 2018-01-12 23:49:51 +01:00
Kit
5b541cb5ec
Simplify tests for issue #1769 2018-01-12 23:34:27 +01:00
Kit
7a2adc4633
Remove some tests to see build status changes 2018-01-12 22:49:16 +01:00
Kit
0e62809a43
Rewrite tests for issue #1769 2018-01-12 22:26:06 +01:00
Ines Montani
36f426fe0a
Merge pull request #1808 from fucking-signup/master
Fix issue #1769
2018-01-12 21:12:02 +00:00
Kit
76f4eeca44
Remove tests to see build changes on Windows (Python 2.7) 2018-01-12 20:30:51 +01:00
Kit
7ec0956e8d
Add regression test (issue #1769) 2018-01-08 03:42:04 +01:00
Søren Lind Kristiansen
62de5da1ff Remove unsused dummy variable 2018-01-05 09:57:24 +01:00
Søren Lind Kristiansen
10dab8eef8 Remove dummy variable from function calls 2018-01-05 09:37:05 +01:00
Kevin Humphreys
597df5bf83 add test 2018-01-03 13:00:05 -08:00
Ines Montani
ff9fc945ab
Merge pull request #1749 from sorenlind/da_ud_tokenization
Tune Danish tokenizer to more closely match Universal Dependencies
2017-12-22 16:00:49 +00:00
ines
26f313dabc Fix missing import 2017-12-22 16:21:44 +01:00
ines
8dc1c27841 Merge branch 'master' of https://github.com/explosion/spaCy 2017-12-22 16:01:00 +01:00
ines
b10ba848b8 xfail test that causes MemoryError on Python 2 on Windows
Need to investigate this further!
2017-12-22 16:00:58 +01:00
Ines Montani
a3dd167d7f
Merge branch 'master' into da_ud_tokenization 2017-12-20 21:05:34 +00:00
Ines Montani
d682a8803e
Merge pull request #1672 from cbilgili/master
Adds Turkish Lemmatization
2017-12-20 21:01:00 +00:00
Søren Lind Kristiansen
15d13efafd Tune Danish tokenizer to more closely match tokenization in Universal Dependencies. 2017-12-20 17:36:52 +01:00
Ines Montani
9c1ee65268
Add regression test for #1698 2017-12-12 10:36:11 +01:00
Isaac Sijaranamual
38021fbb00 Switch from python 3 only TemporaryDirectory to pytest's tmpdir 2017-12-11 00:16:04 +01:00
Isaac Sijaranamual
568130ce7c Adds regression test_issue1622 2017-12-10 23:00:48 +01:00
Matthew Honnibal
36b47e3fa6 Fix (and test) vector pickling 2017-12-07 09:53:30 +01:00
Canbey Bilgili
abe098b255 Adds Turkish Lemmatization 2017-12-01 17:04:32 +03:00
Vadim Mazaev
4ba7ddf651 Bugfixies 2017-11-30 12:29:38 +03:00
Matthew Honnibal
6bc0f4d29f
Merge pull request #1611 from fsonntag/master
Solving #1494
2017-11-29 23:11:23 +01:00
Matthew Honnibal
f9ed9ea529
Merge pull request #1624 from GreenRiverRUS/russian
Add support for Russian
2017-11-29 23:10:01 +01:00
ines
a31506e060 Fix off-by-one error in nlp.add_pipe(after=name) (fixes #1654) 2017-11-28 20:37:55 +01:00
ines
b62739fbfe Add regression test for #1654 2017-11-28 20:27:54 +01:00
ines
2e50dbb9d7 Simplify test 2017-11-28 20:27:27 +01:00
Felix Sonntag
724ae7dc55 Fixed issue of infix capturing prefixes 2017-11-28 17:17:12 +01:00
Søren Lind Kristiansen
0ffd27b0f6 Add several Danish alternative spellings 2017-11-27 13:35:41 +01:00
Vadim Mazaev
53e7c38637 Fixed tests depends on pymorphy2 2017-11-26 21:04:44 +03:00
Vadim Mazaev
cacd859dcd Added tag map, fixed tests fails, added more exceptions 2017-11-26 20:54:48 +03:00
Ines Montani
a7bb8f1b42
Merge pull request #1637 from sorenlind/da_tokenization
Improve Danish tokenization
2017-11-26 15:41:38 +00:00
ines
c699aec089 Add offsets_from_biluo_tags helper and tests (see #1626) 2017-11-26 16:38:01 +01:00
Søren Lind Kristiansen
6aa241bcec Add day of month tokenizer exceptions for Danish. 2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen
0c276ed020 Add weekday abbreviations and remove abiguous month abbreviations for Danish. 2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen
056547e989 Add multiple tokenizer exceptions for Danish. 2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen
8dc265ac0c Add test for tokenization of 'i.' for Danish. 2017-11-24 11:29:37 +01:00
Matthew Honnibal
30ba81f881
Merge pull request #1576 from ligser/master
Actually reset caches in pipe [wip]
2017-11-23 12:54:48 +01:00
ines
c90fe92e15 Fix displaCy test 2017-11-22 05:04:39 +01:00
ines
a6f33ac27d Fix displaCy test 2017-11-22 04:19:28 +01:00
Vadim Mazaev
81314f8659 Fixed tokenizer: added char classes; added first lemmatizer and
tokenizer tests
2017-11-21 22:23:59 +03:00
Burton DeWilde
635792997c Add regression test for #1612 2017-11-20 12:05:35 -06:00
ines
d70a64d78b Fix syntax error and formatting in test (see #1617) 2017-11-20 14:01:25 +01:00
ines
17849dee4b Fix French test (see #1617) 2017-11-20 13:59:59 +01:00
Felix Sonntag
8be3392302 Added regression text for 1494 2017-11-19 16:30:35 +01:00
Motoki Wu
b818afaa0e Added failing test for Issue #1207.
The noun chunk iterator should work for `Doc` but not for `Span`.
2017-11-17 17:04:27 -08:00
ines
a3d4dd1a5d Test adding of lots of pipeline components (see #1585)
Just to make sure that there's no error now or in the future with adding a large number of pipeline components.
2017-11-15 17:28:06 +01:00
Roman Domrachev
505c6a2f2f Completely cleanup tokenizer cache
Tokenizer cache can have be different keys than string

That modification can slow down tokenizer and need to be measured
2017-11-15 17:55:48 +03:00
Roman Domrachev
3e21680814 Use safer method to get string without hit 2017-11-14 22:58:46 +03:00
Roman Domrachev
4e378dc4a4 Remove all obsolete code and test only initial problem 2017-11-14 20:45:04 +03:00
Roman
47ce2347b0
Create test that fails when actual cleanup caused 2017-11-14 20:28:13 +03:00
Roman Domrachev
3d247d2bb8 Get back previous testcase 2017-11-14 18:01:37 +03:00
Roman Domrachev
a2745b0e84 StringStore now actually cleaned
Do not lose docs in ref tracking
2017-11-14 17:45:50 +03:00
Roman Domrachev
ee60a52ee7 Fix test imports and last batch cleanup 2017-11-11 11:32:16 +03:00
Roman Domrachev
3c600adf23 Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
ines
ee97fd3cb4 Add regression test for #1547 2017-11-11 00:14:03 +01:00
ines
2df27db671 Add unicode declaration 2017-11-11 00:13:56 +01:00
ines
1c218397f6 Ensure path in Doc.to_disk/from_disk (resolves ##1521)
Also add Doc serialization tests with both Path and string path options
2017-11-09 02:29:03 +01:00
Matthew Honnibal
a5ea0fdf5a Fix #1518: vocab.vectors.resize() didn't work 2017-11-08 22:18:37 +01:00
Matthew Honnibal
4194bc5744 Xfail flakey serialization test 2017-11-08 13:55:13 +01:00
ines
42a0fbf291 Fix textcat simple train example 2017-11-07 01:25:54 +01:00
ines
5f43953536 Move test 2017-11-06 23:14:10 +01:00
Matthew Honnibal
1831dbd065 Add test of simple textcat workflow 2017-11-06 22:04:29 +01:00
Matthew Honnibal
2f7e9f390d Make test less flakey 2017-11-06 17:34:50 +01:00
Matthew Honnibal
407b08017e Make test less flakey 2017-11-06 17:31:40 +01:00
Matthew Honnibal
102f797933 Fix lemma ordering in test 2017-11-06 17:02:17 +01:00
Matthew Honnibal
63c6ae4191 Fix lemmatizer test 2017-11-06 11:57:06 +01:00
Matthew Honnibal
00435d8f0c Add extra beam parsing test 2017-11-05 14:39:57 +01:00
ines
5e7d98f72a Remove test for #1491 2017-11-03 22:10:57 +01:00
ines
718f1c50fb Add regression test for #1491 2017-11-03 21:11:20 +01:00
Matthew Honnibal
144a93c2a5 Back-off to tensor for similarity if no vectors 2017-11-03 20:56:33 +01:00
Matthew Honnibal
d6e831bf89 Fix lemmatizer tests 2017-11-03 19:46:34 +01:00
ines
eef930c73e Assert instead of print 2017-11-03 18:50:57 +01:00
ines
f0986df94b Add test for #1488 (passes on v2.0.0a18?) 2017-11-03 14:44:36 +01:00
Matthew Honnibal
711278b667 Make test less flakey 2017-11-03 14:36:08 +01:00
Matthew Honnibal
0a534ae96a Fix test for backprop d_pad 2017-11-03 14:04:16 +01:00
Matthew Honnibal
a22f96c3f1 Add test for backpropagating padding 2017-11-03 00:48:54 +01:00
ines
3af281a334 Update test model name 2017-11-01 23:02:00 +01:00
ines
8c2260e18c Move span tests to /doc 2017-11-01 16:56:35 +01:00
ines
260cb37224 Catch deprecation warning 2017-11-01 16:49:18 +01:00
ines
5914faafbb Fix .merge tests to not use deprecated API 2017-11-01 16:49:11 +01:00
Matthew Honnibal
9e0ebee81c Add Token.is_sent_start property, so can deprecate Token.sent_start 2017-11-01 13:27:14 +01:00
Matthew Honnibal
c047498f87 Fix vectors test 2017-11-01 13:24:47 +01:00
Matthew Honnibal
86eba61fae Fix token.vector when vectors are missing 2017-11-01 00:47:35 +01:00
Ines Montani
d11659463b
Merge pull request #1152 from jimregan/develop-irish
[WIP] attempt a port from #1147
2017-11-01 00:23:43 +01:00
Jim O'Regan
08b0bfd153 merge 2017-10-31 22:55:59 +00:00
Jim O'Regan
00ecfa5417 Ó, not O 2017-10-31 22:54:42 +00:00
Ines Montani
25b1d6cd91
Fix syntax error 2017-10-31 22:36:03 +01:00
Matthew Honnibal
92dc127569 Fix test for Python 3 2017-10-31 22:21:55 +01:00
Jim O'Regan
fe4b10346a replace example sentence until I get around to adding a punctuation.py 2017-10-31 20:24:53 +00:00
Matthew Honnibal
77d8f5de9a Revise and simplify Vectors class 2017-10-31 18:25:08 +01:00
Jim O'Regan
d4a8160c36 change quotes 2017-10-31 15:15:44 +00:00
Jim O'Regan
34ca59691b no idea what is wrong here 2017-10-31 14:50:13 +00:00
Jim O'Regan
41dd29e48e merge 2017-10-31 14:07:45 +00:00
Matthew Honnibal
cb5217012f Fix vector remapping 2017-10-31 11:40:46 +01:00
Matthew Honnibal
9c11ee4a1c WIP on vectors fixes 2017-10-31 11:22:56 +01:00
Matthew Honnibal
368fdb389a WIP on refactoring and fixing vectors 2017-10-31 02:00:26 +01:00
Explosion Bot
72aea8f105 Update vectors.add() to allow setting keys to rows 2017-10-30 10:03:08 +01:00
Matthew Honnibal
64e4ff7c4b Merge 'tidy-up' changes into branch. Resolve conflicts 2017-10-28 13:16:06 +02:00
Ines Montani
4033e70c71 Merge pull request #1461 from explosion/feature/disable-pipes
💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples
2017-10-27 12:21:40 +02:00
Matthew Honnibal
b0f3ea2200 Fix names of pipeline components
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder     --> Tensorizer
NeuralLabeller         --> MultitaskObjective
2017-10-26 12:38:23 +02:00
ines
de1e5f35d5 Merge branch 'develop' into feature/disable-pipes 2017-10-25 16:33:12 +02:00
ines
c0b55ebdac Fix PhraseMatcher.__contains__ and add more tests 2017-10-25 16:31:11 +02:00
ines
657a4d91bc Merge branch 'develop' into feature/disable-pipes 2017-10-25 15:19:05 +02:00
ines
1a722dac31 Merge branch 'develop' into feature/disable-pipes 2017-10-25 15:18:18 +02:00
Matthew Honnibal
b5de768852 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-25 14:44:16 +02:00
Matthew Honnibal
094512fd47 Fix model-mark on regression test. 2017-10-25 14:44:00 +02:00
Matthew Honnibal
e70f80f29e Add Language.disable_pipes() 2017-10-25 13:46:41 +02:00
Ines Montani
d3bf488e16 Merge pull request #1171 from mollerhoj/support-danish
Improve basic support for Danish
2017-10-24 20:29:57 +02:00
Matthew Honnibal
908809d488 Update tests 2017-10-24 17:05:15 +02:00
Matthew Honnibal
30e67fa808 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-24 16:08:23 +02:00
Matthew Honnibal
63f0bde749 Add test for #1250: Tokenizer cache clobbered special-case attrs 2017-10-24 16:07:18 +02:00
ines
090aed940a Add test for currently failing span.as_doc case 2017-10-24 16:00:56 +02:00
ines
4ef81a9ebc Fix whitespace 2017-10-24 16:00:56 +02:00
Matthew Honnibal
4bea65a1a8 Fix Issue #1450: Off-by-1 in * and ? matches
Patterns that end in variable-length operators e.g. * and ? now end on
the correct token. Previously, they were off by 1: the next token was
pulled into the match, even if that's where the pattern failed.
2017-10-24 14:26:27 +02:00
Matthew Honnibal
391d5ef0d1 Normalize imports in regression test 2017-10-24 14:25:49 +02:00
Matthew Honnibal
b66b8f028b Fix #1375 -- out-of-bounds on token.nbor() 2017-10-24 12:10:39 +02:00
Matthew Honnibal
a68d89a4f3 Add failing test for bug #1375 -- no out-of-bounds error for token.nbor() 2017-10-24 12:05:25 +02:00
Ines Montani
facf77e541 Merge branch 'develop' into support-danish 2017-10-24 11:53:19 +02:00
Matthew Honnibal
ccd2ab1a62 Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix
Add LCA matrix for spans and docs
2017-10-24 11:22:46 +02:00
Matthew Honnibal
ef3e5a361b Merge pull request #1442 from explosion/feature/fix-sp
💫Fix SP tag, tweak Vectors.__init__, fix Morphology
2017-10-24 10:24:07 +02:00
Matthew Honnibal
fdf25d10ba Merge pull request #1440 from ramananbalakrishnan/develop
Support single value for attribute list in doc.to_array
2017-10-24 10:23:12 +02:00
Matthew Honnibal
490ad3eaf0 Check that empty strings are handled. Closes #1242 2017-10-21 00:52:14 +02:00
Ramanan Balakrishnan
d2fe56a577
Add LCA matrix for spans and docs 2017-10-20 23:58:00 +05:30
Matthew Honnibal
d8391b1c4d Fix #1434: Matcher failed on ending ? if no token 2017-10-20 16:49:36 +02:00
Matthew Honnibal
f111b228e0 Fix re-parsing of previously parsed text
If a Doc object had been previously parsed, it was possible for
invalid parses to be added. There were two problems:

1) The parse was only being partially erased
2) The RightArc action was able to create a 1-cycle.

This patch fixes both errors, and avoids resetting the parse if one is
present. In theory this might allow a better parse to be predicted by
running the parser twice.

Closes #1253.
2017-10-20 16:27:36 +02:00
Matthew Honnibal
ebecaddb76 Make 'data_or_width' two keyword args in Vectors.__init__
Previously the data and width options were one argument in Vectors,
which meant you couldn't say vectors = Vectors(strings, width=300).
It's better to have two keywords.
2017-10-20 14:17:15 +02:00
Ramanan Balakrishnan
b3ab124fc5
Support strings for attribute list in doc.to_array 2017-10-20 11:46:57 +05:30
ines
bf415fd778 Add test for serializing extension attrs (see #1085) 2017-10-19 00:53:08 +02:00
Matthew Honnibal
fe844148f6 Test pickling hooks 2017-10-17 19:43:52 +02:00
Matthew Honnibal
374819edf8 Test user_data deserialization, re #1085 2017-10-17 19:28:54 +02:00
Matthew Honnibal
8ca97f32a3 Fix doc pickling test 2017-10-17 18:19:57 +02:00
Matthew Honnibal
45d1dd90b1 Add tests for pickling doc 2017-10-17 17:20:58 +02:00
Matthew Honnibal
4174477161 Fix equality check in test 2017-10-16 19:50:35 +02:00
Matthew Honnibal
010a7309ff Merge pull request #1402 from explosion/feature/fix-matcher-operators
💫 Fix Matcher variable-length operators
2017-10-16 17:53:19 +02:00
Matthew Honnibal
c29927d2e7 Fix matcher test 2017-10-16 17:22:18 +02:00
Matthew Honnibal
a928ae2f35 Merge branch 'develop' into feature/fix-matcher-operators 2017-10-16 13:38:36 +02:00
Matthew Honnibal
748d525801 Add more matcher operator tests 2017-10-16 13:38:01 +02:00
ines
3516aa0cea Port over changes from #1389 2017-10-14 13:32:55 +02:00
ines
cd6a29dce7 Port over changes from #1294 2017-10-14 13:28:46 +02:00
ines
38c756fd85 Port over changes from #1287 2017-10-14 13:16:21 +02:00
ines
612224c10d Port over changes from #1157 2017-10-14 13:11:39 +02:00
ines
9b3f8f9ec3 Fix formatting and add comment on languages 2017-10-14 13:11:18 +02:00
ines
a4d974d97b Port over URL pattern changes from #1411 2017-10-14 12:58:07 +02:00
Matthew Honnibal
cf6da9301a Update lemmatizer test 2017-10-12 22:50:52 +02:00
Matthew Honnibal
462caf835a Fix SBD test 2017-10-12 21:18:22 +02:00
Ines Montani
37aa523a8e Merge pull request #1408 from explosion/feature/dot-underscore
💫 Custom attributes via Doc._, Token._ and Span._
2017-10-11 18:35:56 +02:00
ines
51519251c2 Fix underscore method test 2017-10-11 13:34:19 +02:00
ines
c6ae49e8bf Fix formatting 2017-10-11 13:34:11 +02:00
ines
453c47ca24 Add German lemmatizer tests 2017-10-11 13:27:26 +02:00
ines
15fe0fd82d Fix tests 2017-10-11 13:27:18 +02:00
ines
e0ff145a8b Merge branch 'develop' into feature/dot-underscore 2017-10-11 11:57:05 +02:00
Matthew Honnibal
fd47f8e89f Fix failing test 2017-10-11 08:38:34 +02:00
Matthew Honnibal
462b2e26b4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-11 08:23:04 +02:00
Matthew Honnibal
2c118ab3a6 Add tests for Doc creation 2017-10-11 03:21:23 +02:00
Matthew Honnibal
d84136b4a9 Update add label test 2017-10-10 22:57:41 +02:00
Matthew Honnibal
e0a9b02b67 Merge Span._ and Span.as_doc methods 2017-10-09 22:00:15 -05:00
Matthew Honnibal
09d61ada5e Merge pull request #1396 from explosion/feature/pipeline-management
💫 Improve pipeline and factory management
2017-10-10 04:29:54 +02:00
Matthew Honnibal
f0f2739ae3 Add test for serialization issue raised in #1105 2017-10-10 03:57:58 +02:00
ines
de374dc72a Merge branch 'feature/pipeline-management' into feature/dot-underscore 2017-10-09 14:37:51 +02:00
Matthew Honnibal
2534cd57d7 Add bandaid solution to the 'shadowing' problem in #864 2017-10-09 08:59:35 +02:00
Matthew Honnibal
d8a2506023 Merge pull request #1401 from explosion/feature/add-parser-action
💫 Allow labels to be added to pre-trained parser and NER modes
2017-10-09 04:57:51 +02:00
Matthew Honnibal
689349e32f Merge pull request #1400 from explosion/feature/sentence-parsing
💫 Force parser to respect preset sentence boundaries
2017-10-09 04:31:43 +02:00
Matthew Honnibal
fad2b8315f Merge branch 'develop' into feature/add-parser-action 2017-10-09 04:13:04 +02:00
Matthew Honnibal
6c79841c0d Fix tests for history features 2017-10-09 04:12:24 +02:00
Matthew Honnibal
dde87e6b0d Add tests for adding parser actions 2017-10-09 03:42:35 +02:00
Matthew Honnibal
81a64119db Fix string-to-unicode problem 2017-10-09 00:59:49 +02:00
Matthew Honnibal
02c2af7119 Fix test 2017-10-09 00:29:37 +02:00
Matthew Honnibal
5a67efeccc Add tests for sentence segmentation presetting 2017-10-09 00:02:23 +02:00
Matthew Honnibal
9bd8191739 Add tests for Underscore 2017-10-07 18:56:19 +02:00
Matthew Honnibal
3b67eabfea Allow empty dictionaries to match any token in Matcher
Often patterns need to match "any token". A clean way to denote this
is with the empty dict {}: this sets no constraints on the token,
so should always match.

The problem was that having attributes length==0 was used as an
end-of-array signal, so the matcher didn't handle this case correctly.

This patch compiles empty token spec dicts into a constraint
NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the
lexeme -- so this always matches.
2017-10-07 03:36:15 +02:00
ines
0adadcb3f0 Fix beam parse model test 2017-10-07 02:15:15 +02:00
ines
b38a8f4a94 Fix and update pipe methods tests 2017-10-07 02:06:23 +02:00
Matthew Honnibal
3a65a0c970 Start adding tests for new pipeline management 2017-10-07 01:48:23 +02:00
ines
61a503a611 Fix parser test 2017-10-07 00:38:51 +02:00
Matthew Honnibal
c6cd81f192 Wrap try/except around model saving 2017-10-05 08:14:24 -05:00
Matthew Honnibal
fd4baff475 Update tests 2017-10-05 08:12:27 -05:00
Matthew Honnibal
40edb65ee7 Make test work for Python 2.7 2017-10-04 16:36:50 +02:00
Matthew Honnibal
db05d4d582 Add test for #1380. Passes without fix? 2017-10-04 14:56:31 +02:00
Matthew Honnibal
4a59f6358c Fix thinc imports 2017-10-03 19:21:26 +02:00
Ines Montani
959c46eabe Merge pull request #1365 from wannaphongcom/develop
Add Thai language for spaCy v2
2017-09-26 23:43:05 +02:00
Wannaphong Phatthiyaphaibun
7b5263ffa4 fix thai test 2017-09-26 23:54:15 +07:00