Kit
5b541cb5ec
Simplify tests for issue #1769
2018-01-12 23:34:27 +01:00
Kit
7a2adc4633
Remove some tests to see build status changes
2018-01-12 22:49:16 +01:00
Kit
0e62809a43
Rewrite tests for issue #1769
2018-01-12 22:26:06 +01:00
Ines Montani
36f426fe0a
Merge pull request #1808 from fucking-signup/master
...
Fix issue #1769
2018-01-12 21:12:02 +00:00
Kit
76f4eeca44
Remove tests to see build changes on Windows (Python 2.7)
2018-01-12 20:30:51 +01:00
Kit
7ec0956e8d
Add regression test (issue #1769 )
2018-01-08 03:42:04 +01:00
Søren Lind Kristiansen
62de5da1ff
Remove unsused dummy variable
2018-01-05 09:57:24 +01:00
Søren Lind Kristiansen
10dab8eef8
Remove dummy variable from function calls
2018-01-05 09:37:05 +01:00
Kevin Humphreys
597df5bf83
add test
2018-01-03 13:00:05 -08:00
Ines Montani
ff9fc945ab
Merge pull request #1749 from sorenlind/da_ud_tokenization
...
Tune Danish tokenizer to more closely match Universal Dependencies
2017-12-22 16:00:49 +00:00
ines
26f313dabc
Fix missing import
2017-12-22 16:21:44 +01:00
ines
8dc1c27841
Merge branch 'master' of https://github.com/explosion/spaCy
2017-12-22 16:01:00 +01:00
ines
b10ba848b8
xfail test that causes MemoryError on Python 2 on Windows
...
Need to investigate this further!
2017-12-22 16:00:58 +01:00
Ines Montani
a3dd167d7f
Merge branch 'master' into da_ud_tokenization
2017-12-20 21:05:34 +00:00
Ines Montani
d682a8803e
Merge pull request #1672 from cbilgili/master
...
Adds Turkish Lemmatization
2017-12-20 21:01:00 +00:00
Søren Lind Kristiansen
15d13efafd
Tune Danish tokenizer to more closely match tokenization in Universal Dependencies.
2017-12-20 17:36:52 +01:00
Ines Montani
9c1ee65268
Add regression test for #1698
2017-12-12 10:36:11 +01:00
Isaac Sijaranamual
38021fbb00
Switch from python 3 only TemporaryDirectory to pytest's tmpdir
2017-12-11 00:16:04 +01:00
Isaac Sijaranamual
568130ce7c
Adds regression test_issue1622
2017-12-10 23:00:48 +01:00
Matthew Honnibal
36b47e3fa6
Fix (and test) vector pickling
2017-12-07 09:53:30 +01:00
Canbey Bilgili
abe098b255
Adds Turkish Lemmatization
2017-12-01 17:04:32 +03:00
Vadim Mazaev
4ba7ddf651
Bugfixies
2017-11-30 12:29:38 +03:00
Matthew Honnibal
6bc0f4d29f
Merge pull request #1611 from fsonntag/master
...
Solving #1494
2017-11-29 23:11:23 +01:00
Matthew Honnibal
f9ed9ea529
Merge pull request #1624 from GreenRiverRUS/russian
...
Add support for Russian
2017-11-29 23:10:01 +01:00
ines
a31506e060
Fix off-by-one error in nlp.add_pipe(after=name) ( fixes #1654 )
2017-11-28 20:37:55 +01:00
ines
b62739fbfe
Add regression test for #1654
2017-11-28 20:27:54 +01:00
ines
2e50dbb9d7
Simplify test
2017-11-28 20:27:27 +01:00
Felix Sonntag
724ae7dc55
Fixed issue of infix capturing prefixes
2017-11-28 17:17:12 +01:00
Søren Lind Kristiansen
0ffd27b0f6
Add several Danish alternative spellings
2017-11-27 13:35:41 +01:00
Vadim Mazaev
53e7c38637
Fixed tests depends on pymorphy2
2017-11-26 21:04:44 +03:00
Vadim Mazaev
cacd859dcd
Added tag map, fixed tests fails, added more exceptions
2017-11-26 20:54:48 +03:00
Ines Montani
a7bb8f1b42
Merge pull request #1637 from sorenlind/da_tokenization
...
Improve Danish tokenization
2017-11-26 15:41:38 +00:00
ines
c699aec089
Add offsets_from_biluo_tags helper and tests (see #1626 )
2017-11-26 16:38:01 +01:00
Søren Lind Kristiansen
6aa241bcec
Add day of month tokenizer exceptions for Danish.
2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen
0c276ed020
Add weekday abbreviations and remove abiguous month abbreviations for Danish.
2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen
056547e989
Add multiple tokenizer exceptions for Danish.
2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen
8dc265ac0c
Add test for tokenization of 'i.' for Danish.
2017-11-24 11:29:37 +01:00
Matthew Honnibal
30ba81f881
Merge pull request #1576 from ligser/master
...
Actually reset caches in pipe [wip]
2017-11-23 12:54:48 +01:00
ines
c90fe92e15
Fix displaCy test
2017-11-22 05:04:39 +01:00
ines
a6f33ac27d
Fix displaCy test
2017-11-22 04:19:28 +01:00
Vadim Mazaev
81314f8659
Fixed tokenizer: added char classes; added first lemmatizer and
...
tokenizer tests
2017-11-21 22:23:59 +03:00
Burton DeWilde
635792997c
Add regression test for #1612
2017-11-20 12:05:35 -06:00
ines
d70a64d78b
Fix syntax error and formatting in test (see #1617 )
2017-11-20 14:01:25 +01:00
ines
17849dee4b
Fix French test (see #1617 )
2017-11-20 13:59:59 +01:00
Felix Sonntag
8be3392302
Added regression text for 1494
2017-11-19 16:30:35 +01:00
Motoki Wu
b818afaa0e
Added failing test for Issue #1207 .
...
The noun chunk iterator should work for `Doc` but not for `Span`.
2017-11-17 17:04:27 -08:00
ines
a3d4dd1a5d
Test adding of lots of pipeline components (see #1585 )
...
Just to make sure that there's no error now or in the future with adding a large number of pipeline components.
2017-11-15 17:28:06 +01:00
Roman Domrachev
505c6a2f2f
Completely cleanup tokenizer cache
...
Tokenizer cache can have be different keys than string
That modification can slow down tokenizer and need to be measured
2017-11-15 17:55:48 +03:00
Roman Domrachev
3e21680814
Use safer method to get string without hit
2017-11-14 22:58:46 +03:00
Roman Domrachev
4e378dc4a4
Remove all obsolete code and test only initial problem
2017-11-14 20:45:04 +03:00
Roman
47ce2347b0
Create test that fails when actual cleanup caused
2017-11-14 20:28:13 +03:00
Roman Domrachev
3d247d2bb8
Get back previous testcase
2017-11-14 18:01:37 +03:00
Roman Domrachev
a2745b0e84
StringStore now actually cleaned
...
Do not lose docs in ref tracking
2017-11-14 17:45:50 +03:00
Roman Domrachev
ee60a52ee7
Fix test imports and last batch cleanup
2017-11-11 11:32:16 +03:00
Roman Domrachev
3c600adf23
Try to fix StringStore clean up (see #1506 )
2017-11-11 03:11:27 +03:00
ines
ee97fd3cb4
Add regression test for #1547
2017-11-11 00:14:03 +01:00
ines
2df27db671
Add unicode declaration
2017-11-11 00:13:56 +01:00
ines
1c218397f6
Ensure path in Doc.to_disk/from_disk (resolves ##1521)
...
Also add Doc serialization tests with both Path and string path options
2017-11-09 02:29:03 +01:00
Matthew Honnibal
a5ea0fdf5a
Fix #1518 : vocab.vectors.resize() didn't work
2017-11-08 22:18:37 +01:00
Matthew Honnibal
4194bc5744
Xfail flakey serialization test
2017-11-08 13:55:13 +01:00
ines
42a0fbf291
Fix textcat simple train example
2017-11-07 01:25:54 +01:00
ines
5f43953536
Move test
2017-11-06 23:14:10 +01:00
Matthew Honnibal
1831dbd065
Add test of simple textcat workflow
2017-11-06 22:04:29 +01:00
Matthew Honnibal
2f7e9f390d
Make test less flakey
2017-11-06 17:34:50 +01:00
Matthew Honnibal
407b08017e
Make test less flakey
2017-11-06 17:31:40 +01:00
Matthew Honnibal
102f797933
Fix lemma ordering in test
2017-11-06 17:02:17 +01:00
Matthew Honnibal
63c6ae4191
Fix lemmatizer test
2017-11-06 11:57:06 +01:00
Matthew Honnibal
00435d8f0c
Add extra beam parsing test
2017-11-05 14:39:57 +01:00
ines
5e7d98f72a
Remove test for #1491
2017-11-03 22:10:57 +01:00
ines
718f1c50fb
Add regression test for #1491
2017-11-03 21:11:20 +01:00
Matthew Honnibal
144a93c2a5
Back-off to tensor for similarity if no vectors
2017-11-03 20:56:33 +01:00
Matthew Honnibal
d6e831bf89
Fix lemmatizer tests
2017-11-03 19:46:34 +01:00
ines
eef930c73e
Assert instead of print
2017-11-03 18:50:57 +01:00
ines
f0986df94b
Add test for #1488 (passes on v2.0.0a18?)
2017-11-03 14:44:36 +01:00
Matthew Honnibal
711278b667
Make test less flakey
2017-11-03 14:36:08 +01:00
Matthew Honnibal
0a534ae96a
Fix test for backprop d_pad
2017-11-03 14:04:16 +01:00
Matthew Honnibal
a22f96c3f1
Add test for backpropagating padding
2017-11-03 00:48:54 +01:00
ines
3af281a334
Update test model name
2017-11-01 23:02:00 +01:00
ines
8c2260e18c
Move span tests to /doc
2017-11-01 16:56:35 +01:00
ines
260cb37224
Catch deprecation warning
2017-11-01 16:49:18 +01:00
ines
5914faafbb
Fix .merge tests to not use deprecated API
2017-11-01 16:49:11 +01:00
Matthew Honnibal
9e0ebee81c
Add Token.is_sent_start property, so can deprecate Token.sent_start
2017-11-01 13:27:14 +01:00
Matthew Honnibal
c047498f87
Fix vectors test
2017-11-01 13:24:47 +01:00
Matthew Honnibal
86eba61fae
Fix token.vector when vectors are missing
2017-11-01 00:47:35 +01:00
Ines Montani
d11659463b
Merge pull request #1152 from jimregan/develop-irish
...
[WIP] attempt a port from #1147
2017-11-01 00:23:43 +01:00
Jim O'Regan
08b0bfd153
merge
2017-10-31 22:55:59 +00:00
Jim O'Regan
00ecfa5417
Ó, not O
2017-10-31 22:54:42 +00:00
Ines Montani
25b1d6cd91
Fix syntax error
2017-10-31 22:36:03 +01:00
Matthew Honnibal
92dc127569
Fix test for Python 3
2017-10-31 22:21:55 +01:00
Jim O'Regan
fe4b10346a
replace example sentence until I get around to adding a punctuation.py
2017-10-31 20:24:53 +00:00
Matthew Honnibal
77d8f5de9a
Revise and simplify Vectors class
2017-10-31 18:25:08 +01:00
Jim O'Regan
d4a8160c36
change quotes
2017-10-31 15:15:44 +00:00
Jim O'Regan
34ca59691b
no idea what is wrong here
2017-10-31 14:50:13 +00:00
Jim O'Regan
41dd29e48e
merge
2017-10-31 14:07:45 +00:00
Matthew Honnibal
cb5217012f
Fix vector remapping
2017-10-31 11:40:46 +01:00
Matthew Honnibal
9c11ee4a1c
WIP on vectors fixes
2017-10-31 11:22:56 +01:00
Matthew Honnibal
368fdb389a
WIP on refactoring and fixing vectors
2017-10-31 02:00:26 +01:00
Explosion Bot
72aea8f105
Update vectors.add() to allow setting keys to rows
2017-10-30 10:03:08 +01:00
Matthew Honnibal
64e4ff7c4b
Merge 'tidy-up' changes into branch. Resolve conflicts
2017-10-28 13:16:06 +02:00
Ines Montani
4033e70c71
Merge pull request #1461 from explosion/feature/disable-pipes
...
💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples
2017-10-27 12:21:40 +02:00
Matthew Honnibal
b0f3ea2200
Fix names of pipeline components
...
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder --> Tensorizer
NeuralLabeller --> MultitaskObjective
2017-10-26 12:38:23 +02:00
ines
de1e5f35d5
Merge branch 'develop' into feature/disable-pipes
2017-10-25 16:33:12 +02:00
ines
c0b55ebdac
Fix PhraseMatcher.__contains__ and add more tests
2017-10-25 16:31:11 +02:00
ines
657a4d91bc
Merge branch 'develop' into feature/disable-pipes
2017-10-25 15:19:05 +02:00
ines
1a722dac31
Merge branch 'develop' into feature/disable-pipes
2017-10-25 15:18:18 +02:00
Matthew Honnibal
b5de768852
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-25 14:44:16 +02:00
Matthew Honnibal
094512fd47
Fix model-mark on regression test.
2017-10-25 14:44:00 +02:00
Matthew Honnibal
e70f80f29e
Add Language.disable_pipes()
2017-10-25 13:46:41 +02:00
Ines Montani
d3bf488e16
Merge pull request #1171 from mollerhoj/support-danish
...
Improve basic support for Danish
2017-10-24 20:29:57 +02:00
Matthew Honnibal
908809d488
Update tests
2017-10-24 17:05:15 +02:00
Matthew Honnibal
30e67fa808
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-24 16:08:23 +02:00
Matthew Honnibal
63f0bde749
Add test for #1250 : Tokenizer cache clobbered special-case attrs
2017-10-24 16:07:18 +02:00
ines
090aed940a
Add test for currently failing span.as_doc case
2017-10-24 16:00:56 +02:00
ines
4ef81a9ebc
Fix whitespace
2017-10-24 16:00:56 +02:00
Matthew Honnibal
4bea65a1a8
Fix Issue #1450 : Off-by-1 in * and ? matches
...
Patterns that end in variable-length operators e.g. * and ? now end on
the correct token. Previously, they were off by 1: the next token was
pulled into the match, even if that's where the pattern failed.
2017-10-24 14:26:27 +02:00
Matthew Honnibal
391d5ef0d1
Normalize imports in regression test
2017-10-24 14:25:49 +02:00
Matthew Honnibal
b66b8f028b
Fix #1375 -- out-of-bounds on token.nbor()
2017-10-24 12:10:39 +02:00
Matthew Honnibal
a68d89a4f3
Add failing test for bug #1375 -- no out-of-bounds error for token.nbor()
2017-10-24 12:05:25 +02:00
Ines Montani
facf77e541
Merge branch 'develop' into support-danish
2017-10-24 11:53:19 +02:00
Matthew Honnibal
ccd2ab1a62
Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix
...
Add LCA matrix for spans and docs
2017-10-24 11:22:46 +02:00
Matthew Honnibal
ef3e5a361b
Merge pull request #1442 from explosion/feature/fix-sp
...
💫 Fix SP tag, tweak Vectors.__init__, fix Morphology
2017-10-24 10:24:07 +02:00
Matthew Honnibal
fdf25d10ba
Merge pull request #1440 from ramananbalakrishnan/develop
...
Support single value for attribute list in doc.to_array
2017-10-24 10:23:12 +02:00
Matthew Honnibal
490ad3eaf0
Check that empty strings are handled. Closes #1242
2017-10-21 00:52:14 +02:00
Ramanan Balakrishnan
d2fe56a577
Add LCA matrix for spans and docs
2017-10-20 23:58:00 +05:30
Matthew Honnibal
d8391b1c4d
Fix #1434 : Matcher failed on ending ? if no token
2017-10-20 16:49:36 +02:00
Matthew Honnibal
f111b228e0
Fix re-parsing of previously parsed text
...
If a Doc object had been previously parsed, it was possible for
invalid parses to be added. There were two problems:
1) The parse was only being partially erased
2) The RightArc action was able to create a 1-cycle.
This patch fixes both errors, and avoids resetting the parse if one is
present. In theory this might allow a better parse to be predicted by
running the parser twice.
Closes #1253 .
2017-10-20 16:27:36 +02:00
Matthew Honnibal
ebecaddb76
Make 'data_or_width' two keyword args in Vectors.__init__
...
Previously the data and width options were one argument in Vectors,
which meant you couldn't say vectors = Vectors(strings, width=300).
It's better to have two keywords.
2017-10-20 14:17:15 +02:00
Ramanan Balakrishnan
b3ab124fc5
Support strings for attribute list in doc.to_array
2017-10-20 11:46:57 +05:30
ines
bf415fd778
Add test for serializing extension attrs (see #1085 )
2017-10-19 00:53:08 +02:00
Matthew Honnibal
fe844148f6
Test pickling hooks
2017-10-17 19:43:52 +02:00
Matthew Honnibal
374819edf8
Test user_data deserialization, re #1085
2017-10-17 19:28:54 +02:00
Matthew Honnibal
8ca97f32a3
Fix doc pickling test
2017-10-17 18:19:57 +02:00
Matthew Honnibal
45d1dd90b1
Add tests for pickling doc
2017-10-17 17:20:58 +02:00
Matthew Honnibal
4174477161
Fix equality check in test
2017-10-16 19:50:35 +02:00
Matthew Honnibal
010a7309ff
Merge pull request #1402 from explosion/feature/fix-matcher-operators
...
💫 Fix Matcher variable-length operators
2017-10-16 17:53:19 +02:00
Matthew Honnibal
c29927d2e7
Fix matcher test
2017-10-16 17:22:18 +02:00
Matthew Honnibal
a928ae2f35
Merge branch 'develop' into feature/fix-matcher-operators
2017-10-16 13:38:36 +02:00
Matthew Honnibal
748d525801
Add more matcher operator tests
2017-10-16 13:38:01 +02:00
ines
3516aa0cea
Port over changes from #1389
2017-10-14 13:32:55 +02:00
ines
cd6a29dce7
Port over changes from #1294
2017-10-14 13:28:46 +02:00
ines
38c756fd85
Port over changes from #1287
2017-10-14 13:16:21 +02:00
ines
612224c10d
Port over changes from #1157
2017-10-14 13:11:39 +02:00
ines
9b3f8f9ec3
Fix formatting and add comment on languages
2017-10-14 13:11:18 +02:00
ines
a4d974d97b
Port over URL pattern changes from #1411
2017-10-14 12:58:07 +02:00
Matthew Honnibal
cf6da9301a
Update lemmatizer test
2017-10-12 22:50:52 +02:00
Matthew Honnibal
462caf835a
Fix SBD test
2017-10-12 21:18:22 +02:00
Ines Montani
37aa523a8e
Merge pull request #1408 from explosion/feature/dot-underscore
...
💫 Custom attributes via Doc._, Token._ and Span._
2017-10-11 18:35:56 +02:00
ines
51519251c2
Fix underscore method test
2017-10-11 13:34:19 +02:00
ines
c6ae49e8bf
Fix formatting
2017-10-11 13:34:11 +02:00
ines
453c47ca24
Add German lemmatizer tests
2017-10-11 13:27:26 +02:00