Matthew Honnibal
9c11ee4a1c
WIP on vectors fixes
2017-10-31 11:22:56 +01:00
Matthew Honnibal
368fdb389a
WIP on refactoring and fixing vectors
2017-10-31 02:00:26 +01:00
Explosion Bot
72aea8f105
Update vectors.add() to allow setting keys to rows
2017-10-30 10:03:08 +01:00
Matthew Honnibal
64e4ff7c4b
Merge 'tidy-up' changes into branch. Resolve conflicts
2017-10-28 13:16:06 +02:00
Ines Montani
4033e70c71
Merge pull request #1461 from explosion/feature/disable-pipes
...
💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples
2017-10-27 12:21:40 +02:00
Matthew Honnibal
b0f3ea2200
Fix names of pipeline components
...
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder --> Tensorizer
NeuralLabeller --> MultitaskObjective
2017-10-26 12:38:23 +02:00
ines
de1e5f35d5
Merge branch 'develop' into feature/disable-pipes
2017-10-25 16:33:12 +02:00
ines
c0b55ebdac
Fix PhraseMatcher.__contains__ and add more tests
2017-10-25 16:31:11 +02:00
ines
657a4d91bc
Merge branch 'develop' into feature/disable-pipes
2017-10-25 15:19:05 +02:00
ines
1a722dac31
Merge branch 'develop' into feature/disable-pipes
2017-10-25 15:18:18 +02:00
Matthew Honnibal
b5de768852
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-25 14:44:16 +02:00
Matthew Honnibal
094512fd47
Fix model-mark on regression test.
2017-10-25 14:44:00 +02:00
Matthew Honnibal
e70f80f29e
Add Language.disable_pipes()
2017-10-25 13:46:41 +02:00
Ines Montani
d3bf488e16
Merge pull request #1171 from mollerhoj/support-danish
...
Improve basic support for Danish
2017-10-24 20:29:57 +02:00
Matthew Honnibal
908809d488
Update tests
2017-10-24 17:05:15 +02:00
Matthew Honnibal
30e67fa808
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-24 16:08:23 +02:00
Matthew Honnibal
63f0bde749
Add test for #1250 : Tokenizer cache clobbered special-case attrs
2017-10-24 16:07:18 +02:00
ines
090aed940a
Add test for currently failing span.as_doc case
2017-10-24 16:00:56 +02:00
ines
4ef81a9ebc
Fix whitespace
2017-10-24 16:00:56 +02:00
Matthew Honnibal
4bea65a1a8
Fix Issue #1450 : Off-by-1 in * and ? matches
...
Patterns that end in variable-length operators e.g. * and ? now end on
the correct token. Previously, they were off by 1: the next token was
pulled into the match, even if that's where the pattern failed.
2017-10-24 14:26:27 +02:00
Matthew Honnibal
391d5ef0d1
Normalize imports in regression test
2017-10-24 14:25:49 +02:00
Matthew Honnibal
b66b8f028b
Fix #1375 -- out-of-bounds on token.nbor()
2017-10-24 12:10:39 +02:00
Matthew Honnibal
a68d89a4f3
Add failing test for bug #1375 -- no out-of-bounds error for token.nbor()
2017-10-24 12:05:25 +02:00
Ines Montani
facf77e541
Merge branch 'develop' into support-danish
2017-10-24 11:53:19 +02:00
Matthew Honnibal
ccd2ab1a62
Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix
...
Add LCA matrix for spans and docs
2017-10-24 11:22:46 +02:00
Matthew Honnibal
ef3e5a361b
Merge pull request #1442 from explosion/feature/fix-sp
...
💫 Fix SP tag, tweak Vectors.__init__, fix Morphology
2017-10-24 10:24:07 +02:00
Matthew Honnibal
fdf25d10ba
Merge pull request #1440 from ramananbalakrishnan/develop
...
Support single value for attribute list in doc.to_array
2017-10-24 10:23:12 +02:00
Matthew Honnibal
490ad3eaf0
Check that empty strings are handled. Closes #1242
2017-10-21 00:52:14 +02:00
Ramanan Balakrishnan
d2fe56a577
Add LCA matrix for spans and docs
2017-10-20 23:58:00 +05:30
Matthew Honnibal
d8391b1c4d
Fix #1434 : Matcher failed on ending ? if no token
2017-10-20 16:49:36 +02:00
Matthew Honnibal
f111b228e0
Fix re-parsing of previously parsed text
...
If a Doc object had been previously parsed, it was possible for
invalid parses to be added. There were two problems:
1) The parse was only being partially erased
2) The RightArc action was able to create a 1-cycle.
This patch fixes both errors, and avoids resetting the parse if one is
present. In theory this might allow a better parse to be predicted by
running the parser twice.
Closes #1253 .
2017-10-20 16:27:36 +02:00
Matthew Honnibal
ebecaddb76
Make 'data_or_width' two keyword args in Vectors.__init__
...
Previously the data and width options were one argument in Vectors,
which meant you couldn't say vectors = Vectors(strings, width=300).
It's better to have two keywords.
2017-10-20 14:17:15 +02:00
Ramanan Balakrishnan
b3ab124fc5
Support strings for attribute list in doc.to_array
2017-10-20 11:46:57 +05:30
ines
bf415fd778
Add test for serializing extension attrs (see #1085 )
2017-10-19 00:53:08 +02:00
Matthew Honnibal
fe844148f6
Test pickling hooks
2017-10-17 19:43:52 +02:00
Matthew Honnibal
374819edf8
Test user_data deserialization, re #1085
2017-10-17 19:28:54 +02:00
Matthew Honnibal
8ca97f32a3
Fix doc pickling test
2017-10-17 18:19:57 +02:00
Matthew Honnibal
45d1dd90b1
Add tests for pickling doc
2017-10-17 17:20:58 +02:00
Matthew Honnibal
4174477161
Fix equality check in test
2017-10-16 19:50:35 +02:00
Matthew Honnibal
010a7309ff
Merge pull request #1402 from explosion/feature/fix-matcher-operators
...
💫 Fix Matcher variable-length operators
2017-10-16 17:53:19 +02:00
Matthew Honnibal
c29927d2e7
Fix matcher test
2017-10-16 17:22:18 +02:00
Matthew Honnibal
a928ae2f35
Merge branch 'develop' into feature/fix-matcher-operators
2017-10-16 13:38:36 +02:00
Matthew Honnibal
748d525801
Add more matcher operator tests
2017-10-16 13:38:01 +02:00
ines
3516aa0cea
Port over changes from #1389
2017-10-14 13:32:55 +02:00
ines
cd6a29dce7
Port over changes from #1294
2017-10-14 13:28:46 +02:00
ines
38c756fd85
Port over changes from #1287
2017-10-14 13:16:21 +02:00
ines
612224c10d
Port over changes from #1157
2017-10-14 13:11:39 +02:00
ines
9b3f8f9ec3
Fix formatting and add comment on languages
2017-10-14 13:11:18 +02:00
ines
a4d974d97b
Port over URL pattern changes from #1411
2017-10-14 12:58:07 +02:00
Matthew Honnibal
cf6da9301a
Update lemmatizer test
2017-10-12 22:50:52 +02:00
Matthew Honnibal
462caf835a
Fix SBD test
2017-10-12 21:18:22 +02:00
Ines Montani
37aa523a8e
Merge pull request #1408 from explosion/feature/dot-underscore
...
💫 Custom attributes via Doc._, Token._ and Span._
2017-10-11 18:35:56 +02:00
ines
51519251c2
Fix underscore method test
2017-10-11 13:34:19 +02:00
ines
c6ae49e8bf
Fix formatting
2017-10-11 13:34:11 +02:00
ines
453c47ca24
Add German lemmatizer tests
2017-10-11 13:27:26 +02:00
ines
15fe0fd82d
Fix tests
2017-10-11 13:27:18 +02:00
ines
e0ff145a8b
Merge branch 'develop' into feature/dot-underscore
2017-10-11 11:57:05 +02:00
Matthew Honnibal
fd47f8e89f
Fix failing test
2017-10-11 08:38:34 +02:00
Matthew Honnibal
462b2e26b4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-11 08:23:04 +02:00
Matthew Honnibal
2c118ab3a6
Add tests for Doc creation
2017-10-11 03:21:23 +02:00
Matthew Honnibal
d84136b4a9
Update add label test
2017-10-10 22:57:41 +02:00
Matthew Honnibal
e0a9b02b67
Merge Span._ and Span.as_doc methods
2017-10-09 22:00:15 -05:00
Matthew Honnibal
09d61ada5e
Merge pull request #1396 from explosion/feature/pipeline-management
...
💫 Improve pipeline and factory management
2017-10-10 04:29:54 +02:00
Matthew Honnibal
f0f2739ae3
Add test for serialization issue raised in #1105
2017-10-10 03:57:58 +02:00
ines
de374dc72a
Merge branch 'feature/pipeline-management' into feature/dot-underscore
2017-10-09 14:37:51 +02:00
Matthew Honnibal
2534cd57d7
Add bandaid solution to the 'shadowing' problem in #864
2017-10-09 08:59:35 +02:00
Matthew Honnibal
d8a2506023
Merge pull request #1401 from explosion/feature/add-parser-action
...
💫 Allow labels to be added to pre-trained parser and NER modes
2017-10-09 04:57:51 +02:00
Matthew Honnibal
689349e32f
Merge pull request #1400 from explosion/feature/sentence-parsing
...
💫 Force parser to respect preset sentence boundaries
2017-10-09 04:31:43 +02:00
Matthew Honnibal
fad2b8315f
Merge branch 'develop' into feature/add-parser-action
2017-10-09 04:13:04 +02:00
Matthew Honnibal
6c79841c0d
Fix tests for history features
2017-10-09 04:12:24 +02:00
Matthew Honnibal
dde87e6b0d
Add tests for adding parser actions
2017-10-09 03:42:35 +02:00
Matthew Honnibal
81a64119db
Fix string-to-unicode problem
2017-10-09 00:59:49 +02:00
Matthew Honnibal
02c2af7119
Fix test
2017-10-09 00:29:37 +02:00
Matthew Honnibal
5a67efeccc
Add tests for sentence segmentation presetting
2017-10-09 00:02:23 +02:00
Matthew Honnibal
9bd8191739
Add tests for Underscore
2017-10-07 18:56:19 +02:00
Matthew Honnibal
3b67eabfea
Allow empty dictionaries to match any token in Matcher
...
Often patterns need to match "any token". A clean way to denote this
is with the empty dict {}: this sets no constraints on the token,
so should always match.
The problem was that having attributes length==0 was used as an
end-of-array signal, so the matcher didn't handle this case correctly.
This patch compiles empty token spec dicts into a constraint
NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the
lexeme -- so this always matches.
2017-10-07 03:36:15 +02:00
ines
0adadcb3f0
Fix beam parse model test
2017-10-07 02:15:15 +02:00
ines
b38a8f4a94
Fix and update pipe methods tests
2017-10-07 02:06:23 +02:00
Matthew Honnibal
3a65a0c970
Start adding tests for new pipeline management
2017-10-07 01:48:23 +02:00
ines
61a503a611
Fix parser test
2017-10-07 00:38:51 +02:00
Matthew Honnibal
c6cd81f192
Wrap try/except around model saving
2017-10-05 08:14:24 -05:00
Matthew Honnibal
fd4baff475
Update tests
2017-10-05 08:12:27 -05:00
Matthew Honnibal
40edb65ee7
Make test work for Python 2.7
2017-10-04 16:36:50 +02:00
Matthew Honnibal
db05d4d582
Add test for #1380 . Passes without fix?
2017-10-04 14:56:31 +02:00
Matthew Honnibal
4a59f6358c
Fix thinc imports
2017-10-03 19:21:26 +02:00
Ines Montani
959c46eabe
Merge pull request #1365 from wannaphongcom/develop
...
Add Thai language for spaCy v2
2017-09-26 23:43:05 +02:00
Wannaphong Phatthiyaphaibun
7b5263ffa4
fix thai test
2017-09-26 23:54:15 +07:00
Matthew Honnibal
41cc5c4c17
Merge branch 'develop' into feature/phrasematcher
2017-09-26 09:59:17 -05:00
Wannaphong Phatthiyaphaibun
5cba67146c
add thai in spacy2
2017-09-26 21:36:27 +07:00
Matthew Honnibal
74f08e1ad5
Update test
2017-09-26 06:45:56 -05:00
Matthew Honnibal
20193371f5
Don't share CNN, to reduce complexities
2017-09-21 14:59:48 +02:00
Matthew Honnibal
cc408fc189
Make PhraseMatcher API like Matcher API
2017-09-20 22:20:35 +02:00
Matthew Honnibal
43ad250dd5
Update matcher tests
2017-09-20 21:54:49 +02:00
Matthew Honnibal
c013e5996f
Fix parser test
2017-09-17 13:13:20 -05:00
ines
ece30c28a8
Don't split hyphenated words in German
...
This way, the tokenizer matches the tokenization in German treebanks
2017-09-16 20:40:15 +02:00
Matthew Honnibal
ebf8942564
Fix test for Python3
2017-09-16 16:22:38 +02:00
Matthew Honnibal
8c945310fb
Excuse emoji failure on narrow unicode builds
2017-09-16 16:21:13 +02:00
Matthew Honnibal
3fa5b40b5c
Add test for hash consistency
2017-09-16 11:21:35 +02:00
Jim O'Regan
7de709483b
missed adding here
2017-09-11 10:51:21 +01:00
Jim O'Regan
b1b6123867
add ga_tokenizer
2017-09-11 10:31:41 +01:00
Jim O'Regan
187be6d372
copy/paste error
2017-09-11 09:33:17 +01:00
Jim O'Regan
c283e9edfe
first stab at test
2017-09-11 08:57:48 +01:00
Matthew Honnibal
456bb8a74c
Unxfail and close #1305
2017-09-06 19:14:17 +02:00
Matthew Honnibal
99e44fbdbb
Update regression test
2017-09-06 19:13:51 +02:00
Matthew Honnibal
497a9308a8
Xfail new lemmatizer test
2017-09-06 18:41:22 +02:00
Matthew Honnibal
5384fff5ce
Add test for 1305: Incorrect lemmatization of VBZ for English
2017-09-06 18:40:18 +02:00
Matthew Honnibal
d5fbf27335
Fix test
2017-09-04 16:45:11 +02:00
Matthew Honnibal
cb4839033c
Fix loader for EN tests
2017-09-04 15:19:18 +02:00
Matthew Honnibal
644d6c9e1a
Improve lemmatization tests, re #1296
2017-09-04 15:17:44 +02:00
Jim Geovedi
fbc62a09c7
added {pre,suf,in}fix tests
2017-08-20 13:43:00 +07:00
Jim Geovedi
713d7c0aa0
added indonesian lang test
2017-08-20 12:17:14 +07:00
Jim Geovedi
fa544e6c9a
Merge remote-tracking branch 'upstream/develop' into indonesian
2017-08-20 11:49:40 +07:00
Matthew Honnibal
41c2218c53
Fix test for vectors
2017-08-19 22:09:12 +02:00
Matthew Honnibal
ef87562741
Restore vectors test utils
2017-08-19 20:35:16 +02:00
Matthew Honnibal
1391f9da37
Restore vectors tests
2017-08-19 20:34:58 +02:00
Matthew Honnibal
d55d6e1cfa
Fix comparison of Token from different docs. Closes #1257
2017-08-19 16:39:32 +02:00
Matthew Honnibal
4fda02c7e6
Add test for new Span.to_array method
2017-08-19 16:24:38 +02:00
Matthew Honnibal
c606b4a42c
Add test for Doc.char_span
2017-08-19 16:18:23 +02:00
Matthew Honnibal
42d47c1e5c
Fix tagger serialization
2017-08-19 04:16:32 +02:00
Matthew Honnibal
2da96a0ec7
Fix beam test
2017-08-19 04:15:46 +02:00
Matthew Honnibal
a7309a217d
Update tagger serialization
2017-08-18 23:12:05 +02:00
Matthew Honnibal
de7e8703e3
Restore tests for beam parser
2017-08-18 22:27:42 +02:00
Matthew Honnibal
52c180ecf5
Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
...
This reverts commit ea8de11ad5
, reversing
changes made to 08e443e083
.
2017-08-14 13:00:23 +02:00
Matthew Honnibal
92ebab6073
Update beam-update tests
2017-08-13 08:56:02 +02:00
Matthew Honnibal
24b45b45c6
Add test for beam update
2017-08-12 17:15:28 -05:00
Matthew Honnibal
b353e4d843
Work on parser beam training
2017-08-12 14:47:45 -05:00
Jim Geovedi
cc4772cac2
reworks
2017-08-03 13:08:38 +07:00
Jim Geovedi
783f7d8b86
added test set for Indonesian language
2017-07-29 18:21:07 +07:00
Matthew Honnibal
d6a5c2c85a
Add test for NER
2017-07-22 01:48:58 +02:00
Matthew Honnibal
28244df4da
Add test for beam parsing
2017-07-22 01:48:35 +02:00
Matthew Honnibal
2424493970
Remove unnecessary import of Mock
2017-07-22 01:13:54 +02:00
Matthew Honnibal
289f23df51
Test beam parsing
2017-07-20 15:03:10 +02:00
Matthew Honnibal
f014138c11
Fix parser tests
2017-07-20 00:16:52 +02:00
mollerhoj
e840077601
Add some basic tests for Danish
2017-07-03 15:49:51 +02:00
ines
34a2eecb17
Add simple "naughty strings" test (see #1107 )
2017-06-06 17:43:51 +02:00
ines
cc9c5dc7a3
Fix noun chunks test
2017-06-05 16:39:04 +02:00
Matthew Honnibal
b4cdd05466
Add vectors.pyx in setup
2017-06-05 12:45:29 +02:00
Matthew Honnibal
30369d580f
Start testing Vectors class
2017-06-05 12:32:49 +02:00
ines
51d7414e94
Make sure sents are a list
2017-06-05 12:30:13 +02:00
ines
a0f4592f0a
Update tests
2017-06-05 02:26:13 +02:00
ines
3e105bcd36
Update tests
2017-06-05 02:09:27 +02:00
ines
078232932c
Fix tokenizer fixture scope
2017-06-05 01:06:34 +02:00
Matthew Honnibal
58be0e1f6f
Update tests
2017-06-04 16:35:06 -05:00
Matthew Honnibal
bb98d45a63
Fix tests
2017-06-04 16:00:44 -05:00
Matthew Honnibal
55d0621532
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-04 15:53:25 -05:00
Matthew Honnibal
5b9f116aca
Update tests
2017-06-04 15:53:17 -05:00
ines
8a29308d0b
Remove unused imports
2017-06-04 22:39:29 +02:00
Ines Montani
112c5787eb
Merge pull request #1101 from oroszgy/hu_tokenizer_fix
...
More robust Hungarian tokenizer.
2017-06-04 22:37:51 +02:00
ines
96867a24ae
Fix typo
2017-06-04 22:36:40 +02:00
ines
f432bb4b48
Fix fixture scopes
2017-06-04 22:34:31 +02:00