ines
4eb5bd02e7
Update textcat pre-processing after to_array change
2017-10-27 00:32:12 +02:00
ines
2d6ec99884
Set 'model' as default model name to prevent meta.json errors
2017-10-26 16:12:23 +02:00
ines
9e372913e0
Remove old 'SP' condition in tag map
2017-10-26 16:11:57 +02:00
Matthew Honnibal
c52671420c
Remove old cfile import
2017-10-26 13:28:19 +02:00
Matthew Honnibal
ea03f1ef64
Remove obsolete cfile code
2017-10-26 13:23:36 +02:00
Matthew Honnibal
90d1d9b230
Remove obsolete parser code
2017-10-26 13:22:45 +02:00
ines
6f78e29bed
Add LAW entity label to glossary
2017-10-26 13:04:35 +02:00
ines
9bf78d5fb3
Update spacy.explain docs
2017-10-26 13:04:25 +02:00
Matthew Honnibal
33f8c58782
Remove obsolete parser.pyx
2017-10-26 12:42:05 +02:00
Matthew Honnibal
a8abc47811
Rename BaseThincComponent --> Pipe
2017-10-26 12:40:40 +02:00
Matthew Honnibal
b0f3ea2200
Fix names of pipeline components
...
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder --> Tensorizer
NeuralLabeller --> MultitaskObjective
2017-10-26 12:38:23 +02:00
Matthew Honnibal
b6b4f1aaf7
Merge pull request #1462 from explosion/feature/vector-meta-data
...
💫 Add vector meta data to model meta.json on train/package and show in docs
2017-10-26 11:39:41 +02:00
Matthew Honnibal
35977bdbb9
Update better-parser branch with develop
2017-10-26 00:55:53 +00:00
Ines Montani
090bd00369
Merge pull request #1464 from mayukh18/develop_bengali_pronouns
...
added the bengali pronouns for v2.0
2017-10-25 21:55:25 +02:00
mayukh18
1bc07758fa
added few bengali pronouns
2017-10-25 22:24:40 +05:30
ines
de1e5f35d5
Merge branch 'develop' into feature/disable-pipes
2017-10-25 16:33:12 +02:00
ines
728b609bf9
Merge branch 'develop' into feature/vector-meta-data
2017-10-25 16:32:22 +02:00
ines
c0b55ebdac
Fix PhraseMatcher.__contains__ and add more tests
2017-10-25 16:31:11 +02:00
ines
91beacf5e3
Fix Matcher.__contains__
2017-10-25 16:19:38 +02:00
ines
11e3f19764
Fix vectors data added after training (see #1457 )
2017-10-25 16:08:26 +02:00
ines
057954695b
Read pipeline and vector data off model in --generate-meta
2017-10-25 16:03:26 +02:00
ines
273e638183
Add vector data to model meta after training (see #1457 )
2017-10-25 16:03:05 +02:00
ines
18aae423fb
Remove import of non-existing function
2017-10-25 15:54:10 +02:00
ines
5117a7d24d
Fix whitespace
2017-10-25 15:54:02 +02:00
ines
657a4d91bc
Merge branch 'develop' into feature/disable-pipes
2017-10-25 15:19:05 +02:00
ines
1a722dac31
Merge branch 'develop' into feature/disable-pipes
2017-10-25 15:18:18 +02:00
ines
6a00de4f77
Fix check of unexpected pipe names in restore()
2017-10-25 14:56:35 +02:00
ines
7f03932477
Return self on __enter__
2017-10-25 14:56:16 +02:00
Matthew Honnibal
b5de768852
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-25 14:44:16 +02:00
Matthew Honnibal
094512fd47
Fix model-mark on regression test.
2017-10-25 14:44:00 +02:00
Matthew Honnibal
e70f80f29e
Add Language.disable_pipes()
2017-10-25 13:46:41 +02:00
Matthew Honnibal
075e8118ea
Update from develop
2017-10-25 12:45:21 +02:00
ines
72497c8cb2
Remove comments and add TODO
2017-10-25 12:15:43 +02:00
ines
4d97efc3b5
Add missing docstrings
2017-10-25 12:10:16 +02:00
ines
1262aa0bf9
Implement PhraseMatcher.__contains__
2017-10-25 12:10:04 +02:00
ines
9c733a8849
Implement PhraseMatcher.__len__
2017-10-25 12:09:56 +02:00
ines
7eebeeaf85
Fix Matcher.__contains__
2017-10-25 12:09:47 +02:00
ines
7bcec57462
Remove unused attribute
2017-10-25 12:08:54 +02:00
ines
0b1dcbac14
Remove unused function
2017-10-25 12:08:46 +02:00
ines
3484174e48
Add Language.path
2017-10-25 11:57:43 +02:00
Ines Montani
d3bf488e16
Merge pull request #1171 from mollerhoj/support-danish
...
Improve basic support for Danish
2017-10-24 20:29:57 +02:00
Matthew Honnibal
d9bb1e5de8
Increment version
2017-10-24 17:06:19 +02:00
Matthew Honnibal
908809d488
Update tests
2017-10-24 17:05:15 +02:00
Matthew Honnibal
66766c1454
Restore SP tag to English tag_map, until models migrate
2017-10-24 17:05:00 +02:00
Matthew Honnibal
30e67fa808
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-24 16:08:23 +02:00
Matthew Honnibal
b0f6fd3f1d
Disable tokenizer cache for special-cases. Fixes #1250
2017-10-24 16:08:05 +02:00
Matthew Honnibal
63f0bde749
Add test for #1250 : Tokenizer cache clobbered special-case attrs
2017-10-24 16:07:18 +02:00
ines
8492d5be6d
Always make lemmatizer return a list of lemmas, not a set
2017-10-24 16:00:56 +02:00
ines
95f866f99f
Add lookup argument to Lemmatizer.load
2017-10-24 16:00:56 +02:00
ines
95f6174516
Remove tensorizer from model pipeline example in spacy package
2017-10-24 16:00:56 +02:00
ines
090aed940a
Add test for currently failing span.as_doc case
2017-10-24 16:00:56 +02:00
ines
4ef81a9ebc
Fix whitespace
2017-10-24 16:00:56 +02:00
Matthew Honnibal
18f1c1d0ba
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-24 14:29:43 +02:00
Matthew Honnibal
4bea65a1a8
Fix Issue #1450 : Off-by-1 in * and ? matches
...
Patterns that end in variable-length operators e.g. * and ? now end on
the correct token. Previously, they were off by 1: the next token was
pulled into the match, even if that's where the pattern failed.
2017-10-24 14:26:27 +02:00
Matthew Honnibal
391d5ef0d1
Normalize imports in regression test
2017-10-24 14:25:49 +02:00
ines
c55db0a4a1
Add example sentences for Japanese and Chinese (see #1107 )
2017-10-24 13:02:24 +02:00
ines
66f8f9d4a0
Fix Japanese tokenizer
...
JapaneseTokenizer now returns a Doc, not individual words
2017-10-24 13:02:19 +02:00
Matthew Honnibal
dd5b2d8fa3
Check for out-of-memory when calling calloc. Closes #1446
2017-10-24 12:40:47 +02:00
Matthew Honnibal
b66b8f028b
Fix #1375 -- out-of-bounds on token.nbor()
2017-10-24 12:10:39 +02:00
Matthew Honnibal
a68d89a4f3
Add failing test for bug #1375 -- no out-of-bounds error for token.nbor()
2017-10-24 12:05:25 +02:00
Ines Montani
facf77e541
Merge branch 'develop' into support-danish
2017-10-24 11:53:19 +02:00
Matthew Honnibal
ccd2ab1a62
Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix
...
Add LCA matrix for spans and docs
2017-10-24 11:22:46 +02:00
Matthew Honnibal
ef3e5a361b
Merge pull request #1442 from explosion/feature/fix-sp
...
💫 Fix SP tag, tweak Vectors.__init__, fix Morphology
2017-10-24 10:24:07 +02:00
Matthew Honnibal
fdf25d10ba
Merge pull request #1440 from ramananbalakrishnan/develop
...
Support single value for attribute list in doc.to_array
2017-10-24 10:23:12 +02:00
Matthew Honnibal
e7556ff048
Fix non-maxout parser
2017-10-23 18:16:23 +02:00
ines
a31f048b4d
Fix formatting
2017-10-23 10:38:06 +02:00
Matthew Honnibal
490ad3eaf0
Check that empty strings are handled. Closes #1242
2017-10-21 00:52:14 +02:00
Matthew Honnibal
8f8bccecb9
Patch deserialisation for invalid loads, to avoid model failure
2017-10-21 00:51:42 +02:00
Ramanan Balakrishnan
d2fe56a577
Add LCA matrix for spans and docs
2017-10-20 23:58:00 +05:30
Matthew Honnibal
d8391b1c4d
Fix #1434 : Matcher failed on ending ? if no token
2017-10-20 16:49:36 +02:00
Matthew Honnibal
fec53f09f7
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-20 16:28:34 +02:00
Matthew Honnibal
f111b228e0
Fix re-parsing of previously parsed text
...
If a Doc object had been previously parsed, it was possible for
invalid parses to be added. There were two problems:
1) The parse was only being partially erased
2) The RightArc action was able to create a 1-cycle.
This patch fixes both errors, and avoids resetting the parse if one is
present. In theory this might allow a better parse to be predicted by
running the parser twice.
Closes #1253 .
2017-10-20 16:27:36 +02:00
Matthew Honnibal
1036798155
Make parser consistent if maxout==1
2017-10-20 16:24:16 +02:00
Matthew Honnibal
3faf9189a2
Make parser hidden shape consistent even if maxout==1
2017-10-20 16:23:31 +02:00
Matthew Honnibal
9010a1a060
Create vectors correctly
2017-10-20 14:19:46 +02:00
Matthew Honnibal
33229b1c9e
Remove print statement
2017-10-20 14:19:29 +02:00
Matthew Honnibal
cfae54c507
Make change to Vectors.__init__
2017-10-20 14:19:04 +02:00
Matthew Honnibal
ebecaddb76
Make 'data_or_width' two keyword args in Vectors.__init__
...
Previously the data and width options were one argument in Vectors,
which meant you couldn't say vectors = Vectors(strings, width=300).
It's better to have two keywords.
2017-10-20 14:17:15 +02:00
Matthew Honnibal
49895fbef6
Rename 'SP' special tag to '_SP'
...
Renaming the tag with an underscore lets us add it to the tag map
without worrying that we'll change the sequence of tags, which throws
off the tag-to-ID mapping. For instance, if we inserted a 'SP' tag,
the "VERB" tag is pushed to a different class ID, and the model is all
messed up.
2017-10-20 14:01:12 +02:00
Matthew Honnibal
506cf2eb13
Remove cpdef enum, to avoid too much code generation
2017-10-20 14:00:23 +02:00
Matthew Honnibal
6218af0105
Remove cpdef enum, to avoid too much code generation
2017-10-20 13:59:57 +02:00
Matthew Honnibal
92ac9316b5
Fix initialization of vectors, to address serialization problem
2017-10-20 13:59:24 +02:00
Ramanan Balakrishnan
0726946563
cleanup to_array implementation using fixes on master
2017-10-20 17:09:37 +05:30
ines
108f1f786e
Update symbols and document missing token attributes (see #1439 )
2017-10-20 13:08:44 +02:00
ines
4acab77a8a
Add missing symbol for LAW entities ( resolves #1427 )
2017-10-20 13:07:57 +02:00
Matthew Honnibal
b101736555
Fix precomputed layer
2017-10-20 12:14:52 +02:00
Ramanan Balakrishnan
b3ab124fc5
Support strings for attribute list in doc.to_array
2017-10-20 11:46:57 +05:30
Matthew Honnibal
64658e02e5
Implement fancier initialisation for precomputed layer
2017-10-20 03:07:45 +02:00
Matthew Honnibal
827cd8a883
Fix support of maxout pieces in parser
2017-10-20 03:07:17 +02:00
Matthew Honnibal
a8850b4282
Remove redundant PrecomputableMaxouts class
2017-10-19 20:27:34 +02:00
Matthew Honnibal
a17a1b60c7
Clean up redundant PrecomputableMaxouts class
2017-10-19 20:26:37 +02:00
Matthew Honnibal
b00d0a2c97
Fix bias in parser
2017-10-19 18:42:11 +02:00
Matthew Honnibal
b54b4b8a97
Make parser_maxout_pieces hyper-param work
2017-10-19 13:45:18 +02:00
Matthew Honnibal
03a215c5fd
Make PrecomputableAffines work
2017-10-19 13:44:49 +02:00
Ramanan Balakrishnan
7b9b1be44c
Support single value for attribute list in doc.to_array
2017-10-19 17:00:41 +05:30
Matthew Honnibal
61bc203f3f
Merge pull request #1438 from explosion/feature/fast-parser
...
💫 Improve runtime CPU efficiency of parser/NER
2017-10-19 02:42:21 +02:00
Matthew Honnibal
15e5a04a8d
Clean up more depth=0 conditional code
2017-10-19 01:48:43 +02:00
Matthew Honnibal
906c50ac59
Fix loop typing, that caused error on windows
2017-10-19 01:48:39 +02:00
ines
24512420b1
Show error if data_path does not exist or is None (see #1102 )
2017-10-19 00:53:49 +02:00
ines
bf415fd778
Add test for serializing extension attrs (see #1085 )
2017-10-19 00:53:08 +02:00