Jim O'Regan
34ca59691b
no idea what is wrong here
2017-10-31 14:50:13 +00:00
Jim O'Regan
41dd29e48e
merge
2017-10-31 14:07:45 +00:00
Matthew Honnibal
cb5217012f
Fix vector remapping
2017-10-31 11:40:46 +01:00
Matthew Honnibal
9c11ee4a1c
WIP on vectors fixes
2017-10-31 11:22:56 +01:00
Matthew Honnibal
ce876c551e
Fix GPU usage
2017-10-31 02:33:34 +01:00
Matthew Honnibal
7698903617
Fix GPU usage
2017-10-31 02:33:16 +01:00
Matthew Honnibal
368fdb389a
WIP on refactoring and fixing vectors
2017-10-31 02:00:26 +01:00
Matthew Honnibal
4e3006cec7
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-30 19:44:58 +01:00
Matthew Honnibal
4112a991ec
Fix vector pruning
2017-10-30 19:44:40 +01:00
ines
ec657c1ddc
Update vocab docs and document Vocab.prune_vectors
2017-10-30 19:35:41 +01:00
ines
803e41bc66
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-30 18:39:51 +01:00
ines
8e02294241
Add vectors to Language.meta
2017-10-30 18:39:48 +01:00
ines
abf8aa05d3
Populate --create-meta defaults from file if available
...
If meta.json is found in directory and user chooses to overwrite it, show existing data as defaults.
2017-10-30 18:39:38 +01:00
ines
ce98fa7934
Fix formatting
2017-10-30 18:38:55 +01:00
ines
98c35d2585
Fix spacy vocab command
2017-10-30 18:38:41 +01:00
Matthew Honnibal
e98451b5f7
Add -prune-vectors argument to spacy.cly.train
2017-10-30 18:00:10 +01:00
Matthew Honnibal
e026b29ea9
Add prune_vectors method to Vocab
2017-10-30 17:59:43 +01:00
Explosion Bot
d0cf12c8c7
Fix off-by-one error in vectors
2017-10-30 16:22:03 +01:00
Explosion Bot
05a1dd570e
Fix vocab script
2017-10-30 16:19:22 +01:00
Explosion Bot
b46bdce8d2
Add missing import
2017-10-30 16:18:10 +01:00
Explosion Bot
2d2cc294b4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-30 16:15:05 +01:00
Explosion Bot
0fc1209421
Wire up new vocab command
2017-10-30 16:14:50 +01:00
Explosion Bot
aa64031751
Fix clear_vectors() method on Vocab
2017-10-30 16:09:04 +01:00
Explosion Bot
7b56b2f04b
Add Vocab.cfg attr, to hold stuff like oov probs
2017-10-30 16:08:50 +01:00
Explosion Bot
ab5d5ed880
Fix vectors.add()
2017-10-30 16:08:09 +01:00
Explosion Bot
41d0f1665a
Fix add_attrs for cluster
2017-10-30 16:07:50 +01:00
ines
5453821a9f
Update NER annotation scheme
...
Add note on training data sources and include coarse-grained Wikipedia scheme
2017-10-30 13:53:49 +01:00
Explosion Bot
5ede7cec9b
Improve Lexeme.set_attrs method
2017-10-30 11:49:11 +01:00
Explosion Bot
72aea8f105
Update vectors.add() to allow setting keys to rows
2017-10-30 10:03:08 +01:00
Matthew Honnibal
c43cc5361d
Merge pull request #1467 from explosion/feature/better-parser
...
💫 Bug fixes to parser model (requires retraining)
2017-10-29 02:05:22 +02:00
ines
6c2d8d3b2a
Use shortcuts-nightly.json to resolve model shortcuts
2017-10-29 01:28:31 +02:00
Matthew Honnibal
a0c7dabb72
Fix bug in 8-token parser features
2017-10-28 23:01:35 +00:00
Matthew Honnibal
b713d10d97
Switch to 13 features in parser
2017-10-28 23:01:14 +00:00
Matthew Honnibal
3b91097321
Whitespace
2017-10-28 17:05:11 +00:00
Matthew Honnibal
6ef72864fa
Improve initialization for hidden layers
2017-10-28 17:05:01 +00:00
Matthew Honnibal
5414e2f14b
Use missing features in parser
2017-10-28 16:45:54 +00:00
Matthew Honnibal
df4803cc6d
Add learned missing values for parser
2017-10-28 16:45:14 +00:00
Matthew Honnibal
64e4ff7c4b
Merge 'tidy-up' changes into branch. Resolve conflicts
2017-10-28 13:16:06 +02:00
Explosion Bot
fb0c96f39a
Fix optimizer loading
2017-10-28 11:58:16 +02:00
Explosion Bot
b22e42af7f
Merge changes to parser and _ml
2017-10-28 11:52:10 +02:00
ines
d96e72f656
Tidy up rest
2017-10-27 21:07:59 +02:00
ines
a8e10f94e4
Tidy up Lexeme and update docs
2017-10-27 21:07:50 +02:00
ines
ba5e646219
Tidy up pipeline
2017-10-27 20:29:08 +02:00
ines
b4d226a3f1
Tidy up syntax
2017-10-27 19:45:57 +02:00
ines
5167a0cce2
Tidy up Vectors and docs
2017-10-27 19:45:19 +02:00
ines
7946464742
Remove spacy.tagger (now in pipeline)
2017-10-27 19:45:04 +02:00
ines
9c89e2cdef
Remove unused syntax iterators (now in language data)
2017-10-27 18:09:53 +02:00
ines
d2df81d907
Fix not implemented Span getters
2017-10-27 18:09:28 +02:00
ines
544a407b93
Tidy up Doc, Token and Span and add missing docs
2017-10-27 17:07:26 +02:00
ines
a6135336f5
Tidy up gold
2017-10-27 17:02:55 +02:00
ines
6a0483b7aa
Tidy up and document Doc, Token and Span
2017-10-27 15:41:45 +02:00
ines
1a559d4c95
Remove old, unused file
2017-10-27 15:34:35 +02:00
ines
91899d337b
Tidy up language, lemmatizer and scorer
2017-10-27 14:40:14 +02:00
ines
778212efea
Tidy up init and main
2017-10-27 14:39:51 +02:00
ines
e33b7e0b3c
Tidy up parser and ML
2017-10-27 14:39:30 +02:00
ines
e3265998c0
Tidy up displaCy
2017-10-27 14:39:19 +02:00
ines
ea4a41c8fb
Tidy up util and helpers
2017-10-27 14:39:09 +02:00
ines
d941fc3667
Tidy up CLI
2017-10-27 14:38:39 +02:00
Matthew Honnibal
531142a933
Merge remote-tracking branch 'origin/develop' into feature/better-parser
2017-10-27 12:34:48 +00:00
Matthew Honnibal
19a2b9bf27
Fix import of Optimizer
2017-10-27 12:33:42 +00:00
Matthew Honnibal
4d048e94d3
Add compat for thinc.neural.optimizers.Optimizer
2017-10-27 10:23:49 +00:00
Ines Montani
4033e70c71
Merge pull request #1461 from explosion/feature/disable-pipes
...
💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples
2017-10-27 12:21:40 +02:00
Matthew Honnibal
75a637fa43
Remove redundant imports from _ml
2017-10-27 10:19:56 +00:00
Matthew Honnibal
c9987cf131
Avoid use of numpy.tensordot
2017-10-27 10:18:36 +00:00
Matthew Honnibal
f6fef30adc
Remove dead code from spacy._ml
2017-10-27 10:16:41 +00:00
Matthew Honnibal
b9616419e1
Add try/except around bz2 import
2017-10-27 01:18:05 +00:00
Matthew Honnibal
783c0c8795
Remove unnecessary bz2 import
2017-10-27 01:17:54 +00:00
Matthew Honnibal
bb25bdcd92
Adjust call to scatter_add for the new version
2017-10-27 01:16:55 +00:00
Ines Montani
287a3ca256
Merge pull request #1466 from explosion/feature/rename-pipeline
...
💫 Clean up dead linear model code
2017-10-27 02:03:28 +02:00
ines
4eb5bd02e7
Update textcat pre-processing after to_array change
2017-10-27 00:32:12 +02:00
ines
2d6ec99884
Set 'model' as default model name to prevent meta.json errors
2017-10-26 16:12:23 +02:00
ines
9e372913e0
Remove old 'SP' condition in tag map
2017-10-26 16:11:57 +02:00
Matthew Honnibal
c52671420c
Remove old cfile import
2017-10-26 13:28:19 +02:00
Matthew Honnibal
ea03f1ef64
Remove obsolete cfile code
2017-10-26 13:23:36 +02:00
Matthew Honnibal
90d1d9b230
Remove obsolete parser code
2017-10-26 13:22:45 +02:00
ines
6f78e29bed
Add LAW entity label to glossary
2017-10-26 13:04:35 +02:00
ines
9bf78d5fb3
Update spacy.explain docs
2017-10-26 13:04:25 +02:00
Matthew Honnibal
33f8c58782
Remove obsolete parser.pyx
2017-10-26 12:42:05 +02:00
Matthew Honnibal
a8abc47811
Rename BaseThincComponent --> Pipe
2017-10-26 12:40:40 +02:00
Matthew Honnibal
b0f3ea2200
Fix names of pipeline components
...
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder --> Tensorizer
NeuralLabeller --> MultitaskObjective
2017-10-26 12:38:23 +02:00
Matthew Honnibal
b6b4f1aaf7
Merge pull request #1462 from explosion/feature/vector-meta-data
...
💫 Add vector meta data to model meta.json on train/package and show in docs
2017-10-26 11:39:41 +02:00
Matthew Honnibal
35977bdbb9
Update better-parser branch with develop
2017-10-26 00:55:53 +00:00
Ines Montani
090bd00369
Merge pull request #1464 from mayukh18/develop_bengali_pronouns
...
added the bengali pronouns for v2.0
2017-10-25 21:55:25 +02:00
mayukh18
1bc07758fa
added few bengali pronouns
2017-10-25 22:24:40 +05:30
ines
de1e5f35d5
Merge branch 'develop' into feature/disable-pipes
2017-10-25 16:33:12 +02:00
ines
728b609bf9
Merge branch 'develop' into feature/vector-meta-data
2017-10-25 16:32:22 +02:00
ines
c0b55ebdac
Fix PhraseMatcher.__contains__ and add more tests
2017-10-25 16:31:11 +02:00
ines
91beacf5e3
Fix Matcher.__contains__
2017-10-25 16:19:38 +02:00
ines
11e3f19764
Fix vectors data added after training (see #1457 )
2017-10-25 16:08:26 +02:00
ines
057954695b
Read pipeline and vector data off model in --generate-meta
2017-10-25 16:03:26 +02:00
ines
273e638183
Add vector data to model meta after training (see #1457 )
2017-10-25 16:03:05 +02:00
ines
18aae423fb
Remove import of non-existing function
2017-10-25 15:54:10 +02:00
ines
5117a7d24d
Fix whitespace
2017-10-25 15:54:02 +02:00
ines
657a4d91bc
Merge branch 'develop' into feature/disable-pipes
2017-10-25 15:19:05 +02:00
ines
1a722dac31
Merge branch 'develop' into feature/disable-pipes
2017-10-25 15:18:18 +02:00
ines
6a00de4f77
Fix check of unexpected pipe names in restore()
2017-10-25 14:56:35 +02:00
ines
7f03932477
Return self on __enter__
2017-10-25 14:56:16 +02:00
Matthew Honnibal
b5de768852
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-25 14:44:16 +02:00
Matthew Honnibal
094512fd47
Fix model-mark on regression test.
2017-10-25 14:44:00 +02:00
Matthew Honnibal
e70f80f29e
Add Language.disable_pipes()
2017-10-25 13:46:41 +02:00
Matthew Honnibal
075e8118ea
Update from develop
2017-10-25 12:45:21 +02:00
ines
72497c8cb2
Remove comments and add TODO
2017-10-25 12:15:43 +02:00
ines
4d97efc3b5
Add missing docstrings
2017-10-25 12:10:16 +02:00
ines
1262aa0bf9
Implement PhraseMatcher.__contains__
2017-10-25 12:10:04 +02:00
ines
9c733a8849
Implement PhraseMatcher.__len__
2017-10-25 12:09:56 +02:00
ines
7eebeeaf85
Fix Matcher.__contains__
2017-10-25 12:09:47 +02:00
ines
7bcec57462
Remove unused attribute
2017-10-25 12:08:54 +02:00
ines
0b1dcbac14
Remove unused function
2017-10-25 12:08:46 +02:00
ines
3484174e48
Add Language.path
2017-10-25 11:57:43 +02:00
Ines Montani
d3bf488e16
Merge pull request #1171 from mollerhoj/support-danish
...
Improve basic support for Danish
2017-10-24 20:29:57 +02:00
Matthew Honnibal
d9bb1e5de8
Increment version
2017-10-24 17:06:19 +02:00
Matthew Honnibal
908809d488
Update tests
2017-10-24 17:05:15 +02:00
Matthew Honnibal
66766c1454
Restore SP tag to English tag_map, until models migrate
2017-10-24 17:05:00 +02:00
Matthew Honnibal
30e67fa808
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-24 16:08:23 +02:00
Matthew Honnibal
b0f6fd3f1d
Disable tokenizer cache for special-cases. Fixes #1250
2017-10-24 16:08:05 +02:00
Matthew Honnibal
63f0bde749
Add test for #1250 : Tokenizer cache clobbered special-case attrs
2017-10-24 16:07:18 +02:00
ines
8492d5be6d
Always make lemmatizer return a list of lemmas, not a set
2017-10-24 16:00:56 +02:00
ines
95f866f99f
Add lookup argument to Lemmatizer.load
2017-10-24 16:00:56 +02:00
ines
95f6174516
Remove tensorizer from model pipeline example in spacy package
2017-10-24 16:00:56 +02:00
ines
090aed940a
Add test for currently failing span.as_doc case
2017-10-24 16:00:56 +02:00
ines
4ef81a9ebc
Fix whitespace
2017-10-24 16:00:56 +02:00
Matthew Honnibal
18f1c1d0ba
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-24 14:29:43 +02:00
Matthew Honnibal
4bea65a1a8
Fix Issue #1450 : Off-by-1 in * and ? matches
...
Patterns that end in variable-length operators e.g. * and ? now end on
the correct token. Previously, they were off by 1: the next token was
pulled into the match, even if that's where the pattern failed.
2017-10-24 14:26:27 +02:00
Matthew Honnibal
391d5ef0d1
Normalize imports in regression test
2017-10-24 14:25:49 +02:00
ines
c55db0a4a1
Add example sentences for Japanese and Chinese (see #1107 )
2017-10-24 13:02:24 +02:00
ines
66f8f9d4a0
Fix Japanese tokenizer
...
JapaneseTokenizer now returns a Doc, not individual words
2017-10-24 13:02:19 +02:00
Matthew Honnibal
dd5b2d8fa3
Check for out-of-memory when calling calloc. Closes #1446
2017-10-24 12:40:47 +02:00
Matthew Honnibal
b66b8f028b
Fix #1375 -- out-of-bounds on token.nbor()
2017-10-24 12:10:39 +02:00
Matthew Honnibal
a68d89a4f3
Add failing test for bug #1375 -- no out-of-bounds error for token.nbor()
2017-10-24 12:05:25 +02:00
Ines Montani
facf77e541
Merge branch 'develop' into support-danish
2017-10-24 11:53:19 +02:00
Matthew Honnibal
ccd2ab1a62
Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix
...
Add LCA matrix for spans and docs
2017-10-24 11:22:46 +02:00
Matthew Honnibal
ef3e5a361b
Merge pull request #1442 from explosion/feature/fix-sp
...
💫 Fix SP tag, tweak Vectors.__init__, fix Morphology
2017-10-24 10:24:07 +02:00
Matthew Honnibal
fdf25d10ba
Merge pull request #1440 from ramananbalakrishnan/develop
...
Support single value for attribute list in doc.to_array
2017-10-24 10:23:12 +02:00
Matthew Honnibal
e7556ff048
Fix non-maxout parser
2017-10-23 18:16:23 +02:00
ines
a31f048b4d
Fix formatting
2017-10-23 10:38:06 +02:00
Matthew Honnibal
490ad3eaf0
Check that empty strings are handled. Closes #1242
2017-10-21 00:52:14 +02:00
Matthew Honnibal
8f8bccecb9
Patch deserialisation for invalid loads, to avoid model failure
2017-10-21 00:51:42 +02:00
Ramanan Balakrishnan
d2fe56a577
Add LCA matrix for spans and docs
2017-10-20 23:58:00 +05:30
Matthew Honnibal
d8391b1c4d
Fix #1434 : Matcher failed on ending ? if no token
2017-10-20 16:49:36 +02:00
Matthew Honnibal
fec53f09f7
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-20 16:28:34 +02:00
Matthew Honnibal
f111b228e0
Fix re-parsing of previously parsed text
...
If a Doc object had been previously parsed, it was possible for
invalid parses to be added. There were two problems:
1) The parse was only being partially erased
2) The RightArc action was able to create a 1-cycle.
This patch fixes both errors, and avoids resetting the parse if one is
present. In theory this might allow a better parse to be predicted by
running the parser twice.
Closes #1253 .
2017-10-20 16:27:36 +02:00
Matthew Honnibal
1036798155
Make parser consistent if maxout==1
2017-10-20 16:24:16 +02:00
Matthew Honnibal
3faf9189a2
Make parser hidden shape consistent even if maxout==1
2017-10-20 16:23:31 +02:00
Matthew Honnibal
9010a1a060
Create vectors correctly
2017-10-20 14:19:46 +02:00
Matthew Honnibal
33229b1c9e
Remove print statement
2017-10-20 14:19:29 +02:00
Matthew Honnibal
cfae54c507
Make change to Vectors.__init__
2017-10-20 14:19:04 +02:00
Matthew Honnibal
ebecaddb76
Make 'data_or_width' two keyword args in Vectors.__init__
...
Previously the data and width options were one argument in Vectors,
which meant you couldn't say vectors = Vectors(strings, width=300).
It's better to have two keywords.
2017-10-20 14:17:15 +02:00
Matthew Honnibal
49895fbef6
Rename 'SP' special tag to '_SP'
...
Renaming the tag with an underscore lets us add it to the tag map
without worrying that we'll change the sequence of tags, which throws
off the tag-to-ID mapping. For instance, if we inserted a 'SP' tag,
the "VERB" tag is pushed to a different class ID, and the model is all
messed up.
2017-10-20 14:01:12 +02:00
Matthew Honnibal
506cf2eb13
Remove cpdef enum, to avoid too much code generation
2017-10-20 14:00:23 +02:00
Matthew Honnibal
6218af0105
Remove cpdef enum, to avoid too much code generation
2017-10-20 13:59:57 +02:00
Matthew Honnibal
92ac9316b5
Fix initialization of vectors, to address serialization problem
2017-10-20 13:59:24 +02:00
Ramanan Balakrishnan
0726946563
cleanup to_array implementation using fixes on master
2017-10-20 17:09:37 +05:30
ines
108f1f786e
Update symbols and document missing token attributes (see #1439 )
2017-10-20 13:08:44 +02:00
ines
4acab77a8a
Add missing symbol for LAW entities ( resolves #1427 )
2017-10-20 13:07:57 +02:00
Matthew Honnibal
b101736555
Fix precomputed layer
2017-10-20 12:14:52 +02:00
Ramanan Balakrishnan
b3ab124fc5
Support strings for attribute list in doc.to_array
2017-10-20 11:46:57 +05:30
Matthew Honnibal
64658e02e5
Implement fancier initialisation for precomputed layer
2017-10-20 03:07:45 +02:00
Matthew Honnibal
827cd8a883
Fix support of maxout pieces in parser
2017-10-20 03:07:17 +02:00
Matthew Honnibal
a8850b4282
Remove redundant PrecomputableMaxouts class
2017-10-19 20:27:34 +02:00
Matthew Honnibal
a17a1b60c7
Clean up redundant PrecomputableMaxouts class
2017-10-19 20:26:37 +02:00
Matthew Honnibal
b00d0a2c97
Fix bias in parser
2017-10-19 18:42:11 +02:00
Matthew Honnibal
b54b4b8a97
Make parser_maxout_pieces hyper-param work
2017-10-19 13:45:18 +02:00
Matthew Honnibal
03a215c5fd
Make PrecomputableAffines work
2017-10-19 13:44:49 +02:00
Ramanan Balakrishnan
7b9b1be44c
Support single value for attribute list in doc.to_array
2017-10-19 17:00:41 +05:30
Matthew Honnibal
61bc203f3f
Merge pull request #1438 from explosion/feature/fast-parser
...
💫 Improve runtime CPU efficiency of parser/NER
2017-10-19 02:42:21 +02:00
Matthew Honnibal
15e5a04a8d
Clean up more depth=0 conditional code
2017-10-19 01:48:43 +02:00
Matthew Honnibal
906c50ac59
Fix loop typing, that caused error on windows
2017-10-19 01:48:39 +02:00
ines
24512420b1
Show error if data_path does not exist or is None (see #1102 )
2017-10-19 00:53:49 +02:00
ines
bf415fd778
Add test for serializing extension attrs (see #1085 )
2017-10-19 00:53:08 +02:00
Matthew Honnibal
960788aaa2
Eliminate dead code in parser, and raise errors for obsolete options
2017-10-19 00:42:34 +02:00
Matthew Honnibal
bbfd7d8d5d
Clean up parser multi-threading
2017-10-19 00:25:21 +02:00
Matthew Honnibal
f018f2030c
Try optimized parser forward loop
2017-10-18 21:48:00 +02:00
Matthew Honnibal
65bf5e85bd
Improve piping in language.pipe
2017-10-18 21:46:12 +02:00
Matthew Honnibal
633a75c7e0
Break parser batches into sub-batches, sorted by length.
2017-10-18 21:45:01 +02:00
Ines Montani
f0d577e460
Merge pull request #1425 from explosion/feature/hindi-tokenizer
...
💫 Basic Hindi tokenization support
2017-10-18 13:34:52 +02:00
Matthew Honnibal
394633efce
Make doc pickling support hooks
2017-10-17 19:44:09 +02:00
Matthew Honnibal
fe844148f6
Test pickling hooks
2017-10-17 19:43:52 +02:00
Matthew Honnibal
cdb0c426d8
Improve deserialization of user_data, esp. for Underscore
2017-10-17 19:29:20 +02:00
Matthew Honnibal
374819edf8
Test user_data deserialization, re #1085
2017-10-17 19:28:54 +02:00
Matthew Honnibal
e35a83d142
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-17 18:22:06 +02:00
Matthew Honnibal
f45973848c
Rename 'tokens' variable 'doc' in tokenizer
2017-10-17 18:21:41 +02:00
Matthew Honnibal
839de87ca9
Make lambda func a named function, for pickling
2017-10-17 18:21:20 +02:00
Matthew Honnibal
9baa8fe7ec
Convert closure to functools.partial, to promote pickling
2017-10-17 18:20:52 +02:00
Matthew Honnibal
32a8564c79
Fix doc pickling
2017-10-17 18:20:24 +02:00
Matthew Honnibal
8ca97f32a3
Fix doc pickling test
2017-10-17 18:19:57 +02:00
Matthew Honnibal
9ce7d6af87
Make lex attr functions top-level functions, to promote pickling
2017-10-17 18:19:18 +02:00
Matthew Honnibal
1cc85a89ef
Allow reasonably efficient pickling of Language class, using to_bytes() and from_bytes().
2017-10-17 18:18:49 +02:00
Matthew Honnibal
0d57b9748a
Serialize lex_attr_getters with dill, for better pickle support
2017-10-17 18:17:45 +02:00
Matthew Honnibal
45d1dd90b1
Add tests for pickling doc
2017-10-17 17:20:58 +02:00
Ines Montani
afa67de7ee
Merge pull request #1428 from roanuz/develop
...
Fix trailing whitespace and Language.from_disk overwrites
2017-10-17 16:29:15 +02:00
Matthew Honnibal
92c1eb2d6f
Fix Doc pickling. This also removes need for Binder class
2017-10-17 16:11:13 +02:00
Matthew Honnibal
ed8da9b11f
Add missing return statement in SentenceSegmenter
2017-10-17 15:32:56 +02:00
Ines Montani
aab299c8ae
Merge pull request #1429 from vishnunekkanti/develop
...
fix syntax error in zh
2017-10-17 14:45:02 +02:00
Anto Binish Kaspar
534240648e
Fix trailing whitespace on morphology features
2017-10-17 17:15:58 +05:30
Anto Binish Kaspar
8f5b60c168
Fix Language.from_disk overwrites the meta.json file.
2017-10-17 17:15:32 +05:30
ines
8ca344712d
Add Language.has_pipe method
2017-10-17 11:20:07 +02:00
ines
485c4f6df5
Add Hungarian examples (see #1107 )
2017-10-17 02:37:45 +02:00
Matthew Honnibal
19531bad4c
Merge branch 'develop' into feature/streaming-data-memory-growth
2017-10-16 21:44:11 +02:00
Matthew Honnibal
df488274b1
Fix deserialization of vectors
2017-10-16 20:55:00 +02:00
Matthew Honnibal
4018486d31
Merge remote-tracking branch 'origin/develop' into feature/streaming-data-memory-growth
2017-10-16 20:49:48 +02:00
Matthew Honnibal
4174477161
Fix equality check in test
2017-10-16 19:50:35 +02:00
Matthew Honnibal
2bc06e4b22
Bump rolling buffer size to 10k
2017-10-16 19:38:29 +02:00
Matthew Honnibal
66e2eb8f39
Clean up remnant of frozen in StringStore
2017-10-16 19:34:41 +02:00
Matthew Honnibal
a002264fec
Remove caching of Token in Doc, as caused cycle.
2017-10-16 19:34:21 +02:00
Matthew Honnibal
3e037054c8
Remove obsolete is_frozen functionality from StringStore
2017-10-16 19:23:10 +02:00
Matthew Honnibal
5c14f3f033
Create a rolling buffer for the StringStore in Language.pipe()
2017-10-16 19:22:40 +02:00
Matthew Honnibal
59c216196c
Allow weakrefs on Doc objects
2017-10-16 19:22:11 +02:00
ines
d5418553eb
Fix whitespace
2017-10-16 18:30:04 +02:00
ines
6ceadcdb5c
Make sure from_disk passes string to numpy (see #1421 )
...
If path is a WindowsPath, numpy does not recognise it as a path and as
a result, doesn't open the file.
https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L369
2017-10-16 18:29:56 +02:00
Matthew Honnibal
010a7309ff
Merge pull request #1402 from explosion/feature/fix-matcher-operators
...
💫 Fix Matcher variable-length operators
2017-10-16 17:53:19 +02:00
Matthew Honnibal
c29927d2e7
Fix matcher test
2017-10-16 17:22:18 +02:00
Vishnu Kumar Nekkanti
d3c54cf39a
fixed SyntaxError while checking for jieba
2017-10-16 18:51:33 +05:30
Matthew Honnibal
a928ae2f35
Merge branch 'develop' into feature/fix-matcher-operators
2017-10-16 13:38:36 +02:00
Matthew Honnibal
56aa42cc5d
Fix and document matcher operator 'shadowing' behaviour
2017-10-16 13:38:20 +02:00
Matthew Honnibal
748d525801
Add more matcher operator tests
2017-10-16 13:38:01 +02:00
Matthew Honnibal
0433181658
Document operator semantics in Matcher docstring
2017-10-16 12:06:33 +02:00
ines
266e7180a7
Add Language class, stop words and basic stemmer that sets NORM
2017-10-14 14:59:52 +02:00
ines
e85e1d571b
Update base punctuation
2017-10-14 14:59:23 +02:00
ines
9d6c8eaa49
Update base norm exceptions with more unicode characters
...
e.g. unicode variations of punctuation used in Chinese
2017-10-14 14:58:52 +02:00
ines
3516aa0cea
Port over changes from #1389
2017-10-14 13:32:55 +02:00
ines
cd6a29dce7
Port over changes from #1294
2017-10-14 13:28:46 +02:00
ines
38c756fd85
Port over changes from #1287
2017-10-14 13:16:21 +02:00
ines
612224c10d
Port over changes from #1157
2017-10-14 13:11:39 +02:00
ines
9b3f8f9ec3
Fix formatting and add comment on languages
2017-10-14 13:11:18 +02:00
ines
a4d974d97b
Port over URL pattern changes from #1411
2017-10-14 12:58:07 +02:00
ines
09aed58140
Port over changes from #1333 and add comments
2017-10-14 12:52:59 +02:00
Matthew Honnibal
cf6da9301a
Update lemmatizer test
2017-10-12 22:50:52 +02:00
Matthew Honnibal
9b90d235d1
Fix tag check in lemmatizer
2017-10-12 22:50:43 +02:00
Matthew Honnibal
dc01acd821
Escape encoding in validate function
2017-10-12 22:23:21 +02:00
Matthew Honnibal
27b927259a
Add locale_escape compat function
2017-10-12 22:22:04 +02:00
ines
9c6de3dcfa
Merge branch 'develop' into feature/cli-validate
2017-10-12 21:44:28 +02:00
Matthew Honnibal
462caf835a
Fix SBD test
2017-10-12 21:18:22 +02:00
ines
fff1028391
Add validate CLI command
2017-10-12 20:05:06 +02:00
Matthew Honnibal
908f44c3fe
Disable history features by default
2017-10-12 14:56:11 +02:00
Matthew Honnibal
a955843684
Increase default number of epochs
2017-10-12 13:13:01 +02:00
Matthew Honnibal
cecfcc7711
Set default hyper params back to 'slow' settings
2017-10-12 13:12:26 +02:00
Ines Montani
37aa523a8e
Merge pull request #1408 from explosion/feature/dot-underscore
...
💫 Custom attributes via Doc._, Token._ and Span._
2017-10-11 18:35:56 +02:00
ines
8ce6f96180
Don't make copies of language data components
2017-10-11 15:34:55 +02:00
ines
51519251c2
Fix underscore method test
2017-10-11 13:34:19 +02:00
ines
c6ae49e8bf
Fix formatting
2017-10-11 13:34:11 +02:00
ines
453c47ca24
Add German lemmatizer tests
2017-10-11 13:27:26 +02:00
ines
15fe0fd82d
Fix tests
2017-10-11 13:27:18 +02:00
ines
6dd14dc342
Add lookup lemmas to tokens without POS tags
2017-10-11 13:27:10 +02:00
ines
9620c1a640
Add lemma_lookup to Language defaults
2017-10-11 13:26:05 +02:00
ines
9fd471372a
Add lookup lemmatizer to lemmatizer as lookup() method
2017-10-11 13:25:51 +02:00
ines
e0ff145a8b
Merge branch 'develop' into feature/dot-underscore
2017-10-11 11:57:05 +02:00
ines
c1d6d43c83
Merge branch 'develop' into feature/lemmatizer
2017-10-11 11:56:35 +02:00
Matthew Honnibal
17c467e0ab
Avoid clobbering existing lemmas
2017-10-11 03:33:06 -05:00
Matthew Honnibal
807e109f2b
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-11 02:47:59 -05:00
Matthew Honnibal
6e552c9d83
Prune number of non-projective labels more aggressiely
2017-10-11 02:46:44 -05:00