Commit Graph

4305 Commits

Author SHA1 Message Date
Explosion Bot
7b56b2f04b Add Vocab.cfg attr, to hold stuff like oov probs 2017-10-30 16:08:50 +01:00
Explosion Bot
ab5d5ed880 Fix vectors.add() 2017-10-30 16:08:09 +01:00
Explosion Bot
41d0f1665a Fix add_attrs for cluster 2017-10-30 16:07:50 +01:00
ines
5453821a9f Update NER annotation scheme
Add note on training data sources and include coarse-grained Wikipedia scheme
2017-10-30 13:53:49 +01:00
Explosion Bot
5ede7cec9b Improve Lexeme.set_attrs method 2017-10-30 11:49:11 +01:00
Explosion Bot
72aea8f105 Update vectors.add() to allow setting keys to rows 2017-10-30 10:03:08 +01:00
Matthew Honnibal
c43cc5361d
Merge pull request #1467 from explosion/feature/better-parser
💫 Bug fixes to parser model (requires retraining)
2017-10-29 02:05:22 +02:00
ines
6c2d8d3b2a Use shortcuts-nightly.json to resolve model shortcuts 2017-10-29 01:28:31 +02:00
Matthew Honnibal
a0c7dabb72 Fix bug in 8-token parser features 2017-10-28 23:01:35 +00:00
Matthew Honnibal
b713d10d97 Switch to 13 features in parser 2017-10-28 23:01:14 +00:00
Matthew Honnibal
3b91097321 Whitespace 2017-10-28 17:05:11 +00:00
Matthew Honnibal
6ef72864fa Improve initialization for hidden layers 2017-10-28 17:05:01 +00:00
Matthew Honnibal
5414e2f14b Use missing features in parser 2017-10-28 16:45:54 +00:00
Matthew Honnibal
df4803cc6d Add learned missing values for parser 2017-10-28 16:45:14 +00:00
Matthew Honnibal
64e4ff7c4b Merge 'tidy-up' changes into branch. Resolve conflicts 2017-10-28 13:16:06 +02:00
Explosion Bot
fb0c96f39a Fix optimizer loading 2017-10-28 11:58:16 +02:00
Explosion Bot
b22e42af7f Merge changes to parser and _ml 2017-10-28 11:52:10 +02:00
ines
d96e72f656 Tidy up rest 2017-10-27 21:07:59 +02:00
ines
a8e10f94e4 Tidy up Lexeme and update docs 2017-10-27 21:07:50 +02:00
ines
ba5e646219 Tidy up pipeline 2017-10-27 20:29:08 +02:00
ines
b4d226a3f1 Tidy up syntax 2017-10-27 19:45:57 +02:00
ines
5167a0cce2 Tidy up Vectors and docs 2017-10-27 19:45:19 +02:00
ines
7946464742 Remove spacy.tagger (now in pipeline) 2017-10-27 19:45:04 +02:00
ines
9c89e2cdef Remove unused syntax iterators (now in language data) 2017-10-27 18:09:53 +02:00
ines
d2df81d907 Fix not implemented Span getters 2017-10-27 18:09:28 +02:00
ines
544a407b93 Tidy up Doc, Token and Span and add missing docs 2017-10-27 17:07:26 +02:00
ines
a6135336f5 Tidy up gold 2017-10-27 17:02:55 +02:00
ines
6a0483b7aa Tidy up and document Doc, Token and Span 2017-10-27 15:41:45 +02:00
ines
1a559d4c95 Remove old, unused file 2017-10-27 15:34:35 +02:00
ines
91899d337b Tidy up language, lemmatizer and scorer 2017-10-27 14:40:14 +02:00
ines
778212efea Tidy up init and main 2017-10-27 14:39:51 +02:00
ines
e33b7e0b3c Tidy up parser and ML 2017-10-27 14:39:30 +02:00
ines
e3265998c0 Tidy up displaCy 2017-10-27 14:39:19 +02:00
ines
ea4a41c8fb Tidy up util and helpers 2017-10-27 14:39:09 +02:00
ines
d941fc3667 Tidy up CLI 2017-10-27 14:38:39 +02:00
Matthew Honnibal
531142a933 Merge remote-tracking branch 'origin/develop' into feature/better-parser 2017-10-27 12:34:48 +00:00
Matthew Honnibal
19a2b9bf27 Fix import of Optimizer 2017-10-27 12:33:42 +00:00
Matthew Honnibal
4d048e94d3 Add compat for thinc.neural.optimizers.Optimizer 2017-10-27 10:23:49 +00:00
Ines Montani
4033e70c71 Merge pull request #1461 from explosion/feature/disable-pipes
💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples
2017-10-27 12:21:40 +02:00
Matthew Honnibal
75a637fa43 Remove redundant imports from _ml 2017-10-27 10:19:56 +00:00
Matthew Honnibal
c9987cf131 Avoid use of numpy.tensordot 2017-10-27 10:18:36 +00:00
Matthew Honnibal
f6fef30adc Remove dead code from spacy._ml 2017-10-27 10:16:41 +00:00
Matthew Honnibal
b9616419e1 Add try/except around bz2 import 2017-10-27 01:18:05 +00:00
Matthew Honnibal
783c0c8795 Remove unnecessary bz2 import 2017-10-27 01:17:54 +00:00
Matthew Honnibal
bb25bdcd92 Adjust call to scatter_add for the new version 2017-10-27 01:16:55 +00:00
Ines Montani
287a3ca256 Merge pull request #1466 from explosion/feature/rename-pipeline
💫 Clean up dead linear model code
2017-10-27 02:03:28 +02:00
ines
4eb5bd02e7 Update textcat pre-processing after to_array change 2017-10-27 00:32:12 +02:00
ines
2d6ec99884 Set 'model' as default model name to prevent meta.json errors 2017-10-26 16:12:23 +02:00
ines
9e372913e0 Remove old 'SP' condition in tag map 2017-10-26 16:11:57 +02:00
Matthew Honnibal
c52671420c Remove old cfile import 2017-10-26 13:28:19 +02:00
Matthew Honnibal
ea03f1ef64 Remove obsolete cfile code 2017-10-26 13:23:36 +02:00
Matthew Honnibal
90d1d9b230 Remove obsolete parser code 2017-10-26 13:22:45 +02:00
ines
6f78e29bed Add LAW entity label to glossary 2017-10-26 13:04:35 +02:00
ines
9bf78d5fb3 Update spacy.explain docs 2017-10-26 13:04:25 +02:00
Matthew Honnibal
33f8c58782 Remove obsolete parser.pyx 2017-10-26 12:42:05 +02:00
Matthew Honnibal
a8abc47811 Rename BaseThincComponent --> Pipe 2017-10-26 12:40:40 +02:00
Matthew Honnibal
b0f3ea2200 Fix names of pipeline components
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder     --> Tensorizer
NeuralLabeller         --> MultitaskObjective
2017-10-26 12:38:23 +02:00
Matthew Honnibal
b6b4f1aaf7 Merge pull request #1462 from explosion/feature/vector-meta-data
💫 Add vector meta data to model meta.json on train/package and show in docs
2017-10-26 11:39:41 +02:00
Matthew Honnibal
35977bdbb9 Update better-parser branch with develop 2017-10-26 00:55:53 +00:00
Ines Montani
090bd00369 Merge pull request #1464 from mayukh18/develop_bengali_pronouns
added the bengali pronouns for v2.0
2017-10-25 21:55:25 +02:00
mayukh18
1bc07758fa added few bengali pronouns 2017-10-25 22:24:40 +05:30
ines
de1e5f35d5 Merge branch 'develop' into feature/disable-pipes 2017-10-25 16:33:12 +02:00
ines
728b609bf9 Merge branch 'develop' into feature/vector-meta-data 2017-10-25 16:32:22 +02:00
ines
c0b55ebdac Fix PhraseMatcher.__contains__ and add more tests 2017-10-25 16:31:11 +02:00
ines
91beacf5e3 Fix Matcher.__contains__ 2017-10-25 16:19:38 +02:00
ines
11e3f19764 Fix vectors data added after training (see #1457) 2017-10-25 16:08:26 +02:00
ines
057954695b Read pipeline and vector data off model in --generate-meta 2017-10-25 16:03:26 +02:00
ines
273e638183 Add vector data to model meta after training (see #1457) 2017-10-25 16:03:05 +02:00
ines
18aae423fb Remove import of non-existing function 2017-10-25 15:54:10 +02:00
ines
5117a7d24d Fix whitespace 2017-10-25 15:54:02 +02:00
ines
657a4d91bc Merge branch 'develop' into feature/disable-pipes 2017-10-25 15:19:05 +02:00
ines
1a722dac31 Merge branch 'develop' into feature/disable-pipes 2017-10-25 15:18:18 +02:00
ines
6a00de4f77 Fix check of unexpected pipe names in restore() 2017-10-25 14:56:35 +02:00
ines
7f03932477 Return self on __enter__ 2017-10-25 14:56:16 +02:00
Matthew Honnibal
b5de768852 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-25 14:44:16 +02:00
Matthew Honnibal
094512fd47 Fix model-mark on regression test. 2017-10-25 14:44:00 +02:00
Matthew Honnibal
e70f80f29e Add Language.disable_pipes() 2017-10-25 13:46:41 +02:00
Matthew Honnibal
075e8118ea Update from develop 2017-10-25 12:45:21 +02:00
ines
72497c8cb2 Remove comments and add TODO 2017-10-25 12:15:43 +02:00
ines
4d97efc3b5 Add missing docstrings 2017-10-25 12:10:16 +02:00
ines
1262aa0bf9 Implement PhraseMatcher.__contains__ 2017-10-25 12:10:04 +02:00
ines
9c733a8849 Implement PhraseMatcher.__len__ 2017-10-25 12:09:56 +02:00
ines
7eebeeaf85 Fix Matcher.__contains__ 2017-10-25 12:09:47 +02:00
ines
7bcec57462 Remove unused attribute 2017-10-25 12:08:54 +02:00
ines
0b1dcbac14 Remove unused function 2017-10-25 12:08:46 +02:00
ines
3484174e48 Add Language.path 2017-10-25 11:57:43 +02:00
Ines Montani
d3bf488e16 Merge pull request #1171 from mollerhoj/support-danish
Improve basic support for Danish
2017-10-24 20:29:57 +02:00
Matthew Honnibal
d9bb1e5de8 Increment version 2017-10-24 17:06:19 +02:00
Matthew Honnibal
908809d488 Update tests 2017-10-24 17:05:15 +02:00
Matthew Honnibal
66766c1454 Restore SP tag to English tag_map, until models migrate 2017-10-24 17:05:00 +02:00
Matthew Honnibal
30e67fa808 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-24 16:08:23 +02:00
Matthew Honnibal
b0f6fd3f1d Disable tokenizer cache for special-cases. Fixes #1250 2017-10-24 16:08:05 +02:00
Matthew Honnibal
63f0bde749 Add test for #1250: Tokenizer cache clobbered special-case attrs 2017-10-24 16:07:18 +02:00
ines
8492d5be6d Always make lemmatizer return a list of lemmas, not a set 2017-10-24 16:00:56 +02:00
ines
95f866f99f Add lookup argument to Lemmatizer.load 2017-10-24 16:00:56 +02:00
ines
95f6174516 Remove tensorizer from model pipeline example in spacy package 2017-10-24 16:00:56 +02:00
ines
090aed940a Add test for currently failing span.as_doc case 2017-10-24 16:00:56 +02:00
ines
4ef81a9ebc Fix whitespace 2017-10-24 16:00:56 +02:00
Matthew Honnibal
18f1c1d0ba Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-24 14:29:43 +02:00
Matthew Honnibal
4bea65a1a8 Fix Issue #1450: Off-by-1 in * and ? matches
Patterns that end in variable-length operators e.g. * and ? now end on
the correct token. Previously, they were off by 1: the next token was
pulled into the match, even if that's where the pattern failed.
2017-10-24 14:26:27 +02:00