Commit Graph

3633 Commits

Author SHA1 Message Date
Matthew Honnibal
a1ec41298c Restore CFile loader 2017-08-18 20:46:16 +02:00
Matthew Honnibal
ed4fb991dc Work on vectors loading 2017-08-18 20:45:48 +02:00
Matthew Honnibal
426f84937f Resolve conflicts when merging new beam parsing stuff 2017-08-18 13:38:32 -05:00
Matthew Honnibal
5181e8bedb Fix merge conflict in _ml 2017-08-18 13:35:51 -05:00
Matthew Honnibal
f75420ae79 Unhack beam parsing, moving it under options instead of global flags 2017-08-18 13:31:15 -05:00
Jim Geovedi
7ae45bffcf Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-18 10:14:46 +07:00
Dan O'Huiginn
ebf5a3ce59 Allow loading with python < 3.6
Don't rely on recent python features to load models

Fixes Issue #1271
2017-08-17 15:15:47 +00:00
Matthew Honnibal
0209a06b4e Update beam parser 2017-08-16 18:25:49 -05:00
Matthew Honnibal
4b1e7bd6d8 Improve tensorizer model 2017-08-16 18:25:20 -05:00
Matthew Honnibal
a6d8d7c82e Add is_gold_parse method to transition system 2017-08-16 18:24:09 -05:00
Matthew Honnibal
3533bb61cb Add option of 8 feature parse state 2017-08-16 18:23:27 -05:00
Matthew Honnibal
1cb2f15d65 Clean up unused predict_confidences function 2017-08-16 18:22:26 -05:00
Matthew Honnibal
210f6d5175 Fix efficiency error in batch parse 2017-08-15 03:19:03 -05:00
Matthew Honnibal
23537a011d Tweaks to beam parser 2017-08-15 03:15:28 -05:00
Matthew Honnibal
500e92553d Fix memory error when copying scores in beam 2017-08-15 03:15:04 -05:00
Matthew Honnibal
a8e4064dd8 Fix tensor gradient in parser 2017-08-15 03:14:36 -05:00
Matthew Honnibal
e420e0366c Remove use of hash function in beam parser 2017-08-15 03:13:57 -05:00
Matthew Honnibal
6259490347 Fix mixture weights in fine_tune 2017-08-14 17:55:18 -05:00
Matthew Honnibal
335fa8b05c Fix gradient in fine_tune 2017-08-14 14:55:47 -05:00
Matthew Honnibal
d9f82f6b50 Increment version 2017-08-14 14:55:26 +02:00
ines
a29f132ffd Change python -m spacy to spacy
Reflects latest change to entry point or auto-alias
2017-08-14 13:04:48 +02:00
ines
65bf80302c Increment version 2017-08-14 13:04:30 +02:00
Matthew Honnibal
52c180ecf5 Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit ea8de11ad5, reversing
changes made to 08e443e083.
2017-08-14 13:00:23 +02:00
Matthew Honnibal
dbbfe595a5 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-14 12:09:28 +02:00
Matthew Honnibal
ac6c25f762 Check SGD is not None in update 2017-08-14 12:09:18 +02:00
Matthew Honnibal
0ae045256d Fix beam training 2017-08-13 18:02:05 -05:00
Matthew Honnibal
6a42cc16ff Fix beam parser, improve efficiency of non-beam 2017-08-13 12:37:26 +02:00
Matthew Honnibal
4363b4aa4a Fix redundant tokvecs updates during update 2017-08-13 12:36:55 +02:00
Matthew Honnibal
12de263813 Bug fixes to beam parsing. Learns small sample 2017-08-13 09:33:39 +02:00
Matthew Honnibal
4ae0d5e1e6 Set defaults for convert command 2017-08-13 09:03:38 +02:00
Matthew Honnibal
92ebab6073 Update beam-update tests 2017-08-13 08:56:02 +02:00
Matthew Honnibal
17874fe491 Disable beam parsing 2017-08-12 19:35:40 -05:00
Matthew Honnibal
69f21867b5 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-12 19:25:56 -05:00
Matthew Honnibal
3e30712b62 Improve defaults 2017-08-12 19:24:17 -05:00
Matthew Honnibal
28e930aae0 Fixes for beam parsing. Not working 2017-08-12 19:22:52 -05:00
Matthew Honnibal
c96d769836 Fix beam parse. Not sure if working 2017-08-12 18:21:54 -05:00
Matthew Honnibal
24b45b45c6 Add test for beam update 2017-08-12 17:15:28 -05:00
Matthew Honnibal
4638f4b869 Fix beam update 2017-08-12 17:15:16 -05:00
Matthew Honnibal
d4308d2363 Initialize State offset to 0 2017-08-12 17:14:39 -05:00
Matthew Honnibal
b353e4d843 Work on parser beam training 2017-08-12 14:47:45 -05:00
ines
d4f2baf7dd Add create_meta option to package command
Re-create meta.json in model directory, even if it exists. Especially
useful when updating existing spaCy models or training with Prodigy.
Ensures user won't end up with multiple "en_core_web_sm" models, and
offers easy way to change the model's name and settings without having
to edit the meta.json file.
2017-08-12 21:44:18 +02:00
Matthew Honnibal
4ab0c8c8e9 Try different drop_layer structure in Tok2Vec 2017-08-12 08:56:57 -05:00
Matthew Honnibal
cd5ecedf6a Try drop_layer in parser 2017-08-12 08:56:33 -05:00
Matthew Honnibal
8870d491f1 Remove redundant pickling during training 2017-08-12 08:55:53 -05:00
Matthew Honnibal
680043ebca Improve efficiency of tagger.set_annotations for GPU 2017-08-12 08:54:21 -05:00
Matthew Honnibal
ebe0f7f641 Pass embed size correctly in tagger, and cache embeddings for efficiency 2017-08-12 05:45:20 -05:00
Matthew Honnibal
1a59db1c86 Fix dropout and learn rate in parser 2017-08-12 05:44:39 -05:00
Matthew Honnibal
d01dc3704a Adjust parser model 2017-08-09 20:06:33 -05:00
Matthew Honnibal
f37528ef58 Pass embed size for parser fine-tune. Use SELU 2017-08-09 17:52:53 -05:00
Matthew Honnibal
f93f2bed58 Revert use of layer normalization in Tok2Vec 2017-08-09 17:47:03 -05:00
Matthew Honnibal
20944dd8aa Fix conflict in parser fine-tuning 2017-08-09 16:43:05 -05:00
Matthew Honnibal
ac2de6dced Switch to ReLu layers in Tok2Vec 2017-08-09 16:41:25 -05:00
Matthew Honnibal
bbace204be Gate parser fine-tuning behind feature flag 2017-08-09 16:40:42 -05:00
Matthew Honnibal
a59a1deac4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-09 16:23:19 -05:00
Matthew Honnibal
bcce6f7de0 Fix parser fine tuning 2017-08-09 16:23:12 -05:00
ines
28e2fec23b Fix autolinking failure on fresh model install (resolves #1138)
On fresh install via subprocess, pip.get_installed_distributions()
won't show new model, so is_package check in link command fails.
Solution for now is to get model package path explicitly and pass it to
link command.
2017-08-09 11:52:38 +02:00
Jim Geovedi
c62b49b7cc Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-09 09:17:46 +07:00
Matthew Honnibal
dbdd8afc4b Fix parser fine-tune training 2017-08-08 15:46:07 -05:00
Matthew Honnibal
88bf1cf87c Update parser for fine tuning 2017-08-08 15:34:17 -05:00
Matthew Honnibal
5d837c3776 Add mix weights on fine_tune 2017-08-07 06:32:59 -05:00
Matthew Honnibal
42bd26f6f3 Give parser its own tok2vec weights 2017-08-06 18:33:46 +02:00
Matthew Honnibal
3ed203de25 Use LayerNorm and SELU in Tok2Vec 2017-08-06 18:33:18 +02:00
Matthew Honnibal
78498a072d Return Transition for missing actions in lookup_action 2017-08-06 14:16:36 +02:00
Matthew Honnibal
4a5cc89138 Fix tagger 'fine_tune', to keep private CNN weights 2017-08-06 14:15:48 +02:00
Matthew Honnibal
3cb8f06881 Fix NeuralLabeller 2017-08-06 14:15:14 +02:00
Matthew Honnibal
0acce0521b Fix Language.update for pipeline 2017-08-06 14:13:03 +02:00
Matthew Honnibal
bfffdeabb2 Fix parser batch-size bug introduced during cleanup 2017-08-06 14:10:48 +02:00
Matthew Honnibal
0eec7c9e9b Fix Language.evaluate 2017-08-06 02:18:31 +02:00
Matthew Honnibal
0a566dc320 Add update_tensors flag to Language.update. Experimental, re #1182 2017-08-06 02:18:12 +02:00
Matthew Honnibal
cc19ea0e7c Add update_tensors flag to Language.update. Experimental, re #1182 2017-08-06 02:17:10 +02:00
Matthew Honnibal
4cfb7a54e7 Fix tagger 2017-08-06 01:53:31 +02:00
Matthew Honnibal
e9ab800e15 Fix tagging model 2017-08-06 01:50:08 +02:00
Matthew Honnibal
468c138ab3 WIP: Add fine-tuning logic to tagger model, re #1182 2017-08-06 01:13:23 +02:00
Matthew Honnibal
7f876a7a82 Clean up some unused code in parser 2017-08-06 00:00:21 +02:00
Matthew Honnibal
ae1ad81069 Increment version 2017-08-05 18:09:32 +02:00
Jim Geovedi
cc4772cac2 reworks 2017-08-03 13:08:38 +07:00
Jim Geovedi
37f19f5ed2 added more currencies based on corpus data 2017-08-03 13:03:25 +07:00
Jim Geovedi
30fd068d42 hashtag prefix should be handled somewhere else 2017-08-03 13:03:02 +07:00
Jim Geovedi
4705ae19ba Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-03 12:40:19 +07:00
Jim Geovedi
ba07e23c87 added USD in currency rules 2017-08-02 22:42:47 +07:00
Matthew Honnibal
5c323daa1a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-01 22:10:37 +02:00
Matthew Honnibal
2e00361522 Fix update when 0 docs 2017-08-01 22:10:17 +02:00
Matthew Honnibal
8fce187de4 Fix ArcEager for missing values 2017-08-01 22:10:05 +02:00
ines
78e262140f Add workaround for displaCy server on Python 2/3 (resolves #1227)
Make sure status and headers are bytes on Python 2 and strings on
Python 3
2017-08-01 01:11:35 +02:00
Jim Geovedi
2572a9ddf0 Merge remote-tracking branch 'upstream/develop' into indonesian 2017-07-30 21:24:16 +07:00
Jim Geovedi
bb08d696f9 added hashtag rule and fixed currency rules 2017-07-30 21:23:28 +07:00
Jim Geovedi
e9af79a803 added u-\d+ rules (sports team) 2017-07-30 21:23:01 +07:00
Matthew Honnibal
27abc56e98 Add method to get beam entities 2017-07-29 21:59:02 +02:00
Matthew Honnibal
ec63f4fe7b Add option to control how missing entities are handled when getting NER tags 2017-07-29 21:58:37 +02:00
Jim Geovedi
e5adc26c72 simplified rules 2017-07-29 18:21:32 +07:00
Jim Geovedi
783f7d8b86 added test set for Indonesian language 2017-07-29 18:21:07 +07:00
Jim Geovedi
4d04898dea updated regexp 2017-07-29 17:44:57 +07:00
Jim Geovedi
7d96d477ea updated like_num 2017-07-29 17:44:46 +07:00
Jim Geovedi
3cca4ed798 added lex attrs rules 2017-07-29 17:22:21 +07:00
Jim Geovedi
8b814c63f1 more exceptions 2017-07-27 19:46:30 +07:00
Jim Geovedi
6c725e8dcf updated lemma 2017-07-27 19:46:21 +07:00
Jim Geovedi
c194f7ae26 Merge remote-tracking branch 'upstream/develop' into indonesian 2017-07-27 10:55:34 +07:00
Jim Geovedi
547973b92a wip syntax iterators 2017-07-27 10:51:34 +07:00
Jim Geovedi
bbc75da38d enable syntax iterator and lemma lookup 2017-07-27 10:51:15 +07:00
Jim Geovedi
24a8c8bf28 added wip lemma dict 2017-07-26 21:39:54 +07:00