Commit Graph

3448 Commits

Author SHA1 Message Date
Matthew Honnibal
307d615c5f Fix serialization for tagger when tag_map has changed 2017-06-01 12:18:36 -05:00
Matthew Honnibal
1d18cedae8 Fiddle with msgpack bytes vs unicode 2017-06-01 10:48:43 -05:00
ines
7a2380f617 Rename "nn_tagger" to "tagger" 2017-06-01 17:37:53 +02:00
ines
e5ae6ccf4e Fix typo 2017-06-01 16:46:15 +02:00
ines
a3e4f91f4a Only load vocab if it exists 2017-06-01 14:38:35 +02:00
Matthew Honnibal
d310b0aab3 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-01 04:58:03 -05:00
Matthew Honnibal
3ff7d7fcef Merge for updated requirements 2017-06-01 04:57:47 -05:00
Matthew Honnibal
5eae3b9a1e Fix to/from disk in tagger 2017-06-01 04:55:49 -05:00
ines
d5c8d2f5fd Update about.py and increment version 2017-06-01 11:52:24 +02:00
Matthew Honnibal
4c97371051 Fixes for thinc 6.7 2017-06-01 04:22:16 -05:00
Matthew Honnibal
53d00a0371 Move weight serialization to Thinc 2017-06-01 03:04:36 -05:00
Matthew Honnibal
ae8010b526 Move weight serialization to Thinc 2017-06-01 02:56:12 -05:00
Gyorgy Orosz
f0c3b09242 More robust Hungarian tokenizer. 2017-05-31 22:28:40 +02:00
Matthew Honnibal
c8a58cfcf8 Fix Python2/3 load bug 2017-05-31 15:21:44 -05:00
Matthew Honnibal
99982684b0 Fix normalize_string_keys function' 2017-05-31 14:08:16 -05:00
Matthew Honnibal
67ade63fc4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-31 08:28:42 -05:00
Matthew Honnibal
490b38e6bb Fix reference to thinc copy_array util 2017-05-31 08:25:21 -05:00
Matthew Honnibal
9805e0e369 Fix vocab pickling 2017-05-31 08:25:01 -05:00
Matthew Honnibal
6c51cd77b4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-31 15:06:56 +02:00
Matthew Honnibal
8dfb9546f0 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-31 07:21:14 -05:00
Matthew Honnibal
480ef8bfc8 Add compat function to normalize dict keys 2017-05-31 07:14:29 -05:00
Matthew Honnibal
92f9e5cc9a Silence env_opt, and fix serialization for GPU 2017-05-31 07:14:11 -05:00
Matthew Honnibal
0561df2a9d Fix tokenizer serialization 2017-05-31 14:12:38 +02:00
Matthew Honnibal
4a398c15b7 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-31 13:44:16 +02:00
Matthew Honnibal
097ab9c6e4 Fix transition system to/from disk 2017-05-31 13:44:00 +02:00
Matthew Honnibal
b1469d3360 Fix string serialisation 2017-05-31 13:43:44 +02:00
Matthew Honnibal
e9419072e7 Fix tokenizer serialisation 2017-05-31 13:43:31 +02:00
Matthew Honnibal
33e5ec737f Fix to/from disk methods 2017-05-31 13:43:10 +02:00
ines
5e1c361270 Update tests README with info on model tests 2017-05-31 12:22:58 +02:00
Matthew Honnibal
fe28602f2e Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-31 11:43:56 +02:00
Matthew Honnibal
66af019d5d Fix serialization of tokenizer 2017-05-31 11:43:40 +02:00
Ines Montani
e6cf3c7e1c Merge pull request #1093 from oroszgy/hu_emoji_fix
Fixed emoji handling for Hungarian
2017-05-31 11:33:24 +02:00
Matthew Honnibal
e98eff275d Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-31 10:29:15 +02:00
Matthew Honnibal
53a3824334 Fix mistake in ner feature 2017-05-31 03:01:02 +02:00
Matthew Honnibal
8a693c2605 Write binary file during training 2017-05-31 02:59:18 +02:00
Matthew Honnibal
498ad85309 Try using tensor for vector/similarity methdos 2017-05-30 23:35:17 +02:00
Matthew Honnibal
a131981f3b Work on vectors 2017-05-30 23:34:50 +02:00
Matthew Honnibal
6937e311a4 Update doc tests 2017-05-30 23:34:23 +02:00
Matthew Honnibal
cc911feab2 Fix bug in NER state 2017-05-30 22:12:19 +02:00
Gyorgy Orosz
8c0b4b850e Fixed emoji handling for Hungarian 2017-05-30 21:34:46 +02:00
Matthew Honnibal
be4a640f0c Fix arc eager label costs for uint64 2017-05-30 20:37:58 +02:00
Matthew Honnibal
b127645afc Fix test_misc merge conflict 2017-05-29 18:31:44 -05:00
Matthew Honnibal
e0e8eae7c7 Tweak package test 2017-05-29 18:30:42 -05:00
Matthew Honnibal
11840ff5dd Store tag map before normalizing props 2017-05-29 17:53:48 -05:00
Matthew Honnibal
b92a89f87b Make it easier to reference embedding tables 2017-05-29 17:53:29 -05:00
Matthew Honnibal
293d1b425b Serialize in consistent order 2017-05-29 17:53:06 -05:00
Matthew Honnibal
9bf22a94aa Fix tag set serialisation 2017-05-29 17:52:36 -05:00
Matthew Honnibal
2a061e2777 Fix serialisation, for reals this time 2017-05-29 17:52:08 -05:00
ines
20a7003c0d Update model fixtures and reorganise tests 2017-05-29 22:14:31 +02:00
ines
795fe43a4d Add load_test_model function with importorskip()
Loads model only if it can be imported, i.e. if it's installed as a
package.
2017-05-29 22:11:31 +02:00
ines
ad3c8b3ad9 Fix formatting 2017-05-29 22:10:50 +02:00
ines
6e3937efc5 Check for arguments of model markers to specify models to test
Lets user set --models --en for only English models
2017-05-29 22:10:16 +02:00
Matthew Honnibal
35d981241f Fix model deserialization 2017-05-29 14:46:31 -05:00
Matthew Honnibal
5b29f227ae Fix serialization 2017-05-29 14:35:53 -05:00
Matthew Honnibal
1e6df0a2a1 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-29 14:30:12 -05:00
ines
08382f21e3 Pass model meta to nlp object in load_model 2017-05-29 20:44:11 +02:00
ines
6145fe6a93 Catch all kwargs on Language 2017-05-29 20:43:48 +02:00
ines
0d7d50fe22 Add __version__ to __init__.py 2017-05-29 20:43:24 +02:00
Matthew Honnibal
6522ea6c8b More serialization fixes. Still broken 2017-05-29 13:23:47 -05:00
Matthew Honnibal
9c9ee24411 Fix broken lambda scoping in Python 2 2017-05-29 13:23:28 -05:00
Matthew Honnibal
f1acdaab55 Fix serialization of weight offsets 2017-05-29 13:23:11 -05:00
Matthew Honnibal
c044e9c21c Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-29 08:41:02 -05:00
Matthew Honnibal
aa4c33914b Work on serialization 2017-05-29 08:40:45 -05:00
ines
9e83a17e95 Use new model templates 2017-05-29 15:27:24 +02:00
ines
567485a818 Fix and document model loading with pipeline and overrides 2017-05-29 14:10:10 +02:00
Matthew Honnibal
deac7eb01c Fix for serialization 2017-05-29 13:54:18 +02:00
Matthew Honnibal
04c32aa091 Fix for serialization 2017-05-29 13:53:32 +02:00
Matthew Honnibal
a1960c2d09 Fix for serialization 2017-05-29 13:47:42 +02:00
Matthew Honnibal
7b06bb896e Fix for serialization 2017-05-29 13:42:55 +02:00
Matthew Honnibal
74235587ef Fix to serialization 2017-05-29 13:40:31 +02:00
Matthew Honnibal
59f355d525 Fixes for serialization 2017-05-29 13:38:20 +02:00
Matthew Honnibal
920887f4e4 Specify order of vocab deserialization 2017-05-29 13:04:40 +02:00
Matthew Honnibal
f4aafca222 Merge changes to test_misc 2017-05-29 12:26:02 +02:00
Matthew Honnibal
a318f0cae1 Add to/from disk/bytes methods for tokenizer 2017-05-29 12:24:41 +02:00
Matthew Honnibal
ff26aa6c37 Work on to/from bytes/disk serialization methods 2017-05-29 11:45:45 +02:00
ines
df920ba0e7 Add tests for displaCy and util functions and fix util typo 2017-05-29 10:51:19 +02:00
ines
c5714d4fb2 xfail matcher test for now until setting norm via Span.merge works 2017-05-29 10:51:02 +02:00
Matthew Honnibal
6b019b0540 Update to/from bytes methods 2017-05-29 10:14:20 +02:00
Matthew Honnibal
c91b121aeb Move serialization functions to util 2017-05-29 10:13:42 +02:00
Matthew Honnibal
1fa2bfb600 Add model_to_bytes and model_from_bytes helpers. Probably belong in thinc. 2017-05-29 09:27:04 +02:00
Matthew Honnibal
6dad4117ad Work on serialization for models 2017-05-29 01:37:57 +02:00
ines
7b1ddcc04d Add test for vocab serialization 2017-05-29 01:09:52 +02:00
ines
00b2094dc3 Fix typos, long integers and tests 2017-05-29 01:09:52 +02:00
ines
804dbb8d25 Add StringStore test for API docs 2017-05-29 01:09:52 +02:00
Matthew Honnibal
6cd5730ee7 Fix lex struct setters for strings 2017-05-29 01:05:09 +02:00
Matthew Honnibal
2edd96ce47 Draft Vocab to/from disk/bytes 2017-05-28 23:34:12 +02:00
Matthew Honnibal
4ddff020c3 Fix compile error 2017-05-28 23:30:40 +02:00
Matthew Honnibal
6d3caeadd2 Fix type check for long 2017-05-28 23:22:45 +02:00
Matthew Honnibal
92dbf28c1e Hack a fixture in the vectors tests, for xfail 2017-05-28 20:28:32 +02:00
Matthew Honnibal
9239f06ed3 Fix german noun chunks iterator 2017-05-28 20:13:03 +02:00
Matthew Honnibal
fd9b6722a9 Fix noun chunks iterator for new stringstore 2017-05-28 20:12:10 +02:00
ines
414193e9ba Update docs to reflect StringStore changes 2017-05-28 18:19:11 +02:00
Matthew Honnibal
7996d21717 Fixes for new StringStore 2017-05-28 11:09:27 -05:00
Matthew Honnibal
8a24c60c1e Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-28 08:12:05 -05:00
Matthew Honnibal
bc97bc292c Fix __call__ method 2017-05-28 08:11:58 -05:00
Matthew Honnibal
5cf47b847b Handle iob with no tag in converter 2017-05-28 08:11:39 -05:00
Matthew Honnibal
fe11564b8e Finish stringstore change. Also xfail vectors tests 2017-05-28 15:10:22 +02:00
Matthew Honnibal
b007a2b0d3 Update stringstore tests 2017-05-28 14:08:09 +02:00
Matthew Honnibal
84e66ca6d4 WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
Matthew Honnibal
fe4a746300 Accomodate symbols in new string scheme 2017-05-28 13:03:16 +02:00
Matthew Honnibal
f51e6a6c16 Adjust lexeme sizing for attr_t being 64 bit 2017-05-28 12:51:09 +02:00
Matthew Honnibal
a5606c3eda Work on changing StringStore to return hashes. 2017-05-28 12:36:27 +02:00
Matthew Honnibal
39293ab2ee Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-28 11:46:57 +02:00
Matthew Honnibal
dd052572d4 Update arc eager for SBD changes 2017-05-28 11:46:51 +02:00
Matthew Honnibal
3ea98e2043 Remove vector member from lexeme 2017-05-28 11:46:24 +02:00
Matthew Honnibal
2445707f3c Re-delegate vectors to vocab 2017-05-28 11:46:10 +02:00
Matthew Honnibal
6863d01361 Remove vectors from lexeme 2017-05-28 11:45:48 +02:00
Matthew Honnibal
15f6efc127 Remove vectors from vocab 2017-05-28 11:45:32 +02:00
Matthew Honnibal
c1263a844b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-27 18:32:57 -05:00
Matthew Honnibal
9e711c3476 Divide d_loss by batch size 2017-05-27 18:32:46 -05:00
Matthew Honnibal
b082f76494 Randomize pipeline order during training 2017-05-27 18:32:21 -05:00
Matthew Honnibal
a1d4c97fb7 Improve correctness of minibatching 2017-05-27 17:59:00 -05:00
ines
84189c1cab Add 'xx' language ID for multi-language support
Allows models to specify their language ID as 'xx'.
2017-05-28 00:58:59 +02:00
ines
33e332e67c Remove unused export 2017-05-28 00:57:59 +02:00
ines
c1983621fb Update util functions for model loading 2017-05-28 00:22:40 +02:00
ines
c8543c8237 Fix formatting and docstrings and remove deprecated function 2017-05-28 00:22:40 +02:00
Matthew Honnibal
49235017bf Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-27 16:34:28 -05:00
Matthew Honnibal
7ebd26b8aa Use ordered dict to specify transitions 2017-05-27 15:52:20 -05:00
Matthew Honnibal
3eea5383a1 Add move_names property to parser 2017-05-27 15:51:55 -05:00
Matthew Honnibal
8de9829f09 Don't overwrite model in initialization, when loading 2017-05-27 15:50:40 -05:00
Matthew Honnibal
99316fa631 Use ordered dict to specify actions 2017-05-27 15:50:21 -05:00
Matthew Honnibal
655ca58c16 Clarifying change to StateC.clone 2017-05-27 15:49:37 -05:00
Matthew Honnibal
5e4312feed Evaluate loaded class, to ensure save/load works 2017-05-27 15:47:02 -05:00
Matthew Honnibal
34bbad8e0e Add __reduce__ methods on parser subclasses. Fixes pickling. 2017-05-27 15:46:06 -05:00
Matthew Honnibal
7cc9c3e9a6 Fix convert CLI 2017-05-27 15:44:42 -05:00
ines
1203959625 Add pipeline setting to meta.json generator 2017-05-27 20:02:01 +02:00
ines
086a06e7d7 Fix CLI docstrings and add command as first argument
Workaround for Plac
2017-05-27 20:01:46 +02:00
ines
a8e58e04ef Add symbols class to punctuation rules to handle emoji (see #1088)
Currently doesn't work for Hungarian, because of conflicts with the
custom punctuation rules. Also doesn't take multi-character emoji like
👩🏽‍💻 into account.
2017-05-27 17:57:10 +02:00
Matthew Honnibal
dc07d72d80 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-27 08:20:40 -05:00
Matthew Honnibal
de13fe0305 Remove length cap on sentences 2017-05-27 08:20:32 -05:00
Matthew Honnibal
73a643d32a Don't randomise pipeline for training, and don't update if no gradient 2017-05-27 08:20:13 -05:00
Matthew Honnibal
3d22fcaf0b Return None from parser if there are no annotations 2017-05-26 14:02:59 -05:00
Matthew Honnibal
d06f235fc9 Fix conflict on convert.py 2017-05-26 11:33:29 -05:00
Matthew Honnibal
2e587c6417 Export iob_to_biluo utility 2017-05-26 11:32:55 -05:00
Matthew Honnibal
2b3b937a04 Fix converter CLI 2017-05-26 11:32:41 -05:00
Matthew Honnibal
5a87bcf35f Fix converters 2017-05-26 11:32:34 -05:00
Matthew Honnibal
8af3100143 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-26 11:31:41 -05:00
Matthew Honnibal
3d5a536eaa Improve efficiency of parser batching 2017-05-26 11:31:23 -05:00
Matthew Honnibal
daac3e3573 Always shuffle gold data, and support length cap 2017-05-26 11:30:52 -05:00
Matthew Honnibal
d65f99a720 Improve model saving in train script 2017-05-26 05:52:09 -05:00
ines
51882c4984 Fix formatting 2017-05-26 12:37:45 +02:00
ines
353f0ef8d7 Use disable argument (list) for serialization 2017-05-26 12:33:54 +02:00
Matthew Honnibal
22d7b448a5 Fix convert command 2017-05-25 19:47:12 -05:00
Matthew Honnibal
dbf2a4cf57 Update all models on each epoch 2017-05-25 19:46:56 -05:00
Matthew Honnibal
faff1c23fb Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-25 17:16:10 -05:00
Matthew Honnibal
82b11b0320 Remove print statement 2017-05-25 17:15:59 -05:00
Matthew Honnibal
80cf42e33b Fix compounding and decaying utils 2017-05-25 17:15:39 -05:00
Matthew Honnibal
df8015f05d Tweaks to train script 2017-05-25 17:15:24 -05:00
Matthew Honnibal
3a6e59cc53 Add minibatch function in spacy.gold 2017-05-25 17:15:09 -05:00
Matthew Honnibal
702fe74a4d Clean up spacy.cli.train 2017-05-25 16:16:30 -05:00
Matthew Honnibal
b9cea9cd93 Add compounding and decaying functions 2017-05-25 16:16:10 -05:00
Matthew Honnibal
2cb7cc2db7 Remove commented code from parser 2017-05-25 14:55:09 -05:00
Matthew Honnibal
f403c2cd5f Add env opts for optimizer 2017-05-25 11:19:26 -05:00
Matthew Honnibal
c245ff6b27 Rebatch parser inputs, with mid-sentence states 2017-05-25 11:18:59 -05:00
Matthew Honnibal
679efe79c8 Make parser update less hacky 2017-05-25 06:49:00 -05:00
Matthew Honnibal
8500d9b1da Only train one task per iter, holding grads 2017-05-25 06:47:42 -05:00
Matthew Honnibal
b27c587800 Fix pieces argument to PrecomputedMaxout 2017-05-25 06:46:59 -05:00
Matthew Honnibal
e1cb5be0c7 Adjust dropout, depth and multi-task in parser 2017-05-24 20:11:41 -05:00
Matthew Honnibal
e6cc927ab1 Rearrange multi-task learning 2017-05-24 20:10:54 -05:00
Matthew Honnibal
135a13790c Disable gold preprocessing 2017-05-24 20:10:20 -05:00
Matthew Honnibal
467bbeadb8 Add hidden layers for tagger 2017-05-24 20:09:51 -05:00
ines
66088851dc Add Doc.to_disk() and Doc.from_disk() methods 2017-05-24 11:58:17 +02:00
Matthew Honnibal
620df0414f Fix dropout in parser 2017-05-23 15:20:45 -05:00
Matthew Honnibal
5b67bcbee0 Increase default embed size to 7500 2017-05-23 15:20:16 -05:00
Matthew Honnibal
48eef94f92 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-23 18:47:32 +02:00
Matthew Honnibal
d44b1eafc4 Fix conflict artefacts 2017-05-23 18:47:11 +02:00
Matthew Honnibal
01e59e4e6e * Add Token.sent_start property, re Issue #235 2017-05-23 18:41:11 +02:00
Matthew Honnibal
4917cbb484 Include sent_start test 2017-05-23 18:40:37 +02:00
Matthew Honnibal
d68dd1f251 Add SENT_START attribute, for custom sentence boundary detection 2017-05-23 18:37:58 +02:00
Matthew Honnibal
8026c183d0 Add hacky logic to accelerate depth=0 case in parser 2017-05-23 11:06:49 -05:00
Matthew Honnibal
e7d3159d91 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-23 05:58:17 -05:00
Matthew Honnibal
a8b6d11c5b Support optional maxout layer 2017-05-23 05:58:07 -05:00
Matthew Honnibal
c55b8fa7c5 Fix bugs in parse_batch 2017-05-23 05:57:52 -05:00
ines
fb0ff0272f xfail neural parser tests for now and remove test for deprecated method 2017-05-23 12:40:37 +02:00
Matthew Honnibal
964707d795 Restore support for deeper networks in parser 2017-05-23 05:31:13 -05:00
Matthew Honnibal
e27262f431 Go back to previous matcher signature, with on_match positional 2017-05-23 04:37:40 -05:00
Matthew Honnibal
5418bcf5d7 Resolve conflict on test 2017-05-23 04:37:16 -05:00
ines
e6acd3bbf2 Fix matcher tests and matcher docs 2017-05-23 11:36:02 +02:00
ines
d0c6d4f76d Fix formatting 2017-05-23 11:32:00 +02:00
Matthew Honnibal
f0bcc0bd8d Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-23 04:29:28 -05:00
Matthew Honnibal
9adfe9e8fc Don't hold gradient updates in language -- let the parser decide how to batch the updates. 2017-05-23 04:29:10 -05:00
Matthew Honnibal
6b918cc58e Support making updates periodically during training 2017-05-23 04:23:29 -05:00
Matthew Honnibal
3f725ff7b3 Roll back changes to parser update 2017-05-23 04:23:05 -05:00
Matthew Honnibal
3959d778ac Revert "Revert "WIP on improving parser efficiency""
This reverts commit 532afef4a8.
2017-05-23 03:06:53 -05:00
Matthew Honnibal
532afef4a8 Revert "WIP on improving parser efficiency"
This reverts commit bdaac7ab44.
2017-05-23 03:05:25 -05:00
Matthew Honnibal
bdaac7ab44 WIP on improving parser efficiency 2017-05-23 02:59:31 -05:00
Matthew Honnibal
8a9e318deb Put the parsing loop in a nogil prange block 2017-05-22 17:58:12 -05:00
ines
a23f487b06 Tidy up displaCy and add "manual" option
Also don't require title in EntityRenderer
2017-05-22 18:48:20 +02:00
Matthew Honnibal
0264447c4d Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-22 10:41:56 -05:00
Matthew Honnibal
6e8dce2c05 Fix train command line args 2017-05-22 10:41:39 -05:00
Matthew Honnibal
a7ee63c0ac Fix labeller loss for unseen labels 2017-05-22 10:41:20 -05:00
Matthew Honnibal
c9760b2104 Support sentence limits in GoldCorpus 2017-05-22 10:40:46 -05:00
Matthew Honnibal
e2136232f9 Exclude states with no matching gold annotations from parsing 2017-05-22 10:30:12 -05:00
Matthew Honnibal
83ffd16474 Fix offset calculation for other negative values 2017-05-22 08:00:53 -05:00
ines
b3c7ee0148 Fix tests and use the new Matcher API 2017-05-22 13:54:20 +02:00
Matthew Honnibal
f00f821496 Fix pseudoprojectivity->nonproj 2017-05-22 06:14:42 -05:00
Matthew Honnibal
ae8cf70dc1 Fix CLI train signature 2017-05-22 06:13:39 -05:00
Matthew Honnibal
187f370734 Update tests for matcher changes 2017-05-22 12:59:50 +02:00
Matthew Honnibal
5d59e74cf6 PseudoProjectivity->nonproj 2017-05-22 05:49:53 -05:00
Matthew Honnibal
7e2cdc0c81 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-22 12:39:34 +02:00