Matthew Honnibal
53a3824334
Fix mistake in ner feature
2017-05-31 03:01:02 +02:00
Matthew Honnibal
8a693c2605
Write binary file during training
2017-05-31 02:59:18 +02:00
Matthew Honnibal
498ad85309
Try using tensor for vector/similarity methdos
2017-05-30 23:35:17 +02:00
Matthew Honnibal
a131981f3b
Work on vectors
2017-05-30 23:34:50 +02:00
Matthew Honnibal
6937e311a4
Update doc tests
2017-05-30 23:34:23 +02:00
Matthew Honnibal
cc911feab2
Fix bug in NER state
2017-05-30 22:12:19 +02:00
Gyorgy Orosz
8c0b4b850e
Fixed emoji handling for Hungarian
2017-05-30 21:34:46 +02:00
Matthew Honnibal
be4a640f0c
Fix arc eager label costs for uint64
2017-05-30 20:37:58 +02:00
Matthew Honnibal
b127645afc
Fix test_misc merge conflict
2017-05-29 18:31:44 -05:00
Matthew Honnibal
e0e8eae7c7
Tweak package test
2017-05-29 18:30:42 -05:00
Matthew Honnibal
11840ff5dd
Store tag map before normalizing props
2017-05-29 17:53:48 -05:00
Matthew Honnibal
b92a89f87b
Make it easier to reference embedding tables
2017-05-29 17:53:29 -05:00
Matthew Honnibal
293d1b425b
Serialize in consistent order
2017-05-29 17:53:06 -05:00
Matthew Honnibal
9bf22a94aa
Fix tag set serialisation
2017-05-29 17:52:36 -05:00
Matthew Honnibal
2a061e2777
Fix serialisation, for reals this time
2017-05-29 17:52:08 -05:00
ines
20a7003c0d
Update model fixtures and reorganise tests
2017-05-29 22:14:31 +02:00
ines
795fe43a4d
Add load_test_model function with importorskip()
...
Loads model only if it can be imported, i.e. if it's installed as a
package.
2017-05-29 22:11:31 +02:00
ines
ad3c8b3ad9
Fix formatting
2017-05-29 22:10:50 +02:00
ines
6e3937efc5
Check for arguments of model markers to specify models to test
...
Lets user set --models --en for only English models
2017-05-29 22:10:16 +02:00
Matthew Honnibal
35d981241f
Fix model deserialization
2017-05-29 14:46:31 -05:00
Matthew Honnibal
5b29f227ae
Fix serialization
2017-05-29 14:35:53 -05:00
Matthew Honnibal
1e6df0a2a1
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-29 14:30:12 -05:00
ines
08382f21e3
Pass model meta to nlp object in load_model
2017-05-29 20:44:11 +02:00
ines
6145fe6a93
Catch all kwargs on Language
2017-05-29 20:43:48 +02:00
ines
0d7d50fe22
Add __version__ to __init__.py
2017-05-29 20:43:24 +02:00
Matthew Honnibal
6522ea6c8b
More serialization fixes. Still broken
2017-05-29 13:23:47 -05:00
Matthew Honnibal
9c9ee24411
Fix broken lambda scoping in Python 2
2017-05-29 13:23:28 -05:00
Matthew Honnibal
f1acdaab55
Fix serialization of weight offsets
2017-05-29 13:23:11 -05:00
Matthew Honnibal
c044e9c21c
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-29 08:41:02 -05:00
Matthew Honnibal
aa4c33914b
Work on serialization
2017-05-29 08:40:45 -05:00
ines
9e83a17e95
Use new model templates
2017-05-29 15:27:24 +02:00
ines
567485a818
Fix and document model loading with pipeline and overrides
2017-05-29 14:10:10 +02:00
Matthew Honnibal
deac7eb01c
Fix for serialization
2017-05-29 13:54:18 +02:00
Matthew Honnibal
04c32aa091
Fix for serialization
2017-05-29 13:53:32 +02:00
Matthew Honnibal
a1960c2d09
Fix for serialization
2017-05-29 13:47:42 +02:00
Matthew Honnibal
7b06bb896e
Fix for serialization
2017-05-29 13:42:55 +02:00
Matthew Honnibal
74235587ef
Fix to serialization
2017-05-29 13:40:31 +02:00
Matthew Honnibal
59f355d525
Fixes for serialization
2017-05-29 13:38:20 +02:00
Matthew Honnibal
920887f4e4
Specify order of vocab deserialization
2017-05-29 13:04:40 +02:00
Matthew Honnibal
f4aafca222
Merge changes to test_misc
2017-05-29 12:26:02 +02:00
Matthew Honnibal
a318f0cae1
Add to/from disk/bytes methods for tokenizer
2017-05-29 12:24:41 +02:00
Matthew Honnibal
ff26aa6c37
Work on to/from bytes/disk serialization methods
2017-05-29 11:45:45 +02:00
ines
df920ba0e7
Add tests for displaCy and util functions and fix util typo
2017-05-29 10:51:19 +02:00
ines
c5714d4fb2
xfail matcher test for now until setting norm via Span.merge works
2017-05-29 10:51:02 +02:00
Matthew Honnibal
6b019b0540
Update to/from bytes methods
2017-05-29 10:14:20 +02:00
Matthew Honnibal
c91b121aeb
Move serialization functions to util
2017-05-29 10:13:42 +02:00
Matthew Honnibal
1fa2bfb600
Add model_to_bytes and model_from_bytes helpers. Probably belong in thinc.
2017-05-29 09:27:04 +02:00
Matthew Honnibal
6dad4117ad
Work on serialization for models
2017-05-29 01:37:57 +02:00
ines
7b1ddcc04d
Add test for vocab serialization
2017-05-29 01:09:52 +02:00
ines
00b2094dc3
Fix typos, long integers and tests
2017-05-29 01:09:52 +02:00
ines
804dbb8d25
Add StringStore test for API docs
2017-05-29 01:09:52 +02:00
Matthew Honnibal
6cd5730ee7
Fix lex struct setters for strings
2017-05-29 01:05:09 +02:00
Matthew Honnibal
2edd96ce47
Draft Vocab to/from disk/bytes
2017-05-28 23:34:12 +02:00
Matthew Honnibal
4ddff020c3
Fix compile error
2017-05-28 23:30:40 +02:00
Matthew Honnibal
6d3caeadd2
Fix type check for long
2017-05-28 23:22:45 +02:00
Matthew Honnibal
92dbf28c1e
Hack a fixture in the vectors tests, for xfail
2017-05-28 20:28:32 +02:00
Matthew Honnibal
9239f06ed3
Fix german noun chunks iterator
2017-05-28 20:13:03 +02:00
Matthew Honnibal
fd9b6722a9
Fix noun chunks iterator for new stringstore
2017-05-28 20:12:10 +02:00
ines
414193e9ba
Update docs to reflect StringStore changes
2017-05-28 18:19:11 +02:00
Matthew Honnibal
7996d21717
Fixes for new StringStore
2017-05-28 11:09:27 -05:00
Matthew Honnibal
8a24c60c1e
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-28 08:12:05 -05:00
Matthew Honnibal
bc97bc292c
Fix __call__ method
2017-05-28 08:11:58 -05:00
Matthew Honnibal
5cf47b847b
Handle iob with no tag in converter
2017-05-28 08:11:39 -05:00
Matthew Honnibal
fe11564b8e
Finish stringstore change. Also xfail vectors tests
2017-05-28 15:10:22 +02:00
Matthew Honnibal
b007a2b0d3
Update stringstore tests
2017-05-28 14:08:09 +02:00
Matthew Honnibal
84e66ca6d4
WIP on stringstore change. 27 failures
2017-05-28 14:06:40 +02:00
Matthew Honnibal
fe4a746300
Accomodate symbols in new string scheme
2017-05-28 13:03:16 +02:00
Matthew Honnibal
f51e6a6c16
Adjust lexeme sizing for attr_t being 64 bit
2017-05-28 12:51:09 +02:00
Matthew Honnibal
a5606c3eda
Work on changing StringStore to return hashes.
2017-05-28 12:36:27 +02:00
Matthew Honnibal
39293ab2ee
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-28 11:46:57 +02:00
Matthew Honnibal
dd052572d4
Update arc eager for SBD changes
2017-05-28 11:46:51 +02:00
Matthew Honnibal
3ea98e2043
Remove vector member from lexeme
2017-05-28 11:46:24 +02:00
Matthew Honnibal
2445707f3c
Re-delegate vectors to vocab
2017-05-28 11:46:10 +02:00
Matthew Honnibal
6863d01361
Remove vectors from lexeme
2017-05-28 11:45:48 +02:00
Matthew Honnibal
15f6efc127
Remove vectors from vocab
2017-05-28 11:45:32 +02:00
Matthew Honnibal
c1263a844b
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-27 18:32:57 -05:00
Matthew Honnibal
9e711c3476
Divide d_loss by batch size
2017-05-27 18:32:46 -05:00
Matthew Honnibal
b082f76494
Randomize pipeline order during training
2017-05-27 18:32:21 -05:00
Matthew Honnibal
a1d4c97fb7
Improve correctness of minibatching
2017-05-27 17:59:00 -05:00
ines
84189c1cab
Add 'xx' language ID for multi-language support
...
Allows models to specify their language ID as 'xx'.
2017-05-28 00:58:59 +02:00
ines
33e332e67c
Remove unused export
2017-05-28 00:57:59 +02:00
ines
c1983621fb
Update util functions for model loading
2017-05-28 00:22:40 +02:00
ines
c8543c8237
Fix formatting and docstrings and remove deprecated function
2017-05-28 00:22:40 +02:00
Matthew Honnibal
49235017bf
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-27 16:34:28 -05:00
Matthew Honnibal
7ebd26b8aa
Use ordered dict to specify transitions
2017-05-27 15:52:20 -05:00
Matthew Honnibal
3eea5383a1
Add move_names property to parser
2017-05-27 15:51:55 -05:00
Matthew Honnibal
8de9829f09
Don't overwrite model in initialization, when loading
2017-05-27 15:50:40 -05:00
Matthew Honnibal
99316fa631
Use ordered dict to specify actions
2017-05-27 15:50:21 -05:00
Matthew Honnibal
655ca58c16
Clarifying change to StateC.clone
2017-05-27 15:49:37 -05:00
Matthew Honnibal
5e4312feed
Evaluate loaded class, to ensure save/load works
2017-05-27 15:47:02 -05:00
Matthew Honnibal
34bbad8e0e
Add __reduce__ methods on parser subclasses. Fixes pickling.
2017-05-27 15:46:06 -05:00
Matthew Honnibal
7cc9c3e9a6
Fix convert CLI
2017-05-27 15:44:42 -05:00
ines
1203959625
Add pipeline setting to meta.json generator
2017-05-27 20:02:01 +02:00
ines
086a06e7d7
Fix CLI docstrings and add command as first argument
...
Workaround for Plac
2017-05-27 20:01:46 +02:00
ines
a8e58e04ef
Add symbols class to punctuation rules to handle emoji (see #1088 )
...
Currently doesn't work for Hungarian, because of conflicts with the
custom punctuation rules. Also doesn't take multi-character emoji like
👩🏽💻 into account.
2017-05-27 17:57:10 +02:00
Matthew Honnibal
dc07d72d80
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-27 08:20:40 -05:00
Matthew Honnibal
de13fe0305
Remove length cap on sentences
2017-05-27 08:20:32 -05:00
Matthew Honnibal
73a643d32a
Don't randomise pipeline for training, and don't update if no gradient
2017-05-27 08:20:13 -05:00
Matthew Honnibal
3d22fcaf0b
Return None from parser if there are no annotations
2017-05-26 14:02:59 -05:00
Matthew Honnibal
d06f235fc9
Fix conflict on convert.py
2017-05-26 11:33:29 -05:00
Matthew Honnibal
2e587c6417
Export iob_to_biluo utility
2017-05-26 11:32:55 -05:00
Matthew Honnibal
2b3b937a04
Fix converter CLI
2017-05-26 11:32:41 -05:00
Matthew Honnibal
5a87bcf35f
Fix converters
2017-05-26 11:32:34 -05:00
Matthew Honnibal
8af3100143
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-26 11:31:41 -05:00
Matthew Honnibal
3d5a536eaa
Improve efficiency of parser batching
2017-05-26 11:31:23 -05:00
Matthew Honnibal
daac3e3573
Always shuffle gold data, and support length cap
2017-05-26 11:30:52 -05:00
Matthew Honnibal
d65f99a720
Improve model saving in train script
2017-05-26 05:52:09 -05:00
ines
51882c4984
Fix formatting
2017-05-26 12:37:45 +02:00
ines
353f0ef8d7
Use disable argument (list) for serialization
2017-05-26 12:33:54 +02:00
Matthew Honnibal
22d7b448a5
Fix convert command
2017-05-25 19:47:12 -05:00
Matthew Honnibal
dbf2a4cf57
Update all models on each epoch
2017-05-25 19:46:56 -05:00
Matthew Honnibal
faff1c23fb
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-25 17:16:10 -05:00
Matthew Honnibal
82b11b0320
Remove print statement
2017-05-25 17:15:59 -05:00
Matthew Honnibal
80cf42e33b
Fix compounding and decaying utils
2017-05-25 17:15:39 -05:00
Matthew Honnibal
df8015f05d
Tweaks to train script
2017-05-25 17:15:24 -05:00
Matthew Honnibal
3a6e59cc53
Add minibatch function in spacy.gold
2017-05-25 17:15:09 -05:00
Matthew Honnibal
702fe74a4d
Clean up spacy.cli.train
2017-05-25 16:16:30 -05:00
Matthew Honnibal
b9cea9cd93
Add compounding and decaying functions
2017-05-25 16:16:10 -05:00
Matthew Honnibal
2cb7cc2db7
Remove commented code from parser
2017-05-25 14:55:09 -05:00
Matthew Honnibal
f403c2cd5f
Add env opts for optimizer
2017-05-25 11:19:26 -05:00
Matthew Honnibal
c245ff6b27
Rebatch parser inputs, with mid-sentence states
2017-05-25 11:18:59 -05:00
Matthew Honnibal
679efe79c8
Make parser update less hacky
2017-05-25 06:49:00 -05:00
Matthew Honnibal
8500d9b1da
Only train one task per iter, holding grads
2017-05-25 06:47:42 -05:00
Matthew Honnibal
b27c587800
Fix pieces argument to PrecomputedMaxout
2017-05-25 06:46:59 -05:00
Matthew Honnibal
e1cb5be0c7
Adjust dropout, depth and multi-task in parser
2017-05-24 20:11:41 -05:00
Matthew Honnibal
e6cc927ab1
Rearrange multi-task learning
2017-05-24 20:10:54 -05:00
Matthew Honnibal
135a13790c
Disable gold preprocessing
2017-05-24 20:10:20 -05:00
Matthew Honnibal
467bbeadb8
Add hidden layers for tagger
2017-05-24 20:09:51 -05:00
ines
66088851dc
Add Doc.to_disk() and Doc.from_disk() methods
2017-05-24 11:58:17 +02:00
Matthew Honnibal
620df0414f
Fix dropout in parser
2017-05-23 15:20:45 -05:00
Matthew Honnibal
5b67bcbee0
Increase default embed size to 7500
2017-05-23 15:20:16 -05:00
Matthew Honnibal
48eef94f92
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-23 18:47:32 +02:00
Matthew Honnibal
d44b1eafc4
Fix conflict artefacts
2017-05-23 18:47:11 +02:00
Matthew Honnibal
01e59e4e6e
* Add Token.sent_start property, re Issue #235
2017-05-23 18:41:11 +02:00
Matthew Honnibal
4917cbb484
Include sent_start test
2017-05-23 18:40:37 +02:00
Matthew Honnibal
d68dd1f251
Add SENT_START attribute, for custom sentence boundary detection
2017-05-23 18:37:58 +02:00
Matthew Honnibal
8026c183d0
Add hacky logic to accelerate depth=0 case in parser
2017-05-23 11:06:49 -05:00
Matthew Honnibal
e7d3159d91
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-23 05:58:17 -05:00
Matthew Honnibal
a8b6d11c5b
Support optional maxout layer
2017-05-23 05:58:07 -05:00
Matthew Honnibal
c55b8fa7c5
Fix bugs in parse_batch
2017-05-23 05:57:52 -05:00
ines
fb0ff0272f
xfail neural parser tests for now and remove test for deprecated method
2017-05-23 12:40:37 +02:00
Matthew Honnibal
964707d795
Restore support for deeper networks in parser
2017-05-23 05:31:13 -05:00
Matthew Honnibal
e27262f431
Go back to previous matcher signature, with on_match positional
2017-05-23 04:37:40 -05:00
Matthew Honnibal
5418bcf5d7
Resolve conflict on test
2017-05-23 04:37:16 -05:00
ines
e6acd3bbf2
Fix matcher tests and matcher docs
2017-05-23 11:36:02 +02:00
ines
d0c6d4f76d
Fix formatting
2017-05-23 11:32:00 +02:00
Matthew Honnibal
f0bcc0bd8d
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-23 04:29:28 -05:00
Matthew Honnibal
9adfe9e8fc
Don't hold gradient updates in language -- let the parser decide how to batch the updates.
2017-05-23 04:29:10 -05:00
Matthew Honnibal
6b918cc58e
Support making updates periodically during training
2017-05-23 04:23:29 -05:00
Matthew Honnibal
3f725ff7b3
Roll back changes to parser update
2017-05-23 04:23:05 -05:00
Matthew Honnibal
3959d778ac
Revert "Revert "WIP on improving parser efficiency""
...
This reverts commit 532afef4a8
.
2017-05-23 03:06:53 -05:00
Matthew Honnibal
532afef4a8
Revert "WIP on improving parser efficiency"
...
This reverts commit bdaac7ab44
.
2017-05-23 03:05:25 -05:00
Matthew Honnibal
bdaac7ab44
WIP on improving parser efficiency
2017-05-23 02:59:31 -05:00
Matthew Honnibal
8a9e318deb
Put the parsing loop in a nogil prange block
2017-05-22 17:58:12 -05:00
ines
a23f487b06
Tidy up displaCy and add "manual" option
...
Also don't require title in EntityRenderer
2017-05-22 18:48:20 +02:00
Matthew Honnibal
0264447c4d
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-22 10:41:56 -05:00
Matthew Honnibal
6e8dce2c05
Fix train command line args
2017-05-22 10:41:39 -05:00
Matthew Honnibal
a7ee63c0ac
Fix labeller loss for unseen labels
2017-05-22 10:41:20 -05:00
Matthew Honnibal
c9760b2104
Support sentence limits in GoldCorpus
2017-05-22 10:40:46 -05:00
Matthew Honnibal
e2136232f9
Exclude states with no matching gold annotations from parsing
2017-05-22 10:30:12 -05:00
Matthew Honnibal
83ffd16474
Fix offset calculation for other negative values
2017-05-22 08:00:53 -05:00
ines
b3c7ee0148
Fix tests and use the new Matcher API
2017-05-22 13:54:20 +02:00
Matthew Honnibal
f00f821496
Fix pseudoprojectivity->nonproj
2017-05-22 06:14:42 -05:00
Matthew Honnibal
ae8cf70dc1
Fix CLI train signature
2017-05-22 06:13:39 -05:00
Matthew Honnibal
187f370734
Update tests for matcher changes
2017-05-22 12:59:50 +02:00
Matthew Honnibal
5d59e74cf6
PseudoProjectivity->nonproj
2017-05-22 05:49:53 -05:00
Matthew Honnibal
7e2cdc0c81
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-22 12:39:34 +02:00
Matthew Honnibal
70a8c531cd
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-22 05:39:18 -05:00
Matthew Honnibal
2f78413a02
PseudoProjectivity->nonproj
2017-05-22 05:39:03 -05:00
Matthew Honnibal
89ebc5c3cd
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-22 12:38:15 +02:00
Matthew Honnibal
d8bb5bb959
Implement StringStore serialization, and update tests
2017-05-22 12:38:00 +02:00
ines
54f04a9fe0
Update API docs with changes in spacy.gold and spacy.language
2017-05-22 12:29:30 +02:00
ines
b5fb43fdd8
Allow sys.exit status as exits keyword arg in util.prints()
2017-05-22 12:29:15 +02:00
ines
fc3ec733ea
Reduce complexity in CLI
...
Remove now redundant model command and move plac annotations to cli
files
2017-05-22 12:28:58 +02:00
Matthew Honnibal
b45b4aa392
PseudoProjectivity --> nonproj
2017-05-22 05:17:44 -05:00
Matthew Honnibal
aae97f00e9
Fix nonproj import
2017-05-22 05:15:06 -05:00
Matthew Honnibal
9262fc4829
Fix syntax error
2017-05-22 05:14:59 -05:00
Matthew Honnibal
93a042253b
Make GoldParse attributes writeable
2017-05-22 04:51:08 -05:00
Matthew Honnibal
2a5eb9f61e
Make nonproj methods top-level functions, instead of class methods
2017-05-22 04:51:08 -05:00
Matthew Honnibal
c998776c25
Make single array for features, to reduce GPU copies
2017-05-22 04:51:08 -05:00
Matthew Honnibal
bc2294d7f1
Add support for fiddly hyper-parameters to train func
2017-05-22 04:51:08 -05:00
Matthew Honnibal
80e19a2399
Simplify CLI implementation for subcommands. Remove model command.
2017-05-22 04:51:08 -05:00
Matthew Honnibal
33e2222839
Remove unused code in deprojectivize
2017-05-22 04:51:08 -05:00
Matthew Honnibal
4e0988605a
Pass through non-projective=True
2017-05-22 04:51:08 -05:00
Matthew Honnibal
025d9bbc37
Fix handling of non-projective deps
2017-05-22 04:51:08 -05:00
Matthew Honnibal
5738d373d5
Add deprojectivize to pipeline
2017-05-22 04:51:08 -05:00
Matthew Honnibal
1b5fa68996
Do pseudo-projective pre-processing for parser
2017-05-22 04:51:08 -05:00
Matthew Honnibal
1d5d9838a2
Fix action collection for parser
2017-05-22 04:51:08 -05:00
Matthew Honnibal
8d1e64be69
Add experimental NeuralLabeller
2017-05-22 04:51:08 -05:00
Matthew Honnibal
9b1b0742fd
Fix prediction for tok2vec
2017-05-22 04:51:08 -05:00
Matthew Honnibal
f13d6c7359
Support gold preprocessing and single gold files
2017-05-22 04:51:08 -05:00
Matthew Honnibal
e14533757b
Use averaged params for evaluation
2017-05-22 04:51:08 -05:00
Matthew Honnibal
7811d97339
Refactor CLI
2017-05-22 04:51:08 -05:00
Matthew Honnibal
5db89053aa
Merge docstrings
2017-05-21 13:46:23 -05:00
Matthew Honnibal
432b3499b3
Fix memory leak
2017-05-21 13:38:46 -05:00
Matthew Honnibal
59fbfb3829
Remove train.py -- functions now in GoldCorpus and Language
2017-05-21 09:08:27 -05:00
Matthew Honnibal
8904814c0e
Add missing import
2017-05-21 09:07:56 -05:00
Matthew Honnibal
baf3ef0ddc
Remove import of removed train_config script
2017-05-21 09:07:34 -05:00
Matthew Honnibal
4c9202249d
Refactor training, to fix memory leak
2017-05-21 09:07:06 -05:00
Matthew Honnibal
4803b3b69e
Add GoldCorpus class, to manage data streaming
2017-05-21 09:06:17 -05:00