Matthew Honnibal
|
3d22fcaf0b
|
Return None from parser if there are no annotations
|
2017-05-26 14:02:59 -05:00 |
|
Matthew Honnibal
|
d06f235fc9
|
Fix conflict on convert.py
|
2017-05-26 11:33:29 -05:00 |
|
Matthew Honnibal
|
2e587c6417
|
Export iob_to_biluo utility
|
2017-05-26 11:32:55 -05:00 |
|
Matthew Honnibal
|
2b3b937a04
|
Fix converter CLI
|
2017-05-26 11:32:41 -05:00 |
|
Matthew Honnibal
|
5a87bcf35f
|
Fix converters
|
2017-05-26 11:32:34 -05:00 |
|
Matthew Honnibal
|
8af3100143
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-05-26 11:31:41 -05:00 |
|
Matthew Honnibal
|
3d5a536eaa
|
Improve efficiency of parser batching
|
2017-05-26 11:31:23 -05:00 |
|
Matthew Honnibal
|
daac3e3573
|
Always shuffle gold data, and support length cap
|
2017-05-26 11:30:52 -05:00 |
|
Matthew Honnibal
|
d65f99a720
|
Improve model saving in train script
|
2017-05-26 05:52:09 -05:00 |
|
ines
|
51882c4984
|
Fix formatting
|
2017-05-26 12:37:45 +02:00 |
|
ines
|
353f0ef8d7
|
Use disable argument (list) for serialization
|
2017-05-26 12:33:54 +02:00 |
|
Matthew Honnibal
|
22d7b448a5
|
Fix convert command
|
2017-05-25 19:47:12 -05:00 |
|
Matthew Honnibal
|
dbf2a4cf57
|
Update all models on each epoch
|
2017-05-25 19:46:56 -05:00 |
|
Matthew Honnibal
|
faff1c23fb
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-05-25 17:16:10 -05:00 |
|
Matthew Honnibal
|
82b11b0320
|
Remove print statement
|
2017-05-25 17:15:59 -05:00 |
|
Matthew Honnibal
|
80cf42e33b
|
Fix compounding and decaying utils
|
2017-05-25 17:15:39 -05:00 |
|
Matthew Honnibal
|
df8015f05d
|
Tweaks to train script
|
2017-05-25 17:15:24 -05:00 |
|
Matthew Honnibal
|
3a6e59cc53
|
Add minibatch function in spacy.gold
|
2017-05-25 17:15:09 -05:00 |
|
Matthew Honnibal
|
702fe74a4d
|
Clean up spacy.cli.train
|
2017-05-25 16:16:30 -05:00 |
|
Matthew Honnibal
|
b9cea9cd93
|
Add compounding and decaying functions
|
2017-05-25 16:16:10 -05:00 |
|
Matthew Honnibal
|
2cb7cc2db7
|
Remove commented code from parser
|
2017-05-25 14:55:09 -05:00 |
|
Matthew Honnibal
|
f403c2cd5f
|
Add env opts for optimizer
|
2017-05-25 11:19:26 -05:00 |
|
Matthew Honnibal
|
c245ff6b27
|
Rebatch parser inputs, with mid-sentence states
|
2017-05-25 11:18:59 -05:00 |
|
Matthew Honnibal
|
679efe79c8
|
Make parser update less hacky
|
2017-05-25 06:49:00 -05:00 |
|
Matthew Honnibal
|
8500d9b1da
|
Only train one task per iter, holding grads
|
2017-05-25 06:47:42 -05:00 |
|
Matthew Honnibal
|
b27c587800
|
Fix pieces argument to PrecomputedMaxout
|
2017-05-25 06:46:59 -05:00 |
|
Matthew Honnibal
|
e1cb5be0c7
|
Adjust dropout, depth and multi-task in parser
|
2017-05-24 20:11:41 -05:00 |
|
Matthew Honnibal
|
e6cc927ab1
|
Rearrange multi-task learning
|
2017-05-24 20:10:54 -05:00 |
|
Matthew Honnibal
|
135a13790c
|
Disable gold preprocessing
|
2017-05-24 20:10:20 -05:00 |
|
Matthew Honnibal
|
467bbeadb8
|
Add hidden layers for tagger
|
2017-05-24 20:09:51 -05:00 |
|
ines
|
66088851dc
|
Add Doc.to_disk() and Doc.from_disk() methods
|
2017-05-24 11:58:17 +02:00 |
|
Matthew Honnibal
|
620df0414f
|
Fix dropout in parser
|
2017-05-23 15:20:45 -05:00 |
|
Matthew Honnibal
|
5b67bcbee0
|
Increase default embed size to 7500
|
2017-05-23 15:20:16 -05:00 |
|
Matthew Honnibal
|
48eef94f92
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-05-23 18:47:32 +02:00 |
|
Matthew Honnibal
|
d44b1eafc4
|
Fix conflict artefacts
|
2017-05-23 18:47:11 +02:00 |
|
Matthew Honnibal
|
01e59e4e6e
|
* Add Token.sent_start property, re Issue #235
|
2017-05-23 18:41:11 +02:00 |
|
Matthew Honnibal
|
4917cbb484
|
Include sent_start test
|
2017-05-23 18:40:37 +02:00 |
|
Matthew Honnibal
|
d68dd1f251
|
Add SENT_START attribute, for custom sentence boundary detection
|
2017-05-23 18:37:58 +02:00 |
|
Matthew Honnibal
|
8026c183d0
|
Add hacky logic to accelerate depth=0 case in parser
|
2017-05-23 11:06:49 -05:00 |
|
Matthew Honnibal
|
e7d3159d91
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-05-23 05:58:17 -05:00 |
|
Matthew Honnibal
|
a8b6d11c5b
|
Support optional maxout layer
|
2017-05-23 05:58:07 -05:00 |
|
Matthew Honnibal
|
c55b8fa7c5
|
Fix bugs in parse_batch
|
2017-05-23 05:57:52 -05:00 |
|
ines
|
fb0ff0272f
|
xfail neural parser tests for now and remove test for deprecated method
|
2017-05-23 12:40:37 +02:00 |
|
Matthew Honnibal
|
964707d795
|
Restore support for deeper networks in parser
|
2017-05-23 05:31:13 -05:00 |
|
Matthew Honnibal
|
e27262f431
|
Go back to previous matcher signature, with on_match positional
|
2017-05-23 04:37:40 -05:00 |
|
Matthew Honnibal
|
5418bcf5d7
|
Resolve conflict on test
|
2017-05-23 04:37:16 -05:00 |
|
ines
|
e6acd3bbf2
|
Fix matcher tests and matcher docs
|
2017-05-23 11:36:02 +02:00 |
|
ines
|
d0c6d4f76d
|
Fix formatting
|
2017-05-23 11:32:00 +02:00 |
|
Matthew Honnibal
|
f0bcc0bd8d
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-05-23 04:29:28 -05:00 |
|
Matthew Honnibal
|
9adfe9e8fc
|
Don't hold gradient updates in language -- let the parser decide how to batch the updates.
|
2017-05-23 04:29:10 -05:00 |
|
Matthew Honnibal
|
6b918cc58e
|
Support making updates periodically during training
|
2017-05-23 04:23:29 -05:00 |
|
Matthew Honnibal
|
3f725ff7b3
|
Roll back changes to parser update
|
2017-05-23 04:23:05 -05:00 |
|
Matthew Honnibal
|
3959d778ac
|
Revert "Revert "WIP on improving parser efficiency""
This reverts commit 532afef4a8 .
|
2017-05-23 03:06:53 -05:00 |
|
Matthew Honnibal
|
532afef4a8
|
Revert "WIP on improving parser efficiency"
This reverts commit bdaac7ab44 .
|
2017-05-23 03:05:25 -05:00 |
|
Matthew Honnibal
|
bdaac7ab44
|
WIP on improving parser efficiency
|
2017-05-23 02:59:31 -05:00 |
|
Matthew Honnibal
|
8a9e318deb
|
Put the parsing loop in a nogil prange block
|
2017-05-22 17:58:12 -05:00 |
|
ines
|
a23f487b06
|
Tidy up displaCy and add "manual" option
Also don't require title in EntityRenderer
|
2017-05-22 18:48:20 +02:00 |
|
Matthew Honnibal
|
0264447c4d
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-05-22 10:41:56 -05:00 |
|
Matthew Honnibal
|
6e8dce2c05
|
Fix train command line args
|
2017-05-22 10:41:39 -05:00 |
|
Matthew Honnibal
|
a7ee63c0ac
|
Fix labeller loss for unseen labels
|
2017-05-22 10:41:20 -05:00 |
|
Matthew Honnibal
|
c9760b2104
|
Support sentence limits in GoldCorpus
|
2017-05-22 10:40:46 -05:00 |
|
Matthew Honnibal
|
e2136232f9
|
Exclude states with no matching gold annotations from parsing
|
2017-05-22 10:30:12 -05:00 |
|
Matthew Honnibal
|
83ffd16474
|
Fix offset calculation for other negative values
|
2017-05-22 08:00:53 -05:00 |
|
ines
|
b3c7ee0148
|
Fix tests and use the new Matcher API
|
2017-05-22 13:54:20 +02:00 |
|
Matthew Honnibal
|
f00f821496
|
Fix pseudoprojectivity->nonproj
|
2017-05-22 06:14:42 -05:00 |
|
Matthew Honnibal
|
ae8cf70dc1
|
Fix CLI train signature
|
2017-05-22 06:13:39 -05:00 |
|
Matthew Honnibal
|
187f370734
|
Update tests for matcher changes
|
2017-05-22 12:59:50 +02:00 |
|
Matthew Honnibal
|
5d59e74cf6
|
PseudoProjectivity->nonproj
|
2017-05-22 05:49:53 -05:00 |
|
Matthew Honnibal
|
7e2cdc0c81
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-05-22 12:39:34 +02:00 |
|
Matthew Honnibal
|
70a8c531cd
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-05-22 05:39:18 -05:00 |
|
Matthew Honnibal
|
2f78413a02
|
PseudoProjectivity->nonproj
|
2017-05-22 05:39:03 -05:00 |
|
Matthew Honnibal
|
89ebc5c3cd
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-05-22 12:38:15 +02:00 |
|
Matthew Honnibal
|
d8bb5bb959
|
Implement StringStore serialization, and update tests
|
2017-05-22 12:38:00 +02:00 |
|
ines
|
54f04a9fe0
|
Update API docs with changes in spacy.gold and spacy.language
|
2017-05-22 12:29:30 +02:00 |
|
ines
|
b5fb43fdd8
|
Allow sys.exit status as exits keyword arg in util.prints()
|
2017-05-22 12:29:15 +02:00 |
|
ines
|
fc3ec733ea
|
Reduce complexity in CLI
Remove now redundant model command and move plac annotations to cli
files
|
2017-05-22 12:28:58 +02:00 |
|
Matthew Honnibal
|
b45b4aa392
|
PseudoProjectivity --> nonproj
|
2017-05-22 05:17:44 -05:00 |
|
Matthew Honnibal
|
aae97f00e9
|
Fix nonproj import
|
2017-05-22 05:15:06 -05:00 |
|
Matthew Honnibal
|
9262fc4829
|
Fix syntax error
|
2017-05-22 05:14:59 -05:00 |
|
Matthew Honnibal
|
93a042253b
|
Make GoldParse attributes writeable
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
2a5eb9f61e
|
Make nonproj methods top-level functions, instead of class methods
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
c998776c25
|
Make single array for features, to reduce GPU copies
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
bc2294d7f1
|
Add support for fiddly hyper-parameters to train func
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
80e19a2399
|
Simplify CLI implementation for subcommands. Remove model command.
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
33e2222839
|
Remove unused code in deprojectivize
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
4e0988605a
|
Pass through non-projective=True
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
025d9bbc37
|
Fix handling of non-projective deps
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
5738d373d5
|
Add deprojectivize to pipeline
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
1b5fa68996
|
Do pseudo-projective pre-processing for parser
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
1d5d9838a2
|
Fix action collection for parser
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
8d1e64be69
|
Add experimental NeuralLabeller
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
9b1b0742fd
|
Fix prediction for tok2vec
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
f13d6c7359
|
Support gold preprocessing and single gold files
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
e14533757b
|
Use averaged params for evaluation
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
7811d97339
|
Refactor CLI
|
2017-05-22 04:51:08 -05:00 |
|
Matthew Honnibal
|
5db89053aa
|
Merge docstrings
|
2017-05-21 13:46:23 -05:00 |
|
Matthew Honnibal
|
432b3499b3
|
Fix memory leak
|
2017-05-21 13:38:46 -05:00 |
|
Matthew Honnibal
|
59fbfb3829
|
Remove train.py -- functions now in GoldCorpus and Language
|
2017-05-21 09:08:27 -05:00 |
|
Matthew Honnibal
|
8904814c0e
|
Add missing import
|
2017-05-21 09:07:56 -05:00 |
|
Matthew Honnibal
|
baf3ef0ddc
|
Remove import of removed train_config script
|
2017-05-21 09:07:34 -05:00 |
|
Matthew Honnibal
|
4c9202249d
|
Refactor training, to fix memory leak
|
2017-05-21 09:07:06 -05:00 |
|
Matthew Honnibal
|
4803b3b69e
|
Add GoldCorpus class, to manage data streaming
|
2017-05-21 09:06:17 -05:00 |
|
Matthew Honnibal
|
180e5afede
|
Fix tokvecs flattening in pipeline
|
2017-05-21 09:05:34 -05:00 |
|
Matthew Honnibal
|
0731971bfc
|
Add itershuffle utility function. Maybe belongs in thinc
|
2017-05-21 09:05:05 -05:00 |
|
ines
|
2c5cfe8bbf
|
Update docstrings and API docs for StringStore
|
2017-05-21 14:18:58 +02:00 |
|
ines
|
251346b59f
|
Fix typos and formatting
|
2017-05-21 14:18:46 +02:00 |
|
ines
|
075f5ff87a
|
Update docstrings and API docs for GoldParse
|
2017-05-21 13:53:46 +02:00 |
|
ines
|
99b631617d
|
Reformat docstrings
|
2017-05-21 13:32:15 +02:00 |
|
ines
|
885e82c9b0
|
Update docstrings and remove deprecated load classmethod
|
2017-05-21 13:27:52 +02:00 |
|
ines
|
c5a653fa48
|
Update docstrings and API docs for Tokenizer
|
2017-05-21 13:18:14 +02:00 |
|
ines
|
f216422ac5
|
Remove deprecated load classmethod
|
2017-05-21 13:18:01 +02:00 |
|
ines
|
d82ae9a585
|
Change "function" to "callable" in docs
|
2017-05-21 13:17:40 +02:00 |
|
ines
|
3871157d84
|
Update spacy.util documentation
|
2017-05-21 01:12:09 +02:00 |
|
ines
|
0c6c65aa3c
|
Improve messaging if model linking fails after download
|
2017-05-21 00:28:37 +02:00 |
|
Matthew Honnibal
|
3b7c108246
|
Pass tokvecs through as a list, instead of concatenated. Also fix padding
|
2017-05-20 13:23:32 -05:00 |
|
ines
|
924e8506de
|
Move Defaults subclass to module scope (necessary for pickling)
|
2017-05-20 19:02:27 +02:00 |
|
Matthew Honnibal
|
d52b65aec2
|
Revert "Move to contiguous buffer for token_ids and d_vectors"
This reverts commit 3ff8c35a79 .
|
2017-05-20 11:26:23 -05:00 |
|
ines
|
27de0834b2
|
Update docstrings and API docs for Lexeme
|
2017-05-20 15:13:42 +02:00 |
|
ines
|
7ed8a92ed1
|
Update docstrings and API docs for Token
|
2017-05-20 15:13:33 +02:00 |
|
ines
|
4ed6a36622
|
Update docstrings and API docs for Matcher
|
2017-05-20 14:43:10 +02:00 |
|
ines
|
39f36539f6
|
Update docstrings and API docs for Matcher
|
2017-05-20 14:32:34 +02:00 |
|
ines
|
c00ff257be
|
Update docstrings and API docs for Matcher
|
2017-05-20 14:26:10 +02:00 |
|
ines
|
790435e51c
|
Update docstrings
|
2017-05-20 14:05:07 +02:00 |
|
ines
|
f0cc642bb9
|
Update docstrings and API docs for Vocab
|
2017-05-20 14:00:41 +02:00 |
|
Matthew Honnibal
|
ce9234f593
|
Update Matcher API
|
2017-05-20 13:54:53 +02:00 |
|
Matthew Honnibal
|
b272890a8c
|
Try to move parser to simpler PrecomputedAffine class. Currently broken -- maybe the previous change
|
2017-05-20 06:40:10 -05:00 |
|
ines
|
e39ad78267
|
Resolve model name properly in cli.info
Use util.resolve_model_path() to also allow package names and paths.
|
2017-05-20 12:24:40 +02:00 |
|
Matthew Honnibal
|
3ff8c35a79
|
Move to contiguous buffer for token_ids and d_vectors
|
2017-05-20 04:17:30 -05:00 |
|
Matthew Honnibal
|
8b04b0af9f
|
Remove freqs from transition_system
|
2017-05-20 02:20:48 -05:00 |
|
Matthew Honnibal
|
61fe55efba
|
Move EnglishDefaults class out of English
|
2017-05-20 02:18:19 -05:00 |
|
Matthew Honnibal
|
a1ba20e2b1
|
Fix over-run on parse_batch
|
2017-05-19 18:57:30 -05:00 |
|
ines
|
1d4d3d0ecd
|
Add TODO
|
2017-05-20 01:38:04 +02:00 |
|
Matthew Honnibal
|
7ee1827af0
|
Disable data caching in parser
|
2017-05-19 18:17:11 -05:00 |
|
Matthew Honnibal
|
e84de028b5
|
Remove 'rebatch' op, and remove min-batch cap
|
2017-05-19 18:16:36 -05:00 |
|
Matthew Honnibal
|
3376d4d6e8
|
Update the train script, fixing GPU memory leak
|
2017-05-19 18:15:50 -05:00 |
|
Matthew Honnibal
|
836fe1d880
|
Update neural net tests
|
2017-05-19 18:11:29 -05:00 |
|
ines
|
fe5d8819ea
|
Update Matcher docstrings and API docs
|
2017-05-19 21:47:06 +02:00 |
|
Matthew Honnibal
|
08766240c3
|
Add incomplete iob converter
|
2017-05-19 13:27:51 -05:00 |
|
Matthew Honnibal
|
c12ab47a56
|
Remove state argument in pipeline. Other changes
|
2017-05-19 13:26:36 -05:00 |
|
Matthew Honnibal
|
66ea9aebe7
|
Remove the state argument from Language
|
2017-05-19 13:25:42 -05:00 |
|
Matthew Honnibal
|
09a877886b
|
WIP on iob converter
|
2017-05-19 13:24:39 -05:00 |
|
ines
|
a804045597
|
Use is_ancestor instead of deprecated is_ancestor_of
|
2017-05-19 20:23:40 +02:00 |
|
Matthew Honnibal
|
8d5e6d9f4f
|
Rename no_ner arg to no_entities
|
2017-05-19 13:23:11 -05:00 |
|
ines
|
e9e62b01b0
|
Update docstrings and API docs for Token
|
2017-05-19 18:47:56 +02:00 |
|
ines
|
62ceec4fc6
|
Update docstrings and API docs for Span
|
2017-05-19 18:47:46 +02:00 |
|
ines
|
23f9a3ccc8
|
Update docstrings and API docs for Doc
|
2017-05-19 18:47:39 +02:00 |
|
ines
|
2c8c9dc0c9
|
Update docstrings and API docs for Language
|
2017-05-19 18:47:24 +02:00 |
|
ines
|
0791f0aae6
|
Update docstrings and API docs for Span class
|
2017-05-19 00:31:31 +02:00 |
|
ines
|
8455cb1327
|
Update docstring for Doc.__getitem__
|
2017-05-19 00:30:51 +02:00 |
|
ines
|
0fc05e54e4
|
Document TokenVectorEncoder
|
2017-05-19 00:00:02 +02:00 |
|