Commit Graph

3184 Commits

Author SHA1 Message Date
Matthew Honnibal
48eef94f92 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-23 18:47:32 +02:00
Matthew Honnibal
d44b1eafc4 Fix conflict artefacts 2017-05-23 18:47:11 +02:00
Matthew Honnibal
01e59e4e6e * Add Token.sent_start property, re Issue #235 2017-05-23 18:41:11 +02:00
Matthew Honnibal
4917cbb484 Include sent_start test 2017-05-23 18:40:37 +02:00
Matthew Honnibal
d68dd1f251 Add SENT_START attribute, for custom sentence boundary detection 2017-05-23 18:37:58 +02:00
Matthew Honnibal
8026c183d0 Add hacky logic to accelerate depth=0 case in parser 2017-05-23 11:06:49 -05:00
Matthew Honnibal
e7d3159d91 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-23 05:58:17 -05:00
Matthew Honnibal
a8b6d11c5b Support optional maxout layer 2017-05-23 05:58:07 -05:00
Matthew Honnibal
c55b8fa7c5 Fix bugs in parse_batch 2017-05-23 05:57:52 -05:00
ines
fb0ff0272f xfail neural parser tests for now and remove test for deprecated method 2017-05-23 12:40:37 +02:00
Matthew Honnibal
964707d795 Restore support for deeper networks in parser 2017-05-23 05:31:13 -05:00
Matthew Honnibal
e27262f431 Go back to previous matcher signature, with on_match positional 2017-05-23 04:37:40 -05:00
Matthew Honnibal
5418bcf5d7 Resolve conflict on test 2017-05-23 04:37:16 -05:00
ines
e6acd3bbf2 Fix matcher tests and matcher docs 2017-05-23 11:36:02 +02:00
ines
d0c6d4f76d Fix formatting 2017-05-23 11:32:00 +02:00
Matthew Honnibal
f0bcc0bd8d Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-23 04:29:28 -05:00
Matthew Honnibal
9adfe9e8fc Don't hold gradient updates in language -- let the parser decide how to batch the updates. 2017-05-23 04:29:10 -05:00
Matthew Honnibal
6b918cc58e Support making updates periodically during training 2017-05-23 04:23:29 -05:00
Matthew Honnibal
3f725ff7b3 Roll back changes to parser update 2017-05-23 04:23:05 -05:00
Matthew Honnibal
3959d778ac Revert "Revert "WIP on improving parser efficiency""
This reverts commit 532afef4a8.
2017-05-23 03:06:53 -05:00
Matthew Honnibal
532afef4a8 Revert "WIP on improving parser efficiency"
This reverts commit bdaac7ab44.
2017-05-23 03:05:25 -05:00
Matthew Honnibal
bdaac7ab44 WIP on improving parser efficiency 2017-05-23 02:59:31 -05:00
Matthew Honnibal
8a9e318deb Put the parsing loop in a nogil prange block 2017-05-22 17:58:12 -05:00
ines
a23f487b06 Tidy up displaCy and add "manual" option
Also don't require title in EntityRenderer
2017-05-22 18:48:20 +02:00
Matthew Honnibal
0264447c4d Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-22 10:41:56 -05:00
Matthew Honnibal
6e8dce2c05 Fix train command line args 2017-05-22 10:41:39 -05:00
Matthew Honnibal
a7ee63c0ac Fix labeller loss for unseen labels 2017-05-22 10:41:20 -05:00
Matthew Honnibal
c9760b2104 Support sentence limits in GoldCorpus 2017-05-22 10:40:46 -05:00
Matthew Honnibal
e2136232f9 Exclude states with no matching gold annotations from parsing 2017-05-22 10:30:12 -05:00
Matthew Honnibal
83ffd16474 Fix offset calculation for other negative values 2017-05-22 08:00:53 -05:00
ines
b3c7ee0148 Fix tests and use the new Matcher API 2017-05-22 13:54:20 +02:00
Matthew Honnibal
f00f821496 Fix pseudoprojectivity->nonproj 2017-05-22 06:14:42 -05:00
Matthew Honnibal
ae8cf70dc1 Fix CLI train signature 2017-05-22 06:13:39 -05:00
Matthew Honnibal
187f370734 Update tests for matcher changes 2017-05-22 12:59:50 +02:00
Matthew Honnibal
5d59e74cf6 PseudoProjectivity->nonproj 2017-05-22 05:49:53 -05:00
Matthew Honnibal
7e2cdc0c81 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-22 12:39:34 +02:00
Matthew Honnibal
70a8c531cd Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-22 05:39:18 -05:00
Matthew Honnibal
2f78413a02 PseudoProjectivity->nonproj 2017-05-22 05:39:03 -05:00
Matthew Honnibal
89ebc5c3cd Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-22 12:38:15 +02:00
Matthew Honnibal
d8bb5bb959 Implement StringStore serialization, and update tests 2017-05-22 12:38:00 +02:00
ines
54f04a9fe0 Update API docs with changes in spacy.gold and spacy.language 2017-05-22 12:29:30 +02:00
ines
b5fb43fdd8 Allow sys.exit status as exits keyword arg in util.prints() 2017-05-22 12:29:15 +02:00
ines
fc3ec733ea Reduce complexity in CLI
Remove now redundant model command and move plac annotations to cli
files
2017-05-22 12:28:58 +02:00
Matthew Honnibal
b45b4aa392 PseudoProjectivity --> nonproj 2017-05-22 05:17:44 -05:00
Matthew Honnibal
aae97f00e9 Fix nonproj import 2017-05-22 05:15:06 -05:00
Matthew Honnibal
9262fc4829 Fix syntax error 2017-05-22 05:14:59 -05:00
Matthew Honnibal
93a042253b Make GoldParse attributes writeable 2017-05-22 04:51:08 -05:00
Matthew Honnibal
2a5eb9f61e Make nonproj methods top-level functions, instead of class methods 2017-05-22 04:51:08 -05:00
Matthew Honnibal
c998776c25 Make single array for features, to reduce GPU copies 2017-05-22 04:51:08 -05:00
Matthew Honnibal
bc2294d7f1 Add support for fiddly hyper-parameters to train func 2017-05-22 04:51:08 -05:00
Matthew Honnibal
80e19a2399 Simplify CLI implementation for subcommands. Remove model command. 2017-05-22 04:51:08 -05:00
Matthew Honnibal
33e2222839 Remove unused code in deprojectivize 2017-05-22 04:51:08 -05:00
Matthew Honnibal
4e0988605a Pass through non-projective=True 2017-05-22 04:51:08 -05:00
Matthew Honnibal
025d9bbc37 Fix handling of non-projective deps 2017-05-22 04:51:08 -05:00
Matthew Honnibal
5738d373d5 Add deprojectivize to pipeline 2017-05-22 04:51:08 -05:00
Matthew Honnibal
1b5fa68996 Do pseudo-projective pre-processing for parser 2017-05-22 04:51:08 -05:00
Matthew Honnibal
1d5d9838a2 Fix action collection for parser 2017-05-22 04:51:08 -05:00
Matthew Honnibal
8d1e64be69 Add experimental NeuralLabeller 2017-05-22 04:51:08 -05:00
Matthew Honnibal
9b1b0742fd Fix prediction for tok2vec 2017-05-22 04:51:08 -05:00
Matthew Honnibal
f13d6c7359 Support gold preprocessing and single gold files 2017-05-22 04:51:08 -05:00
Matthew Honnibal
e14533757b Use averaged params for evaluation 2017-05-22 04:51:08 -05:00
Matthew Honnibal
7811d97339 Refactor CLI 2017-05-22 04:51:08 -05:00
Matthew Honnibal
5db89053aa Merge docstrings 2017-05-21 13:46:23 -05:00
Matthew Honnibal
432b3499b3 Fix memory leak 2017-05-21 13:38:46 -05:00
Matthew Honnibal
59fbfb3829 Remove train.py -- functions now in GoldCorpus and Language 2017-05-21 09:08:27 -05:00
Matthew Honnibal
8904814c0e Add missing import 2017-05-21 09:07:56 -05:00
Matthew Honnibal
baf3ef0ddc Remove import of removed train_config script 2017-05-21 09:07:34 -05:00
Matthew Honnibal
4c9202249d Refactor training, to fix memory leak 2017-05-21 09:07:06 -05:00
Matthew Honnibal
4803b3b69e Add GoldCorpus class, to manage data streaming 2017-05-21 09:06:17 -05:00
Matthew Honnibal
180e5afede Fix tokvecs flattening in pipeline 2017-05-21 09:05:34 -05:00
Matthew Honnibal
0731971bfc Add itershuffle utility function. Maybe belongs in thinc 2017-05-21 09:05:05 -05:00
ines
2c5cfe8bbf Update docstrings and API docs for StringStore 2017-05-21 14:18:58 +02:00
ines
251346b59f Fix typos and formatting 2017-05-21 14:18:46 +02:00
ines
075f5ff87a Update docstrings and API docs for GoldParse 2017-05-21 13:53:46 +02:00
ines
99b631617d Reformat docstrings 2017-05-21 13:32:15 +02:00
ines
885e82c9b0 Update docstrings and remove deprecated load classmethod 2017-05-21 13:27:52 +02:00
ines
c5a653fa48 Update docstrings and API docs for Tokenizer 2017-05-21 13:18:14 +02:00
ines
f216422ac5 Remove deprecated load classmethod 2017-05-21 13:18:01 +02:00
ines
d82ae9a585 Change "function" to "callable" in docs 2017-05-21 13:17:40 +02:00
ines
3871157d84 Update spacy.util documentation 2017-05-21 01:12:09 +02:00
ines
0c6c65aa3c Improve messaging if model linking fails after download 2017-05-21 00:28:37 +02:00
Matthew Honnibal
3b7c108246 Pass tokvecs through as a list, instead of concatenated. Also fix padding 2017-05-20 13:23:32 -05:00
ines
924e8506de Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 19:02:27 +02:00
Matthew Honnibal
d52b65aec2 Revert "Move to contiguous buffer for token_ids and d_vectors"
This reverts commit 3ff8c35a79.
2017-05-20 11:26:23 -05:00
ines
27de0834b2 Update docstrings and API docs for Lexeme 2017-05-20 15:13:42 +02:00
ines
7ed8a92ed1 Update docstrings and API docs for Token 2017-05-20 15:13:33 +02:00
ines
4ed6a36622 Update docstrings and API docs for Matcher 2017-05-20 14:43:10 +02:00
ines
39f36539f6 Update docstrings and API docs for Matcher 2017-05-20 14:32:34 +02:00
ines
c00ff257be Update docstrings and API docs for Matcher 2017-05-20 14:26:10 +02:00
ines
790435e51c Update docstrings 2017-05-20 14:05:07 +02:00
ines
f0cc642bb9 Update docstrings and API docs for Vocab 2017-05-20 14:00:41 +02:00
Matthew Honnibal
ce9234f593 Update Matcher API 2017-05-20 13:54:53 +02:00
Matthew Honnibal
b272890a8c Try to move parser to simpler PrecomputedAffine class. Currently broken -- maybe the previous change 2017-05-20 06:40:10 -05:00
ines
e39ad78267 Resolve model name properly in cli.info
Use util.resolve_model_path() to also allow package names and paths.
2017-05-20 12:24:40 +02:00
Matthew Honnibal
3ff8c35a79 Move to contiguous buffer for token_ids and d_vectors 2017-05-20 04:17:30 -05:00
Matthew Honnibal
8b04b0af9f Remove freqs from transition_system 2017-05-20 02:20:48 -05:00
Matthew Honnibal
61fe55efba Move EnglishDefaults class out of English 2017-05-20 02:18:19 -05:00
Matthew Honnibal
a1ba20e2b1 Fix over-run on parse_batch 2017-05-19 18:57:30 -05:00
ines
1d4d3d0ecd Add TODO 2017-05-20 01:38:04 +02:00
Matthew Honnibal
7ee1827af0 Disable data caching in parser 2017-05-19 18:17:11 -05:00