ines
f3f8bfc367
Add built-in factories for merge_entities and merge_noun_chunks
...
Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).
2018-03-15 17:16:54 +01:00
ines
d854f69fe3
Add built-in factories for merge_entities and merge_noun_chunks
...
Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).
2018-03-15 00:18:51 +01:00
Aaron Marquez
3765d84d57
Fix issue #1959
2018-02-15 12:51:49 -08:00
Claudiu-Vlad Ursache
e28de12cbd
Ensure files opened in from_disk
are closed
...
Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706 ).
2018-02-13 20:49:43 +01:00
Motoki Wu
f4a7d1a423
make to sure pass in **cfg to each component when training
2018-01-30 18:29:54 -08:00
ines
4046823699
Only check component in factories if string (see #1911 )
2018-01-30 16:29:07 +01:00
ines
ce10d320c4
Fix component check in self.factories (see #1911 )
2018-01-30 16:09:37 +01:00
ines
8901814248
Improve error handling if pipeline component is not callable ( resolves #1911 )
...
Also add help message if user accidentally calls nlp.add_pipe() with a string of a built-in component name.
2018-01-30 15:43:03 +01:00
ines
a31506e060
Fix off-by-one error in nlp.add_pipe(after=name) ( fixes #1654 )
2017-11-28 20:37:55 +01:00
Ines Montani
6362024cf8
Merge pull request #1645 from GreenRiverRUS/fix_default_meta
...
Fixed spaCy version string in default meta
2017-11-27 11:58:02 +00:00
Vadim Mazaev
59f03ab1d7
Fixed spacy version string in default meta
2017-11-26 23:02:07 +03:00
Matthew Honnibal
8fec7268eb
Move string cleanup under a setting flag
2017-11-23 12:19:18 +00:00
Matthew Honnibal
5949777b12
Fix misleading multi-threading docstring
2017-11-23 12:18:59 +00:00
Roman Domrachev
61d28d03e4
Try again to do selective remove cache
2017-11-15 19:11:12 +03:00
Roman Domrachev
505c6a2f2f
Completely cleanup tokenizer cache
...
Tokenizer cache can have be different keys than string
That modification can slow down tokenizer and need to be measured
2017-11-15 17:55:48 +03:00
Roman Domrachev
a33d5a068d
Try to hold origin data instead of restore it
2017-11-14 22:40:03 +03:00
Roman Domrachev
91e2fa6561
Clean all caches
2017-11-14 21:15:04 +03:00
Roman Domrachev
86ca434c93
Merge github.com:explosion/spaCy
2017-11-14 17:46:22 +03:00
Roman Domrachev
a2745b0e84
StringStore now actually cleaned
...
Do not lose docs in ref tracking
2017-11-14 17:45:50 +03:00
Matthew Honnibal
dd1678eab3
Edit comment
2017-11-11 18:37:08 +01:00
Roman Domrachev
ee60a52ee7
Fix test imports and last batch cleanup
2017-11-11 11:32:16 +03:00
Roman Domrachev
4a6b094e09
Remove unused import
2017-11-11 03:13:05 +03:00
Roman Domrachev
3c600adf23
Try to fix StringStore clean up (see #1506 )
2017-11-11 03:11:27 +03:00
Matthew Honnibal
45e0617e61
Allow Language.update to take unicode text and dict objects
2017-11-06 22:07:38 +01:00
Matthew Honnibal
5c85bf3791
Fix missing import
2017-11-06 15:06:27 +01:00
Matthew Honnibal
465adfee94
Remove unused resume_training method, and pass optimizer through
2017-11-06 14:26:00 +01:00
Matthew Honnibal
38109a0e4a
Register SentenceSegmenter in Language.factories
2017-11-05 18:45:57 +01:00
Matthew Honnibal
d185927998
Undo harmful pickling hacks on Language class
2017-11-04 23:07:03 +01:00
Matthew Honnibal
2bf21cbe29
Update model after optimising it instead of waiting
2017-11-03 20:20:01 +01:00
ines
5f661a1b3a
Remove tensorizer from pre-set pipe_names
2017-11-01 19:48:33 +01:00
ines
bfe17b7df1
Fix begin_training if get_gold_tuples is None
2017-11-01 13:14:31 +01:00
ines
37e62ab0e2
Update vector meta in meta.json
2017-11-01 01:25:09 +01:00
ines
8e02294241
Add vectors to Language.meta
2017-10-30 18:39:48 +01:00
ines
d96e72f656
Tidy up rest
2017-10-27 21:07:59 +02:00
ines
91899d337b
Tidy up language, lemmatizer and scorer
2017-10-27 14:40:14 +02:00
Ines Montani
4033e70c71
Merge pull request #1461 from explosion/feature/disable-pipes
...
💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples
2017-10-27 12:21:40 +02:00
ines
2d6ec99884
Set 'model' as default model name to prevent meta.json errors
2017-10-26 16:12:23 +02:00
Matthew Honnibal
90d1d9b230
Remove obsolete parser code
2017-10-26 13:22:45 +02:00
Matthew Honnibal
b0f3ea2200
Fix names of pipeline components
...
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder --> Tensorizer
NeuralLabeller --> MultitaskObjective
2017-10-26 12:38:23 +02:00
ines
1a722dac31
Merge branch 'develop' into feature/disable-pipes
2017-10-25 15:18:18 +02:00
ines
6a00de4f77
Fix check of unexpected pipe names in restore()
2017-10-25 14:56:35 +02:00
ines
7f03932477
Return self on __enter__
2017-10-25 14:56:16 +02:00
Matthew Honnibal
e70f80f29e
Add Language.disable_pipes()
2017-10-25 13:46:41 +02:00
ines
3484174e48
Add Language.path
2017-10-25 11:57:43 +02:00
Matthew Honnibal
65bf5e85bd
Improve piping in language.pipe
2017-10-18 21:46:12 +02:00
Matthew Honnibal
e35a83d142
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-17 18:22:06 +02:00
Matthew Honnibal
1cc85a89ef
Allow reasonably efficient pickling of Language class, using to_bytes() and from_bytes().
2017-10-17 18:18:49 +02:00
Ines Montani
afa67de7ee
Merge pull request #1428 from roanuz/develop
...
Fix trailing whitespace and Language.from_disk overwrites
2017-10-17 16:29:15 +02:00
Anto Binish Kaspar
8f5b60c168
Fix Language.from_disk overwrites the meta.json file.
2017-10-17 17:15:32 +05:30
ines
8ca344712d
Add Language.has_pipe method
2017-10-17 11:20:07 +02:00
Matthew Honnibal
2bc06e4b22
Bump rolling buffer size to 10k
2017-10-16 19:38:29 +02:00
Matthew Honnibal
5c14f3f033
Create a rolling buffer for the StringStore in Language.pipe()
2017-10-16 19:22:40 +02:00
Ines Montani
37aa523a8e
Merge pull request #1408 from explosion/feature/dot-underscore
...
💫 Custom attributes via Doc._, Token._ and Span._
2017-10-11 18:35:56 +02:00
ines
9620c1a640
Add lemma_lookup to Language defaults
2017-10-11 13:26:05 +02:00
ines
67350fa496
Use better logic for auto-generating component name
...
Instances don't have __name__, so we try __class__.__name__ as well,
before giving up and defaulting to repr(component).
2017-10-10 04:23:05 +02:00
Matthew Honnibal
0384f08218
Trigger nonproj.deprojectivize as a postprocess
2017-10-07 02:00:47 +02:00
ines
e43530269c
Update docstrings
2017-10-07 01:04:50 +02:00
ines
2586b61b15
Fix formatting, tidy up and remove unused imports
2017-10-07 00:26:05 +02:00
ines
212c8f0711
Implement new Language methods and pipeline API
2017-10-07 00:25:54 +02:00
Matthew Honnibal
96da86b3e5
Add support for verbose flag to Language
2017-10-03 09:14:57 -05:00
Matthew Honnibal
4ae9ea7684
Remove unused argument in Language
2017-09-26 05:41:35 -05:00
Matthew Honnibal
8716ffe57d
Serialize vocab last
2017-09-24 05:01:45 -05:00
Matthew Honnibal
5a7fd0fd36
Fix vector linkage
2017-09-22 20:11:52 -05:00
Matthew Honnibal
4348c479fc
Merge pre-trained vectors and noshare patches
2017-09-22 20:07:28 -05:00
Matthew Honnibal
7dc61b3f43
Whitespace
2017-09-22 20:00:50 -05:00
Matthew Honnibal
20193371f5
Don't share CNN, to reduce complexities
2017-09-21 14:59:48 +02:00
Matthew Honnibal
b832f89ff8
Add resume_training function
2017-09-20 19:15:20 -05:00
Matthew Honnibal
c858927271
Copy vectors to GPU on begin training
2017-09-18 18:04:16 -05:00
Matthew Honnibal
43210abacc
Resolve fine-tuning conflict
2017-09-17 05:30:04 -05:00
Matthew Honnibal
e37a50a436
Pass documents to tensorizer, not 'features'
2017-09-16 12:46:36 -05:00
Matthew Honnibal
70da88a3a7
Update comment on Language.begin_training
2017-09-14 16:18:30 +02:00
Matthew Honnibal
78a5f842e9
Fix update when update_shared=False
2017-08-20 15:58:34 -05:00
Matthew Honnibal
8875590081
Add optimizer in Language.update if sgd=None
2017-08-20 14:42:07 +02:00
Matthew Honnibal
a3c51a0355
Fix creation of pipeline
2017-08-19 21:58:57 +02:00
Matthew Honnibal
97aabafb5f
Document as_tuples keyword arg of Language.pipe
2017-08-19 12:21:33 +02:00
Matthew Honnibal
11c31d285c
Restore changes from nn-beam-parser
2017-08-18 22:26:12 +02:00
Matthew Honnibal
52c180ecf5
Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
...
This reverts commit ea8de11ad5
, reversing
changes made to 08e443e083
.
2017-08-14 13:00:23 +02:00
Matthew Honnibal
4363b4aa4a
Fix redundant tokvecs updates during update
2017-08-13 12:36:55 +02:00
Matthew Honnibal
0acce0521b
Fix Language.update for pipeline
2017-08-06 14:13:03 +02:00
Matthew Honnibal
0eec7c9e9b
Fix Language.evaluate
2017-08-06 02:18:31 +02:00
Matthew Honnibal
cc19ea0e7c
Add update_tensors flag to Language.update. Experimental, re #1182
2017-08-06 02:17:10 +02:00
Matthew Honnibal
2e00361522
Fix update when 0 docs
2017-08-01 22:10:17 +02:00
Matthew Honnibal
523b0df2c9
Update text classification model
2017-07-25 18:57:59 +02:00
Matthew Honnibal
d8aa721664
Compute Language.meta with a property
2017-07-23 00:50:18 +02:00
Matthew Honnibal
baa3d81c35
Add text categorizer to Language
2017-07-22 01:13:36 +02:00
Matthew Honnibal
836bfa2d0f
Add factory for experimental SimilarityHook component
2017-06-05 15:40:22 +02:00
Matthew Honnibal
2479cde446
Support disable keyword in Language.__init__
2017-06-05 13:13:07 +02:00
Matthew Honnibal
8f8f90b46b
Disable labeller if not parsing
2017-06-04 20:18:54 -05:00
Matthew Honnibal
939e8ed567
Add lookup properties for components in Language
2017-06-04 15:52:09 -05:00
Matthew Honnibal
92ae36f84e
Improve way noun chunks iterator is looked up
2017-06-04 21:53:39 +02:00
Matthew Honnibal
21eef90dbc
Support specifying which GPU
2017-06-03 16:10:23 -05:00
Matthew Honnibal
fea1144e6d
Set max batch size in evaluate
2017-06-03 13:31:33 -05:00
ines
a3e4f91f4a
Only load vocab if it exists
2017-06-01 14:38:35 +02:00
Matthew Honnibal
33e5ec737f
Fix to/from disk methods
2017-05-31 13:43:10 +02:00
Matthew Honnibal
1e6df0a2a1
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-29 14:30:12 -05:00
ines
6145fe6a93
Catch all kwargs on Language
2017-05-29 20:43:48 +02:00
Matthew Honnibal
9c9ee24411
Fix broken lambda scoping in Python 2
2017-05-29 13:23:28 -05:00
Matthew Honnibal
aa4c33914b
Work on serialization
2017-05-29 08:40:45 -05:00
Matthew Honnibal
7b06bb896e
Fix for serialization
2017-05-29 13:42:55 +02:00
Matthew Honnibal
74235587ef
Fix to serialization
2017-05-29 13:40:31 +02:00