spaCy/spacy/ml/models/parser.py

from pydantic import StrictInt
from thinc.api import Model, chain, list2array, Linear, zero_init, use_ops

from ...util import registry
from .._layers import PrecomputableAffine
from ...syntax._parser_model import ParserModel


@registry.architectures.register("spacy.TransitionBasedParser.v1")
def build_tb_parser_model(
    tok2vec: Model,
    nr_feature_tokens: StrictInt,
    hidden_width: StrictInt,
    maxout_pieces: StrictInt,
    nO=None,
):
    token_vector_width = tok2vec.get_dim("nO")
    tok2vec = chain(tok2vec, list2array())
    tok2vec.set_dim("nO", token_vector_width)

    lower = PrecomputableAffine(
        nO=hidden_width,
        nF=nr_feature_tokens,
        nI=tok2vec.get_dim("nO"),
        nP=maxout_pieces,
    )
    lower.set_dim("nP", maxout_pieces)
    with use_ops("numpy"):
        # Initialize weights at zero, as it's a classification layer.
        upper = Linear(nO=nO, init_W=zero_init)
    model = ParserModel(tok2vec, lower, upper)
    return model
Default settings to configurations (#4995) * fix grad_clip naming * cleaning up pretrained_vectors out of cfg * further refactoring Model init's * move Model building out of pipes * further refactor to require a model config when creating a pipe * small fixes * making cfg in nn_parser more consistent * fixing nr_class for parser * fixing nn_parser's nO * fix printing of loss * architectures in own file per type, consistent naming * convenience methods default_tagger_config and default_tok2vec_config * let create_pipe access default config if available for that component * default_parser_config * move defaults to separate folder * allow reading nlp from package or dir with argument 'name' * architecture spacy.VocabVectors.v1 to read static vectors from file * cleanup * default configs for nel, textcat, morphologizer, tensorizer * fix imports * fixing unit tests * fixes and clean up * fixing defaults, nO, fix unit tests * restore parser IO * fix IO * 'fix' serialization test * add .cfg to manifest fix example configs with additional arguments * replace Morpohologizer with Tagger * add IO bit when testing overfitting of tagger (currently failing) * fix IO - don't initialize when reading from disk * expand overfitting tests to also check IO goes OK * remove dropout from HashEmbed to fix Tagger performance * add defaults for sentrec * update thinc * always pass a Model instance to a Pipe * fix piped_added statement * remove obsolete W029 * remove obsolete errors * restore byte checking tests (work again) * clean up test * further test cleanup * convert from config to Model in create_pipe * bring back error when component is not initialized * cleanup * remove calls for nlp2.begin_training * use thinc.api in imports * allow setting charembed's nM and nC * fix for hardcoded nM/nC + unit test * formatting fixes * trigger build 2020-02-27 20:42:27 +03:00			`from pydantic import StrictInt`
			`from thinc.api import Model, chain, list2array, Linear, zero_init, use_ops`

Tidy up and auto-format 2020-02-28 13:57:41 +03:00			`from ...util import registry`
			`from .._layers import PrecomputableAffine`
			`from ...syntax._parser_model import ParserModel`

Default settings to configurations (#4995) * fix grad_clip naming * cleaning up pretrained_vectors out of cfg * further refactoring Model init's * move Model building out of pipes * further refactor to require a model config when creating a pipe * small fixes * making cfg in nn_parser more consistent * fixing nr_class for parser * fixing nn_parser's nO * fix printing of loss * architectures in own file per type, consistent naming * convenience methods default_tagger_config and default_tok2vec_config * let create_pipe access default config if available for that component * default_parser_config * move defaults to separate folder * allow reading nlp from package or dir with argument 'name' * architecture spacy.VocabVectors.v1 to read static vectors from file * cleanup * default configs for nel, textcat, morphologizer, tensorizer * fix imports * fixing unit tests * fixes and clean up * fixing defaults, nO, fix unit tests * restore parser IO * fix IO * 'fix' serialization test * add .cfg to manifest fix example configs with additional arguments * replace Morpohologizer with Tagger * add IO bit when testing overfitting of tagger (currently failing) * fix IO - don't initialize when reading from disk * expand overfitting tests to also check IO goes OK * remove dropout from HashEmbed to fix Tagger performance * add defaults for sentrec * update thinc * always pass a Model instance to a Pipe * fix piped_added statement * remove obsolete W029 * remove obsolete errors * restore byte checking tests (work again) * clean up test * further test cleanup * convert from config to Model in create_pipe * bring back error when component is not initialized * cleanup * remove calls for nlp2.begin_training * use thinc.api in imports * allow setting charembed's nM and nC * fix for hardcoded nM/nC + unit test * formatting fixes * trigger build 2020-02-27 20:42:27 +03:00
			`@registry.architectures.register("spacy.TransitionBasedParser.v1")`
			`def build_tb_parser_model(`
			`tok2vec: Model,`
			`nr_feature_tokens: StrictInt,`
			`hidden_width: StrictInt,`
			`maxout_pieces: StrictInt,`
			`nO=None,`
			`):`
			`token_vector_width = tok2vec.get_dim("nO")`
			`tok2vec = chain(tok2vec, list2array())`
			`tok2vec.set_dim("nO", token_vector_width)`

			`lower = PrecomputableAffine(`
			`nO=hidden_width,`
			`nF=nr_feature_tokens,`
			`nI=tok2vec.get_dim("nO"),`
			`nP=maxout_pieces,`
			`)`
			`lower.set_dim("nP", maxout_pieces)`
			`with use_ops("numpy"):`
			`# Initialize weights at zero, as it's a classification layer.`
			`upper = Linear(nO=nO, init_W=zero_init)`
			`model = ParserModel(tok2vec, lower, upper)`
			`return model`