spaCy/spacy/lang/nl/__init__.py

# coding: utf8
from __future__ import unicode_literals

from .stop_words import STOP_WORDS
from .lex_attrs import LEX_ATTRS
from .tag_map import TAG_MAP
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES
from .lemmatizer import DutchLemmatizer
from ..tokenizer_exceptions import BASE_EXCEPTIONS
from ..norm_exceptions import BASE_NORMS
from ...language import Language
from ...attrs import LANG, NORM
from ...util import update_exc, add_lookups, get_lemma_tables


class DutchDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
    lex_attr_getters[LANG] = lambda text: "nl"
    lex_attr_getters[NORM] = add_lookups(
        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    stop_words = STOP_WORDS
    tag_map = TAG_MAP
    infixes = TOKENIZER_INFIXES
    suffixes = TOKENIZER_SUFFIXES
    resources = {
        "lemma_rules": "lemmatizer/lemma_rules.json",
        "lemma_index": "lemmatizer/lemma_index.json",
        "lemma_exc": "lemmatizer/lemma_exc.json",
        "lemma_lookup": "lemmatizer/lemma_lookup.json",
    }

    @classmethod
    def create_lemmatizer(cls, nlp=None, lookups=None):
        lemma_rules, lemma_index, lemma_exc, lemma_lookup = get_lemma_tables(lookups)
        return DutchLemmatizer(lemma_index, lemma_exc, lemma_rules, lemma_lookup)


class Dutch(Language):
    lang = "nl"
    Defaults = DutchDefaults


__all__ = ["Dutch"]
Use consistent unicode declarations 2017-03-12 15:07:28 +03:00			`# coding: utf8`
Reorganise Dutch language data 2017-05-08 16:51:39 +03:00			`from __future__ import unicode_literals`
Added language class and some language data (with some TODOs) for Dutch 2016-11-24 17:56:38 +03:00
Reorganise Dutch language data 2017-05-08 16:51:39 +03:00			`from .stop_words import STOP_WORDS`
Implement like_num getter for Dutch (via #1177) 2017-09-26 17:39:15 +03:00			`from .lex_attrs import LEX_ATTRS`
Add Dutch tag map 2017-11-05 15:48:07 +03:00			`from .tag_map import TAG_MAP`
Improved Dutch language resources and Dutch lemmatization (#3409) * Improved Dutch language resources and Dutch lemmatization * Fix conftest * Update punctuation.py * Auto-format * Format and fix tests * Remove unused test file * Re-add deleted test * removed redundant infix regex pattern for ','; note: brackets + simple hyphen remains * Cleaner lemmatization files 2019-04-03 15:13:26 +03:00			`from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS`
			`from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES`
💫 WIP: Basic lookup class scaffolding and JSON for all lemmati… (#4167) * Improve load_language_data helper * WIP: Add Lookups implementation * Start moving lemma data over to JSON * WIP: move data over for more languages * Convert more languages * Fix lemmatizer fixtures in tests * Finish conversion * Auto-format JSON files * Fix test for now * Make sure tables are stored on instance 2019-08-22 15:21:32 +03:00			`from .lemmatizer import DutchLemmatizer`
Fix relative imports 2017-05-08 23:29:04 +03:00			`from ..tokenizer_exceptions import BASE_EXCEPTIONS`
Add norm exceptions to other Language classes 2017-06-03 23:29:21 +03:00			`from ..norm_exceptions import BASE_NORMS`
Fix relative imports 2017-05-08 23:29:04 +03:00			`from ...language import Language`
Add norm exceptions to other Language classes 2017-06-03 23:29:21 +03:00			`from ...attrs import LANG, NORM`
💫 WIP: Basic lookup class scaffolding and JSON for all lemmati… (#4167) * Improve load_language_data helper * WIP: Add Lookups implementation * Start moving lemma data over to JSON * WIP: move data over for more languages * Convert more languages * Fix lemmatizer fixtures in tests * Finish conversion * Auto-format JSON files * Fix test for now * Make sure tables are stored on instance 2019-08-22 15:21:32 +03:00			`from ...util import update_exc, add_lookups, get_lemma_tables`
Reorganise Dutch language data 2017-05-08 16:51:39 +03:00
Added language class and some language data (with some TODOs) for Dutch 2016-11-24 17:56:38 +03:00
Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 20:02:27 +03:00			`class DutchDefaults(Language.Defaults):`
			`lex_attr_getters = dict(Language.Defaults.lex_attr_getters)`
Implement like_num getter for Dutch (via #1177) 2017-09-26 17:39:15 +03:00			`lex_attr_getters.update(LEX_ATTRS)`
Tidy up and auto-format 2019-04-09 12:40:19 +03:00			`lex_attr_getters[LANG] = lambda text: "nl"`
			`lex_attr_getters[NORM] = add_lookups(`
			`Language.Defaults.lex_attr_getters[NORM], BASE_NORMS`
			`)`
Improved Dutch language resources and Dutch lemmatization (#3409) * Improved Dutch language resources and Dutch lemmatization * Fix conftest * Update punctuation.py * Auto-format * Format and fix tests * Remove unused test file * Re-add deleted test * removed redundant infix regex pattern for ','; note: brackets + simple hyphen remains * Cleaner lemmatization files 2019-04-03 15:13:26 +03:00			`tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)`
Don't make copies of language data components 2017-10-11 16:34:55 +03:00			`stop_words = STOP_WORDS`
Add Dutch tag map 2017-11-05 15:48:07 +03:00			`tag_map = TAG_MAP`
Improved Dutch language resources and Dutch lemmatization (#3409) * Improved Dutch language resources and Dutch lemmatization * Fix conftest * Update punctuation.py * Auto-format * Format and fix tests * Remove unused test file * Re-add deleted test * removed redundant infix regex pattern for ','; note: brackets + simple hyphen remains * Cleaner lemmatization files 2019-04-03 15:13:26 +03:00			`infixes = TOKENIZER_INFIXES`
			`suffixes = TOKENIZER_SUFFIXES`
💫 WIP: Basic lookup class scaffolding and JSON for all lemmati… (#4167) * Improve load_language_data helper * WIP: Add Lookups implementation * Start moving lemma data over to JSON * WIP: move data over for more languages * Convert more languages * Fix lemmatizer fixtures in tests * Finish conversion * Auto-format JSON files * Fix test for now * Make sure tables are stored on instance 2019-08-22 15:21:32 +03:00			`resources = {`
			`"lemma_rules": "lemmatizer/lemma_rules.json",`
			`"lemma_index": "lemmatizer/lemma_index.json",`
			`"lemma_exc": "lemmatizer/lemma_exc.json",`
			`"lemma_lookup": "lemmatizer/lemma_lookup.json",`
			`}`
Improved Dutch language resources and Dutch lemmatization (#3409) * Improved Dutch language resources and Dutch lemmatization * Fix conftest * Update punctuation.py * Auto-format * Format and fix tests * Remove unused test file * Re-add deleted test * removed redundant infix regex pattern for ','; note: brackets + simple hyphen remains * Cleaner lemmatization files 2019-04-03 15:13:26 +03:00
			`@classmethod`
💫 WIP: Basic lookup class scaffolding and JSON for all lemmati… (#4167) * Improve load_language_data helper * WIP: Add Lookups implementation * Start moving lemma data over to JSON * WIP: move data over for more languages * Convert more languages * Fix lemmatizer fixtures in tests * Finish conversion * Auto-format JSON files * Fix test for now * Make sure tables are stored on instance 2019-08-22 15:21:32 +03:00			`def create_lemmatizer(cls, nlp=None, lookups=None):`
			`lemma_rules, lemma_index, lemma_exc, lemma_lookup = get_lemma_tables(lookups)`
			`return DutchLemmatizer(lemma_index, lemma_exc, lemma_rules, lemma_lookup)`
Added language class and some language data (with some TODOs) for Dutch 2016-11-24 17:56:38 +03:00

Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 20:02:27 +03:00			`class Dutch(Language):`
Tidy up and auto-format 2019-04-09 12:40:19 +03:00			`lang = "nl"`
Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 20:02:27 +03:00			`Defaults = DutchDefaults`
Lazy imports language 2017-05-03 12:01:42 +03:00

Tidy up and auto-format 2019-04-09 12:40:19 +03:00			`__all__ = ["Dutch"]`