spaCy/spacy/lang/tl/__init__.py

# coding: utf8
from __future__ import unicode_literals

from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
from .stop_words import STOP_WORDS
from .lex_attrs import LEX_ATTRS
from ..tokenizer_exceptions import BASE_EXCEPTIONS
from ..norm_exceptions import BASE_NORMS
from ...language import Language
from ...attrs import LANG, NORM
from ...util import update_exc, add_lookups


def _return_tl(_):
    return "tl"


class TagalogDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters[LANG] = _return_tl
    lex_attr_getters[NORM] = add_lookups(
        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
    )
    lex_attr_getters.update(LEX_ATTRS)
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    stop_words = STOP_WORDS
    resources = {"lemma_lookup": "lemma_lookup.json"}


class Tagalog(Language):
    lang = "tl"
    Defaults = TagalogDefaults


__all__ = ["Tagalog"]
Added alpha support for Tagalog language (#3062) I have added alpha support for the Tagalog language from the Philippines. It is the basis for the country's national language Filipino. I have heavily based the format to the EN and ES languages. I have provided several words in the lemmatizer lookup table, added stop words from a source, translated numeric words to its Tagalog counterpart, added some tokenizer exceptions, and kept the tag map the same as the English language. While the alpha language passed the preliminary testing that you provided, I think it needs more data to be useful for most cases. * Added alpha support for Tagalog language * Edited contributor template * Included SCA; Reverted templates * Fixed SCA template * Fixed changes in SCA template 2018-12-18 15:08:38 +03:00			`# coding: utf8`
			`from __future__ import unicode_literals`

			`from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS`
			`from .stop_words import STOP_WORDS`
			`from .lex_attrs import LEX_ATTRS`
			`from ..tokenizer_exceptions import BASE_EXCEPTIONS`
			`from ..norm_exceptions import BASE_NORMS`
			`from ...language import Language`
			`from ...attrs import LANG, NORM`
			`from ...util import update_exc, add_lookups`


Tidy up and fix small bugs and typos 2019-02-08 16:14:49 +03:00			`def _return_tl(_):`
			`return "tl"`
Added alpha support for Tagalog language (#3062) I have added alpha support for the Tagalog language from the Philippines. It is the basis for the country's national language Filipino. I have heavily based the format to the EN and ES languages. I have provided several words in the lemmatizer lookup table, added stop words from a source, translated numeric words to its Tagalog counterpart, added some tokenizer exceptions, and kept the tag map the same as the English language. While the alpha language passed the preliminary testing that you provided, I think it needs more data to be useful for most cases. * Added alpha support for Tagalog language * Edited contributor template * Included SCA; Reverted templates * Fixed SCA template * Fixed changes in SCA template 2018-12-18 15:08:38 +03:00

			`class TagalogDefaults(Language.Defaults):`
			`lex_attr_getters = dict(Language.Defaults.lex_attr_getters)`
Tidy up and fix small bugs and typos 2019-02-08 16:14:49 +03:00			`lex_attr_getters[LANG] = _return_tl`
			`lex_attr_getters[NORM] = add_lookups(`
			`Language.Defaults.lex_attr_getters[NORM], BASE_NORMS`
			`)`
Added alpha support for Tagalog language (#3062) I have added alpha support for the Tagalog language from the Philippines. It is the basis for the country's national language Filipino. I have heavily based the format to the EN and ES languages. I have provided several words in the lemmatizer lookup table, added stop words from a source, translated numeric words to its Tagalog counterpart, added some tokenizer exceptions, and kept the tag map the same as the English language. While the alpha language passed the preliminary testing that you provided, I think it needs more data to be useful for most cases. * Added alpha support for Tagalog language * Edited contributor template * Included SCA; Reverted templates * Fixed SCA template * Fixed changes in SCA template 2018-12-18 15:08:38 +03:00			`lex_attr_getters.update(LEX_ATTRS)`
			`tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)`
			`stop_words = STOP_WORDS`
💫 WIP: Basic lookup class scaffolding and JSON for all lemmati… (#4167) * Improve load_language_data helper * WIP: Add Lookups implementation * Start moving lemma data over to JSON * WIP: move data over for more languages * Convert more languages * Fix lemmatizer fixtures in tests * Finish conversion * Auto-format JSON files * Fix test for now * Make sure tables are stored on instance 2019-08-22 15:21:32 +03:00			`resources = {"lemma_lookup": "lemma_lookup.json"}`
Added alpha support for Tagalog language (#3062) I have added alpha support for the Tagalog language from the Philippines. It is the basis for the country's national language Filipino. I have heavily based the format to the EN and ES languages. I have provided several words in the lemmatizer lookup table, added stop words from a source, translated numeric words to its Tagalog counterpart, added some tokenizer exceptions, and kept the tag map the same as the English language. While the alpha language passed the preliminary testing that you provided, I think it needs more data to be useful for most cases. * Added alpha support for Tagalog language * Edited contributor template * Included SCA; Reverted templates * Fixed SCA template * Fixed changes in SCA template 2018-12-18 15:08:38 +03:00

			`class Tagalog(Language):`
Tidy up and fix small bugs and typos 2019-02-08 16:14:49 +03:00			`lang = "tl"`
			`Defaults = TagalogDefaults`
Added alpha support for Tagalog language (#3062) I have added alpha support for the Tagalog language from the Philippines. It is the basis for the country's national language Filipino. I have heavily based the format to the EN and ES languages. I have provided several words in the lemmatizer lookup table, added stop words from a source, translated numeric words to its Tagalog counterpart, added some tokenizer exceptions, and kept the tag map the same as the English language. While the alpha language passed the preliminary testing that you provided, I think it needs more data to be useful for most cases. * Added alpha support for Tagalog language * Edited contributor template * Included SCA; Reverted templates * Fixed SCA template * Fixed changes in SCA template 2018-12-18 15:08:38 +03:00

Tidy up and fix small bugs and typos 2019-02-08 16:14:49 +03:00			`__all__ = ["Tagalog"]`