spaCy/spacy/lang/pt/norm_exceptions.py

# coding: utf8
from __future__ import unicode_literals

# These exceptions are used to add NORM values based on a token's ORTH value.
# Individual languages can also add their own exceptions and overwrite them -
# for example, British vs. American spelling in English.

# Norms are only set if no alternative is provided in the tokenizer exceptions.
# Note that this does not change any other token attributes. Its main purpose
# is to normalise the word representations so that equivalent tokens receive
# similar representations. For example: $ and € are very different, but they're
# both currency symbols. By normalising currency symbols to $, all symbols are
# seen as similar, no matter how common they are in the training data.


NORM_EXCEPTIONS = {
    "R$": "$",    # Real
    "r$": "$",    # Real
    "Cz$": "$",   # Cruzado
    "cz$": "$",   # Cruzado
    "NCz$": "$",  # Cruzado Novo
    "ncz$": "$"   # Cruzado Novo
}
Update Portuguese Language (#2790) * Add words to portuguese language _num_words * Add words to portuguese language _num_words * Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols * Extended punctuation and norm_exceptions in the Portuguese language 2018-09-29 10:51:45 +03:00			`# coding: utf8`
			`from __future__ import unicode_literals`

			`# These exceptions are used to add NORM values based on a token's ORTH value.`
			`# Individual languages can also add their own exceptions and overwrite them -`
			`# for example, British vs. American spelling in English.`

			`# Norms are only set if no alternative is provided in the tokenizer exceptions.`
			`# Note that this does not change any other token attributes. Its main purpose`
			`# is to normalise the word representations so that equivalent tokens receive`
			`# similar representations. For example: $ and € are very different, but they're`
			`# both currency symbols. By normalising currency symbols to $, all symbols are`
			`# seen as similar, no matter how common they are in the training data.`


			`NORM_EXCEPTIONS = {`
			`"R$": "$", # Real`
			`"r$": "$", # Real`
			`"Cz$": "$", # Cruzado`
			`"cz$": "$", # Cruzado`
			`"NCz$": "$", # Cruzado Novo`
			`"ncz$": "$" # Cruzado Novo`
			`}`