I have added alpha support for the Tagalog language from the Philippines. It is the basis for the country's national language Filipino. I have heavily based the format to the EN and ES languages.
I have provided several words in the lemmatizer lookup table, added stop words from a source, translated numeric words to its Tagalog counterpart, added some tokenizer exceptions, and kept the tag map the same as the English language.
While the alpha language passed the preliminary testing that you provided, I think it needs more data to be useful for most cases.
* Added alpha support for Tagalog language
* Edited contributor template
* Included SCA; Reverted templates
* Fixed SCA template
* Fixed changes in SCA template
* Add note that Unidic is required for Japanese
This addresses #3001. -POLM
* Add extras_require for mecab with old version
Related to issue #3018.
* mecab → ja
Co-Authored-By: polm <polm@dampfkraft.com>
* Upadate Unidic link for latest version in document
This patch improves #3017 . The link for Unidic was old version one, so will the lates version.
* Add contributor agreement
* Use more specific link for unidic-cwj
* modifying FR lemmatization for nouns
* modifying FR lemmatization for nouns
* adding contributor agreement for amperinet
* adding rules for words with inclusive parentheses wrongly tokenized
* adding contributor agreement for amperinet
* adding a missing comma
* updating rules and vocabulary for French lemmatization of verbs
* updating the file with French auxiliary verb
* updating rules and vocabulary for French lemmatization of verbs
* adding contributor agreement for amperinet
* adding rules for words with inclusive parentheses wrongly tokenized
* Updated wordforms for Norwegian lemmatizer
Upload of updated lists of wordforms for the Norwegian lemmatizer (nouns, verbs, adverbs, adjectives and lookup).
* Add spaCy contributor agreement for user beatesi
* Updated wordforms for Norwegian lemmatizer
* additional unit test for new entr word not in other lists
* bugfix - unit test works
* use _latin_lower instead of alpha_lower for french
* revert back to ALPHA_LOWER (following the code for languages)
* contributor agreement
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes#2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
Resolves#2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.