* Work on refactoring greedy parser
* Compile updated parser
* Fix refactored parser
* Update test
* Fix refactored parser
* Fix refactored parser
* Readd beam search after refactor
* Fix beam search after refactor
* Fix parser
* Fix beam parsing
* Support oracle segmentation in ud-train CLI command
* Avoid relying on final gold check in beam search
* Add a keyword argument sink to GoldParse
* Bug fixes to beam search after refactor
* Avoid importing fused token symbol in ud-run-test, untl that's added
* Avoid importing fused token symbol in ud-run-test, untl that's added
* Don't modify Token in global scope
* Fix error in beam gradient calculation
* Default to beam_update_prob 1
* Set a more aggressive threshold on the max violn update
* Disable some tests to figure out why CI fails
* Disable some tests to figure out why CI fails
* Add some diagnostics to travis.yml to try to figure out why build fails
* Tell Thinc to link against system blas on Travis
* Point thinc to libblas on Travis
* Try running sudo=true for travis
* Unhack travis.sh
* Restore beam_density argument for parser beam
* Require thinc 6.11.1.dev16
* Revert hacks to tests
* Revert hacks to travis.yml
* Update thinc requirement
* Fix parser model loading
* Fix size limits in training data
* Add missing name attribute for parser
* Fix appveyor for Windows
* Add Romanian lemmatizer lookup table.
Adapted from http://www.lexiconista.com/datasets/lemmatization/
by replacing cedillas with commas (ș and ț).
The original dataset is licensed under the Open Database License.
* Fix one blatant issue in the Romanian lemmatizer
* Romanian examples file
* Add ro_tokenizer in conftest
* Add Romanian lemmatizer test
* Fix the code for FACILITIY entities
As far as I can tell, the default models all use "FAC" rather than "FACILITY"
* Added my Contributor Agreement
* Rename vishnumenon to vishnumenon.md
* Update lex_attrs.py
Fixed spelling mistakes of some numbers (according to Brazilian Portuguese).
* Update lex_attrs.py
As requested, I've included the correct spelling for both Brazilian Portuguese and Portuguese Portuguese.
I will advise however, that the two are separated in the future. Brazilian Portuguese is a very different language from the original one, although most of the writing is unified, the way people talk in both countries is radically different. Keeping both languages as one may lead to bigger issues in the future, especially when it comes to spell checking.
* Add contraction forms of some common stopwords
All the stopwords added contain the apostrophe" ' "or " ’ ".
* Adds contributor agreement mauryaland
* Update mauryaland.md