Commit Graph

8 Commits

Author SHA1 Message Date
Björn Lennartsson
b892b446cc Updates to Swedish Language (#3164)
* Added the same punctuation rules as danish language.

* Added abbreviations and also the possibility to have capitalized abbreviations on some. Added a few specific cases too

* Added test for long texts in swedish

* Added morph rules, infixes and suffixes to __init__.py for swedish

* Added some tests for prefixes, infixes and suffixes

* Added tests for lemma

* Renamed files to follow convention

* [sv] Removed ambigious abbreviations

* Added more tests for tokenizer exceptions

* Added test for problem with punctuation in issue #2578

* Contributor agreement

* Removed faulty lemmatization of 'jag' ('I') as it was lemmatized to 'jaga' ('hunt')
2019-01-16 13:45:50 +01:00
ines
8ce6f96180 Don't make copies of language data components 2017-10-11 15:34:55 +02:00
ines
417d45f5d0 Add lemmatizer data as variable on language data
Don't create lookup lemmatizer within Language class and just pass in
the data so it can be set on Token creation
2017-10-11 02:24:58 +02:00
ines
0c2343d73a Tidy up language data 2017-10-11 02:22:49 +02:00
ines
4c643d74c5 Add norm exceptions to other Language classes 2017-06-03 22:29:21 +02:00
ines
924e8506de Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 19:02:27 +02:00
ines
73b577cb01 Fix relative imports 2017-05-08 22:29:04 +02:00
ines
f46ffe3e89 Move language data to /lang module 2017-05-08 20:00:40 +02:00