Commit Graph

21 Commits

Author SHA1 Message Date
Ines Montani
cb57b35bb8 Also include lowercase norm exceptions 2018-10-13 15:37:30 +02:00
Stefan Schweter
3dfc7f86be lemmatizer: correct lemma for Rang (#2537)
<!--- Provide a general summary of your changes in the title. -->

## Description

This PR corrects the German lemma form for the word "Rang". Initially, the lemma form was "ringen", which is not correct, because it refers to the verb ("ringen") and not to the noun ("Rang").

### Types of change

The lemma form for "Rang" is corrected to "Rang", see also the [Duden](https://www.duden.de/rechtschreibung/Rang) entry.

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-07-10 13:11:19 +02:00
Duygu Altinok
00b9a58558 German lemmatizer additions (#2529)
* lemma of was-> was

* added new pairs issue @2486

* added article tests
2018-07-09 11:10:15 +02:00
ines
acb9bdb852 Fix PRON_LEMMA imports 2017-11-06 17:41:53 +01:00
ines
819e30a26e Tidy up tokenizer exceptions 2017-11-01 23:02:45 +01:00
ines
9659391944 Update deprecated methods and add warnings 2017-11-01 16:49:42 +01:00
ines
7e424a1804 Don't copy exception dicts if not necessary and tidy up 2017-10-31 21:05:29 +01:00
Matthew Honnibal
49895fbef6 Rename 'SP' special tag to '_SP'
Renaming the tag with an underscore lets us add it to the tag map
without worrying that we'll change the sequence of tags, which throws
off the tag-to-ID mapping. For instance, if we inserted a 'SP' tag,
the "VERB" tag is pushed to a different class ID, and the model is all
messed up.
2017-10-20 14:01:12 +02:00
ines
8ce6f96180 Don't make copies of language data components 2017-10-11 15:34:55 +02:00
ines
417d45f5d0 Add lemmatizer data as variable on language data
Don't create lookup lemmatizer within Language class and just pass in
the data so it can be set on Token creation
2017-10-11 02:24:58 +02:00
ines
0c2343d73a Tidy up language data 2017-10-11 02:22:49 +02:00
ines
ece30c28a8 Don't split hyphenated words in German
This way, the tokenizer matches the tokenization in German treebanks
2017-09-16 20:40:15 +02:00
ines
1fe5e1a4d1 Add language example sentences (see #1107)
da, de, en, es, fr, he, it, nb, pl, pt, sv
2017-08-19 12:22:29 +02:00
Matthew Honnibal
e28f90b672 Fix syntax iterators 2017-06-04 15:51:50 -05:00
ines
fa7e576c57 Change order of exception dicts 2017-06-03 21:52:06 +02:00
ines
e47eef5e03 Update German tokenizer exceptions and tests 2017-06-03 21:07:44 +02:00
ines
0d6fa8b241 Add German norm exceptions 2017-06-03 20:54:18 +02:00
ines
924e8506de Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 19:02:27 +02:00
ines
1a05078c79 Add language-specific syntax iterators to en and de 2017-05-17 12:04:03 +02:00
ines
73b577cb01 Fix relative imports 2017-05-08 22:29:04 +02:00
ines
f46ffe3e89 Move language data to /lang module 2017-05-08 20:00:40 +02:00