spaCy/spacy/lang/nl/lemmatizer/_numbers_irreg.py
Yves Peirsman 951825532c Improved Dutch language resources and Dutch lemmatization (#3409)
* Improved Dutch language resources and Dutch lemmatization

* Fix conftest

* Update punctuation.py

* Auto-format

* Format and fix tests

* Remove unused test file

* Re-add deleted test

* removed redundant infix regex pattern for ','; note: brackets + simple hyphen remains

* Cleaner lemmatization files
2019-04-03 14:13:26 +02:00

32 lines
789 B
Python

# coding: utf8
from __future__ import unicode_literals
NUMBERS_IRREG = {
'achten': ('acht',),
'biljoenen': ('biljoen',),
'drieën': ('drie',),
'duizenden': ('duizend',),
'eentjes': ('één',),
'elven': ('elf',),
'miljoenen': ('miljoen',),
'negenen': ('negen',),
'negentiger': ('negentig',),
'tienduizenden': ('tienduizend',),
'tienen': ('tien',),
'tientjes': ('tien',),
'twaalven': ('twaalf',),
'tweeën': ('twee',),
'twintiger': ('twintig',),
'twintigsten': ('twintig',),
'vieren': ('vier',),
'vijftiger': ('vijftig',),
'vijven': ('vijf',),
'zessen': ('zes',),
'zestiger': ('zestig',),
'zevenen': ('zeven',),
'zeventiger': ('zeventig',),
'zovele': ('zoveel',),
'zovelen': ('zoveel',)
}