Ole Henrik Skogstrøm
|
1107e89fcf
|
Updated doc string on nb tag_map module
|
2018-01-25 11:08:28 +01:00 |
|
Ole Henrik Skogstrøm
|
4058a7d579
|
Fix æøå characters in lemmatizer
|
2018-01-24 14:03:14 +01:00 |
|
Ole Henrik Skogstrøm
|
42248f423f
|
Updated tag map
|
2018-01-24 13:50:33 +01:00 |
|
Ole Henrik Skogstrøm
|
74b430b49a
|
Correct Lemmatizer
|
2018-01-24 13:26:33 +01:00 |
|
Ole Henrik Skogstrøm
|
b9b3a40c78
|
Add norwegian lemmatizer and tag_map
|
2018-01-24 12:28:29 +01:00 |
|
Kit
|
701e7cc6aa
|
Rename variable to keep code consistent
|
2018-01-08 03:38:44 +01:00 |
|
Kit
|
ed0db95183
|
Find lowercased forms of ordinal words, where possible
|
2018-01-08 03:28:50 +01:00 |
|
Kit
|
9bc524982e
|
Find lowercased forms of numeric words
|
2018-01-08 03:25:08 +01:00 |
|
Kevin Humphreys
|
7918fa4ef9
|
handle would've
|
2018-01-03 12:25:48 -08:00 |
|
zqhZY
|
f27859fa99
|
add ChineseDefaults class for pickling
|
2017-12-28 17:13:58 +08:00 |
|
Søren Lind Kristiansen
|
bef735aef7
|
Fix Danish abbreviation 'm.h.t.'
|
2017-12-21 09:24:31 +01:00 |
|
Ines Montani
|
a3dd167d7f
|
Merge branch 'master' into da_ud_tokenization
|
2017-12-20 21:05:34 +00:00 |
|
Ines Montani
|
97f100f69f
|
Merge pull request #1742 from kimfalk/master
Two corrections in the da lan.
|
2017-12-20 21:02:00 +00:00 |
|
Ines Montani
|
d682a8803e
|
Merge pull request #1672 from cbilgili/master
Adds Turkish Lemmatization
|
2017-12-20 21:01:00 +00:00 |
|
Benjamin Peterson
|
9452134cd1
|
remove no-break spaces from Hindi example (fixes #1750)
|
2017-12-20 11:35:30 -08:00 |
|
Søren Lind Kristiansen
|
7a2f2f6f94
|
Fix formatting.
|
2017-12-20 18:37:37 +01:00 |
|
Søren Lind Kristiansen
|
15d13efafd
|
Tune Danish tokenizer to more closely match tokenization in Universal Dependencies.
|
2017-12-20 17:36:52 +01:00 |
|
Kim FalkJørgensen
|
648dc60755
|
Remove the incorrect exception 'm.h.t'
|
2017-12-20 10:02:39 +01:00 |
|
Kim FalkJørgensen
|
9c9f4ef84a
|
Fixing a translation error in examples.py
Adding an exception in the tokenizer_exceptions.py
|
2017-12-19 15:26:50 +01:00 |
|
ines
|
22dc744b48
|
Fix check for '@' in like_url (see #1715)
|
2017-12-16 13:48:43 +01:00 |
|
Ines Montani
|
6455b574fc
|
Check for email address first
|
2017-12-12 10:25:13 +01:00 |
|
Bri-Will
|
d77361d76c
|
Update lex_attrs.py. Fix like_url from matching on e-mail
|
2017-12-11 14:13:28 -08:00 |
|
Matthew Honnibal
|
2ab0f2d186
|
Merge pull request #1664 from jimregan/italian-lemmatizer
BOM in Italian lemmatiser
|
2017-12-06 11:09:04 +01:00 |
|
Matthew Honnibal
|
3f247119d3
|
Merge pull request #1668 from sorenlind/da_morph
Add more Danish morph rules and clean up existing ones
|
2017-12-06 11:08:09 +01:00 |
|
ines
|
f2ea6d4713
|
Add Dutch example sentences (see #1107)
|
2017-12-01 23:36:05 +01:00 |
|
Canbey Bilgili
|
abe098b255
|
Adds Turkish Lemmatization
|
2017-12-01 17:04:32 +03:00 |
|
Søren Lind Kristiansen
|
d86b537a38
|
Enable morph rules for Danish
|
2017-11-30 15:58:02 +01:00 |
|
Søren Lind Kristiansen
|
13a988adc3
|
Remove 'Number[psor]'
|
2017-11-30 15:55:04 +01:00 |
|
Søren Lind Kristiansen
|
dd6fde18a9
|
Add more Danish morph rules and clean up existing ones
|
2017-11-30 11:17:19 +01:00 |
|
Vadim Mazaev
|
4ba7ddf651
|
Bugfixies
|
2017-11-30 12:29:38 +03:00 |
|
Matthew Honnibal
|
f9ed9ea529
|
Merge pull request #1624 from GreenRiverRUS/russian
Add support for Russian
|
2017-11-29 23:10:01 +01:00 |
|
Jim O'Regan
|
ba6a23fd11
|
BOM in Italian lemmatiser
|
2017-11-29 17:40:07 +00:00 |
|
Ines Montani
|
9052643e2c
|
Merge pull request #1653 from sorenlind/da_example_typo
Fix typo
|
2017-11-27 14:47:42 +00:00 |
|
Søren Lind Kristiansen
|
5fe58b885b
|
Fix typo
|
2017-11-27 15:36:18 +01:00 |
|
Ines Montani
|
d52b1ab245
|
Add unicode_literals (hopefully fixes test failure on Python 2)
|
2017-11-27 15:16:54 +01:00 |
|
Søren Lind Kristiansen
|
0ffd27b0f6
|
Add several Danish alternative spellings
|
2017-11-27 13:35:41 +01:00 |
|
Vadim Mazaev
|
cacd859dcd
|
Added tag map, fixed tests fails, added more exceptions
|
2017-11-26 20:54:48 +03:00 |
|
Søren Lind Kristiansen
|
ef03e9ea53
|
Remove unused import.
|
2017-11-25 13:04:02 +01:00 |
|
Søren Lind Kristiansen
|
6aa241bcec
|
Add day of month tokenizer exceptions for Danish.
|
2017-11-24 15:03:24 +01:00 |
|
Søren Lind Kristiansen
|
0c276ed020
|
Add weekday abbreviations and remove abiguous month abbreviations for Danish.
|
2017-11-24 14:43:29 +01:00 |
|
Søren Lind Kristiansen
|
056547e989
|
Add multiple tokenizer exceptions for Danish.
|
2017-11-24 11:51:26 +01:00 |
|
Søren Lind Kristiansen
|
ac8116510d
|
Fix tokenization of 'i.' for Danish.
|
2017-11-24 11:16:53 +01:00 |
|
Vadim Mazaev
|
81314f8659
|
Fixed tokenizer: added char classes; added first lemmatizer and
tokenizer tests
|
2017-11-21 22:23:59 +03:00 |
|
Vadim Mazaev
|
52ee1f9bf9
|
Updated Russian Language, added lemmatizer, norm exceptions and lex
attrs
|
2017-11-21 11:44:46 +03:00 |
|
Vadim Mazaev
|
a0739a06d4
|
Returned russian support from v1.10 branch
|
2017-11-17 17:06:15 +03:00 |
|
ines
|
c9d72de0fb
|
Add dummy serialization methods for Japanese and missing lang getter (resolves #1557)
|
2017-11-15 12:44:02 +01:00 |
|
Mathias Deschamps
|
c0691b2ab4
|
Add tokenizer exceptions for ing verbs
Extend list of tokenizing exceptions introduced in 123810b
|
2017-11-13 17:46:05 +01:00 |
|
Mathias Deschamps
|
288298ead9
|
Add norm exception for ing verbs
Some ing verbs are sometimes written in or in'. Make the NORM form correct
|
2017-11-13 17:46:05 +01:00 |
|
Abhinav Sharma
|
59f5740ede
|
improved upon the list of included stop_words
|
2017-11-13 17:13:49 +05:30 |
|
ines
|
123810b6de
|
Add "lovin'" to tokenizer exceptions (see #1248)
|
2017-11-09 17:09:30 +01:00 |
|