DuyguA
f708d7443b
added contractions to stopwords #2020
2018-03-19 14:06:39 +01:00
DuyguA
be4f6da16b
maybe not a good idea to remove also
2018-03-14 14:47:24 +01:00
DuyguA
1a513f71e3
removed also from lookup
2018-03-14 11:57:15 +01:00
DuyguA
cca66abf1e
quick typo fix
2018-03-14 11:34:22 +01:00
Kit
9bc524982e
Find lowercased forms of numeric words
2018-01-08 03:25:08 +01:00
Kevin Humphreys
7918fa4ef9
handle would've
2018-01-03 12:25:48 -08:00
Mathias Deschamps
c0691b2ab4
Add tokenizer exceptions for ing verbs
...
Extend list of tokenizing exceptions introduced in 123810b
2017-11-13 17:46:05 +01:00
Mathias Deschamps
288298ead9
Add norm exception for ing verbs
...
Some ing verbs are sometimes written in or in'. Make the NORM form correct
2017-11-13 17:46:05 +01:00
ines
123810b6de
Add "lovin'" to tokenizer exceptions (see #1248 )
2017-11-09 17:09:30 +01:00
ines
acb9bdb852
Fix PRON_LEMMA imports
2017-11-06 17:41:53 +01:00
ines
819e30a26e
Tidy up tokenizer exceptions
2017-11-01 23:02:45 +01:00
ines
9659391944
Update deprecated methods and add warnings
2017-11-01 16:49:42 +01:00
ines
7e424a1804
Don't copy exception dicts if not necessary and tidy up
2017-10-31 21:05:29 +01:00
Ines Montani
d3bf488e16
Merge pull request #1171 from mollerhoj/support-danish
...
Improve basic support for Danish
2017-10-24 20:29:57 +02:00
Matthew Honnibal
66766c1454
Restore SP tag to English tag_map, until models migrate
2017-10-24 17:05:00 +02:00
Ines Montani
facf77e541
Merge branch 'develop' into support-danish
2017-10-24 11:53:19 +02:00
Matthew Honnibal
49895fbef6
Rename 'SP' special tag to '_SP'
...
Renaming the tag with an underscore lets us add it to the tag map
without worrying that we'll change the sequence of tags, which throws
off the tag-to-ID mapping. For instance, if we inserted a 'SP' tag,
the "VERB" tag is pushed to a different class ID, and the model is all
messed up.
2017-10-20 14:01:12 +02:00
Matthew Honnibal
839de87ca9
Make lambda func a named function, for pickling
2017-10-17 18:21:20 +02:00
ines
38c756fd85
Port over changes from #1287
2017-10-14 13:16:21 +02:00
ines
8ce6f96180
Don't make copies of language data components
2017-10-11 15:34:55 +02:00
ines
417d45f5d0
Add lemmatizer data as variable on language data
...
Don't create lookup lemmatizer within Language class and just pass in
the data so it can be set on Token creation
2017-10-11 02:24:58 +02:00
ines
0c2343d73a
Tidy up language data
2017-10-11 02:22:49 +02:00
Matthew Honnibal
b29e6bff46
Improve lemmatization rule for am|VBP
2017-09-04 15:18:10 +02:00
ines
a68dc891ea
Port over changes from #1281
2017-08-21 23:19:18 +02:00
ines
1fe5e1a4d1
Add language example sentences (see #1107 )
...
da, de, en, es, fr, he, it, nb, pl, pt, sv
2017-08-19 12:22:29 +02:00
mollerhoj
23025d3b05
Clean up a couple of strange English stopwords
2017-07-03 15:41:59 +02:00
Matthew Honnibal
e28f90b672
Fix syntax iterators
2017-06-04 15:51:50 -05:00
Matthew Honnibal
3f5c85d8de
Reorder setting of lex attrs, to avoid clobbering
2017-06-03 14:47:55 -05:00
Matthew Honnibal
de3954843e
Populate norm exceptions with lower-case
2017-06-03 14:47:12 -05:00
ines
5bd311c77e
Fix update of norm exceptions
2017-06-03 20:54:09 +02:00
ines
746653880c
Add English norm exceptions to lex_attrs
2017-06-03 20:27:28 +02:00
ines
095eeeb12f
Update English tokenizer exceptions and add norms
2017-06-03 20:27:16 +02:00
ines
33e332e67c
Remove unused export
2017-05-28 00:57:59 +02:00
Matthew Honnibal
5db89053aa
Merge docstrings
2017-05-21 13:46:23 -05:00
ines
924e8506de
Move Defaults subclass to module scope (necessary for pickling)
2017-05-20 19:02:27 +02:00
Matthew Honnibal
61fe55efba
Move EnglishDefaults class out of English
2017-05-20 02:18:19 -05:00
ines
1a05078c79
Add language-specific syntax iterators to en and de
2017-05-17 12:04:03 +02:00
ines
2f870123bf
Fix formatting
2017-05-12 15:37:20 +02:00
ines
12c3d5fbba
Fix formatting
2017-05-09 01:15:28 +02:00
ines
88adeee548
Add English lex_attrs overrides
2017-05-09 01:09:52 +02:00
ines
73b577cb01
Fix relative imports
2017-05-08 22:29:04 +02:00
ines
f46ffe3e89
Move language data to /lang module
2017-05-08 20:00:40 +02:00