Commit Graph

5137 Commits

Author SHA1 Message Date
Matthew Honnibal
f9327343ce Start updating serializer test 2017-05-09 18:12:03 +02:00
Matthew Honnibal
1166b0c491 Implement Doc.to_bytes and Doc.from_bytes methods 2017-05-09 18:11:34 +02:00
Matthew Honnibal
9e167b7bb6 Strip serializer from code 2017-05-09 17:28:50 +02:00
Matthew Honnibal
825c6403d8 Remove serializer 2017-05-09 17:28:30 +02:00
Matthew Honnibal
b53f7dfdc3 Remove spacy.serialize 2017-05-09 17:22:06 +02:00
Matthew Honnibal
62ecdea9f2 Add binder class for document serialization 2017-05-09 17:21:00 +02:00
ines
a0b00624bb Make sure like_email returns bool 2017-05-09 11:37:29 +02:00
ines
ea60932e1b Fix formatting 2017-05-09 11:08:14 +02:00
ines
2c3bdd09b1 Add English test for like_num 2017-05-09 11:06:34 +02:00
ines
22375eafb0 Fix and merge attrs and lex_attrs tests 2017-05-09 11:06:25 +02:00
ines
02d0ac5cab Remove redundant function and fix formatting 2017-05-09 11:06:04 +02:00
ines
b5ca50607e Reorganise entity rules 2017-05-09 01:37:10 +02:00
ines
564939391a Remove spacy.orth 2017-05-09 01:21:47 +02:00
ines
12c3d5fbba Fix formatting 2017-05-09 01:15:28 +02:00
ines
2829a024ef Re-add basic like_num check to global lex_attrs 2017-05-09 01:15:23 +02:00
ines
88adeee548 Add English lex_attrs overrides 2017-05-09 01:09:52 +02:00
ines
8f3fbbb147 Fix typos 2017-05-09 01:09:37 +02:00
ines
ea5fa46475 Import LEX_ATTRS from lang.lex_attrs 2017-05-09 00:58:10 +02:00
ines
2216e5f326 Reorganise lex_attrs and add dict 2017-05-09 00:57:54 +02:00
ines
e666f14d20 Add global lex_attrs 2017-05-09 00:41:53 +02:00
ines
41972c43fe Use consistent regex imports 2017-05-09 00:34:31 +02:00
ines
7b83977020 Remove unused munge package 2017-05-09 00:16:16 +02:00
ines
c714841cc8 Move language-specific tests to tests/lang 2017-05-09 00:02:37 +02:00
ines
bd57b611cc Update conftest to lazy load languages 2017-05-09 00:02:21 +02:00
ines
9f0fd5963f Reorganise Hungarian punctuation rules 2017-05-09 00:01:59 +02:00
ines
fc0d793360 Reorganise Bengali punctuation rules 2017-05-09 00:01:52 +02:00
ines
e895d1afd7 Reorganise French punctuation rules 2017-05-09 00:00:54 +02:00
ines
014bda0ae3 Reorganise global punctuation rules 2017-05-09 00:00:46 +02:00
ines
a91278cb32 Rename _URL_PATTERN to URL_PATTERN 2017-05-09 00:00:00 +02:00
ines
604f299cf6 Add char classes to global language data 2017-05-08 23:59:33 +02:00
ines
f6f5d78cb9 Fix formatting 2017-05-08 23:59:17 +02:00
ines
6eb6306843 Fix language data imports 2017-05-08 23:58:31 +02:00
ines
3c0f85de8e Remove imports in /lang/__init__.py 2017-05-08 23:58:07 +02:00
ines
86d9c29f30 Reorder util functions 2017-05-08 23:51:15 +02:00
ines
9a0d2fdef1 Add load_lang_class() util function 2017-05-08 23:50:45 +02:00
ines
614aa09582 Tidy up Bengali tokenizer exceptions 2017-05-08 22:29:49 +02:00
ines
73b577cb01 Fix relative imports 2017-05-08 22:29:04 +02:00
ines
ae99990f63 Fix formatting 2017-05-08 22:23:48 +02:00
ines
f46ffe3e89 Move language data to /lang module 2017-05-08 20:00:40 +02:00
ines
41a322c733 Fix LEMMA in exceptions and morph rules 2017-05-08 19:57:36 +02:00
ines
2edc0aee12 Update warning message 2017-05-08 19:53:36 +02:00
ines
6025cdb992 Fix string interpolation in times 2017-05-08 16:38:16 +02:00
ines
b9ba58ba5c Add function to resolve load name
Warn if old 'path' keyword argument is used.
2017-05-08 16:33:37 +02:00
ines
e6f1a5d0a1 Add unicode declaration 2017-05-08 16:22:17 +02:00
ines
be5541bd16 Fix import and tokenizer exceptions 2017-05-08 16:20:14 +02:00
ines
2324788970 Remove bad tests 2017-05-08 16:15:27 +02:00
ines
b88c4193e7 Add missing symbol 2017-05-08 16:15:20 +02:00
ines
9a5b2bdd4c Don't set morph rules without tag map 2017-05-08 16:15:12 +02:00
ines
4930f0fa8f Explicitly import TOKEN_MATCH 2017-05-08 16:11:54 +02:00
ines
50b7ec03ca Fix typo 2017-05-08 16:11:45 +02:00