Matthew Honnibal
|
48c712f1c1
|
Merge branch 'master' of ssh://github.com/explosion/spaCy
|
2017-01-16 13:18:06 +01:00 |
|
Matthew Honnibal
|
7ccf490c73
|
Increment version
|
2017-01-16 13:17:58 +01:00 |
|
Ines Montani
|
50878ef598
|
Exclude "were" and "Were" from tokenizer exceptions and add regression test (resolves #744)
|
2017-01-16 13:10:38 +01:00 |
|
Ines Montani
|
e053c7693b
|
Fix formatting
|
2017-01-16 13:09:52 +01:00 |
|
Ines Montani
|
116c675c3c
|
Merge pull request #742 from oroszgy/hu_tokenizer_fix
Improved Hungarian tokenizer
|
2017-01-14 23:52:44 +01:00 |
|
Gyorgy Orosz
|
92345b6a41
|
Further numeric test.
|
2017-01-14 22:44:19 +01:00 |
|
Gyorgy Orosz
|
b4df202bfa
|
Better error handling
|
2017-01-14 22:24:58 +01:00 |
|
Gyorgy Orosz
|
b03a46792c
|
Better error handling
|
2017-01-14 22:09:29 +01:00 |
|
Gyorgy Orosz
|
a45f22913f
|
Added further abbreviations present in the Szeged corpus
|
2017-01-14 22:08:55 +01:00 |
|
Ines Montani
|
332ce2d758
|
Update README.md
|
2017-01-14 21:12:11 +01:00 |
|
Gyorgy Orosz
|
9505c6a72b
|
Passing all old tests.
|
2017-01-14 20:39:21 +01:00 |
|
Gyorgy Orosz
|
63037e79af
|
Fixed hyphen handling in the Hungarian tokenizer.
|
2017-01-14 16:30:11 +01:00 |
|
Gyorgy Orosz
|
f77c0284d6
|
Maintaining compatibility with other spacy tokenizers.
|
2017-01-14 16:19:15 +01:00 |
|
Gyorgy Orosz
|
be7a7aeb1a
|
Reversed accidental changes.
|
2017-01-14 15:59:36 +01:00 |
|
Gyorgy Orosz
|
1be5da1ac6
|
Fixed Hungarian tokenizer for numbers
|
2017-01-14 15:51:59 +01:00 |
|
Ines Montani
|
a89e269a5a
|
Fix test formatting and consistency
|
2017-01-14 13:41:19 +01:00 |
|
Ines Montani
|
3424e3a7e5
|
Update README.md
|
2017-01-13 15:54:54 +01:00 |
|
Ines Montani
|
49186b34a1
|
Mark lemmatizer tests as models since they use installed data
|
2017-01-13 15:12:07 +01:00 |
|
Ines Montani
|
138deb80a1
|
Modernise vector tests, use add_vecs_to_vocab and don't depend on models
|
2017-01-13 15:12:07 +01:00 |
|
Ines Montani
|
96f0caa28a
|
Fix test name for consistency
|
2017-01-13 15:12:07 +01:00 |
|
Ines Montani
|
dc2bb1259f
|
Add util function to add vectors to vocab
|
2017-01-13 15:12:07 +01:00 |
|
Ines Montani
|
db9b25663d
|
Reformat add_docs_equal and add docstring
|
2017-01-13 15:12:07 +01:00 |
|
Ines Montani
|
62ce0a0073
|
Add README.md to tests to explain organisation and conventions
|
2017-01-13 15:11:18 +01:00 |
|
Ines Montani
|
38d60f6b90
|
Modernise serializer I/O tests and don't depend on models where possible
|
2017-01-13 02:24:56 +01:00 |
|
Ines Montani
|
4bb5b89ee4
|
Add text_file_b fixture using BytesIO
|
2017-01-13 02:23:50 +01:00 |
|
Ines Montani
|
49febd8c62
|
Modernise noun chunks tests and don't depend on models
|
2017-01-13 02:01:00 +01:00 |
|
Ines Montani
|
3ee97b5686
|
Rename test_parser to test_noun_chunks
|
2017-01-13 01:36:33 +01:00 |
|
Ines Montani
|
a308703f47
|
Remove old tests
|
2017-01-13 01:34:48 +01:00 |
|
Ines Montani
|
12eb8edf26
|
Move parser tests from unit to parser
|
2017-01-13 01:34:38 +01:00 |
|
Ines Montani
|
138c53ff2e
|
Merge tokenizer tests
|
2017-01-13 01:34:14 +01:00 |
|
Ines Montani
|
01f36ca3ff
|
Move attrs tests from unit to root and modernise
|
2017-01-13 01:33:50 +01:00 |
|
Ines Montani
|
3610d27967
|
Move alignment tests from munge to gold and modernise
|
2017-01-13 01:33:31 +01:00 |
|
Ines Montani
|
094ff7396a
|
Reformat and rename Pragmatic Segmenter tests and mark xfails
|
2017-01-13 01:30:20 +01:00 |
|
Ines Montani
|
affcf1b19d
|
Modernise lemmatizer tests
|
2017-01-12 23:41:17 +01:00 |
|
Ines Montani
|
33d9cf87f9
|
Modernise tagger tests and fix xpassing test
|
2017-01-12 23:40:52 +01:00 |
|
Ines Montani
|
33e5f8dc2e
|
Create basic and extended test set for URLs
|
2017-01-12 23:40:02 +01:00 |
|
Ines Montani
|
5e4f5ebfc8
|
Modernise BILUO tests
|
2017-01-12 23:39:18 +01:00 |
|
Ines Montani
|
09acfbca01
|
Add Lemmatizer fixture
|
2017-01-12 23:38:55 +01:00 |
|
Ines Montani
|
514bfa2597
|
Add path fixture for spaCy data path
|
2017-01-12 23:38:47 +01:00 |
|
Ines Montani
|
0894b8c0ef
|
Don't split tokens with digits and "/" infixes (resolves #740)
|
2017-01-12 22:58:26 +01:00 |
|
Ines Montani
|
e9e99a5670
|
Add regression test for #740
|
2017-01-12 22:57:38 +01:00 |
|
Ines Montani
|
6935d55409
|
Fix formatting
|
2017-01-12 22:56:20 +01:00 |
|
Ines Montani
|
5f0d196a31
|
Modernise and merge matcher tests
|
2017-01-12 22:23:11 +01:00 |
|
Ines Montani
|
d5d774413a
|
Update comments on EN and DE fixtures
|
2017-01-12 22:03:07 +01:00 |
|
Ines Montani
|
9b4bea1df9
|
Tidy up and rename regression tests and remove unnecessary imports
|
2017-01-12 22:00:37 +01:00 |
|
Ines Montani
|
5e1b6178e3
|
Fix formatting and consistency
|
2017-01-12 22:00:06 +01:00 |
|
Ines Montani
|
a3fd32455e
|
Remove redundant language loading integration tests
|
2017-01-12 21:59:48 +01:00 |
|
Ines Montani
|
61f1ca09c2
|
Modernise serializer codecs tests
|
2017-01-12 21:58:55 +01:00 |
|
Ines Montani
|
5dbc6e59f6
|
Modernise Huffman tests
|
2017-01-12 21:58:40 +01:00 |
|
Ines Montani
|
edeeeccea5
|
Modernise packer tests and don't depend on models where possible
|
2017-01-12 21:58:07 +01:00 |
|
Ines Montani
|
d084676cd0
|
Modernise and merge serialization tests
|
2017-01-12 21:57:19 +01:00 |
|
Ines Montani
|
442237787c
|
Add assert_docs_equal util to compare two docs
|
2017-01-12 21:56:52 +01:00 |
|
Ines Montani
|
eac3f700fb
|
Add fixture for entity recognizer
|
2017-01-12 21:56:32 +01:00 |
|
Ines Montani
|
b438cfddbc
|
Modernise matcher tests and split into two files
|
2017-01-12 17:51:46 +01:00 |
|
Ines Montani
|
27482ebed8
|
Move matcher tests for #188 and #242 to regression tests
Modernise tests and remove unnecessary imports
|
2017-01-12 17:33:57 +01:00 |
|
Ines Montani
|
0a4dc632bd
|
Update test to not create redundant Doc object
|
2017-01-12 17:33:18 +01:00 |
|
Ines Montani
|
a2526e66d8
|
Fix formatting, naming and unicode declaration
|
2017-01-12 16:51:13 +01:00 |
|
Ines Montani
|
052cdff07d
|
Modernise vector similarity tests
|
2017-01-12 16:51:13 +01:00 |
|
Ines Montani
|
bd20ec0a6a
|
Add get_cosine util function
|
2017-01-12 16:51:13 +01:00 |
|
Ines Montani
|
51ef75f629
|
Fix regression test for #615 and remove unnecessary imports
|
2017-01-12 16:51:12 +01:00 |
|
Ines Montani
|
aeb747e10c
|
Adjust formatting
|
2017-01-12 16:51:12 +01:00 |
|
Ines Montani
|
8e3e58a7e6
|
Modernise and merge lexeme vocab tests
|
2017-01-12 16:51:12 +01:00 |
|
Ines Montani
|
c3d4516fc2
|
Move test for #361 to regression tests
|
2017-01-12 16:51:12 +01:00 |
|
Daniel Hershcovich
|
99eb494a82
|
Fix #737: support loading word vectors with " " as a word
|
2017-01-12 17:00:14 +02:00 |
|
Ines Montani
|
7cb3d74426
|
Modernise span tests and don't depend on models
|
2017-01-12 15:30:49 +01:00 |
|
Ines Montani
|
92e3d8b3ee
|
Modernise vocab API tests and remove old xfailing tests
|
2017-01-12 15:27:46 +01:00 |
|
Ines Montani
|
7ea87684cd
|
Rename test_vocab.py to test_vocab_api.py
|
2017-01-12 15:12:21 +01:00 |
|
Ines Montani
|
0da2ee5c68
|
Merge flag features tests into orth tests in tests root
|
2017-01-12 15:12:00 +01:00 |
|
Ines Montani
|
03c136cfd3
|
Remove StringStore tests from vocab tests
|
2017-01-12 15:11:15 +01:00 |
|
Ines Montani
|
d7bd57abdf
|
Modernise add vectors vocab test
|
2017-01-12 15:09:49 +01:00 |
|
Ines Montani
|
89525ef345
|
Use consistent test names
|
2017-01-12 15:09:21 +01:00 |
|
Ines Montani
|
f8803808ce
|
Remove old unused tests and conftest files
|
2017-01-12 15:09:05 +01:00 |
|
Ines Montani
|
4d0bfebcd9
|
Move Pragmatic Segmenter test cases (currently unused) to parser tests
|
2017-01-12 15:08:02 +01:00 |
|
Ines Montani
|
26d018d874
|
Add tests for StringStore
|
2017-01-12 15:07:31 +01:00 |
|
Ines Montani
|
9b6784bab5
|
Add fixture for StringStore
|
2017-01-12 15:05:40 +01:00 |
|
Ines Montani
|
99d66d613a
|
Modernise tests for merging spans and don't depend on models
|
2017-01-12 12:26:26 +01:00 |
|
Ines Montani
|
fa8f67596d
|
Remove unused old test
|
2017-01-12 12:26:08 +01:00 |
|
Ines Montani
|
359f73a96b
|
Move test for #54 to regression tests
|
2017-01-12 12:25:51 +01:00 |
|
Ines Montani
|
3f3a46722c
|
Remove unused conftest
|
2017-01-12 12:25:24 +01:00 |
|
Ines Montani
|
c2406e92bc
|
Allow setting ents in get_doc
|
2017-01-12 12:25:10 +01:00 |
|
Ines Montani
|
c5914c6fe5
|
Fix and pass regression test for #736
|
2017-01-12 11:48:56 +01:00 |
|
Matthew Honnibal
|
4e48862fa8
|
Remove print statement
|
2017-01-12 11:25:39 +01:00 |
|
Matthew Honnibal
|
d1d8214767
|
Increment version
|
2017-01-12 11:21:57 +01:00 |
|
Matthew Honnibal
|
fba67fa342
|
Fix Issue #736: Times were being tokenized with incorrect string values.
|
2017-01-12 11:21:01 +01:00 |
|
Ines Montani
|
a6790b6694
|
Rename tags to pos in get_doc and allow adding tags to tokens
|
2017-01-12 11:18:36 +01:00 |
|
Ines Montani
|
1add8ace67
|
Merge lemmatizer tests
|
2017-01-12 11:16:53 +01:00 |
|
Ines Montani
|
3bc082abdf
|
Modernise morph exceptions test and don't depend on models
|
2017-01-12 11:14:29 +01:00 |
|
Ines Montani
|
ec7739b76e
|
Add regression test for #736
|
2017-01-12 11:12:44 +01:00 |
|
Ines Montani
|
6c1c564891
|
Move language-specific tests out of redundant tokenizer directories
|
2017-01-12 02:17:18 +01:00 |
|
Ines Montani
|
8fecedac3a
|
Tidy up
|
2017-01-12 02:16:37 +01:00 |
|
Ines Montani
|
ae7edd30e7
|
Move text file back to tokenizer tests directory
|
2017-01-12 02:10:23 +01:00 |
|
Ines Montani
|
ffcaba9017
|
Remove old and/or redundant tests
|
2017-01-12 02:10:18 +01:00 |
|
Ines Montani
|
19c4132097
|
Modernise space attachment parser tests and don't depend on models
|
2017-01-12 01:54:44 +01:00 |
|
Ines Montani
|
69778924c8
|
Modernise and merge parser tests and don't depend on models
|
2017-01-12 01:07:29 +01:00 |
|
Ines Montani
|
178c147612
|
Modernise nonprojectivity tests and don't depend on models
|
2017-01-12 01:06:36 +01:00 |
|
Ines Montani
|
1a3984742c
|
Modernise sentence boundary detection tests and don't depend on models (where possible)
|
2017-01-11 23:53:08 +01:00 |
|
Ines Montani
|
0cdb6ea61d
|
Remove old unused pickle test
|
2017-01-11 23:52:28 +01:00 |
|
Ines Montani
|
c9671329dc
|
Move test for #309 to regression tests
|
2017-01-11 23:52:13 +01:00 |
|
Ines Montani
|
d0e37b5670
|
Modernise parser tests and don't depend on models
|
2017-01-11 21:30:27 +01:00 |
|
Ines Montani
|
342cb41782
|
Add apply_transition_sequence util function to utils
|
2017-01-11 21:30:14 +01:00 |
|
Ines Montani
|
09807addff
|
Add en_parser fixture
|
2017-01-11 21:29:59 +01:00 |
|
Ines Montani
|
55d151aa61
|
Modernise Doc parse tree navigation tests and don't depend on models
|
2017-01-11 21:14:15 +01:00 |
|
Ines Montani
|
7262421bb2
|
Use consistent test names
|
2017-01-11 19:00:52 +01:00 |
|
Ines Montani
|
33800c9367
|
Rename "tokens" tests to "doc"
|
2017-01-11 18:59:01 +01:00 |
|
Ines Montani
|
3a9c6a9563
|
Remove old unused files
|
2017-01-11 18:58:38 +01:00 |
|
Ines Montani
|
8e962de39f
|
Remove old word vector tests
|
2017-01-11 18:55:08 +01:00 |
|
Ines Montani
|
e027936920
|
Modernise Doc noun chunks tests
|
2017-01-11 18:54:56 +01:00 |
|
Ines Montani
|
439f396acd
|
Modernise Doc array tests and don't depend on models
|
2017-01-11 18:54:46 +01:00 |
|
Ines Montani
|
05447be884
|
Modernise test for adding entities
|
2017-01-11 18:54:24 +01:00 |
|
Ines Montani
|
6e883f4c00
|
Modernise Doc API tests and don't depend on models
|
2017-01-11 18:05:36 +01:00 |
|
Ines Montani
|
8bf3bb5c44
|
Make words optional for get_doc
|
2017-01-11 18:05:10 +01:00 |
|
Ines Montani
|
928db7e419
|
Fix StringIO import for Python 3
|
2017-01-11 14:07:48 +01:00 |
|
Ines Montani
|
69998f216b
|
Rename test_tokens_api.py to test_doc_api.py
|
2017-01-11 13:58:56 +01:00 |
|
Ines Montani
|
d94dea1b18
|
Merge token tests into token API tests
|
2017-01-11 13:57:02 +01:00 |
|
Ines Montani
|
eb23424ab0
|
Modernise token API tests and don't depend on loading models
|
2017-01-11 13:56:54 +01:00 |
|
Ines Montani
|
c682b8ca90
|
Merge conftests into one cohesive file
|
2017-01-11 13:56:32 +01:00 |
|
Ines Montani
|
909f24d7df
|
Add test utils and get_doc helper function
Create Doc object from given vocab, words and annotations to allow
tests not to depend on loading the models.
|
2017-01-11 13:55:33 +01:00 |
|
Matthew Honnibal
|
e12c90e03f
|
Merge branch 'master' of ssh://github.com/explosion/spaCy
|
2017-01-11 13:03:51 +01:00 |
|
Matthew Honnibal
|
12cd27b821
|
Amend 8ae8b443f: Handle comparison with None tokens.
|
2017-01-11 13:03:32 +01:00 |
|
Daniel Hershcovich
|
8e603cc917
|
Avoid "True if ... else False"
|
2017-01-11 11:18:22 +02:00 |
|
Matthew Honnibal
|
44e2b0100d
|
Support TAG attribute in doc.from_array
|
2017-01-10 22:47:07 +01:00 |
|
Ines Montani
|
3e6e1f0251
|
Tidy up regression tests
|
2017-01-10 19:24:10 +01:00 |
|
Magnus Burton
|
aad23ab0b4
|
Supplemented with capitalized Swedish exceptions
|
2017-01-10 16:07:20 +01:00 |
|
Ines Montani
|
869963c3c4
|
Mark extensive prefix/suffix tests as slow
|
2017-01-10 15:57:35 +01:00 |
|
Ines Montani
|
487e020ebe
|
Add simple test for surrounding brackets
|
2017-01-10 15:57:26 +01:00 |
|
Ines Montani
|
0ba5cf51d2
|
Assert length first
|
2017-01-10 15:57:00 +01:00 |
|
Ines Montani
|
2185d31907
|
Adjust names and formatting
|
2017-01-10 15:56:35 +01:00 |
|
Ines Montani
|
e10d4ca964
|
Remove semi-redundant URLs and punctuation for faster testing
|
2017-01-10 15:54:25 +01:00 |
|
Ines Montani
|
3a3cb2c90c
|
Add unicode declaration
|
2017-01-10 15:53:15 +01:00 |
|
Matthew Honnibal
|
0f9b8a00a5
|
Unbreak data download
|
2017-01-09 23:40:26 +01:00 |
|
Matthew Honnibal
|
8ae8b443f1
|
Add richcmp method to Token. Closes #631
|
2017-01-09 19:30:31 +01:00 |
|
Matthew Honnibal
|
64f747cb65
|
Token comparison test
|
2017-01-09 19:12:00 +01:00 |
|
Matthew Honnibal
|
18c3c2d05c
|
Add tests for token comparison, re Issue #631
|
2017-01-09 19:09:59 +01:00 |
|
Matthew Honnibal
|
97a1286129
|
Revert changes to tagger and parser for thinc 6
|
2017-01-09 10:08:34 -06:00 |
|
Matthew Honnibal
|
95a52005df
|
Revert "Fix Issue #683: Add 'SP' to tag_map, if it's not there already, within the Morphology class."
This reverts commit 40e71586d6 .
|
2017-01-09 09:55:55 -06:00 |
|
Ines Montani
|
363f09e68c
|
Merge pull request #726 from magnusburton/master
Added Swedish abbreviations as token exceptions
|
2017-01-09 14:58:15 +01:00 |
|
Matthew Honnibal
|
42cd598f57
|
Use correct fixtures in URL tokenizer
|
2017-01-09 14:10:40 +01:00 |
|
Matthew Honnibal
|
d9a77ddf14
|
Return None for data path if it doesn't exist
|
2017-01-09 14:10:05 +01:00 |
|
Matthew Honnibal
|
e4862d1dab
|
Merge branch 'develop'
|
2017-01-09 13:36:01 +01:00 |
|
Ines Montani
|
aa876884f0
|
Revert "Revert "Merge remote-tracking branch 'origin/master'""
This reverts commit fb9d3bb022 .
|
2017-01-09 13:28:13 +01:00 |
|
Ines Montani
|
d5c72c40eb
|
Remove old tests for old website example code
|
2017-01-08 22:28:53 +01:00 |
|
Ines Montani
|
eef94e3ee2
|
Split off period after two or more uppercase letters (fixes #483)
|
2017-01-08 22:28:25 +01:00 |
|
Ines Montani
|
a89a6000e5
|
Remove unused import
|
2017-01-08 22:17:37 +01:00 |
|
Ines Montani
|
5d28664fc5
|
Don't test Hungarian for numbers and hyphens for now
Reinvestigate behaviour of case affixes given reorganised tokenizer
patterns.
|
2017-01-08 20:45:40 +01:00 |
|
Ines Montani
|
53362b6b93
|
Reorganise Hungarian prefixes/suffixes/infixes
Use global prefixes and suffixes for non-language-specific rules,
import list of alpha unicode characters and adjust regexes.
|
2017-01-08 20:40:33 +01:00 |
|
Ines Montani
|
347c4a2d06
|
Reorganise and reformat global tokenizer prefixes, suffixes and infixes
|
2017-01-08 20:37:39 +01:00 |
|
Ines Montani
|
0dec90e9f7
|
Use global abbreviation data languages and remove duplicates
|
2017-01-08 20:36:00 +01:00 |
|
Ines Montani
|
7c3cb2a652
|
Add global abbreviations data
|
2017-01-08 20:34:03 +01:00 |
|
Ines Montani
|
de5aa92bc2
|
Handle deprecated tokenizer prefix data
|
2017-01-08 20:33:28 +01:00 |
|
Ines Montani
|
abb09782f9
|
Move sun.txt to original location and fix path to not break parser tests
|
2017-01-08 20:32:54 +01:00 |
|
Ines Montani
|
cab39c59c5
|
Add missing contractions to English tokenizer exceptions
Inspired by
https://github.com/kootenpv/contractions/blob/master/contractions/__init
__.py
|
2017-01-05 19:59:06 +01:00 |
|
Ines Montani
|
a23504fe07
|
Move abbreviations below other exceptions
|
2017-01-05 19:58:07 +01:00 |
|
Ines Montani
|
7d2cf934b9
|
Generate he/she/it correctly with 's instead of 've
|
2017-01-05 19:57:00 +01:00 |
|
Ines Montani
|
8328925e1f
|
Add newlines to long German text
|
2017-01-05 18:13:30 +01:00 |
|
Ines Montani
|
55b46d7cf6
|
Add tokenizer tests for German
|
2017-01-05 18:11:25 +01:00 |
|
Ines Montani
|
5bb4081f52
|
Remove redundant test_tokenizer.py for English
|
2017-01-05 18:11:11 +01:00 |
|
Ines Montani
|
8216ba599b
|
Add tests for longer and mixed English texts
|
2017-01-05 18:11:04 +01:00 |
|
Ines Montani
|
65f937d5c6
|
Move basic contraction tests to test_contractions.py
|
2017-01-05 18:09:53 +01:00 |
|
Ines Montani
|
bbe7cab3a1
|
Move non-English-specific tests back to general tokenizer tests
|
2017-01-05 18:09:29 +01:00 |
|
Ines Montani
|
038002d616
|
Reformat HU tokenizer tests and adapt to general style
Improve readability of test cases and add conftest.py with fixture
|
2017-01-05 18:06:44 +01:00 |
|
Ines Montani
|
bc911322b3
|
Move ") to emoticons (see Tweebo challenge test)
|
2017-01-05 18:05:38 +01:00 |
|
Ines Montani
|
637f785036
|
Add general sanity tests for all tokenizers
|
2017-01-05 16:25:38 +01:00 |
|
Ines Montani
|
c5f2dc15de
|
Move English tokenizer tests to directory /en
|
2017-01-05 16:25:04 +01:00 |
|
Ines Montani
|
8b45363b4d
|
Modernize and merge general tokenizer tests
|
2017-01-05 13:17:05 +01:00 |
|
Ines Montani
|
02cfda48c9
|
Modernize and merge tokenizer tests for string loading
|
2017-01-05 13:16:55 +01:00 |
|
Ines Montani
|
a11f684822
|
Modernize and merge tokenizer tests for whitespace
|
2017-01-05 13:16:33 +01:00 |
|
Ines Montani
|
8b284fc6f1
|
Modernize and merge tokenizer tests for text from file
|
2017-01-05 13:15:52 +01:00 |
|
Ines Montani
|
2c2e878653
|
Modernize and merge tokenizer tests for punctuation
|
2017-01-05 13:14:16 +01:00 |
|
Ines Montani
|
8a74129cdf
|
Modernize and merge tokenizer tests for prefixes/suffixes/infixes
|
2017-01-05 13:13:12 +01:00 |
|
Ines Montani
|
0e65dca9a5
|
Modernize and merge tokenizer tests for exception and emoticons
|
2017-01-05 13:11:31 +01:00 |
|
Ines Montani
|
34c47bb20d
|
Fix formatting
|
2017-01-05 13:10:51 +01:00 |
|
Ines Montani
|
2e72683baa
|
Add missing docstrings
|
2017-01-05 13:10:21 +01:00 |
|
Ines Montani
|
da10a049a6
|
Add unicode declarations
|
2017-01-05 13:09:48 +01:00 |
|
Ines Montani
|
58adae8774
|
Remove unused file
|
2017-01-05 13:09:22 +01:00 |
|
Ines Montani
|
c6e5a5349d
|
Move regression test for #360 into own file
|
2017-01-04 00:49:31 +01:00 |
|
Ines Montani
|
8279993a6f
|
Modernize and merge tokenizer tests for punctuation
|
2017-01-04 00:49:20 +01:00 |
|
Ines Montani
|
550630df73
|
Update tokenizer tests for contractions
|
2017-01-04 00:48:42 +01:00 |
|
Ines Montani
|
109f202e8f
|
Update conftest fixture
|
2017-01-04 00:48:21 +01:00 |
|
Ines Montani
|
ee6b49b293
|
Modernize tokenizer tests for emoticons
|
2017-01-04 00:47:59 +01:00 |
|
Ines Montani
|
f09b5a5dfd
|
Modernize tokenizer tests for infixes
|
2017-01-04 00:47:42 +01:00 |
|
Ines Montani
|
59059fed27
|
Move regression test for #351 to own file
|
2017-01-04 00:47:11 +01:00 |
|
Ines Montani
|
667051375d
|
Modernize tokenizer tests for whitespace
|
2017-01-04 00:46:35 +01:00 |
|
Ines Montani
|
aafc894285
|
Modernize tokenizer tests for contractions
Use @pytest.mark.parametrize.
|
2017-01-03 23:02:21 +01:00 |
|
Ines Montani
|
1d237664af
|
Add lowercase lemma to tokenizer exceptions
|
2017-01-03 23:02:21 +01:00 |
|
Ines Montani
|
84a87951eb
|
Fix typos
|
2017-01-03 18:27:43 +01:00 |
|
Ines Montani
|
35b39f53c3
|
Reorganise English tokenizer exceptions (as discussed in #718)
Add logic to generate exceptions that follow a consistent pattern (like
verbs and pronouns) and allow certain tokens to be excluded explicitly.
|
2017-01-03 18:26:09 +01:00 |
|
Ines Montani
|
fb9d3bb022
|
Revert "Merge remote-tracking branch 'origin/master'"
This reverts commit d3b181cdf1 , reversing
changes made to b19cfcc144 .
|
2017-01-03 18:21:36 +01:00 |
|
Ines Montani
|
461cbb99d8
|
Revert "Reorganise English tokenizer exceptions (as discussed in #718)"
This reverts commit b19cfcc144 .
|
2017-01-03 18:21:29 +01:00 |
|
Ines Montani
|
d3b181cdf1
|
Merge remote-tracking branch 'origin/master'
# Conflicts:
# spacy/en/tokenizer_exceptions.py
|
2017-01-03 18:20:01 +01:00 |
|
Ines Montani
|
b19cfcc144
|
Reorganise English tokenizer exceptions (as discussed in #718)
Add logic to generate exceptions that follow a consistent pattern (like
verbs and pronouns) and allow certain tokens to be excluded explicitly.
|
2017-01-03 18:17:57 +01:00 |
|
Ines Montani
|
1bd53bbf89
|
Fix typos (resolves #718)
|
2017-01-03 11:26:21 +01:00 |
|
Matthew Honnibal
|
fde53be3b4
|
Move whole token mach inside _split_affixes.
|
2016-12-30 17:11:50 -06:00 |
|
Matthew Honnibal
|
3ba7c167a8
|
Fix URL tests
|
2016-12-30 17:10:08 -06:00 |
|
Matthew Honnibal
|
9936a1b9b5
|
Merge branch 'tokenization_w_exception_patterns' of https://github.com/oroszgy/spaCy.hu into oroszgy-tokenization_w_exception_patterns
|
2016-12-30 14:53:40 -06:00 |
|
Magnus Burton
|
56e2219b65
|
Added Swedish city abbreviations
|
2016-12-30 21:17:34 +01:00 |
|
Magnus Burton
|
e935c950d8
|
Added months and days as abbreviations for Swedish
|
2016-12-30 21:08:44 +01:00 |
|
Matthew Honnibal
|
3e8d9c772e
|
Test interaction of token_match and punctuation
Check that the new token_match function applies after punctuation is split off.
|
2016-12-31 00:52:17 +11:00 |
|
Matthew Honnibal
|
74b921f394
|
Merge branch 'master' of ssh://github.com/explosion/spaCy into develop
|
2016-12-30 14:38:27 +01:00 |
|
Matthew Honnibal
|
623d94e14f
|
Whitespace
|
2016-12-31 00:30:28 +11:00 |
|
Matthew Honnibal
|
af81ac8bb0
|
Use thinc 6.0
|
2016-12-29 11:58:42 +01:00 |
|