Matthew Honnibal
7d4a52a4d0
Set version to v2.1.0a7
2019-02-16 17:48:34 +01:00
Matthew Honnibal
07617b6b7f
Set version to v2.1.0a7.dev12
2019-02-16 17:30:29 +01:00
Matthew Honnibal
808ae7521b
Require thinc 7.0.1
2019-02-16 17:29:57 +01:00
Matthew Honnibal
1dc314bada
Set version to v2.1.0a7.dev11
2019-02-16 17:02:49 +01:00
Matthew Honnibal
eea3001b98
Depend on thinc 7.0.1.dev2
2019-02-16 17:02:30 +01:00
Matthew Honnibal
2ef227c313
Set version to v2.1.0a7.dev1
2019-02-16 16:22:46 +01:00
Matthew Honnibal
f456b673d4
Require thinc 7.0.1.dev1
2019-02-16 16:22:26 +01:00
Matthew Honnibal
22923b9cb1
Set version to v2.1.0a7.dev9
2019-02-16 15:47:19 +01:00
Matthew Honnibal
11e826ac3b
Require thinc v7.0.1.dev0
2019-02-16 15:47:02 +01:00
Matthew Honnibal
e0c91a4c8d
Set version to 2.1.0a7
2019-02-16 14:43:38 +01:00
Matthew Honnibal
92b6bd2977
Refinements to retokenize.split() function ( #3282 )
...
* Change retokenize.split() API for heads
* Pass lists as values for attrs in split
* Fix test_doc_split filename
* Add error for mismatched tokens after split
* Raise error if new tokens don't match text
* Fix doc test
* Fix error
* Move deps under attrs
* Fix split tests
* Fix retokenize.split
2019-02-15 17:32:31 +01:00
Matthew Honnibal
2dbc61bc26
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2019-02-15 14:03:54 +01:00
Ines Montani
1aa57690dc
Add xfailing test for orth mismatch in retokenizer.split
2019-02-15 13:55:04 +01:00
Ines Montani
819768483f
Add xfailing test for out-of-bounds heads
2019-02-15 13:09:07 +01:00
Ines Montani
d8051e89ca
Tidy up tests
2019-02-15 12:56:51 +01:00
Matthew Honnibal
58aac58631
Set version to v2.1.0a7.dev8
2019-02-15 12:39:26 +01:00
Matthew Honnibal
4c49f5f7b0
Update Thinc dependency
2019-02-15 12:39:08 +01:00
Matthew Honnibal
5f1abe2cc7
Set version to v2.1.0a7.dev7
2019-02-15 10:30:53 +01:00
Matthew Honnibal
a66e8e0c8a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2019-02-15 10:30:22 +01:00
Ines Montani
c31a9dabd5
💫 Add en/em dash to prefixes and suffixes ( #3281 )
...
* Auto-format
* Add en/em dash to prefixes and suffixes
2019-02-15 10:29:59 +01:00
Ines Montani
5651a0d052
💫 Replace {Doc,Span}.merge with Doc.retokenize ( #3280 )
...
* Add deprecation warning to Doc.merge and Span.merge
* Replace {Doc,Span}.merge with Doc.retokenize
2019-02-15 10:29:44 +01:00
Matthew Honnibal
dcf79c5ef3
Set version to v2.1.0a7.dev6
2019-02-14 20:12:02 +01:00
Matthew Honnibal
0371ac23e7
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2019-02-14 20:09:10 +01:00
Ines Montani
f146121092
💫 Make handling of [Pipe].labels consistent ( #3273 )
...
* Make handling of [Pipe].labels consistent
* Un-xfail passing test
* Update spacy/pipeline/pipes.pyx
Co-Authored-By: ines <ines@ines.io>
* Update spacy/pipeline/pipes.pyx
Co-Authored-By: ines <ines@ines.io>
* Update spacy/tests/pipeline/test_pipe_methods.py
Co-Authored-By: ines <ines@ines.io>
* Update spacy/pipeline/pipes.pyx
Co-Authored-By: ines <ines@ines.io>
* Move error message to spacy.errors
* Fix textcat labels and test
* Make EntityRuler.labels return tuple as well
2019-02-15 06:03:19 +11:00
Ines Montani
3d577b77c6
Auto-formatting
2019-02-14 19:56:38 +01:00
Ines Montani
2569339a98
Formatting and whitespace [ci skip]
2019-02-14 18:05:07 +01:00
Matthew Honnibal
aebf71bc72
Set version to v2.1.0a7.dev5
2019-02-14 15:51:42 +01:00
Matthew Honnibal
6ccd67c682
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2019-02-14 15:51:12 +01:00
Ines Montani
e104e47c21
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2019-02-14 15:35:34 +01:00
Ines Montani
0cd01a8c5e
Merge branch 'master' into develop
2019-02-14 15:35:20 +01:00
Ines Montani
2e31921d0a
💫 Add base Language classes for more languages ( #3276 )
...
* Add base classes for more languages
* Add test for language class initialization
Make sure language can be initialize – otherwise, it's difficult to catch serious errors in the test suite, because languages are lazy-loaded
2019-02-15 01:31:19 +11:00
Grivaz
39815513e2
Add split one token into several ( resolves #2838 ) ( #3253 )
...
* Add split one token into several (resolves #2838 )
* Improve error message for token splitting
* Make retokenizer.split() tests use a Token object
Change retokenizer.split() to use a Token object, instead of an index.
* Pass Token into retokenize.split()
Tweak retokenize.split() API so that we pass the `Token` object, not the index.
* Fix token.idx in retokenize.split()
* Test that token.idx is correct after split
* Fix token.idx for split tokens
* Fix retokenize.split()
* Fix retokenize.split
* Fix retokenize.split() test
2019-02-15 01:27:13 +11:00
Ines Montani
743ecf728c
Tidy up conftest
2019-02-14 13:27:13 +01:00
Ines Montani
106d95b01a
Fix typo
2019-02-14 12:26:56 +01:00
Ines Montani
11d6b874db
Update stop_words.py
2019-02-14 12:25:19 +01:00
Ines Montani
60c2a3bb65
Also raise original error message in util.get_lang_class
...
Otherwise, the true error that happens within a Language subclass is swallowed, because if it's imported lazily like that, it'll always be an ImportError
2019-02-13 16:52:25 +01:00
Ines Montani
4d2438f985
Tidy up and auto-format
2019-02-13 15:29:08 +01:00
Ines Montani
fbf9f1edf1
Also raise error in Span.__reduce__
2019-02-13 13:22:05 +01:00
Matthew Honnibal
1831e1423d
Set version to v2.1.0a7.dev4
2019-02-13 23:08:40 +11:00
Matthew Honnibal
bed956c698
Drop regex dependency
2019-02-13 23:08:22 +11:00
Matthew Honnibal
63dc4234a3
Set version to v2.1.0a7.dev3
2019-02-13 22:53:10 +11:00
Matthew Honnibal
b7ea39564f
Set version to v2.1.0a7.dev2
2019-02-13 22:52:43 +11:00
Ines Montani
2d0c3c73f4
Raise better error if token is pickled ( resolves #2833 ) ( #3267 )
2019-02-13 11:27:04 +01:00
Ines Montani
2f45bd94c0
Auto-formatting
2019-02-12 18:30:11 +01:00
Ines Montani
0184a95340
Merge branch 'master' into develop
2019-02-12 18:29:24 +01:00
Akhilesh
a78db10941
add kannada support ( #3264 )
...
* add kannada support
* add few more stop words
* add support for Kannada Language
2019-02-12 18:28:39 +01:00
Ines Montani
b589b945db
Fix PhraseMatcher pickling and length ( resolves #3248 ) ( #3252 )
2019-02-12 18:27:54 +01:00
Ines Montani
5dd39d8697
Update universe.json
2019-02-12 18:05:51 +01:00
Abhijit Balaji
75a40f56fc
added spacy-langdetect to universe.json ( #3266 )
2019-02-12 18:04:38 +01:00
Ines Montani
483dddc9bc
💫 Add token match pattern validation via JSON schemas ( #3244 )
...
* Add custom MatchPatternError
* Improve validators and add validation option to Matcher
* Adjust formatting
* Never validate in Matcher within PhraseMatcher
If we do decide to make validate default to True, the PhraseMatcher's Matcher shouldn't ever validate. Here, we create the patterns automatically anyways (and it's currently unclear whether the validation has performance impacts at a very large scale).
2019-02-13 01:47:26 +11:00