Commit Graph

5663 Commits

Author SHA1 Message Date
Matthew Honnibal
6b0008afc6 Clean up TextCategorizer slightly 2019-02-23 12:28:06 +01:00
Matthew Honnibal
d13b9373bf Improve initialization for mutually textcat 2019-02-23 12:27:45 +01:00
Matthew Honnibal
e9dd5943b9 Support exclusive_classes setting for textcat models 2019-02-23 11:57:16 +01:00
Matthew Honnibal
ce1e4eace2 Default to former TextCategorizer model
* Keep TextCategorizer default model same as v2.0
* Add option 'architecture' that allows "simple_cnn" to switch to
simpler model.
* Add option exclusive_classes, defaulting to False. If set to True,
the model treats classes as mutually exclusive, i.e. only one class can
be true per instance.
2019-02-23 11:55:16 +01:00
Matthew Honnibal
829c9091a4 Set version to v2.1.0a9.dev0 2019-02-21 17:13:34 +01:00
Matthew Honnibal
d396a69c7b More fixes for issue #3112 2019-02-21 17:12:23 +01:00
Ines Montani
80bdcb99c5 Fix escaping of HTML in displacy ENT (closes #2728) 2019-02-21 14:30:39 +01:00
Matthew Honnibal
7d529ebdfb Set version to v2.1.0a8 2019-02-21 12:09:34 +01:00
Matthew Honnibal
f75be6e7be Set version to v2.1.0a8.dev1 2019-02-21 11:57:06 +01:00
Matthew Honnibal
c5f947f194 Fix regex deprecation warnings 2019-02-21 11:56:47 +01:00
Matthew Honnibal
7f02464494 Set version to v2.1.0a8.dev0 2019-02-21 11:42:23 +01:00
Matthew Honnibal
f31dbec528 More fixes for #3112 2019-02-21 11:10:10 +01:00
Matthew Honnibal
80195bc2d1
Fix issue #3288 (#3308) 2019-02-21 09:48:53 +01:00
Matthew Honnibal
a137e8b418 Fix Pipe.to_bytes() when model uninitialized
Closes #3289
2019-02-21 09:42:02 +01:00
Matthew Honnibal
6574e4f2d3 Fix issue #3112 part 1 2019-02-21 09:27:38 +01:00
Matthew Honnibal
b21481eeca Load token_match regex with .match, not .search 2019-02-21 09:09:03 +01:00
Sofie
9a478b6db8 Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293)
* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* remove duplicate

* remove xfail for Issue #2179 fixed by Matt

* adjust documentation and remove reference to regex lib
2019-02-20 22:10:13 +01:00
Matthew Honnibal
0d1ca15b13 💫 Fix bugs in matcher extensions. Closes #1971 (#3301)
* Fix matching on extension attrs and predicates

* Fix detection of match_id when using extension attributes. The match
ID is stored as the last entry in the pattern. We were checking for this
with nr_attr == 0, which didn't account for extension attributes.

* Fix handling of predicates. The wrong count was being passed through,
so even patterns that didn't have a predicate were being checked.

* Fix regex pattern

* Fix matcher set value test
2019-02-20 21:30:39 +01:00
Ines Montani
3b667787a9 Add xfailing test for #3289 2019-02-18 16:45:04 +01:00
Ines Montani
91f260f2c4 Add another test for #1971 2019-02-18 13:36:20 +01:00
Ines Montani
f30aac324c Update test_issue1971.py 2019-02-18 13:36:15 +01:00
Ines Montani
8fa26ca97e Fix tensor shape in test for #3288 2019-02-18 11:01:54 +01:00
Ines Montani
c32290557f Add xfailing test for #3288 2019-02-18 10:59:31 +01:00
Ines Montani
3fdcdec6a0 Merge branch 'master' into develop 2019-02-18 10:03:32 +01:00
Roshni Biswas
e09f1347fa updates for Bengali language (#3286)
* Update morph_rules.py

* contributor agreement for roshni-b

* created example sentences
2019-02-18 10:02:28 +01:00
Ines Montani
043e8186f3 Merge branch 'master' into develop 2019-02-17 17:51:17 +01:00
Marc Puig
51268e9f21 Typo error fixed (#3284) 2019-02-17 17:51:02 +01:00
Ines Montani
3af0b2dd1c Add xfailing test for #1971 [ci skip] 2019-02-17 13:04:47 +01:00
Ines Montani
19a002bfd3 Merge branch 'master' into develop 2019-02-17 12:22:54 +01:00
Ines Montani
1e252b129c Auto-format 2019-02-17 12:22:07 +01:00
Roshni Biswas
e26d923726 Update morph_rules.py (#3283) 2019-02-17 12:21:47 +01:00
Matthew Honnibal
7d4a52a4d0 Set version to v2.1.0a7 2019-02-16 17:48:34 +01:00
Matthew Honnibal
07617b6b7f Set version to v2.1.0a7.dev12 2019-02-16 17:30:29 +01:00
Matthew Honnibal
1dc314bada Set version to v2.1.0a7.dev11 2019-02-16 17:02:49 +01:00
Matthew Honnibal
2ef227c313 Set version to v2.1.0a7.dev1 2019-02-16 16:22:46 +01:00
Matthew Honnibal
22923b9cb1 Set version to v2.1.0a7.dev9 2019-02-16 15:47:19 +01:00
Matthew Honnibal
e0c91a4c8d Set version to 2.1.0a7 2019-02-16 14:43:38 +01:00
Matthew Honnibal
92b6bd2977
Refinements to retokenize.split() function (#3282)
* Change retokenize.split() API for heads

* Pass lists as values for attrs in split

* Fix test_doc_split filename

* Add error for mismatched tokens after split

* Raise error if new tokens don't match text

* Fix doc test

* Fix error

* Move deps under attrs

* Fix split tests

* Fix retokenize.split
2019-02-15 17:32:31 +01:00
Matthew Honnibal
2dbc61bc26 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2019-02-15 14:03:54 +01:00
Ines Montani
1aa57690dc Add xfailing test for orth mismatch in retokenizer.split 2019-02-15 13:55:04 +01:00
Ines Montani
819768483f Add xfailing test for out-of-bounds heads 2019-02-15 13:09:07 +01:00
Ines Montani
d8051e89ca Tidy up tests 2019-02-15 12:56:51 +01:00
Matthew Honnibal
58aac58631 Set version to v2.1.0a7.dev8 2019-02-15 12:39:26 +01:00
Matthew Honnibal
5f1abe2cc7 Set version to v2.1.0a7.dev7 2019-02-15 10:30:53 +01:00
Matthew Honnibal
a66e8e0c8a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2019-02-15 10:30:22 +01:00
Ines Montani
c31a9dabd5 💫 Add en/em dash to prefixes and suffixes (#3281)
* Auto-format

* Add en/em dash to prefixes and suffixes
2019-02-15 10:29:59 +01:00
Ines Montani
5651a0d052 💫 Replace {Doc,Span}.merge with Doc.retokenize (#3280)
* Add deprecation warning to Doc.merge and Span.merge

* Replace {Doc,Span}.merge with Doc.retokenize
2019-02-15 10:29:44 +01:00
Matthew Honnibal
dcf79c5ef3 Set version to v2.1.0a7.dev6 2019-02-14 20:12:02 +01:00
Matthew Honnibal
0371ac23e7 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2019-02-14 20:09:10 +01:00
Ines Montani
f146121092 💫 Make handling of [Pipe].labels consistent (#3273)
* Make handling of [Pipe].labels consistent

* Un-xfail passing test

* Update spacy/pipeline/pipes.pyx

Co-Authored-By: ines <ines@ines.io>

* Update spacy/pipeline/pipes.pyx

Co-Authored-By: ines <ines@ines.io>

* Update spacy/tests/pipeline/test_pipe_methods.py

Co-Authored-By: ines <ines@ines.io>

* Update spacy/pipeline/pipes.pyx

Co-Authored-By: ines <ines@ines.io>

* Move error message to spacy.errors

* Fix textcat labels and test

* Make EntityRuler.labels return tuple as well
2019-02-15 06:03:19 +11:00