Commit Graph

9608 Commits

Author SHA1 Message Date
Ines Montani
723e27cb8c Tidy up tests 2019-02-24 14:11:23 +01:00
Ines Montani
2982f82934 Auto-format 2019-02-24 14:09:15 +01:00
Ines Montani
09bf08b3c3 Update redirects [ci skip] 2019-02-24 13:37:50 +01:00
Ines Montani
dceca3264d Tidy up package.json [ci skip] 2019-02-24 13:37:41 +01:00
Ines Montani
3ef4da3503 Update and auto-format README [ci skip] 2019-02-24 13:12:13 +01:00
Ines Montani
46ec5cdccc Update TextCategorizer docs 2019-02-24 13:11:57 +01:00
Ines Montani
c03cb1cc63 Improve built-in component API docs 2019-02-24 13:11:49 +01:00
Ines Montani
235a0e948e Tidy up CI config 2019-02-24 12:07:33 +01:00
Ines Montani
b570a1e203 Exclude website branch from CI 2019-02-24 11:52:16 +01:00
Ines Montani
383e2e1f12 Update Python versions [ci skip] 2019-02-24 11:49:45 +01:00
Ines Montani
b624cb4b89 Update v2-1.md 2019-02-24 11:49:27 +01:00
Matthew Honnibal
909a9d9932 Set version to v2.1.0a9.dev1 2019-02-23 13:10:42 +01:00
Matthew Honnibal
55bb3cc482 Require thinc 7.0.2 2019-02-23 13:10:09 +01:00
Matthew Honnibal
981cb89194 Fix f-score calculation if zero 2019-02-23 12:45:41 +01:00
Matthew Honnibal
6b0008afc6 Clean up TextCategorizer slightly 2019-02-23 12:28:06 +01:00
Matthew Honnibal
d13b9373bf Improve initialization for mutually textcat 2019-02-23 12:27:45 +01:00
Matthew Honnibal
5063d999e5 Set architecture in textcat example 2019-02-23 11:57:59 +01:00
Matthew Honnibal
e9dd5943b9 Support exclusive_classes setting for textcat models 2019-02-23 11:57:16 +01:00
Matthew Honnibal
ce1e4eace2 Default to former TextCategorizer model
* Keep TextCategorizer default model same as v2.0
* Add option 'architecture' that allows "simple_cnn" to switch to
simpler model.
* Add option exclusive_classes, defaulting to False. If set to True,
the model treats classes as mutually exclusive, i.e. only one class can
be true per instance.
2019-02-23 11:55:16 +01:00
Matthew Honnibal
829c9091a4 Set version to v2.1.0a9.dev0 2019-02-21 17:13:34 +01:00
Matthew Honnibal
d396a69c7b More fixes for issue #3112 2019-02-21 17:12:23 +01:00
Ines Montani
80bdcb99c5 Fix escaping of HTML in displacy ENT (closes #2728) 2019-02-21 14:30:39 +01:00
Ines Montani
250e88ef55 Fix docs example (see #2728) 2019-02-21 14:22:06 +01:00
Ines Montani
0fc908d7a5 Add note on merging speed in v2.1 (see #3300) [ci skip] 2019-02-21 12:34:18 +01:00
Ines Montani
236aa94ded Update v2-1.md 2019-02-21 12:33:56 +01:00
Matthew Honnibal
7d529ebdfb Set version to v2.1.0a8 2019-02-21 12:09:34 +01:00
Matthew Honnibal
7cbdcaddf3 Ensure new setuptools before building sdist 2019-02-21 12:08:41 +01:00
Matthew Honnibal
f75be6e7be Set version to v2.1.0a8.dev1 2019-02-21 11:57:06 +01:00
Matthew Honnibal
c5f947f194 Fix regex deprecation warnings 2019-02-21 11:56:47 +01:00
Matthew Honnibal
7f02464494 Set version to v2.1.0a8.dev0 2019-02-21 11:42:23 +01:00
Matthew Honnibal
f31dbec528 More fixes for #3112 2019-02-21 11:10:10 +01:00
Matthew Honnibal
e485241003 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2019-02-21 10:33:35 +01:00
Matthew Honnibal
582be8746c Update multi_processing example 2019-02-21 10:33:16 +01:00
Matthew Honnibal
80195bc2d1
Fix issue #3288 (#3308) 2019-02-21 09:48:53 +01:00
Matthew Honnibal
a137e8b418 Fix Pipe.to_bytes() when model uninitialized
Closes #3289
2019-02-21 09:42:02 +01:00
Matthew Honnibal
6574e4f2d3 Fix issue #3112 part 1 2019-02-21 09:27:38 +01:00
Matthew Honnibal
b21481eeca Load token_match regex with .match, not .search 2019-02-21 09:09:03 +01:00
Sofie
9a478b6db8 Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293)
* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* remove duplicate

* remove xfail for Issue #2179 fixed by Matt

* adjust documentation and remove reference to regex lib
2019-02-20 22:10:13 +01:00
Ines Montani
9696cf16c1 Merge branch 'master' into develop 2019-02-20 21:31:27 +01:00
Matthew Honnibal
0d1ca15b13 💫 Fix bugs in matcher extensions. Closes #1971 (#3301)
* Fix matching on extension attrs and predicates

* Fix detection of match_id when using extension attributes. The match
ID is stored as the last entry in the pattern. We were checking for this
with nr_attr == 0, which didn't account for extension attributes.

* Fix handling of predicates. The wrong count was being passed through,
so even patterns that didn't have a predicate were being checked.

* Fix regex pattern

* Fix matcher set value test
2019-02-20 21:30:39 +01:00
Ines Montani
f73d01aa32 Update netlify.toml [ci skip] 2019-02-20 14:33:32 +01:00
Ines Montani
da5edbe434 Tidy up 2019-02-20 14:33:23 +01:00
Michael Liberman
386cec1979 - Json fix in comment (#3294) 2019-02-19 18:01:35 +01:00
Ines Montani
3b667787a9 Add xfailing test for #3289 2019-02-18 16:45:04 +01:00
Ines Montani
57ae71ea95 Add docs on serializing the pipeline (see #3289) [ci skip] 2019-02-18 14:13:29 +01:00
Ines Montani
91f260f2c4 Add another test for #1971 2019-02-18 13:36:20 +01:00
Ines Montani
f30aac324c Update test_issue1971.py 2019-02-18 13:36:15 +01:00
Ines Montani
38e4422c0d Improve matcher example (resolves #3287) 2019-02-18 13:26:37 +01:00
Ines Montani
660cfe44c5 Fix formatting 2019-02-18 13:26:22 +01:00
Ines Montani
8fa26ca97e Fix tensor shape in test for #3288 2019-02-18 11:01:54 +01:00