Commit Graph

14453 Commits

Author SHA1 Message Date
Matthew Honnibal
427c0693c8 Fix missing comma in init-model command 2018-12-06 22:48:31 +01:00
Amandine Périnet
0b44ea23bd Lemmatization of Nouns - French : adding rules and vocabulary (#2992)
* modifying FR lemmatization for nouns

* modifying FR lemmatization for nouns

* adding contributor agreement for amperinet

* adding rules for words with inclusive parentheses wrongly tokenized

* adding contributor agreement for amperinet

* adding a missing comma
2018-12-06 22:42:18 +01:00
Matthew Honnibal
d896fbca62 Fix batch size in parser.pipe 2018-12-06 21:45:56 +01:00
Matthew Honnibal
bb3304a4f1 Fix pickle tests 2018-12-06 20:46:36 +01:00
Matthew Honnibal
e619f45287 Fix pickle tests 2018-12-06 20:43:47 +01:00
Matthew Honnibal
0a60726215 Remove cytoolz usage in CLI 2018-12-06 20:37:00 +01:00
Matthew Honnibal
c0af627f32 Fix dill usage in vocab 2018-12-06 18:53:16 +01:00
Matthew Honnibal
9520489225 Fix removabl of dill (for srsly) 2018-12-06 18:46:09 +01:00
Ines Montani
27905a7b14 Remove reference to cuda10 in docs (closes #2894) [ci skip] 2018-12-06 16:05:37 +01:00
Matthew Honnibal
711f108532 Fix cytoolz import cytoolz 2018-12-06 16:04:12 +01:00
Gavriel Loria
9c8c4287bf Accept iob2 and allow generic whitespace (#2999)
* accept non-pipe whitespace as delimiter; allow iob2 filename

* added small documentation note for IOB2 allowance

* added contributor agreement
2018-12-06 15:50:25 +01:00
Amandine Périnet
2457318b7a Lemmatization of Verbs - French : adding rules and vocabulary (#3006)
* updating rules and vocabulary for French lemmatization of verbs

* updating the file with French auxiliary verb

* updating rules and vocabulary for French lemmatization of verbs

* adding contributor agreement for amperinet

* adding rules for words with inclusive parentheses wrongly tokenized
2018-12-06 15:49:28 +01:00
Beate Sildnes
f0d7e206ec Updated wordforms for Norwegian lemmatizer (#3007)
* Updated wordforms for Norwegian lemmatizer

Upload of updated lists of wordforms for the Norwegian lemmatizer (nouns, verbs, adverbs, adjectives and lookup).

* Add spaCy contributor agreement for user beatesi

*  Updated wordforms for Norwegian lemmatizer
2018-12-06 15:46:18 +01:00
Paul O'Leary McCann
b36f6eabfb Add note that Unidic is required for Japanese (#3017)
This addresses #3001. -POLM
2018-12-06 15:14:10 +01:00
Matthew Honnibal
cabaadd793
Fix build error from bad import
Thinc v7.0.0.dev6 moved FeatureExtracter around and didn't add a compatibility import.
2018-12-06 15:12:39 +01:00
Matthew Honnibal
8f6555df4e Update requirements 2018-12-04 00:07:28 +01:00
Matthew Honnibal
378ca4b46d Fix OSX build problem 2018-12-04 00:06:42 +01:00
Matthew Honnibal
ea00dbaaa4 Remove usage of itertools.islice 2018-12-03 02:43:03 +01:00
Matthew Honnibal
3df26d820f Sort requirements 2018-12-03 02:41:05 +01:00
Matthew Honnibal
5ed19fbee2 Remove cytoolz dependency 2018-12-03 02:37:22 +01:00
Matthew Honnibal
db75c70550
Remove dill dependency 2018-12-03 02:31:19 +01:00
Matthew Honnibal
c7b33b24f1 Fix conflict 2018-12-03 02:20:20 +01:00
Matthew Honnibal
2402ef498b Remove unused import 2018-12-03 02:19:23 +01:00
Matthew Honnibal
1c71fdb805 Remove cytoolz usage from spaCy 2018-12-03 02:19:12 +01:00
Ines Montani
5b2741f751 Remove unused cytoolz / itertools imports 2018-12-03 02:12:07 +01:00
Ines Montani
ee4733b48c Update srsly version pin 2018-12-03 02:10:37 +01:00
Matthew Honnibal
a7b085ae46 Set version back to 2.1.0a4 2018-12-03 02:03:26 +01:00
Matthew Honnibal
8e9a4d2f5e Increment version to 2.1.0a5 2018-12-03 01:59:50 +01:00
Gavriel Loria
ae5601beae Initialize trues to 0.0 in training example (#3004)
* added contributor agreement

* if there are no true positives, precision should be 0.0
2018-12-03 01:33:22 +01:00
Ines Montani
f37863093a 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003)
Remove hacks and wrappers, keep code in sync across our libraries and move spaCy a few steps closer to only depending on packages with binary wheels 🎉

See here: https://github.com/explosion/srsly

    Serialization is hard, especially across Python versions and multiple platforms. After dealing with many subtle bugs over the years (encodings, locales, large files) our libraries like spaCy and Prodigy have steadily grown a number of utility functions to wrap the multiple serialization formats we need to support (especially json, msgpack and pickle). These wrapping functions ended up duplicated across our codebases, so we wanted to put them in one place.

    At the same time, we noticed that having a lot of small dependencies was making maintainence harder, and making installation slower. To solve this, we've made srsly standalone, by including the component packages directly within it. This way we can provide all the serialization utilities we need in a single binary wheel.

    srsly currently includes forks of the following packages:

        ujson
        msgpack
        msgpack-numpy
        cloudpickle



* WIP: replace json/ujson with srsly

* Replace ujson in examples

Use regular json instead of srsly to make code easier to read and follow

* Update requirements

* Fix imports

* Fix typos

* Replace msgpack with srsly

* Fix warning
2018-12-03 01:28:22 +01:00
Justin DuJardin
33fca8672f fix issue compiling the latest spacy on MacOS 10.3.6 (#2998) 2018-12-02 05:51:11 +01:00
Ines Montani
40b57ea4ac Format example 2018-12-02 04:28:34 +01:00
Ines Montani
45798cc53e Auto-format examples 2018-12-02 04:26:26 +01:00
Ines Montani
6f2d3c863a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-12-02 04:22:25 +01:00
Ines Montani
db7d250924 Update README.md 2018-12-02 04:22:23 +01:00
Matthew Honnibal
b47bd6a27f Update thinc version 2018-12-02 03:57:19 +01:00
Matthew Honnibal
512ba48217 Revert "Allow binary deps when building pex"
This reverts commit 2d0c366101.
2018-12-01 17:37:27 +01:00
Matthew Honnibal
2d0c366101 Allow binary deps when building pex 2018-12-01 15:51:57 +01:00
Matthew Honnibal
fa617997de Fix Thinc pin 2018-12-01 15:27:44 +01:00
Matthew Honnibal
78afc696b2 Fix push-tag script 2018-12-01 14:48:02 +01:00
Matthew Honnibal
40a273245c Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-12-01 14:43:29 +01:00
Matthew Honnibal
d9d339186b Fix dropout and batch-size defaults 2018-12-01 13:42:35 +00:00
Matthew Honnibal
9536ee787c Add comma deletion to data noising 2018-12-01 13:42:18 +00:00
Matthew Honnibal
21ee1c7a17 Improve parser multi-task objective 2018-12-01 13:41:24 +00:00
Matthew Honnibal
fe7d6f36b1 Fix parser default 2018-12-01 13:41:04 +00:00
Matthew Honnibal
a31d557f2d Set version to v2.1.0a4 2018-12-01 14:40:03 +01:00
Ines Montani
5c966d0874 Simplify function 2018-12-01 04:59:12 +01:00
Ines Montani
ce7eec846b Move CLi-specific Markdown helper to CLI 2018-12-01 04:55:48 +01:00
Ines Montani
40ae499f32 Remove unused helper function
Now imported from wasabi
2018-12-01 04:54:46 +01:00
Ines Montani
e4f8bed3d2 Change order of requirements [ci skip] 2018-12-01 04:28:51 +01:00