Commit Graph

8760 Commits

Author SHA1 Message Date
ines
a685fff875 Merge branch 'master' of https://github.com/explosion/spaCy 2018-05-07 18:58:57 +02:00
ines
e2241c797c Add lock-threads configuration [ci skip] 2018-05-07 18:54:22 +02:00
B!
414f5270b3 B Cavello's signed Contributor Agreement v2 (#2302)
This time hopefully created in the right spot. (Sorry about that!)
2018-05-07 17:48:54 +02:00
Matt Upson
9a1d3b63fb Add missing default to .set_extension (#2297)
Failing to set a default, method, or getter results in a ValueError:

ValueError: [E083] Error setting extension: only one of `default`, `method`, or `getter` (plus optional `setter`) is allowed. Got: 0
2018-05-04 18:47:01 +02:00
ines
929a01139a Order issue templates 2018-05-04 03:04:41 +02:00
Ines Montani
7f39c8896b
Update issue templates (#2295)
* Update issue templates

* Update templates
2018-05-04 03:02:26 +02:00
Douglas Knox
9b49a40f4e Test and fix for Issue #2219 (#2272)
Test and fix for Issue #2219: Token.similarity() failed if single letter
2018-05-03 18:40:46 +02:00
Paul O'Leary McCann
bd72fbf09c Port Japanese mecab tokenizer from v1 (#2036)
* Port Japanese mecab tokenizer from v1

This brings the Mecab-based Japanese tokenization introduced in #1246 to
spaCy v2. There isn't a JapaneseTagger implementation yet, but POS tag
information from Mecab is stored in a token extension. A tag map is also
included.

As a reminder, Mecab is required because Universal Dependencies are
based on Unidic tags, and Janome doesn't support Unidic.

Things to check:

1. Is this the right way to use a token extension?

2. What's the right way to implement a JapaneseTagger? The approach in
 #1246 relied on `tag_from_strings` which is just gone now. I guess the
best thing is to just try training spaCy's default Tagger?

-POLM

* Add tagging/make_doc and tests
2018-05-03 18:38:26 +02:00
G.Pruvost
cc8e804648 #2211 - Support for ssl certs config on download command (#2212)
* Add support for SSL/Certs customization on download CLI

* Add a note on SSL options for the 'download' CLI in the README

* Add contributor agreement
2018-05-03 18:37:02 +02:00
Jens Dahl Møllerhøj
b9290397fb rename SP to _SP (#2289) 2018-05-03 18:33:49 +02:00
ines
c9547b7b8b Update Juniper (see #2293) 2018-05-03 15:36:02 +02:00
Alex Villarreal
647f2544c5 Fix code sample for span.set_extension (#2286) 2018-05-03 00:39:22 +02:00
Alex Villarreal
13d562e1a4 Fix code sample for Doc.set_extension (#2282)
* Fix code sample for `set_extension`

The previous sample code for `set_extension` fails the assertion at the end, because `city_getter` it checked if the whole document text matches any of the city names. Now it checks if any of the city names is contained in the document text.

* Contributor agreement
2018-05-02 10:16:05 +02:00
Mr Roboto
6f5ccda19c Addresses Issue #2228 - Deserialization fails when using tensor=False or sentiment=False (#2230)
* Fixes issue #2228

* Adds a new contributor
2018-05-01 13:40:22 +02:00
Shirish Kadam
d98a90440f Added Adam project to spaCy Universe (#2275)
* Added 5hirish to contributors

* Added Adam Qas Project to spaCy Universe

* Remove $ from code example
2018-04-30 22:25:01 +02:00
ines
56e7faf16b Fix spacing 2018-04-30 22:24:40 +02:00
ines
6efb4cdf88 Use Juniper and tidy up 2018-04-30 18:48:35 +02:00
ines
45bb8d75a5 Fix overflow issues on small screens [ci skip] 2018-04-29 03:17:36 +02:00
Ines Montani
49cee4af92
💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)
* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label
2018-04-29 02:06:46 +02:00
ines
3c80f69ff5 Return data in cli.info and add silent option (resolves #2196) 2018-04-29 01:59:44 +02:00
ines
1c6d77610c Add remove_extension method on Doc, Token and Span (closes #2242) 2018-04-28 23:33:09 +02:00
ines
a512fa60ef Remove upcoming option from docs for now 2018-04-28 23:32:18 +02:00
ines
abdb853ebf Simplify underscore tests 2018-04-28 23:30:33 +02:00
ines
6fb6371670 Add collapse_phrases option to displacy (closes #2266) 2018-04-28 23:06:50 +02:00
Matt Upson
87cc6b3599 Add missing comma to NN example in docs (#2255)
Also add a completed contributor agreement.
2018-04-28 14:56:00 +02:00
Robin Linderborg
1f9904ef12 fixes #2238 (#2241)
* Remove erroneous lemma lookup år > åra in Swedish

* Add contributors agreement

* Add contrib agreement to correct directory

* Revert change to CONTRIBUTOR_AGREEMENT
2018-04-28 14:55:22 +02:00
Robin Linderborg
d01f503b54 Remove incorrect lemma lookup gäng->gänga (#2252)
* Remove incorrect lemma lookup gäng->gänga
In modern Swedish, "gäng" is mostly associated with "gang" or "group of people". The removed lemma lookup lemmatized it to the verb "thread".

* Add contrib agreement to correct directory

* Revert change to CONTRIBUTOR_AGREEMENT
2018-04-28 14:54:41 +02:00
ines
4a3bea00c7 Update resources [ci skip] 2018-04-26 22:10:34 +02:00
ines
686225eadd Fix Spanish noun_chunks (resolves #2210)
Make sure 'NP' label is added to StringStore and move noun_bounds helper into a closure to allow reusing label sets
2018-04-18 18:44:01 -04:00
ines
9632595fb4 Use correct, non-deprecated merge syntax (resolves #2226) 2018-04-18 18:28:28 -04:00
Suraj Rajan
5957f15227 Fixed typos for #2222,#2223 (#2233) (closes #2222, closes #2223) 2018-04-18 14:55:26 -07:00
Pradeep Kumar Tippa
df389e5b74 spacy-101 vocab doc giving valid variable names (#2236) 2018-04-18 14:54:26 -07:00
Ines Montani
b4d35c7dfb
Update README.rst 2018-04-10 23:05:49 +02:00
Matthew Honnibal
97851d2c4e Increment version to v2.0.12.dev0 2018-04-10 22:20:16 +02:00
Matthew Honnibal
ed39c75a92 Merge branch 'master' of https://github.com/explosion/spaCy 2018-04-10 22:19:40 +02:00
Matthew Honnibal
3836199a83 Fix loading of models when custom vectors are added 2018-04-10 22:19:20 +02:00
ines
0299d5fac8 Update argument annotations and formatting 2018-04-10 21:45:11 +02:00
ines
49b1e48bf5 Fix syntax error 2018-04-10 21:44:59 +02:00
ines
ce63f8997b Update init-model docs 2018-04-10 21:42:54 +02:00
ines
70052e46e9 Fix formatting [ci skip] 2018-04-10 21:42:46 +02:00
Matthew Honnibal
0ddb152be0 Improve error message when reading vectors 2018-04-10 21:26:50 +02:00
Matthew Honnibal
db50ac524e Support zipped vector files in init-model 2018-04-10 21:21:00 +02:00
ines
270fcfd925 Fix typo in package command message (closes #2200) 2018-04-10 19:14:31 +02:00
ines
24d8bf348d Revert "Add support for .zip to init_model"
This reverts commit 7ee880a0ad.
2018-04-10 19:08:06 +02:00
Matthew Honnibal
7ee880a0ad Add support for .zip to init_model 2018-04-10 14:30:04 +00:00
ines
5ecb274764 Fix indentation error and set Doc.is_tagged correctly 2018-04-10 16:14:52 +02:00
ines
0e847d7fe5 Fix typo 2018-04-09 14:51:14 +02:00
ines
987ee27af7 Return Doc if noun chunks merger component if Doc is not parsed 2018-04-09 14:51:02 +02:00
Xiaoquan Kong
e2f13ec722 bugfix: Doc.noun_chunks call Doc.noun_chunks_iterator without checking (closes #2194) 2018-04-08 23:44:05 +02:00
Jens Dahl Møllerhøj
e5055e3cf6 Add Danish lemmatizer (#2184)
* add danish lemmatizer

* fill contributor agreement
2018-04-07 19:07:28 +02:00