Commit Graph

1254 Commits

Author SHA1 Message Date
Matthew Honnibal
63b7accd74
💫 Make span.as_doc() return a copy, not a view. Closes #1537 (#3107)
Initially span.as_doc() was designed to return a view of the span's contents, as a Doc object. This was a nice idea, but it fails due to the token.idx property, which refers to the character offset within the string. In a span, the idx of the first token might not be 0. Because this data is different, we can't have a view --- it'll be inconsistent.

This patch changes span.as_doc() to instead return a copy. The docs are updated accordingly. Closes #1537

* Update test for span.as_doc()

* Make span.as_doc() return a copy. Closes #1537

* Document change to Span.as_doc()
2018-12-30 15:17:46 +01:00
Sofie
b7916fffcf Fixing few typos in the documentation (#3103)
* few typos / small grammatical errors corrected in documentation

* one more typo

* one last typo
2018-12-28 15:52:26 +01:00
Ines Montani
2dc6c52ccc Update displayed Binder version (see #3077) [ci skip] 2018-12-20 17:36:19 +01:00
Ines Montani
ca244f5f84
Small fixes to displaCy (#3076)
## Description
- [x] fix auto-detection of Jupyter notebooks (even if `jupyter=True` isn't set)
- [x] add `displacy.set_render_wrapper` method to define a custom function called around the HTML markup generated in all calls to `displacy.render` (can be used to allow custom integrations, callbacks and page formatting)
- [x] add option to customise host for web server
- [x] show warning if `displacy.serve` is called from within Jupyter notebooks
- [x] move error message to `spacy.errors.Errors`.

### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-12-20 17:32:04 +01:00
Ines Montani
61d09c481b Merge branch 'master' into develop 2018-12-18 13:48:10 +01:00
Ines Montani
8c0f0f50bc Use nlp.make_doc instead of nlp for patterns [ci skip] 2018-12-08 11:56:01 +01:00
Aki Ariga
7fcd6419ff Upadate the document for Unidic link with latest version URL (#3022)
* Upadate Unidic link for latest version in document

This patch improves #3017 . The link for Unidic was old version one, so will the lates version.

* Add contributor agreement

* Use more specific link for unidic-cwj
2018-12-07 17:24:48 +01:00
Ines Montani
27905a7b14 Remove reference to cuda10 in docs (closes #2894) [ci skip] 2018-12-06 16:05:37 +01:00
Gavriel Loria
9c8c4287bf Accept iob2 and allow generic whitespace (#2999)
* accept non-pipe whitespace as delimiter; allow iob2 filename

* added small documentation note for IOB2 allowance

* added contributor agreement
2018-12-06 15:50:25 +01:00
Paul O'Leary McCann
b36f6eabfb Add note that Unidic is required for Japanese (#3017)
This addresses #3001. -POLM
2018-12-06 15:14:10 +01:00
Ines Montani
f37863093a 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003)
Remove hacks and wrappers, keep code in sync across our libraries and move spaCy a few steps closer to only depending on packages with binary wheels 🎉

See here: https://github.com/explosion/srsly

    Serialization is hard, especially across Python versions and multiple platforms. After dealing with many subtle bugs over the years (encodings, locales, large files) our libraries like spaCy and Prodigy have steadily grown a number of utility functions to wrap the multiple serialization formats we need to support (especially json, msgpack and pickle). These wrapping functions ended up duplicated across our codebases, so we wanted to put them in one place.

    At the same time, we noticed that having a lot of small dependencies was making maintainence harder, and making installation slower. To solve this, we've made srsly standalone, by including the component packages directly within it. This way we can provide all the serialization utilities we need in a single binary wheel.

    srsly currently includes forks of the following packages:

        ujson
        msgpack
        msgpack-numpy
        cloudpickle



* WIP: replace json/ujson with srsly

* Replace ujson in examples

Use regular json instead of srsly to make code easier to read and follow

* Update requirements

* Fix imports

* Fix typos

* Replace msgpack with srsly

* Fix warning
2018-12-03 01:28:22 +01:00
Gavriel Loria
919729d38c replace user-facing references to "sbd" with "sentencizer" (#2985)
## Description
Fixes #2693
Previously, the tokens `sbd` and `sentencizer` would create the same nlp pipe. Internally, both would be called `sbd`. This setup became problematic because it was hard for a user relying on the `sentencizer` pipe name to realize that their pipe's name would be `sbd` for all functions other than creating a pipe. This PR intends to change the API and API documentation to fully support `sentencizer` and drop any user-facing references to `sbd`.

### Types of change
end-user API bug

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-11-30 21:22:40 +01:00
Ines Montani
add6469225 Add "new in v2.0.12" note to Span.ents (closes #2986) 2018-11-30 20:50:55 +01:00
Ines Montani
37c7c85a86 💫 New JSON helpers, training data internals & CLI rewrite (#2932)
* Support nowrap setting in util.prints

* Tidy up and fix whitespace

* Simplify script and use read_jsonl helper

* Add JSON schemas (see #2928)

* Deprecate Doc.print_tree

Will be replaced with Doc.to_json, which will produce a unified format

* Add Doc.to_json() method (see #2928)

Converts Doc objects to JSON using the same unified format as the training data. Method also supports serializing selected custom attributes in the doc._. space.

* Remove outdated test

* Add write_json and write_jsonl helpers

* WIP: Update spacy train

* Tidy up spacy train

* WIP: Use wasabi for formatting

* Add GoldParse helpers for JSON format

* WIP: add debug-data command

* Fix typo

* Add missing import

* Update wasabi pin

* Add missing import

* 💫 Refactor CLI (#2943)

To be merged into #2932.

## Description
- [x] refactor CLI To use [`wasabi`](https://github.com/ines/wasabi)
- [x] use [`black`](https://github.com/ambv/black) for auto-formatting
- [x] add `flake8` config
- [x] move all messy UD-related scripts to `cli.ud`
- [x] make converters function that take the opened file and return the converted data (instead of having them handle the IO)

### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

* Update wasabi pin

* Delete old test

* Update errors

* Fix typo

* Tidy up and format remaining code

* Fix formatting

* Improve formatting of messages

* Auto-format remaining code

* Add tok2vec stuff to spacy.train

* Fix typo

* Update wasabi pin

* Fix path checks for when train() is called as function

* Reformat and tidy up pretrain script

* Update argument annotations

* Raise error if model language doesn't match lang

* Document new train command
2018-11-30 20:16:14 +01:00
wxv
06820ef6e7 Fix is_ascii documentation and create contributor file (#2988)
Proposed in #2933
2018-11-30 15:57:58 +01:00
Ben Batorsky
658f7e0dc8 OntoNotes url fix (#2981)
The website for OntoNotes 5 is: https://catalog.ldc.upenn.edu/LDC2013T19, currently the named entity section has it as https://catalog.ldc.upenn.edu/ldc2013T19.
2018-11-29 19:34:30 +01:00
Ines Montani
d33953037e
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)

* Update _install.jade (#2688)

Typo fix: "models" -> "model"

* Add FAC to spacy.explain (resolves #2706)

* Remove docstrings for deprecated arguments (see #2703)

* When calling getoption() in conftest.py, pass a default option (#2709)

* When calling getoption() in conftest.py, pass a default option

This is necessary to allow testing an installed spacy by running:

  pytest --pyargs spacy

* Add contributor agreement

* update bengali token rules for hyphen and digits (#2731)

* Less norm computations in token similarity (#2730)

* Less norm computations in token similarity

* Contributor agreement

* Remove ')' for clarity (#2737)

Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.

* added contributor agreement for mbkupfer (#2738)

* Basic support for Telugu language (#2751)

* Lex _attrs for polish language (#2750)

* Signed spaCy contributor agreement

* Added polish version of english lex_attrs

* Introduces a bulk merge function, in order to solve issue #653 (#2696)

* Fix comment

* Introduce bulk merge to increase performance on many span merges

* Sign contributor agreement

* Implement pull request suggestions

* Describe converters more explicitly (see #2643)

* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]

* Fix formatting

* Fix dependency scheme docs (closes #2705) [ci skip]

* Don't set stop word in example (closes #2657) [ci skip]

* Add words to portuguese language _num_words (#2759)

* Add words to portuguese language _num_words

* Add words to portuguese language _num_words

* Update Indonesian model (#2752)

* adding e-KTP in tokenizer exceptions list

* add exception token

* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception

* add tokenizer exceptions list

* combining base_norms with norm_exceptions

* adding norm_exception

* fix double key in lemmatizer

* remove unused import on punctuation.py

* reformat stop_words to reduce number of lines, improve readibility

* updating tokenizer exception

* implement is_currency for lang/id

* adding orth_first_upper in tokenizer_exceptions

* update the norm_exception list

* remove bunch of abbreviations

* adding contributors file

* Fixed spaCy+Keras example (#2763)

* bug fixes in keras example

* created contributor agreement

* Adding French hyphenated first name (#2786)

* Fix typo (closes #2784)

* Fix typo (#2795) [ci skip]

Fixed typo on line 6 "regcognizer --> recognizer"

* Adding basic support for Sinhala language. (#2788)

* adding Sinhala language package, stop words, examples and lex_attrs.

* Adding contributor agreement

* Updating contributor agreement

* Also include lowercase norm exceptions

* Fix error (#2802)

* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way.  Use the resize function

* added spaCy Contributor Agreement

* Add charlax's contributor agreement (#2805)

* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)

* Contributors agreement

* Contributors agreement

* Contributors agreement

* Add jupyter=True to displacy.render in documentation (#2806)

* Revert "Also include lowercase norm exceptions"

This reverts commit 70f4e8adf3.

* Remove deprecated encoding argument to msgpack

* Set up dependency tree pattern matching skeleton (#2732)

* Fix bug when too many entity types. Fixes #2800

* Fix Python 2 test failure

* Require older msgpack-numpy

* Restore encoding arg on msgpack-numpy

* Try to fix version pin for msgpack-numpy

* Update Portuguese Language (#2790)

* Add words to portuguese language _num_words

* Add words to portuguese language _num_words

* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols

* Extended punctuation and norm_exceptions in the Portuguese language

* Correct error in spacy universe docs concerning spacy-lookup (#2814)

* Update Keras Example for (Parikh et al, 2016) implementation  (#2803)

* bug fixes in keras example

* created contributor agreement

* baseline for Parikh model

* initial version of parikh 2016 implemented

* tested asymmetric models

* fixed grevious error in normalization

* use standard SNLI test file

* begin to rework parikh example

* initial version of running example

* start to document the new version

* start to document the new version

* Update Decompositional Attention.ipynb

* fixed calls to similarity

* updated the README

* import sys package duh

* simplified indexing on mapping word to IDs

* stupid python indent error

* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround

* Fix typo (closes #2815) [ci skip]

* Update regex version dependency

* Set version to 2.0.13.dev3

* Skip seemingly problematic test

* Remove problematic test

* Try previous version of regex

* Revert "Remove problematic test"

This reverts commit bdebbef455.

* Unskip test

* Try older version of regex

* 💫 Update training examples and use minibatching (#2830)

<!--- Provide a general summary of your changes in the title. -->

## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.

### Types of change
enhancements

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

* Visual C++ link updated (#2842) (closes #2841) [ci skip]

* New landing page

* Add contribution agreement

* Correcting lang/ru/examples.py (#2845)

* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement

* Correct some grammatical inaccuracies in lang\ru\examples.py

* Move contributor agreement to separate file

* Set version to 2.0.13.dev4

* Add Persian(Farsi) language support (#2797)

* Also include lowercase norm exceptions

* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors

* Rule-based French Lemmatizer (#2818)

<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech 
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.

* Set version to 2.0.13

* Fix formatting and consistency

* Update docs for new version [ci skip]

* Increment version [ci skip]

* Add info on wheels [ci skip]

* Adding "This is a sentence" example to Sinhala (#2846)

* Add wheels badge

* Update badge [ci skip]

* Update README.rst [ci skip]

* Update murmurhash pin

* Increment version to 2.0.14.dev0

* Update GPU docs for v2.0.14

* Add wheel to setup_requires

* Import prefer_gpu and require_gpu functions from Thinc

* Add tests for prefer_gpu() and require_gpu()

* Update requirements and setup.py

* Workaround bug in thinc require_gpu

* Set version to v2.0.14

* Update push-tag script

* Unhack prefer_gpu

* Require thinc 6.10.6

* Update prefer_gpu and require_gpu docs [ci skip]

* Fix specifiers for GPU

* Set version to 2.0.14.dev1

* Set version to 2.0.14

* Update Thinc version pin

* Increment version

* Fix msgpack-numpy version pin

* Increment version

* Update version to 2.0.16

* Update version [ci skip]

* Redundant ')' in the Stop words' example (#2856)

<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.

* Documentation improvement regarding joblib and SO (#2867)

Some documentation improvements

## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)

### Types of change
Documentation

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

* raise error when setting overlapping entities as doc.ents (#2880)

* Fix out-of-bounds access in NER training

The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!

This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.

* Change PyThaiNLP Url (#2876)

* Fix missing comma

* Add example showing a fix-up rule for space entities

* Set version to 2.0.17.dev0

* Update regex version

* Revert "Update regex version"

This reverts commit 62358dd867.

* Try setting older regex version, to align with conda

* Set version to 2.0.17

* Add spacy-js to universe [ci-skip]

* Add spacy-raspberry to universe (closes #2889)

* Add script to validate universe json [ci skip]

* Removed space in docs + added contributor indo (#2909)

* - removed unneeded space in documentation

* - added contributor info

* Allow input text of length up to max_length, inclusive (#2922)

* Include universe spec for spacy-wordnet component (#2919)

* feat: include universe spec for spacy-wordnet component

* chore: include spaCy contributor agreement

* Minor formatting changes [ci skip]

* Fix image [ci skip]

Twitter URL doesn't work on live site

* Check if the word is in one of the regular lists specific to each POS (#2886)

* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)

Resolves #2924.

## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)

### Types of change
bug fix

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

* Fix typo [ci skip]

* fixes symbolic link on py3 and windows (#2949)

* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948

* Update spacy/compat.py

Co-Authored-By: cicorias <cicorias@users.noreply.github.com>

* Fix formatting

* Update universe [ci skip]

* Catalan Language Support (#2940)

* Catalan language Support

* Ddding Catalan to documentation

* Sort languages alphabetically [ci skip]

* Update tests for pytest 4.x (#2965)

<!--- Provide a general summary of your changes in the title. -->

## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

* Fix regex pin to harmonize with conda (#2964)

* Update README.rst

* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)

Fixes #2976

* Fix typo

* Fix typo

* Remove duplicate file

* Require thinc 7.0.0.dev2

Fixes bug in gpu_ops that would use cupy instead of numpy on CPU

* Add missing import

* Fix error IDs

* Fix tests
2018-11-29 16:30:29 +01:00
Ines Montani
c80c20e1ec Sort languages alphabetically [ci skip] 2018-11-26 15:37:53 +01:00
Marc Puig
98fe1ab259 Catalan Language Support (#2940)
* Catalan language Support

* Ddding Catalan to documentation
2018-11-26 15:25:47 +01:00
Ines Montani
1844bc238a Update universe [ci skip] 2018-11-26 14:16:22 +01:00
Ines Montani
696acb0f92 Fix typo [ci skip] 2018-11-24 15:20:57 +01:00
Ines Montani
dfcc8f02af Fix image [ci skip]
Twitter URL doesn't work on live site
2018-11-14 01:01:33 +01:00
Ines Montani
1aa91e926f Minor formatting changes [ci skip] 2018-11-13 23:59:59 +01:00
Francisco Aranda
be99f1cac5 Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component

* chore: include spaCy contributor agreement
2018-11-13 23:54:46 +01:00
mikelibg
75e7d503b7 Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation

* - added contributor info
2018-11-08 14:18:25 +01:00
Ines Montani
11db4d2f27 Add script to validate universe json [ci skip] 2018-11-06 12:50:41 +01:00
Ines Montani
a9fda638a9 Add spacy-raspberry to universe (closes #2889) 2018-11-06 12:45:50 +01:00
Ines Montani
c235ddf44f Add spacy-js to universe [ci-skip] 2018-11-06 12:45:03 +01:00
Bram Vanroy
071789467e Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements

## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)

### Types of change
Documentation

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-10-24 15:19:17 +02:00
Roman
5766d09a5b Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-10-18 10:21:16 +02:00
Ines Montani
c6a320cad4 Update version [ci skip] 2018-10-15 16:42:35 +02:00
Ines Montani
f02bb08f39 Update prefer_gpu and require_gpu docs [ci skip] 2018-10-14 23:30:44 +02:00
Ines Montani
5a4c5b78a8 Update GPU docs for v2.0.14 2018-10-14 16:38:12 +02:00
Ines Montani
ac4cadd31d Add info on wheels [ci skip] 2018-10-14 00:04:37 +02:00
Ines Montani
30aa7f8b20 Increment version [ci skip] 2018-10-13 23:55:50 +02:00
Ines Montani
23d5b4ff5b Update docs for new version [ci skip] 2018-10-13 23:53:33 +02:00
Ines Montani
f0e7da6478 Fix formatting and consistency 2018-10-13 23:53:26 +02:00
Jacopo Farina
42c42376a3 Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page

* Add contribution agreement
2018-10-12 14:59:45 +02:00
Ines Montani
7806deceb4 Fix typo (closes #2815) [ci skip] 2018-10-01 10:49:29 +02:00
Ioannis Daras
405a826436 Correct error in spacy universe docs concerning spacy-lookup (#2814) 2018-10-01 10:24:50 +02:00
Charles-Axel Dein
014dd47c70 Add jupyter=True to displacy.render in documentation (#2806) 2018-09-27 12:28:04 +02:00
Pranshu Jethmalani
9fd27d777e Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
2018-09-25 12:12:40 +02:00
Ines Montani
3c4e3ade30 Fix typo (closes #2784) 2018-09-21 10:45:11 +02:00
Ines Montani
5001d31be6 Don't set stop word in example (closes #2657) [ci skip] 2018-09-12 15:36:51 +02:00
Ines Montani
4e89cfaae1 Fix dependency scheme docs (closes #2705) [ci skip] 2018-09-12 15:32:26 +02:00
Ines Montani
0729d1edca Fix formatting 2018-09-12 15:32:08 +02:00
Ines Montani
907df53904 Add multi-threading note to Language.pipe (resolves #2582) [ci skip] 2018-09-12 15:03:30 +02:00
Ines Montani
885691a7ab
Describe converters more explicitly (see #2643) 2018-09-12 14:53:03 +02:00
Steve Sharp
ca747f58a4 Update _install.jade (#2688)
Typo fix: "models" -> "model"
2018-08-22 13:16:04 +02:00
Ines Montani
aeb49eb625 Update version [ci skip] 2018-08-16 16:56:02 +02:00
Ines Montani
a0eacd3293 Merge branch 'master' into develop 2018-08-16 16:55:05 +02:00
Ines Montani
c0fa9903f4 Update model directory JS [ci skip]
Prevent the default release URL from being overwritten and add license type
2018-08-16 16:54:50 +02:00
Ines Montani
03f661fefb Add Greek to models directory [ci skip] 2018-08-16 16:51:56 +02:00
Ines Montani
fd9d175a53 Update live code [ci skip] 2018-08-15 15:28:48 +02:00
Matthew Honnibal
4336397ecb Update develop from master 2018-08-14 03:04:28 +02:00
Wojciech Łukasiewicz
3953e967a0 User correct variable name in the examples (#2664)
* correct naming

* add contributor agreement
2018-08-13 22:21:24 +02:00
Ines Montani
71723cece1 Add note on visualizing long texts ans sentences (see #2636) [ci skip] 2018-08-08 15:28:21 +02:00
Ines Montani
6147bd3eb4 Fix link target (closes #2645) [ci skip] 2018-08-08 15:03:52 +02:00
Ines Montani
8c47da1f19 Update Language serialization docs (see #2628) [ci skip]
Add note on using from_disk and from_bytes via subclasses and add example
2018-08-07 14:17:57 +02:00
Matthew Honnibal
664cfc29bc Merge branch 'master' of https://github.com/explosion/spaCy 2018-08-07 10:49:39 +02:00
Matthew Honnibal
2278c9734e Fix spelling error #2640 2018-08-07 10:49:21 +02:00
Xiaoquan Kong
f0c9652ed1 New Feature: display more detail when Error E067 (#2639)
* Fix off-by-one error

* Add verbose option

* Update verbose option

* Update documents for verbose option
2018-08-07 10:45:29 +02:00
Ines Montani
6a4360e425 Update universe [ci skip] 2018-08-02 17:33:08 +02:00
Sami
dbc993f5b3 Updating description and code snippet spacy-lefff (#2623)
* updating description and code snippet spacy-lefff

* contributors agreement
2018-08-02 17:25:27 +02:00
Vikas Kumar Yadav
d3e21aad64 Update _benchmarks.jade (#2618) 2018-08-02 00:28:28 +02:00
Brian Phillips
8227de0099 Update language.jade (#2616) 2018-07-31 12:34:42 +02:00
Ioannis Daras
055cc0de44 Bug fix to pseudocode for tokenizer customization (#2604) 2018-07-27 11:04:12 +02:00
Andriy Mulyar
e9ef51137d Fixed typo (#2596)
Changed 'The index of the first character after the span.' to The index of the last character after the span' in description of doc.char_span
2018-07-25 22:17:15 +02:00
Ines Montani
75f3234404
💫 Refactor test suite (#2568)
## Description

Related issues: #2379 (should be fixed by separating model tests)

* **total execution time down from > 300 seconds to under 60 seconds** 🎉
* removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure
* changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version)
* merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways)
* tidied up and rewrote existing tests wherever possible

### Todo

- [ ] move tests to `/tests` and adjust CI commands accordingly
- [x] move model test suite from internal repo to `spacy-models`
- [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~
- [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted
- [ ] update documentation on how to run tests


### Types of change
enhancement, tests

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-07-24 23:38:44 +02:00
kororo
b1ec827ee0 Fix typo (#2579)
Update slogan, desc and code snippet to latest version
2018-07-24 22:47:33 +02:00
ines
cd687091fb Remove nl examples from widget for now [ci skip]
Restore for next spaCy version when path to example sentences is fixed
2018-07-24 22:41:20 +02:00
ines
2d8ffb8bcd Fix formatting 2018-07-24 22:40:49 +02:00
ines
1b3da8d2ae Update website for v2.0.12 [ci skip] 2018-07-24 21:04:22 +02:00
ines
ae5ed2d698 Update docs for v2.0.12 [ci skip] 2018-07-21 15:51:44 +02:00
ines
d517dd4297 Document remove_extension methods 2018-07-21 15:51:28 +02:00
ines
153f41a5cc Use better examples for Doc extension methods 2018-07-21 15:51:11 +02:00
ines
3c30d1763c Merge branch 'master' into develop 2018-07-21 15:34:18 +02:00
kororo
2784babef9 Add ExcelCy into Universe list (#2572)
Hi guys,

This is my first spaCy extension. I am excited to able to do this. Please do let me know if there is any suggestions or modifications I need to do. Feel free to use/contribute the repo that I made.

## Description
ExcelCy is a SpaCy toolkit to help improve the data training experiences. It provides easy annotation using Excel file format. It has helper to pre-train entity annotation with phrase and regex matcher pipe.

### Types of change
Update to Universe list in website.

## Checklist
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-07-19 19:28:33 +02:00
ines
80e7485630 Merge branch 'master' into develop 2018-07-18 17:28:47 +02:00
Xiang Ji
19a5ef1c58 Fix venv command examples (#2560) [ci skip]
* Fix venv command examples

The documentation refers to `venv`, which is native to Python3.
However, the command examples are as if they were still `virtualenv`,
which is a package independent of `venv`:

- It doesn't need to be installed via `pip`. In fact `pip install venv` would
return an error.
- The correct way to invoke `venv` is `python3 -m venv`, not `venv`, which would
return command not found.

See https://docs.python.org/3/library/venv.html

I suspect the documentation simply replaced all occurrences of `virtualenv` with
`venv`. However they are different modules and are used differently.

* Update comment [ci skip]
2018-07-18 10:31:24 +02:00
ines
50c367ee96 Update meta [ci skip] 2018-07-10 13:51:45 +02:00
ines
3a321e79ac Merge branch 'master' into develop 2018-07-10 13:49:08 +02:00
ines
71bfc92913 Exclude models for non-stable versions [ci skip] 2018-07-10 13:44:55 +02:00
ines
b5200962c0 Adjust formatting [ci skip] 2018-07-09 18:35:46 +02:00
Alex Villarreal
bd35bf7f09 Guidance to handle binary files in git in Windows (#2526)
Adds guidance on what to do if users encounter the error described in [1634](https://github.com/explosion/spaCy/issues/1634), which probably only happens in Windows environments.
2018-07-09 18:31:37 +02:00
ines
f575b01595 Update language and license meta [ci skip] 2018-07-04 15:09:36 +02:00
ines
63666af328 Merge branch 'master' into develop 2018-07-04 14:52:25 +02:00
Matthew Honnibal
a85620a731 Note CoreNLP tokenizer correction on website 2018-07-02 11:35:31 +02:00
ines
06c6dc6fbc Update Juniper [ci skip] 2018-06-28 11:48:17 +02:00
Nipun Sadvilkar
741ba80bd5 Train model command n_iteration 20 -> 30 (#2454)
In source code `train.py` default Number of iterations  is 30
2018-06-18 11:57:08 +02:00
ines
53a2bc8c8d Only scroll sidebar item into view if needed [ci skip] 2018-06-12 10:58:50 +02:00
ines
65713a6593 Increment versions [ci skip] 2018-06-12 10:49:50 +02:00
Ines Montani
968f6f0bda
💫 Document Cython API (#2433)
## Description

This PR adds the most relevant documentation of spaCy's Cython API.

(Todo for when we publish this: rewrite `/api/#section-cython` and `/api/#cython` to `/api/cython#conventions`.)

### Types of change
docs

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-06-11 17:47:46 +02:00
GolanLevy
72d7e80f94 adding a missing apostrophe (#2436) 2018-06-11 17:47:24 +02:00
ines
778e5f4da3 Merge branch 'master' into develop 2018-06-11 00:38:04 +02:00
himkt
57311d5d47 replace janome with mecab in the documentation and the test (#2415)
* Add links to Reddit data (see #2401)

* replace janome with mecab in the documentation and the test

* add the assignment
2018-06-11 00:33:13 +02:00
ines
effb55d591 Adjust formatting [ci skip] 2018-06-11 00:29:13 +02:00
Nathan Breit
ba6d2cf393 Add EpiTator to Universe (#2429) 2018-06-11 00:24:13 +02:00
himkt
1a568f2e08 fix wrong documentations (#2423) 2018-06-11 00:21:06 +02:00
Bohdan Moskalevskyi
d66292f767 fix UD data file extensions (#2425)
* fix UD data files extension

* add contributor agreement for msklvsk
2018-06-08 14:26:11 +02:00
ines
a0017e4909 Merge branch 'master' into develop 2018-05-30 14:10:47 +02:00
ines
0baaf836cf Update formatting [ci skip] 2018-05-30 13:32:49 +02:00
ines
3913e18201 Add self-attentive-parser to universe (see #59) 2018-05-30 13:31:28 +02:00
ines
4a62486340 Merge branch 'master' into develop 2018-05-30 13:01:01 +02:00
ines
605c663a4c Fix HTML merger examples (see #2390) 2018-05-30 12:22:32 +02:00
ines
d0b16aa014 Update list of languages 2018-05-26 18:56:26 +02:00
Samuel Pouyt
5f988b8e9c Update _custom.jade (#2372)
It seems based on the doc and trying out that the `en` or `[lang]` is missing from the `spacy model-init`
2018-05-26 18:17:12 +02:00
ines
d84a830d79 Merge branch 'master' of https://github.com/explosion/spaCy 2018-05-26 17:57:05 +02:00
ines
fb923b31ea Fix bad HTML example (see #2376) and turn it into section on matcher + components
Avoid problems caused by merging while matching (e.g. index errors). Creating a Matcher component also better reflects the recommended best practices.
2018-05-26 17:57:02 +02:00
Shantam Raj
592834183a corrected spelling (#2359)
changed **interpretted** to **interpreted**
2018-05-24 13:29:52 +02:00
ines
8adb967e0c Fix from source quickstart instructions for Windows
See: https://stackoverflow.com/a/50478036/6400719
2018-05-24 12:42:16 +02:00
Shantam Raj
1a4682dd0b Update _training.jade (#2340)
* Update _training.jade

Correcting grammar. Replacing "The" with "To".

* Create armsp.md

* Update armsp.md
2018-05-21 11:09:33 +02:00
ines
ff1082d8e4 Add version tag in CLI docs [ci skip] 2018-05-21 01:17:49 +02:00
Ines Montani
d4cc736b7c 💫 Improve model downloads: check for existing install, customise pip and use requests library again (#2346)
* Go back to using requests instead of urllib (closes #2320)

Fewer dependencies are good, but this one was simply causing too many other problems around SSL verification and Python 2/3 compatibility. requests is a popular enough package that it's okay for spaCy to depend on it – and this will hopefully make model downloads less flakey.

* Only download model if not installed (see #1456)

Use #egg=model==version to allow pip to check for existing installations. The download is only started if no installation matching the package/version is found. Fixes a long-standing inconvenience.

* Pass additional options to pip when installing model (resolves #1456)

Treat all additional arguments passed to the download command as pip options to allow user to customise the command. For example:

python -m spacy download en --user

* Add CLI option to enable installing model package dependencies

* Revert "Add CLI option to enable installing model package dependencies"

This reverts commit 9336ffe695.

* Update documentation
2018-05-20 20:26:56 +02:00
vishnumenon
ae3719ece5 Fix the code for FACILITIY entities (#2324)
* Fix the code for FACILITIY entities

As far as I can tell, the default models all use "FAC" rather than "FACILITY"

* Added my Contributor Agreement

* Rename vishnumenon to vishnumenon.md
2018-05-12 15:19:17 +02:00
ines
ac25bc4016 Add docs section on sentence segmentation [ci skip] 2018-05-07 21:25:20 +02:00
ines
14148cd147 Fix formatting and wording 2018-05-07 21:24:35 +02:00
ines
f803da609f Add scattertext [ci skip] 2018-05-07 19:10:23 +02:00
ines
c9547b7b8b Update Juniper (see #2293) 2018-05-03 15:36:02 +02:00
Alex Villarreal
647f2544c5 Fix code sample for span.set_extension (#2286) 2018-05-03 00:39:22 +02:00
Alex Villarreal
13d562e1a4 Fix code sample for Doc.set_extension (#2282)
* Fix code sample for `set_extension`

The previous sample code for `set_extension` fails the assertion at the end, because `city_getter` it checked if the whole document text matches any of the city names. Now it checks if any of the city names is contained in the document text.

* Contributor agreement
2018-05-02 10:16:05 +02:00
Shirish Kadam
d98a90440f Added Adam project to spaCy Universe (#2275)
* Added 5hirish to contributors

* Added Adam Qas Project to spaCy Universe

* Remove $ from code example
2018-04-30 22:25:01 +02:00
ines
56e7faf16b Fix spacing 2018-04-30 22:24:40 +02:00
ines
6efb4cdf88 Use Juniper and tidy up 2018-04-30 18:48:35 +02:00
ines
45bb8d75a5 Fix overflow issues on small screens [ci skip] 2018-04-29 03:17:36 +02:00
Ines Montani
49cee4af92
💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)
* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label
2018-04-29 02:06:46 +02:00
ines
a512fa60ef Remove upcoming option from docs for now 2018-04-28 23:32:18 +02:00
ines
6fb6371670 Add collapse_phrases option to displacy (closes #2266) 2018-04-28 23:06:50 +02:00
Matt Upson
87cc6b3599 Add missing comma to NN example in docs (#2255)
Also add a completed contributor agreement.
2018-04-28 14:56:00 +02:00
ines
4a3bea00c7 Update resources [ci skip] 2018-04-26 22:10:34 +02:00
Pradeep Kumar Tippa
df389e5b74 spacy-101 vocab doc giving valid variable names (#2236) 2018-04-18 14:54:26 -07:00
ines
ce63f8997b Update init-model docs 2018-04-10 21:42:54 +02:00
ines
0e847d7fe5 Fix typo 2018-04-09 14:51:14 +02:00
ines
de137fba84 Add TensorBoard examples to examples overview [ci skip] 2018-04-03 16:01:52 +02:00
ines
6d87b28f15 Add Vietnamese to language overview [ci skip] 2018-04-03 16:01:36 +02:00
ines
9615ed5ed7 Update emoji/hashtag matcher example (resolves #2156) [ci skip] 2018-03-28 18:41:28 +02:00
ines
ce6071ca89 Remove ftfy dependency and update docs 2018-03-28 12:09:42 +02:00
ines
5ecc60cf3b Add book to resources [ci skip] 2018-03-24 17:12:56 +01:00
ines
53680642af Port over docs changes [ci skip] 2018-03-24 17:12:48 +01:00
Matthew Honnibal
f9f46e5a07 Revert matcher fixes from GregDubbin 2018-02-18 10:59:28 +01:00
ines
612c79a4f5 Update first matcher example and match_id (resolves #1989) 2018-02-17 11:57:38 +01:00
ines
ca56fb53d1 Add user survey to navigation [ci skip] 2018-02-15 12:14:30 +01:00
ines
cab5b775e7 Document ENT_TYPE matcher attribute [ci skip] 2018-02-15 12:14:19 +01:00
Pradeep Kumar Tippa
416cd021ce Added TAG from spacy symbols which used below 2018-02-09 19:16:59 +05:30
Pradeep Kumar Tippa
01cc9cd9c0 assert statement syntax fix in doc 2018-02-09 19:16:25 +05:30
Pradeep Kumar Tippa
a78062e466 Merge remote-tracking branch 'upstream/master' into web-doc-patches 2018-02-09 19:13:19 +05:30
ines
ab33e274f5 Add more details on symlink error & Windows solution (resolves #1941) [ci skip] 2018-02-09 10:43:33 +01:00
ines
8eaa934382 Merge branch 'master' of https://github.com/explosion/spaCy 2018-02-09 10:23:36 +01:00
ines
e9f67be04d Fix regex flag matcher example (resolves #1950) 2018-02-09 10:23:33 +01:00
ines
fc4ae04c55 Document LENGTH attribute in matcher 2018-02-09 10:23:03 +01:00