Commit Graph

1385 Commits

Author SHA1 Message Date
Nirant
638caba9b5 Add multiple packages to universe.json (#3809) [ci skip]
* Add multiple packages to universe.json

Added following packages: NLPArchitect, NLPRe, Chatterbot, alibi, NeuroNER

* Auto-format

* Update slogan (probably just copy-paste mistake)

* Adjust formatting

* Update tags / categories
2019-06-02 12:35:52 +02:00
Nirant
d4d1eab5e1 Add Baderlab/saber to universe.json (#3806) 2019-06-01 17:36:40 +02:00
Ines Montani
6be7d07315
Update UNIVERSE.md 2019-06-01 16:37:06 +02:00
Ines Montani
0c74506c9c Fix typos in docs (closes #3802) [ci skip] 2019-06-01 11:35:01 +02:00
Nipun Sadvilkar
1f13005751 Incorrect Token attribute ent_iob_ description (#3800)
* Incorrect Token attribute ent_iob_ description

* Add spaCy contributor agreement
2019-05-31 16:50:45 +02:00
Ramanan Balakrishnan
26c37c5a4d fix all references to BILUO annotation format (#3797) 2019-05-31 12:19:19 +02:00
mak
89379a7fa4 Corrected example model URL in requirements.txt (#3786)
The URL used to show how to add a model to the requirements.txt had the old release path (excl. explosion).
2019-05-29 10:51:55 +02:00
Ines Montani
7634812172 Document Language.evaluate 2019-05-24 14:06:36 +02:00
Ines Montani
45e6855550 Update Language.update docs 2019-05-24 14:06:26 +02:00
Ines Montani
b78a8dc1d2 Update Scorer and add API docs 2019-05-24 14:06:04 +02:00
Ines Montani
321c9f5acc Fix lex_id docs (closes #3743) 2019-05-16 23:15:58 +02:00
Ines Montani
f96af8526a Merge branch 'spacy.io' [ci skip] 2019-05-11 23:03:56 +02:00
Ines Montani
7534f7cb44 Fix return value of Language.update (closes #3692) 2019-05-11 18:40:19 +02:00
Ines Montani
503b8c85f1 Add TWiML podcast to universe [ci skip] 2019-05-11 17:48:22 +02:00
Ines Montani
0daf2422a3 Auto-format 2019-05-11 17:48:07 +02:00
devforfu
21af12eb53 Make "text" key in JSONL format optional when "tokens" key is provided (#3721)
* Fix issue with forcing text key when it is not required

* Extending the docs to reflect the new behavior
2019-05-11 15:41:29 +02:00
Ines Montani
6cfa1e1f47 Fix DependencyParser.predict docs (resolves #3561) 2019-05-11 15:37:54 +02:00
Ines Montani
25f5592d57 Improve Token.prob and Lexeme.prob docs (resolves #3701) 2019-05-11 15:23:41 +02:00
Aaron Kub
719a15f23d fixing regex matcher examples (#3708) (#3719) 2019-05-10 14:23:52 +02:00
Ines Montani
65b55f1aaa Add version tag to --base-model argument (closes #3720) 2019-05-10 14:06:47 +02:00
richardpaulhudson
a1e07f0d14 Request to include Holmes in spaCy Universe (#3685)
* Request to add Holmes to spaCy Universe

Dear spaCy team, I would be grateful if you would consider my Python library Holmes for inclusion in the spaCy Universe. Holmes transforms the syntactic structures delivered by spaCy into semantic structures that, together with various other techniques including ontological matching and word embeddings, serve as the basis for information extraction. Holmes supports several use cases including chatbot, structured search, topic matching and supervised document classification. I had the basic idea for Holmes around 15 years ago and now spaCy has made it possible to build an implementation that is stable and fast enough to actually be of use - thank you! At present Holmes supports English and German (I am based in Munich) but could easily be extended to support any other language with a spaCy model.

* Added
2019-05-08 02:42:03 +02:00
Ines Montani
505c9e0e19 Add util.filter_spans helper (#3686) 2019-05-08 02:33:40 +02:00
Bram Vanroy
8e6f8deaf6 Re-added Universe readme (#3688) (closes #3680) 2019-05-06 21:08:01 +02:00
Ines Montani
b4d142e3c4 Adjust wording and formatting [ci skip] 2019-05-03 12:00:31 +02:00
d5555
ba4bcbf285 Update universe.json (#3653) [ci skip]
* Update universe.json

* Update universe.json
2019-05-03 11:50:12 +02:00
张晓飞
ba1ff00370 update response after calling add_pipe (#3661)
* update response after calling add_pipe

component:print_info is appened in the last, so need show it at the end of  pipeline

* Create henry860916.md
2019-05-01 12:02:18 +02:00
Ramiro Gómez
8ee4100f8f Remove dangling M (#3657)
I assume this is a typo. Sorry if it has a meaning that I'm not aware of.
2019-04-29 19:44:43 +02:00
Amit Chaudhary
167d63af31 Fix broken link to Dive Into Python 3 website (#3656)
* Fix broken link to Dive Into Python 3 website

* Sign spaCy Contributor Agreement
2019-04-29 19:44:00 +02:00
Brad Jascob
6fcafcc564 Doc changes for local website setup (#3651) 2019-04-27 13:28:23 +02:00
Ivan Tham
fa94f83697 Improve redundant variable name (#3643)
* Improve redundant variable name

* Apply suggestions from code review

Co-Authored-By: pickfire <pickfire@riseup.net>
2019-04-26 16:50:14 +02:00
Ines Montani
dc87fb805d Merge branch 'master' of https://github.com/explosion/spaCy 2019-04-26 13:17:57 +02:00
Ines Montani
62060ae9c6 Merge branch 'spacy.io' 2019-04-26 13:17:52 +02:00
Brad Jascob
9afa0d6723 Update Universe Website for pyInflect (#3641) 2019-04-26 13:17:36 +02:00
Ines Montani
db7c0dbfd6 Update seo.js 2019-04-23 18:39:30 +02:00
Ines Montani
ec0d840ab5 Document early stopping 2019-04-22 14:31:32 +02:00
Ines Montani
1d567913f9 Update spacy evaluate example 2019-04-22 14:28:42 +02:00
Ines Montani
7917ce2f73 Make flag shortcut consistent and document 2019-04-22 14:23:44 +02:00
Ines Montani
52658c80d5 Allow jupyter=False to override Jupyter mode (closes #3598) 2019-04-22 14:18:32 +02:00
Motoki Wu
8e2cef49f3 Add save after --save-every batches for spacy pretrain (#3510)
<!--- Provide a general summary of your changes in the title. -->

When using `spacy pretrain`, the model is saved only after every epoch. But each epoch can be very big since `pretrain` is used for language modeling tasks. So I added a `--save-every` option in the CLI to save after every `--save-every` batches.

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

To test...

Save this file to `sample_sents.jsonl`

```
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
```

Then run `--save-every 2` when pretraining.

```bash
spacy pretrain sample_sents.jsonl en_core_web_md here -nw 1 -bs 1 -i 10 --save-every 2
```

And it should save the model to the `here/` folder after every 2 batches. The models that are saved during an epoch will have a `.temp` appended to the save name.

At the end the training, you should see these files (`ls here/`):

```bash
config.json     model2.bin      model5.bin      model8.bin
log.jsonl       model2.temp.bin model5.temp.bin model8.temp.bin
model0.bin      model3.bin      model6.bin      model9.bin
model0.temp.bin model3.temp.bin model6.temp.bin model9.temp.bin
model1.bin      model4.bin      model7.bin
model1.temp.bin model4.temp.bin model7.temp.bin
```

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

This is a new feature to `spacy pretrain`.

🌵 **Unfortunately, I haven't been able to test this because compiling from source is not working (cythonize error).** 

```
Processing matcher.pyx
[Errno 2] No such file or directory: '/Users/mwu/github/spaCy/spacy/matcher.pyx'
Traceback (most recent call last):
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 169, in <module>
    run(args.root)
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 158, in run
    process(base, filename, db)
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 124, in process
    preserve_cwd(base, process_pyx, root + ".pyx", root + ".cpp")
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 87, in preserve_cwd
    func(*args)
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 63, in process_pyx
    raise Exception("Cython failed")
Exception: Cython failed
Traceback (most recent call last):
  File "setup.py", line 276, in <module>
    setup_package()
  File "setup.py", line 209, in setup_package
    generate_cython(root, "spacy")
  File "setup.py", line 132, in generate_cython
    raise RuntimeError("Running cythonize failed")
RuntimeError: Running cythonize failed
```

Edit: Fixed! after deleting all `.cpp` files: `find spacy -name "*.cpp" | xargs rm`

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-04-22 14:10:16 +02:00
Ines Montani
7937109ed9 Update link [ci skip] 2019-04-19 16:01:41 +02:00
Ines Montani
0dce4585b1 Add course to 101 2019-04-19 15:59:51 +02:00
Ines Montani
2efc87c382 Remove unused image 2019-04-19 15:48:12 +02:00
Ines Montani
38395d9518 Merge branch 'spacy.io' 2019-04-19 15:26:20 +02:00
Ines Montani
7ac5bb0a7b Update landing and feature overview 2019-04-19 15:23:08 +02:00
fizban99
f2f2df6e78 entity types for colors should be in uppercase (#3599)
although the text indicates the entity types should be in lowercase, the sample code shows uppercase, which is the correct format.
2019-04-17 11:22:56 +02:00
Ines Montani
5289dd1356 Fix formatting 2019-04-13 17:58:26 +02:00
Ines Montani
9e7deeaf48 Remove Datacamp 2019-04-13 17:46:32 +02:00
oterrier
2854724e69 Added project gracyql to Universe (#3570) (resolves #3568)
As discussed with Ines in https://github.com/explosion/spaCy/issues/3568 , adding a new project proposal for the community in SpaCy Universe website

GracyQL a tiny graphql wrapper aroung spacy using graphene and starlette.

## Description
Change only in universe.json file to add a new project

### Types of change
New project reference in Universe

## Checklist
- [x ] I have submitted the spaCy Contributor Agreement.
- [x ] I ran the tests, and all new and existing tests passed.
- [ x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-04-10 17:54:42 +02:00
Santiago Castro
86e4b68aa9 Fix website docs for Vectors.from_glove (#3565)
* Fix website docs for Vectors.from_glove

* Add myself as a contributor
2019-04-10 15:23:27 +02:00
Bharat Raghunathan
72820896d4 Fix typo in web docs cli.md (#3559) 2019-04-09 11:40:03 +02:00
pierremonico
0d26bfe677 Removes duplicate in table (#3550)
* Removes duplicate in table

Just fixing typos.

* Remove newline


Co-authored-by: Ines Montani <ines@ines.io>
2019-04-08 10:30:42 +02:00
Piero Molino
5198aa4ae6 Added Ludwig among the projects (#3548) [ci skip]
* Added Ludwig among the projects

* Create w4nderlust.md

* Add Uber to logo wall
2019-04-07 13:01:26 +02:00
Ines Montani
2f0f439c54 Remove non-existent example (closes #3533) 2019-04-03 09:59:17 +02:00
Ines Montani
b070e0caf7 Update landing.js 2019-03-30 22:26:46 +01:00
Ines Montani
9d1221943b Merge branch 'master' into spacy.io 2019-03-30 20:32:14 +01:00
Ines Montani
037ffdfd3f Add spaCy IRL to landing [ci skip] 2019-03-30 20:32:03 +01:00
Ines Montani
730f759b4f Merge branch 'master' into spacy.io 2019-03-28 15:26:17 +01:00
Ines Montani
7d033a7b89 Fix met a description in universe projects [ci skip] 2019-03-28 15:26:01 +01:00
Ines Montani
fe2cb642ac Merge branch 'master' into spacy.io 2019-03-28 15:13:39 +01:00
David
74e738dd4d adds textpipe to universe (#3500) [ci skip]
* Adds textpipe to universe

* signed contributor agreement

* Adjust formatting, code style and use "standalone" category
2019-03-28 15:13:19 +01:00
Ines Montani
04a9fb1a02 Merge branch 'master' into spacy.io 2019-03-28 13:34:46 +01:00
Samuel Kane
06a1846379 fix(util): fix decaying function output (#3495)
* fix(util): fix decaying function output

* fix(util): better test and adhere to code standards

* fix(util): correct variable name, pytestify test, update website text
2019-03-28 13:24:47 +01:00
Bharat Raghunathan
1db3e47509 DOC: Update tokenizer docs to include default value for batch_size in pipe (#3492) 2019-03-28 12:48:02 +01:00
Ines Montani
2ed16d82bf Fix social image 2019-03-26 18:27:40 +01:00
Ines Montani
9e14b2b69f Add Estonian to docs [ci skip] (closes #3482) 2019-03-25 18:01:54 +01:00
Ines Montani
21ade53ef7 Merge branch 'master' into spacy.io 2019-03-25 13:05:00 +01:00
Ines Montani
db938ab0e3 Update favicon (closes #3475) [ci skip] 2019-03-25 13:04:47 +01:00
Ines Montani
c8c1baaea8 Update binderVersion 2019-03-25 12:17:03 +01:00
Ines Montani
200d8bdb3c Merge branch 'spacy.io' [ci skip] 2019-03-23 16:46:34 +01:00
Ines Montani
1e5b917d75 Fix formatting [ci skip] 2019-03-23 16:45:50 +01:00
Matthew Honnibal
6c783f8045 Bug fixes and options for TextCategorizer (#3472)
* Fix code for bag-of-words feature extraction

The _ml.py module had a redundant copy of a function to extract unigram
bag-of-words features, except one had a bug that set values to 0.
Another function allowed extraction of bigram features. Replace all three
with a new function that supports arbitrary ngram sizes and also allows
control of which attribute is used (e.g. ORTH, LOWER, etc).

* Support 'bow' architecture for TextCategorizer

This allows efficient ngram bag-of-words models, which are better when
the classifier needs to run quickly, especially when the texts are long.
Pass architecture="bow" to use it. The extra arguments ngram_size and
attr are also available, e.g. ngram_size=2 means unigram and bigram
features will be extracted.

* Fix size limits in train_textcat example

* Explain architectures better in docs
2019-03-23 16:44:44 +01:00
Ines Montani
5944cf10c7 Add blog post to v2.1 page 2019-03-23 16:34:23 +01:00
Ines Montani
ffebdad08d Add cheat sheet to spaCy 101 2019-03-23 16:32:55 +01:00
Ines Montani
06bf130890 💫 Add better and serializable sentencizer (#3471)
* Add better serializable sentencizer component

* Replace default factory

* Add tests

* Tidy up

* Pass test

* Update docs
2019-03-23 15:45:02 +01:00
Ines Montani
dcd6e06c47 Improve landing example [ci skip] 2019-03-22 19:02:15 +01:00
Ines Montani
a841324034 Update landing example [ci skip] 2019-03-22 18:50:00 +01:00
Ines Montani
b532386a60 Fix typo [ci skip] 2019-03-22 18:36:17 +01:00
Ines Montani
d8533f0149 Update Binder [ci skip] 2019-03-22 18:16:46 +01:00
Christos Aridas
9cee3f702a Add missing space in landing page (#3462) [ci skip] 2019-03-22 15:17:35 +01:00
Ines Montani
5073ce63fd Merge branch 'spacy.io' [ci skip] 2019-03-22 15:17:11 +01:00
Ines Montani
0712efc6b3 Update version requirements [ci skip] 2019-03-21 10:23:54 +01:00
Ines Montani
764359c952 Merge branch 'master' into spacy.io 2019-03-20 17:24:28 +01:00
Ines Montani
dac8f8ff99 Update Span.__init__ docs (see #3445) [ci skip] 2019-03-20 17:24:17 +01:00
Ines Montani
f7b5ff7907 Move netlify.toml to root 2019-03-19 14:40:14 +01:00
Ines Montani
c6ee030721 Fix docsearch 2019-03-19 14:38:49 +01:00
Ines Montani
0155083e01 Update netlify.toml 2019-03-19 14:07:00 +01:00
Ines Montani
d4eed4a84f Add note on unicode build to troubleshooting guide (see #3421) [ci skip] 2019-03-19 10:27:02 +01:00
Ines Montani
42d4b818e4 Redirect Netlify URL 2019-03-19 10:17:56 +01:00
Ines Montani
1ee97bc282 Add page title fallback, just in case 2019-03-18 18:58:55 +01:00
Ines Montani
728ae7651b Fix universe page titles if no separate title is set 2019-03-18 18:58:46 +01:00
Ines Montani
a20d3772fd FIx responsive landing 2019-03-18 16:24:52 +01:00
Ines Montani
08284f3a11
💫 v2.1.0 launch updates (only merge on launch!) (#3414)
* Update README.md

* Use production docsearch [ci skip]

* Add option to exclude pages from search
2019-03-18 16:07:26 +01:00
Ines Montani
a611b32fbf Update model docs [ci skip] 2019-03-17 11:48:18 +01:00
Matthew Honnibal
62afa64a8d Expose batch size and length caps on CLI for pretrain (#3417)
Add and document CLI options for batch size, max doc length, min doc length for `spacy pretrain`.

Also improve CLI output.

Closes #3216 

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-16 21:38:45 +01:00
Ines Montani
2c5dd4d602 Update Vectors.find docs [ci skip] 2019-03-16 17:10:57 +01:00
Ines Montani
fa0f501165 Use dev DocSearch index 2019-03-15 14:48:38 +01:00
Ines Montani
8af7d01382 Fix general-purpose IDs 2019-03-15 14:48:26 +01:00
Ines Montani
cbcba699dd Fix missing ids 2019-03-14 17:56:53 +01:00
Ines Montani
cffe63ea24 Fix :target padding for ids 2019-03-14 17:41:02 +01:00
Ines Montani
51b7b88acf Generate active sidebar heading (h0) at compile time 2019-03-14 17:20:51 +01:00