1
1
mirror of https://github.com/explosion/spaCy.git synced 2025-03-27 21:34:12 +03:00
Commit Graph

10051 Commits

Author SHA1 Message Date
Amit Chaudhary
167d63af31 Fix broken link to Dive Into Python 3 website ()
* Fix broken link to Dive Into Python 3 website

* Sign spaCy Contributor Agreement
2019-04-29 19:44:00 +02:00
Ramiro Gómez
e7e5999ddc Create yaph.md so I can contribute () 2019-04-29 19:43:06 +02:00
Brad Jascob
6fcafcc564 Doc changes for local website setup () 2019-04-27 13:28:23 +02:00
Ivan Tham
fa94f83697 Improve redundant variable name ()
* Improve redundant variable name

* Apply suggestions from code review

Co-Authored-By: pickfire <pickfire@riseup.net>
2019-04-26 16:50:14 +02:00
Ines Montani
dc87fb805d Merge branch 'master' of https://github.com/explosion/spaCy 2019-04-26 13:17:57 +02:00
Ines Montani
62060ae9c6 Merge branch 'spacy.io' 2019-04-26 13:17:52 +02:00
Brad Jascob
9afa0d6723 Update Universe Website for pyInflect () 2019-04-26 13:17:36 +02:00
Ines Montani
db7c0dbfd6 Update seo.js 2019-04-23 18:39:30 +02:00
Dobita21
721e1fc86c update norm_exceptions ()
* test sPacy commit to git fri 04052019 10:54

* change Data format from my format to master format

* ทัทั้งนี้ ---> ทั้งนี้

* delete stop_word translate from Eng

* Adjust formatting and readability

* add Thai norm_exception

* Add Dobita21 SCA

* editรึ : หรือ,

* Update Dobita21.md

* Auto-format

* Integrate norms into language defaults

* add acronym and some norm exception words
2019-04-23 12:48:03 +02:00
Ines Montani
ec0d840ab5 Document early stopping 2019-04-22 14:31:32 +02:00
Ines Montani
e0f487f904 Rename early_stopping_iter to n_early_stopping 2019-04-22 14:31:25 +02:00
Ines Montani
9767427669 Auto-format 2019-04-22 14:31:11 +02:00
Ines Montani
1d567913f9 Update spacy evaluate example 2019-04-22 14:28:42 +02:00
Ines Montani
7917ce2f73 Make flag shortcut consistent and document 2019-04-22 14:23:44 +02:00
Ines Montani
52658c80d5 Allow jupyter=False to override Jupyter mode (closes ) 2019-04-22 14:18:32 +02:00
Motoki Wu
8e2cef49f3 Add save after --save-every batches for spacy pretrain ()
<!--- Provide a general summary of your changes in the title. -->

When using `spacy pretrain`, the model is saved only after every epoch. But each epoch can be very big since `pretrain` is used for language modeling tasks. So I added a `--save-every` option in the CLI to save after every `--save-every` batches.

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

To test...

Save this file to `sample_sents.jsonl`

```
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
```

Then run `--save-every 2` when pretraining.

```bash
spacy pretrain sample_sents.jsonl en_core_web_md here -nw 1 -bs 1 -i 10 --save-every 2
```

And it should save the model to the `here/` folder after every 2 batches. The models that are saved during an epoch will have a `.temp` appended to the save name.

At the end the training, you should see these files (`ls here/`):

```bash
config.json     model2.bin      model5.bin      model8.bin
log.jsonl       model2.temp.bin model5.temp.bin model8.temp.bin
model0.bin      model3.bin      model6.bin      model9.bin
model0.temp.bin model3.temp.bin model6.temp.bin model9.temp.bin
model1.bin      model4.bin      model7.bin
model1.temp.bin model4.temp.bin model7.temp.bin
```

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

This is a new feature to `spacy pretrain`.

🌵 **Unfortunately, I haven't been able to test this because compiling from source is not working (cythonize error).** 

```
Processing matcher.pyx
[Errno 2] No such file or directory: '/Users/mwu/github/spaCy/spacy/matcher.pyx'
Traceback (most recent call last):
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 169, in <module>
    run(args.root)
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 158, in run
    process(base, filename, db)
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 124, in process
    preserve_cwd(base, process_pyx, root + ".pyx", root + ".cpp")
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 87, in preserve_cwd
    func(*args)
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 63, in process_pyx
    raise Exception("Cython failed")
Exception: Cython failed
Traceback (most recent call last):
  File "setup.py", line 276, in <module>
    setup_package()
  File "setup.py", line 209, in setup_package
    generate_cython(root, "spacy")
  File "setup.py", line 132, in generate_cython
    raise RuntimeError("Running cythonize failed")
RuntimeError: Running cythonize failed
```

Edit: Fixed! after deleting all `.cpp` files: `find spacy -name "*.cpp" | xargs rm`

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-04-22 14:10:16 +02:00
Dobita21
189c90743c Add Thai norm_exceptions ()
* test sPacy commit to git fri 04052019 10:54

* change Data format from my format to master format

* ทัทั้งนี้ ---> ทั้งนี้

* delete stop_word translate from Eng

* Adjust formatting and readability

* add Thai norm_exception

* Add Dobita21 SCA

* editรึ : หรือ,

* Update Dobita21.md

* Auto-format

* Integrate norms into language defaults
2019-04-20 12:16:03 +02:00
Ines Montani
7937109ed9 Update link [ci skip] 2019-04-19 16:01:41 +02:00
Ines Montani
0dce4585b1 Add course to 101 2019-04-19 15:59:51 +02:00
Ines Montani
2efc87c382 Remove unused image 2019-04-19 15:48:12 +02:00
Ines Montani
38395d9518 Merge branch 'spacy.io' 2019-04-19 15:26:20 +02:00
Ines Montani
7ac5bb0a7b Update landing and feature overview 2019-04-19 15:23:08 +02:00
Dobita21
d86848cf1f Create Dobita21.md ()
<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-04-18 12:51:54 +02:00
fizban99
f2f2df6e78 entity types for colors should be in uppercase ()
although the text indicates the entity types should be in lowercase, the sample code shows uppercase, which is the correct format.
2019-04-17 11:22:56 +02:00
fizban99
57d4a8bf3d Create fizban99.md () 2019-04-17 11:22:19 +02:00
Matthew Honnibal
83511972d3 Set version to v2.1.4.dev0 2019-04-16 14:17:26 +02:00
Matthew Honnibal
8b5ae0733e Merge branch 'master' of https://github.com/explosion/spaCy 2019-04-16 12:29:46 +02:00
Matthew Honnibal
d59b2e8a0c Fix issue : Upper case lemmas
If the Morphology class tries to lemmatize a word that's not in the
string store, it's forced to just return it as-is. While loading
exceptions, the class could hit a case where these strings weren't in
the string store yet. The resulting lemmas could then be cached, leading
to some words receiving upper-case lemmas. Closes .
2019-04-16 12:27:15 +02:00
BreakBB
5b8dbe4975 Fix symlink creation to show error message on failure () (resolves ))
* Fix symlink creation to show error message on failure. Update tests to reflect those changes.

* Fix test to succeed on non windows systems.
2019-04-16 11:58:31 +02:00
Krzysztof Kowalczyk
cc1516ec26 Improved training and evaluation ()
* Add early stopping

* Add return_score option to evaluate

* Fix missing str to path conversion

* Fix import + old python compatibility

* Fix bad beam_width setting during cpu evaluation in spacy train with gpu option turned on
2019-04-15 12:04:36 +02:00
Ines Montani
5289dd1356 Fix formatting 2019-04-13 17:58:26 +02:00
Ines Montani
9e7deeaf48 Remove Datacamp 2019-04-13 17:46:32 +02:00
Shikhar Chauhan
bbf6f9f764 Change default output format from jsonl to json for cli convert () (closes )
* Changing default ouput format from jsonl to json for cli convert

* Adding Contributor Agreement
2019-04-12 11:31:23 +02:00
Omer Celik
531c0869b2 Added Turkish Lira symbol(₺) ()
Added Turkish Lira symbol(₺) 
https://en.wikipedia.org/wiki/Turkish_lira
2019-04-11 11:32:28 +02:00
Omer Celik
034a1f458b Signed agreement () 2019-04-11 11:31:27 +02:00
Ivan Tham
71710e2454 Add myself to contributors () 2019-04-11 11:31:04 +02:00
oterrier
2854724e69 Added project gracyql to Universe () (resolves )
As discussed with Ines in https://github.com/explosion/spaCy/issues/3568 , adding a new project proposal for the community in SpaCy Universe website

GracyQL a tiny graphql wrapper aroung spacy using graphene and starlette.

## Description
Change only in universe.json file to add a new project

### Types of change
New project reference in Universe

## Checklist
- [x ] I have submitted the spaCy Contributor Agreement.
- [x ] I ran the tests, and all new and existing tests passed.
- [ x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-04-10 17:54:42 +02:00
Santiago Castro
86e4b68aa9 Fix website docs for Vectors.from_glove ()
* Fix website docs for Vectors.from_glove

* Add myself as a contributor
2019-04-10 15:23:27 +02:00
Ines Montani
4d198a7e92 Ensure match pattern error isn't raised on empty errors (closes ) 2019-04-09 12:50:43 +02:00
Ines Montani
3ddb799f27 Merge branch 'master' of https://github.com/explosion/spaCy 2019-04-09 11:40:28 +02:00
Ines Montani
145c0b7e88 Tidy up and auto-format 2019-04-09 11:40:19 +02:00
Bharat Raghunathan
72820896d4 Fix typo in web docs cli.md () 2019-04-09 11:40:03 +02:00
Ines Montani
5f005adf61 Add xfailing test for 2019-04-09 11:07:14 +02:00
Ines Montani
6ae3b5699e Make sure path is string (resolves ) 2019-04-08 12:53:41 +02:00
Ines Montani
d0f5e015cb Auto-format 2019-04-08 12:53:16 +02:00
pierremonico
0d26bfe677 Removes duplicate in table ()
* Removes duplicate in table

Just fixing typos.

* Remove newline


Co-authored-by: Ines Montani <ines@ines.io>
2019-04-08 10:30:42 +02:00
Piero Molino
5198aa4ae6 Added Ludwig among the projects () [ci skip]
* Added Ludwig among the projects

* Create w4nderlust.md

* Add Uber to logo wall
2019-04-07 13:01:26 +02:00
Dobita21
8bf6967eb7 Update Thai stop words ()
* test sPacy commit to git fri 04052019 10:54

* change Data format from my format to master format

* ทัทั้งนี้ ---> ทั้งนี้

* delete stop_word translate from Eng

* Adjust formatting and readability
2019-04-05 12:06:38 +02:00
jeannefukumaru
f67d881b30 fix typos in tag_map flagged by python -m debug-data ()
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.


Co-authored-by: Ines Montani <ines@ines.io>
2019-04-05 12:06:09 +02:00
Ines Montani
cd21778bef
Merge pull request from jeannefukumaru/master
Added tags previously missing from Indonesian `tag_map.py`
2019-04-04 11:57:03 +02:00