Commit Graph

9418 Commits

Author SHA1 Message Date
Wannaphong Phatthiyaphaibun
2d2765fd8a Change PyThaiNLP Url (#2876) 2018-10-27 14:46:07 +02:00
Matthew Honnibal
817e1fc5e5 Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!

This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
2018-10-27 01:12:50 +02:00
Matthew Honnibal
9447739027 Merge branch 'master' of https://github.com/explosion/spaCy 2018-10-27 00:50:48 +02:00
Matthew Honnibal
ad068f51be Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!

This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
2018-10-27 00:46:30 +02:00
Grivaz
57f274b693 raise error when setting overlapping entities as doc.ents (#2880) 2018-10-26 23:29:16 +02:00
Bram Vanroy
071789467e Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements

## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)

### Types of change
Documentation

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-10-24 15:19:17 +02:00
Roman
5766d09a5b Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-10-18 10:21:16 +02:00
Ines Montani
c6a320cad4 Update version [ci skip] 2018-10-15 16:42:35 +02:00
Ines Montani
48b1bc44d3 Update version to 2.0.16 2018-10-15 14:39:25 +02:00
Ines Montani
a0f6647160 Increment version 2018-10-15 14:20:55 +02:00
Ines Montani
fd750ec3bf Fix msgpack-numpy version pin 2018-10-15 14:18:38 +02:00
Ines Montani
7bc7fa8f1e Increment version 2018-10-15 01:40:44 +02:00
Ines Montani
051a6b73eb Update Thinc version pin 2018-10-15 01:40:28 +02:00
Matthew Honnibal
8612b75890 Set version to 2.0.14 2018-10-15 00:10:04 +02:00
Matthew Honnibal
d6e9cf8b09 Set version to 2.0.14.dev1 2018-10-15 00:09:02 +02:00
Matthew Honnibal
7202abdfa9 Fix specifiers for GPU 2018-10-15 00:08:44 +02:00
Ines Montani
f02bb08f39 Update prefer_gpu and require_gpu docs [ci skip] 2018-10-14 23:30:44 +02:00
Matthew Honnibal
b305b24c24 Require thinc 6.10.6 2018-10-14 23:28:41 +02:00
Matthew Honnibal
8ccfa52d19 Unhack prefer_gpu 2018-10-14 23:27:09 +02:00
Matthew Honnibal
2ad3a4ea32 Update push-tag script 2018-10-14 23:16:08 +02:00
Matthew Honnibal
41adf3572b Set version to v2.0.14 2018-10-14 23:15:34 +02:00
Matthew Honnibal
38aa835ada Workaround bug in thinc require_gpu 2018-10-14 23:15:08 +02:00
Matthew Honnibal
6e6f6be3f5 Update requirements and setup.py 2018-10-14 23:06:46 +02:00
Matthew Honnibal
91593b7378 Add tests for prefer_gpu() and require_gpu() 2018-10-14 23:05:22 +02:00
Matthew Honnibal
62c70b3163 Import prefer_gpu and require_gpu functions from Thinc 2018-10-14 23:03:06 +02:00
Ines Montani
9ebe607f82 Add wheel to setup_requires 2018-10-14 16:38:48 +02:00
Ines Montani
5a4c5b78a8 Update GPU docs for v2.0.14 2018-10-14 16:38:12 +02:00
Ines Montani
295da0f11b Increment version to 2.0.14.dev0 2018-10-14 16:37:46 +02:00
Ines Montani
2e675d9523 Update murmurhash pin 2018-10-14 16:37:38 +02:00
Matthew Honnibal
7de0dcb91f Merge branch 'master' of https://github.com/explosion/spaCy 2018-10-14 16:12:23 +02:00
Ines Montani
76c43380e4
Update README.rst [ci skip] 2018-10-14 01:00:55 +02:00
Ines Montani
3decf44dd3 Update badge [ci skip] 2018-10-14 00:54:19 +02:00
Ines Montani
8f393b1dcf Add wheels badge 2018-10-14 00:48:04 +02:00
Keshan
cb075c8e72 Adding "This is a sentence" example to Sinhala (#2846) 2018-10-14 00:06:40 +02:00
Ines Montani
ac4cadd31d Add info on wheels [ci skip] 2018-10-14 00:04:37 +02:00
Ines Montani
30aa7f8b20 Increment version [ci skip] 2018-10-13 23:55:50 +02:00
Ines Montani
23d5b4ff5b Update docs for new version [ci skip] 2018-10-13 23:53:33 +02:00
Ines Montani
f0e7da6478 Fix formatting and consistency 2018-10-13 23:53:26 +02:00
Matthew Honnibal
9cfab5933a Set version to 2.0.13 2018-10-13 19:42:16 +02:00
Matthew Honnibal
6a6ae5b0af Merge branch 'master' of https://github.com/explosion/spaCy 2018-10-13 19:41:00 +02:00
mauryaland
36514b5762 Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech 
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-10-13 16:38:21 +02:00
Matthew Honnibal
de46286107 Merge branch 'master' of https://github.com/explosion/spaCy 2018-10-13 16:11:16 +02:00
Ines Montani
fa23be0f3c Remove in favour of https://github.com/explosion/spaCy/graphs/contributors 2018-10-13 15:46:57 +02:00
Ines Montani
cb57b35bb8 Also include lowercase norm exceptions 2018-10-13 15:37:30 +02:00
JKhakpour
74a30d883c Add Persian(Farsi) language support (#2797) 2018-10-13 15:31:49 +02:00
Matthew Honnibal
c3ddf98b1e Set version to 2.0.13.dev4 2018-10-13 15:20:59 +02:00
Marina Lysyuk
b76fe08308 Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement

* Correct some grammatical inaccuracies in lang\ru\examples.py

* Move contributor agreement to separate file
2018-10-13 15:19:43 +02:00
Jacopo Farina
42c42376a3 Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page

* Add contribution agreement
2018-10-12 14:59:45 +02:00
Ines Montani
4cd9ec0f00
💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->

## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.

### Types of change
enhancements

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-10-10 01:40:29 +02:00
Matthew Honnibal
f784e42ffe Try older version of regex 2018-10-03 00:23:40 +02:00