Commit Graph

9609 Commits

Author SHA1 Message Date
Matthew Honnibal
09a0227656 Temporarily add a script to load reddit 2018-11-15 23:18:35 +00:00
Matthew Honnibal
f8afaa0c1c Fix pretrain 2018-11-15 22:46:53 +00:00
Matthew Honnibal
6af6950e46 Fix pretrain 2018-11-15 22:45:36 +00:00
Matthew Honnibal
3e7b214e57 Make pretrain script work with stream from stdin 2018-11-15 22:44:07 +00:00
Matthew Honnibal
8fdb9bc278
💫 Add experimental ULMFit/BERT/Elmo-like pretraining (#2931)
* Add 'spacy pretrain' command

* Fix pretrain command for Python 2

* Fix pretrain command

* Fix pretrain command
2018-11-15 22:17:16 +01:00
Ines Montani
02fc73ca53
💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.

## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)

### Types of change
bug fix

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-11-15 11:40:10 +01:00
Ines Montani
e89708c3eb 💫 Allow matching non-ORTH attributes in PhraseMatcher (#2925)
* Allow matching non-orth attributes in PhraseMatcher (see #1971)

Usage: PhraseMatcher(nlp.vocab, attr='POS')

* Allow attr argument to be int

* Fix formatting

* Fix typo
2018-11-15 03:00:58 +01:00
Matthew Honnibal
7ed9124a45
Fix Python2 error on example 2018-11-14 19:35:17 +01:00
Ines Montani
0d5b142c78 Fix typos and whitespace 2018-11-14 19:12:34 +01:00
Ines Montani
bd1b0e396a Add deprecation warning for PhraseMatcher max_length 2018-11-14 19:10:46 +01:00
Ines Montani
64257bf3a7 Fix formatting 2018-11-14 19:10:21 +01:00
Ines Montani
b3cadd5b81
Delete _matcher2_notes.py 2018-11-14 16:19:12 +01:00
mauryaland
87ce435aff Check if the word is in one of the regular lists specific to each POS (#2886) 2018-11-14 15:58:43 +01:00
Ines Montani
dfcc8f02af Fix image [ci skip]
Twitter URL doesn't work on live site
2018-11-14 01:01:33 +01:00
Ines Montani
1aa91e926f Minor formatting changes [ci skip] 2018-11-13 23:59:59 +01:00
Francisco Aranda
be99f1cac5 Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component

* chore: include spaCy contributor agreement
2018-11-13 23:54:46 +01:00
Daniel Hershcovich
d3d419ecc0 Allow input text of length up to max_length, inclusive (#2922) 2018-11-13 16:46:29 +01:00
mikelibg
75e7d503b7 Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation

* - added contributor info
2018-11-08 14:18:25 +01:00
Matthew Honnibal
5fc98ade04 Set version to 2.1.0a2 2018-11-08 09:56:56 +01:00
Ines Montani
11db4d2f27 Add script to validate universe json [ci skip] 2018-11-06 12:50:41 +01:00
Ines Montani
a9fda638a9 Add spacy-raspberry to universe (closes #2889) 2018-11-06 12:45:50 +01:00
Ines Montani
c235ddf44f Add spacy-js to universe [ci-skip] 2018-11-06 12:45:03 +01:00
Matthew Honnibal
09aa616182 Make pretraining script work without GPU 2018-11-04 17:09:52 +01:00
Matthew Honnibal
bc8cda818c Improve pretrain textcat example 2018-11-04 00:17:09 +00:00
Matthew Honnibal
3e7a96f99d Improve pretrain textcat example 2018-11-03 17:44:12 +00:00
Matthew Honnibal
c87c50af62 Rename new example 2018-11-03 13:09:46 +00:00
Matthew Honnibal
8e8ccc0f92 Work on pretraining script 2018-11-03 12:53:25 +00:00
Matthew Honnibal
ad44982f01 Fix dropout in tensorizer, update comment 2018-11-03 12:46:58 +00:00
Matthew Honnibal
0127f10ba3 Improve train tensorizer script 2018-11-03 10:54:20 +00:00
Matthew Honnibal
ba365ae1c9 Normalize gradient by number of words in tensorizer 2018-11-03 10:53:22 +00:00
Matthew Honnibal
dac3f1b280 Improve Tensorizer 2018-11-03 10:52:50 +00:00
Matthew Honnibal
baf7feae68 Add tensorizer training example 2018-11-02 23:30:06 +00:00
Matthew Honnibal
2527ba68e5 Fix tensorizer 2018-11-02 23:29:54 +00:00
Matthew Honnibal
db08b168a3 Set version to 2.0.17 2018-10-29 23:22:18 +01:00
Suraj Rajan
0bf14082a4 Added more constucts for dependency tree matcher (#2836) 2018-10-29 23:21:39 +01:00
Matthew Honnibal
e2ae25d6f5 Try setting older regex version, to align with conda 2018-10-29 13:39:00 +01:00
Matthew Honnibal
a2745d310e Revert "Update regex version"
This reverts commit 62358dd867.
2018-10-28 16:38:56 +01:00
Matthew Honnibal
62358dd867 Update regex version 2018-10-28 16:27:50 +01:00
Matthew Honnibal
d4fa9af56f Set version to 2.0.17.dev0 2018-10-28 16:15:26 +01:00
Matthew Honnibal
5a4aeb96b7 Add example showing a fix-up rule for space entities 2018-10-28 16:06:00 +01:00
Matthew Honnibal
b2e2bba8b0
Fix missing comma 2018-10-28 00:09:16 +02:00
Wannaphong Phatthiyaphaibun
2d2765fd8a Change PyThaiNLP Url (#2876) 2018-10-27 14:46:07 +02:00
Matthew Honnibal
817e1fc5e5 Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!

This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
2018-10-27 01:12:50 +02:00
Matthew Honnibal
9447739027 Merge branch 'master' of https://github.com/explosion/spaCy 2018-10-27 00:50:48 +02:00
Matthew Honnibal
ad068f51be Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!

This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
2018-10-27 00:46:30 +02:00
Grivaz
57f274b693 raise error when setting overlapping entities as doc.ents (#2880) 2018-10-26 23:29:16 +02:00
Bram Vanroy
071789467e Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements

## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)

### Types of change
Documentation

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-10-24 15:19:17 +02:00
Roman
5766d09a5b Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-10-18 10:21:16 +02:00
Ines Montani
c6a320cad4 Update version [ci skip] 2018-10-15 16:42:35 +02:00
Ines Montani
48b1bc44d3 Update version to 2.0.16 2018-10-15 14:39:25 +02:00