Commit Graph

8719 Commits

Author SHA1 Message Date
Alex Villarreal
bd35bf7f09 Guidance to handle binary files in git in Windows (#2526)
Adds guidance on what to do if users encounter the error described in [1634](https://github.com/explosion/spaCy/issues/1634), which probably only happens in Windows environments.
2018-07-09 18:31:37 +02:00
Duygu Altinok
00b9a58558 German lemmatizer additions (#2529)
* lemma of was-> was

* added new pairs issue @2486

* added article tests
2018-07-09 11:10:15 +02:00
Ole Henrik Skogstrøm
c21efea9bb Add sent property to token (#2521)
* Add sent property to token

* Refactored and cleaned up copy paste errors.
2018-07-06 15:54:15 +02:00
kleinay
a82c3153ad fix issue #2452 - displacy arrow direction is always forward (#2506) (closes #2452)
<!--- Provide a general summary of your changes in the title. -->
Referring #2452, fixing displacy arrow directions to match the input. 

## Description
The fix is simply replacing `direction is 'left'` with `direction == 'left'` to include the case `direction` is a `str` and not a `unicode`.

### Types of change
bug fix

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-07-04 14:12:08 +02:00
Bùi Trung Chí
9af46b4f1b Fix loading tokenizer with custom prefix search (#2495)
* Add contributor agreement

* Fix loading tokenizer with cutom prefix search
2018-07-04 12:56:07 +02:00
Matthew Honnibal
a85620a731 Note CoreNLP tokenizer correction on website 2018-07-02 11:35:31 +02:00
ines
06c6dc6fbc Update Juniper [ci skip] 2018-06-28 11:48:17 +02:00
Ole Henrik Skogstrøm
d16cb6bee6 Accept Span to displacy render (#2478) (closes #2477)
* Add Span to displacy render

* Fix span support, errors and add tests
2018-06-25 14:55:16 +02:00
Muhammad Irfan
f33c703066 Add Urdu Language Support (#2430)
* added Urdu language support.

* added Urdu language tests.

* modified conftest.py for Urdu language support.

* added spacy contributor agreement.
2018-06-22 11:14:03 +02:00
himkt
14d9007efd fix wrong indexing (#2416)
* fix wrong indexing

* add agreement
2018-06-19 10:20:57 +02:00
Aliia E
428bae66b5 Add Tatar Language Support (#2444)
* add Tatar lang support

* add Tatar letters

* add Tatar tests

* sign contributor agreement

* sign contributor agreement [x]

* remove comments from Language class

* remove all template comments
2018-06-19 10:17:53 +02:00
ines
f684d1806e Use pytest>=3.6.0
Hopefully prevents issue with Travis and timeout plugin compatibility
2018-06-18 18:39:37 +02:00
Cory Hurst
446f5ec41b Silent keyword in info function in init (#2459)
* Pass through "silent" kwarg to the wrapper in the spacy module init.
reference issue  #2196

* Pass through "silent" kwarg to the wrapper in the spacy module init.
reference issue  #2196

* contributor agreement
2018-06-18 12:24:21 +02:00
Nipun Sadvilkar
741ba80bd5 Train model command n_iteration 20 -> 30 (#2454)
In source code `train.py` default Number of iterations  is 30
2018-06-18 11:57:08 +02:00
ines
53a2bc8c8d Only scroll sidebar item into view if needed [ci skip] 2018-06-12 10:58:50 +02:00
ines
65713a6593 Increment versions [ci skip] 2018-06-12 10:49:50 +02:00
Ines Montani
968f6f0bda
💫 Document Cython API (#2433)
## Description

This PR adds the most relevant documentation of spaCy's Cython API.

(Todo for when we publish this: rewrite `/api/#section-cython` and `/api/#cython` to `/api/cython#conventions`.)

### Types of change
docs

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-06-11 17:47:46 +02:00
GolanLevy
72d7e80f94 adding a missing apostrophe (#2436) 2018-06-11 17:47:24 +02:00
ines
effb55d591 Adjust formatting [ci skip] 2018-06-11 00:29:13 +02:00
Nathan Breit
ba6d2cf393 Add EpiTator to Universe (#2429) 2018-06-11 00:24:13 +02:00
Daniel Ruf
d6d688914f chore: cache dependencies (#2418)
* chore: cache dependencies

* chore: add CLA
2018-06-11 00:22:41 +02:00
himkt
1a568f2e08 fix wrong documentations (#2423) 2018-06-11 00:21:06 +02:00
Bohdan Moskalevskyi
d66292f767 fix UD data file extensions (#2425)
* fix UD data files extension

* add contributor agreement for msklvsk
2018-06-08 14:26:11 +02:00
Nour Shalabi
a169b79092 Additions to Arabic stop words. (#2422)
* Additions to Arabic stop words.

* Create nourshalabi.md
2018-06-08 02:33:23 +02:00
Ines Montani
3f2e3cbd27
Add links to Reddit data (see #2401) 2018-05-31 16:22:43 +02:00
ines
b8ef9c1000 Fix model names in conftest (see #2379) 2018-05-30 14:10:20 +02:00
ines
0baaf836cf Update formatting [ci skip] 2018-05-30 13:32:49 +02:00
ines
3913e18201 Add self-attentive-parser to universe (see #59) 2018-05-30 13:31:28 +02:00
Maciej
c7d53348d7 Fix bug in CLI iob and ner converter (#2392) (fixes #2385)
* issue_2385 add tests for iob_to_biluo converter function

* issue_2385 fix and modify iob_to_biluo function to accept either iob or biluo tags in cli.converter

* issue_2385 add test to fix b char bug

* add contributor agreement

* fill contributor agreement
2018-05-30 12:28:44 +02:00
ines
605c663a4c Fix HTML merger examples (see #2390) 2018-05-30 12:22:32 +02:00
ansgar-t
9732988951 escape html in displacy.render (#2378) (closes #2361)
## Description
Fix for issue #2361 :
replace &, <, >, " with &amp;amp; , &amp;lt; , &amp;gt; , &amp;quot; in before rendering svg

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
(As discussed in the comments to #2361)
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-05-28 18:36:41 +02:00
Samuel Pouyt
d85494bfae Added agrement (#2374) 2018-05-26 18:19:08 +02:00
Samuel Pouyt
5f988b8e9c Update _custom.jade (#2372)
It seems based on the doc and trying out that the `en` or `[lang]` is missing from the `spacy model-init`
2018-05-26 18:17:12 +02:00
ines
d84a830d79 Merge branch 'master' of https://github.com/explosion/spaCy 2018-05-26 17:57:05 +02:00
ines
fb923b31ea Fix bad HTML example (see #2376) and turn it into section on matcher + components
Avoid problems caused by merging while matching (e.g. index errors). Creating a Matcher component also better reflects the recommended best practices.
2018-05-26 17:57:02 +02:00
James Messinger
4515e96e90 Better formatting for spacy train CLI (#2357)
* Better formatting for `spacy train` CLI

Changed to use fixed-spaces rather than tabs to align table headers and data.

### Before:
```
Itn.    P.Loss  N.Loss  UAS     NER P.  NER R.  NER F.  Tag %   Token %
0       4618.857        2910.004        76.172  79.645  67.987  88.732  88.261  100.000 4436.9  6376.4
1       4671.972        3764.812        74.481  78.046  62.374  82.680  88.377  100.000 4672.2  6227.1
2       4742.756        3673.473        71.994  77.380  63.966  84.494  90.620  100.000 4298.0  5983.9
```

### After:
```
Itn.  Dep Loss  NER Loss  UAS     NER P.  NER R.  NER F.  Tag %   Token %  CPU WPS  GPU WPS
0     4618.857  2910.004  76.172  79.645  67.987  88.732  88.261  100.000  4436.9   6376.4
1     4671.972  3764.812  74.481  78.046  62.374  82.680  88.377  100.000  4672.2   6227.1
2     4742.756  3673.473  71.994  77.380  63.966  84.494  90.620  100.000  4298.0   5983.9
```

* Added contributor file
2018-05-25 13:08:45 +02:00
Shantam Raj
592834183a corrected spelling (#2359)
changed **interpretted** to **interpreted**
2018-05-24 13:29:52 +02:00
ines
8adb967e0c Fix from source quickstart instructions for Windows
See: https://stackoverflow.com/a/50478036/6400719
2018-05-24 12:42:16 +02:00
Aristo Rinjuang
432ede04af adding more words and rephrasing (#2351)
* adding more words and rephrasing

* adding a contributor

* tokenizer bugs solved
2018-05-24 11:40:57 +02:00
Jani Monoses
ec62cadf4c Updates to Romanian support (#2354)
* Add back Romanian in conftest

* Romanian lex_attr

* More tokenizer exceptions for Romanian

* Add tests for some Romanian tokenizer exceptions
2018-05-24 11:40:00 +02:00
Shantam Raj
1a4682dd0b Update _training.jade (#2340)
* Update _training.jade

Correcting grammar. Replacing "The" with "To".

* Create armsp.md

* Update armsp.md
2018-05-21 11:09:33 +02:00
cclauss
f7dcaa1f6b Simplify is_config() and normalize_string_keys() (#2305)
* Simplify is_config() and normalize_string_keys()

* Use __in__ to avoid the nested _ands_ and _ors_.
* Dict comprehension directly tracks with the doc string

* Keep more basic loop in normalize_string_keys

* Whitespace
2018-05-21 01:54:35 +02:00
ines
ff1082d8e4 Add version tag in CLI docs [ci skip] 2018-05-21 01:17:49 +02:00
Ines Montani
d4cc736b7c 💫 Improve model downloads: check for existing install, customise pip and use requests library again (#2346)
* Go back to using requests instead of urllib (closes #2320)

Fewer dependencies are good, but this one was simply causing too many other problems around SSL verification and Python 2/3 compatibility. requests is a popular enough package that it's okay for spaCy to depend on it – and this will hopefully make model downloads less flakey.

* Only download model if not installed (see #1456)

Use #egg=model==version to allow pip to check for existing installations. The download is only started if no installation matching the package/version is found. Fixes a long-standing inconvenience.

* Pass additional options to pip when installing model (resolves #1456)

Treat all additional arguments passed to the download command as pip options to allow user to customise the command. For example:

python -m spacy download en --user

* Add CLI option to enable installing model package dependencies

* Revert "Add CLI option to enable installing model package dependencies"

This reverts commit 9336ffe695.

* Update documentation
2018-05-20 20:26:56 +02:00
ines
b59e3b157f Don't require attrs argument in Doc.retokenize and allow both ints and unicode (resolves #2304) 2018-05-20 15:15:37 +02:00
ines
5768df4f09 Add SimpleFrozenDict util to use as default function argument 2018-05-20 15:13:37 +02:00
Matthew Honnibal
581d318971 Fix conftest 2018-05-15 00:54:45 +02:00
Tahar Zanouda
00417794d3 Add Arabic language (#2314)
* added support for Arabic lang

* added Arabic language support

* updated conftest
2018-05-15 00:27:19 +02:00
Jani Monoses
0e08e49e87 Lemmatizer ro (#2319)
* Add Romanian lemmatizer lookup table.

Adapted from http://www.lexiconista.com/datasets/lemmatization/
by replacing cedillas with commas (ș and ț).

The original dataset is licensed under the Open Database License.

* Fix one blatant issue in the Romanian lemmatizer

* Romanian examples file

* Add ro_tokenizer in conftest

* Add Romanian lemmatizer test
2018-05-12 15:20:04 +02:00
vishnumenon
ae3719ece5 Fix the code for FACILITIY entities (#2324)
* Fix the code for FACILITIY entities

As far as I can tell, the default models all use "FAC" rather than "FACILITY"

* Added my Contributor Agreement

* Rename vishnumenon to vishnumenon.md
2018-05-12 15:19:17 +02:00