1
1
mirror of https://github.com/explosion/spaCy.git synced 2025-01-30 03:04:07 +03:00
Commit Graph

12100 Commits

Author SHA1 Message Date
adrianeboyd
f415e9b7d1 Set extensions when write_conllu() is called in UD train script ()
* Set extensions when write_conllu() is called

`run_eval.py` uses the `write_conllu()` function from `ud_train.py` by
itself, so it needs to set the token extensions if necessary.

* Switch from try to if
2019-11-11 16:25:03 +01:00
adrianeboyd
0b9a5f4074 Rework Chinese language initialization and tokenization ()
* Rework Chinese language initialization

* Create a `ChineseTokenizer` class
  * Modify jieba post-processing to handle whitespace correctly
  * Modify non-jieba character tokenization to handle whitespace correctly

* Add a `create_tokenizer()` method to `ChineseDefaults`

* Load lexical attributes

* Update Chinese tag_map for UD v2

* Add very basic Chinese tests

* Test tokenization with and without jieba

* Test `like_num` attribute

* Fix try_jieba_import()

* Fix zh code formatting
2019-11-11 14:23:21 +01:00
adrianeboyd
4d85f67eee Minor updates to language example sentences ()
* Add punctuation to Spanish example sentences

* Combine multilanguage examples for lang xx

* Add punctuation to nb examples
2019-11-07 22:34:58 +01:00
Priscilla de Abreu Lopes
39e79fcc86 Bugfix/dep matcher issue 4590 ()
* add contributor agreement for prilopes

* add test for issue 

* fix on_match params for DependencyMacther ()
2019-11-07 12:01:06 +01:00
Ines Montani
09cec3e41b
Replace function registries with catalogue ()
* Replace functions registries with catalogue

* Update __init__.py

* Fix test

* Revert unrelated flag [ci skip]
2019-11-07 11:45:22 +01:00
adrianeboyd
0f8678c0b1 Fix DocBin.merge() example () 2019-11-07 11:26:48 +01:00
Ines Montani
b6534d7875 Merge branch 'master' into spacy.io 2019-11-06 23:07:25 +01:00
walterhenry
5563c42ef5 Fixed typo: Added space between "recognize" and "various" () 2019-11-06 23:06:36 +01:00
Ines Montani
e5c319a051 Merge branch 'master' into spacy.io 2019-11-05 18:30:46 +01:00
Ines Montani
828ef27a32 Add warnings about 3.8 (resolves ) [ci skip] 2019-11-05 18:30:11 +01:00
Ines Montani
fed53b1552 Update README.md 2019-11-05 18:26:47 +01:00
Ines Montani
83381018d3 Add load_from_docbin example [ci skip]
TODO: upload the file somewhere
2019-11-05 11:52:43 +01:00
Sofie Van Landeghem
4ec7623288 Fix conllu script ()
* force extensions to avoid clash between example scripts

* fix arg order and default file encoding

* add example config for conllu script

* newline

* move extension definitions to main function

* few more encodings fixes
2019-11-04 20:31:26 +01:00
Matthew Honnibal
4e43c0ba93 Fix multiprocessing for as_tuples=True () 2019-11-04 20:29:03 +01:00
Ines Montani
d7a94edba6 Merge branch 'master' into spacy.io 2019-11-04 13:56:11 +01:00
Ines Montani
4b95587ad4 Update universe.json [ci skip] 2019-11-04 13:55:55 +01:00
Yash Patadia
0c396aeed4 add dframcy to universe.json () 2019-11-04 13:53:23 +01:00
Ines Montani
3ec231f7e1 Reorganise install_requires 2019-11-04 02:39:28 +01:00
Ines Montani
cf4ec88b38 Use latest wasabi 2019-11-04 02:38:45 +01:00
Ines Montani
d82630d7c1 Revert "Update azure-pipelines.yml"
This reverts commit ed1060cf59.
2019-11-03 17:48:54 +01:00
Ines Montani
ed1060cf59 Update azure-pipelines.yml 2019-11-03 17:48:26 +01:00
Ines Montani
6ec119d976 Add error in debug-data if no dev docs are available (see ) 2019-11-02 16:08:11 +01:00
adrianeboyd
56ad3a3988 Add LAS per dependency to Scorer () 2019-10-31 21:18:16 +01:00
Ines Montani
07ba9b4aa2 Merge branch 'master' into spacy.io 2019-10-31 17:30:42 +01:00
Matthew Honnibal
de98d66f87 Set version to v2.2.2 2019-10-31 15:53:31 +01:00
Matthw Honnibal
55f2241d72 Merge branch 'master' of https://github.com/explosion/spaCy 2019-10-31 15:37:52 +01:00
Ines Montani
df4c9ae3dc Fix formatting [ci skip] 2019-10-31 15:10:25 +01:00
Ines Montani
59358d9b71
Remove box-decoration-break from entities in displacy () 2019-10-31 15:09:43 +01:00
Matthw Honnibal
8b9954d1b7 Set version to v2.2.2.dev5 2019-10-31 15:06:19 +01:00
Ines Montani
2c107f02a4 Auto-format [ci skip] 2019-10-31 15:01:56 +01:00
Matthew Honnibal
e82306937e Put Tok2Vec refactor behind feature flag ()
* Add back pre-2.2.2 tok2vec

* Add simple tok2vec tests

* Add simple tok2vec tests

* Reformat

* Fix CharacterEmbed in new tok2vec

* Fix legacy tok2vec

* Resolve circular imports

* Fix test for Python 2
2019-10-31 15:01:15 +01:00
Ines Montani
828108a57f Update README.md [ci skip] 2019-10-31 13:23:25 +01:00
Ines Montani
5e9849b60f Auto-format [ci skip] 2019-10-30 19:27:18 +01:00
Ines Montani
afe4a428f7
Fix pipeline analysis on remove pipe ()
Validate *after* component is removed, not before
2019-10-30 19:04:17 +01:00
Matthew Honnibal
6b874ef096 Set version to v2.2.2.dev4 2019-10-30 17:36:20 +01:00
Ines Montani
85f2b04c45
Support span._. in component decorator attrs ()
* Support span._. in component decorator attrs

* Adjust error [ci skip]
2019-10-30 17:19:36 +01:00
Ines Montani
86c3185f34 Update syntax iterators [ci skip] 2019-10-30 14:32:50 +01:00
Ines Montani
4e1de85e43 Update syntax iterators [ci skip] 2019-10-30 14:31:40 +01:00
Ines Montani
d8c2365b04 Update universe.json [ci skip] 2019-10-30 13:29:15 +01:00
Ines Montani
726c5dd306 Update universe.json [ci skip] 2019-10-30 13:29:00 +01:00
Neel Kamath
4cbc172cc6 Add "spaCy Server" to spaCy Universe ()
* Add "spaCy Server" to spaCy Universe

* Accept the spaCy Contributor Agreement
2019-10-30 13:21:25 +01:00
Neel Kamath
6c036ab57d Add "spaCy Server" to spaCy Universe ()
* Add "spaCy Server" to spaCy Universe

* Accept the spaCy Contributor Agreement
2019-10-30 13:20:46 +01:00
Nipun Sadvilkar
6316243941 project: pySBD - Python Sentence Boundary Disambiguation ()
*   project: pySBD - Python Sentence Boundary Disambiguation

* 📝  Update links and description

* 🐛  Fix missing comma

* Update universe.json

pysbd as a spacy component through entrypoints

* 🚨  Fix universe.json

* 📝  Update code_example
2019-10-30 12:14:49 +01:00
Nipun Sadvilkar
2a5e71232b project: pySBD - Python Sentence Boundary Disambiguation ()
*   project: pySBD - Python Sentence Boundary Disambiguation

* 📝  Update links and description

* 🐛  Fix missing comma

* Update universe.json

pysbd as a spacy component through entrypoints

* 🚨  Fix universe.json

* 📝  Update code_example
2019-10-30 12:13:29 +01:00
Matthew Honnibal
c2f5f9f572 Set version to v2.2.2.dev3 2019-10-29 16:37:58 +01:00
Sofie Van Landeghem
33ba9ff464 set encodings explicitly to utf8 () 2019-10-29 13:16:55 +01:00
Matthew Honnibal
9e210fa7fd
Fix tok2vec structure after model registry refactor ()
The model registry refactor of the Tok2Vec function broke loading models
trained with the previous function, because the model tree was slightly
different. Specifically, the new function wrote:

    concatenate(norm, prefix, suffix, shape)

To build the embedding layer. In the previous implementation, I had used
the operator overloading shortcut:

    ( norm | prefix | suffix | shape )

This actually gets mapped to a binary association, giving something
like:

    concatenate(norm, concatenate(prefix, concatenate(suffix, shape)))

This is a different tree, so the layers iterate differently and we
loaded the weights wrongly.
2019-10-28 23:59:03 +01:00
Matthew Honnibal
bade60fe64 Set version to v2.2.2.dev1 2019-10-28 19:09:34 +01:00
Matthew Honnibal
b1505380ff Fix training with vectors 2019-10-28 18:06:38 +01:00
Matthew Honnibal
a927b3a21e Put new alignment behind flag for v2.2.2 release ()
* Xfail new tokenization test

* Put new alignment behind feature flag

* Move USE_ALIGN to top of the file [ci skip]


Co-authored-by: Ines Montani <ines@ines.io>
2019-10-28 16:12:32 +01:00