Commit Graph

15158 Commits

Author SHA1 Message Date
adrianeboyd
d67b0f196a Fix initialization of token mappings in new align (#4640)
Initialize all values in `a2b` and `b2a` since `numpy.empty()` otherwise
result unspecified integers.
2019-11-13 21:22:18 +01:00
adrianeboyd
3ac4e8eb7a Fix minor issues in debug-data (#4636)
* Add error in debug-data if no dev docs are available (see #4575)

* Update debug-data for GoldCorpus / Example

* Ignore None label in misaligned NER data
2019-11-13 15:25:03 +01:00
f11r
877971860e Fix assert in sentencizer documentation. (#4639) 2019-11-13 15:24:14 +01:00
Sofie Van Landeghem
e48a09df4e Example class for training data (#4543)
* OrigAnnot class instead of gold.orig_annot list of zipped tuples

* from_orig to replace from_annot_tuples

* rename to RawAnnot

* some unit tests for GoldParse creation and internal format

* removing orig_annot and switching to lists instead of tuple

* rewriting tuples to use RawAnnot (+ debug statements, WIP)

* fix pop() changing the data

* small fixes

* pop-append fixes

* return RawAnnot for existing GoldParse to have uniform interface

* clean up imports

* fix merge_sents

* add unit test for 4402 with new structure (not working yet)

* introduce DocAnnot

* typo fixes

* add unit test for merge_sents

* rename from_orig to from_raw

* fixing unit tests

* fix nn parser

* read_annots to produce text, doc_annot pairs

* _make_golds fix

* rename golds_to_gold_annots

* small fixes

* fix encoding

* have golds_to_gold_annots use DocAnnot

* missed a spot

* merge_sents as function in DocAnnot

* allow specifying only part of the token-level annotations

* refactor with Example class + underlying dicts

* pipeline components to work with Example objects (wip)

* input checking

* fix yielding

* fix calls to update

* small fixes

* fix scorer unit test with new format

* fix kwargs order

* fixes for ud and conllu scripts

* fix reading data for conllu script

* add in proper errors (not fixed numbering yet to avoid merge conflicts)

* fixing few more small bugs

* fix EL script
2019-11-11 17:35:27 +01:00
Ines Montani
71f5a5daa1 Merge branch 'master' into spacy.io 2019-11-11 17:12:19 +01:00
Ines Montani
9d5ff177c4 Work around Markdown rendering issue surfaced in #4600 [ci skip] 2019-11-11 17:12:08 +01:00
adrianeboyd
91f89f9693 Fix realloc in retokenizer.split() (#4606)
Always realloc to a size larger than `doc.max_length` in
`retokenizer.split()` (or cymem will throw errors).
2019-11-11 16:26:46 +01:00
adrianeboyd
f415e9b7d1 Set extensions when write_conllu() is called in UD train script (#4618)
* Set extensions when write_conllu() is called

`run_eval.py` uses the `write_conllu()` function from `ud_train.py` by
itself, so it needs to set the token extensions if necessary.

* Switch from try to if
2019-11-11 16:25:03 +01:00
adrianeboyd
0b9a5f4074 Rework Chinese language initialization and tokenization (#4619)
* Rework Chinese language initialization

* Create a `ChineseTokenizer` class
  * Modify jieba post-processing to handle whitespace correctly
  * Modify non-jieba character tokenization to handle whitespace correctly

* Add a `create_tokenizer()` method to `ChineseDefaults`

* Load lexical attributes

* Update Chinese tag_map for UD v2

* Add very basic Chinese tests

* Test tokenization with and without jieba

* Test `like_num` attribute

* Fix try_jieba_import()

* Fix zh code formatting
2019-11-11 14:23:21 +01:00
adrianeboyd
4d85f67eee Minor updates to language example sentences (#4608)
* Add punctuation to Spanish example sentences

* Combine multilanguage examples for lang xx

* Add punctuation to nb examples
2019-11-07 22:34:58 +01:00
Priscilla de Abreu Lopes
39e79fcc86 Bugfix/dep matcher issue 4590 (#4601)
* add contributor agreement for prilopes

* add test for issue #4590

* fix on_match params for DependencyMacther (#4590)
2019-11-07 12:01:06 +01:00
Ines Montani
09cec3e41b
Replace function registries with catalogue (#4584)
* Replace functions registries with catalogue

* Update __init__.py

* Fix test

* Revert unrelated flag [ci skip]
2019-11-07 11:45:22 +01:00
adrianeboyd
0f8678c0b1 Fix DocBin.merge() example (#4599) 2019-11-07 11:26:48 +01:00
Ines Montani
b6534d7875 Merge branch 'master' into spacy.io 2019-11-06 23:07:25 +01:00
walterhenry
5563c42ef5 Fixed typo: Added space between "recognize" and "various" (#4600) 2019-11-06 23:06:36 +01:00
Ines Montani
e5c319a051 Merge branch 'master' into spacy.io 2019-11-05 18:30:46 +01:00
Ines Montani
828ef27a32 Add warnings about 3.8 (resolves #4593) [ci skip] 2019-11-05 18:30:11 +01:00
Ines Montani
fed53b1552 Update README.md 2019-11-05 18:26:47 +01:00
Ines Montani
83381018d3 Add load_from_docbin example [ci skip]
TODO: upload the file somewhere
2019-11-05 11:52:43 +01:00
Sofie Van Landeghem
4ec7623288 Fix conllu script (#4579)
* force extensions to avoid clash between example scripts

* fix arg order and default file encoding

* add example config for conllu script

* newline

* move extension definitions to main function

* few more encodings fixes
2019-11-04 20:31:26 +01:00
Matthew Honnibal
4e43c0ba93 Fix multiprocessing for as_tuples=True (#4582) 2019-11-04 20:29:03 +01:00
Ines Montani
d7a94edba6 Merge branch 'master' into spacy.io 2019-11-04 13:56:11 +01:00
Ines Montani
4b95587ad4 Update universe.json [ci skip] 2019-11-04 13:55:55 +01:00
Yash Patadia
0c396aeed4 add dframcy to universe.json (#4580) 2019-11-04 13:53:23 +01:00
Ines Montani
3ec231f7e1 Reorganise install_requires 2019-11-04 02:39:28 +01:00
Ines Montani
cf4ec88b38 Use latest wasabi 2019-11-04 02:38:45 +01:00
Ines Montani
d82630d7c1 Revert "Update azure-pipelines.yml"
This reverts commit ed1060cf59.
2019-11-03 17:48:54 +01:00
Ines Montani
ed1060cf59 Update azure-pipelines.yml 2019-11-03 17:48:26 +01:00
Ines Montani
6ec119d976 Add error in debug-data if no dev docs are available (see #4575) 2019-11-02 16:08:11 +01:00
adrianeboyd
56ad3a3988 Add LAS per dependency to Scorer (#4560) 2019-10-31 21:18:16 +01:00
Ines Montani
07ba9b4aa2 Merge branch 'master' into spacy.io 2019-10-31 17:30:42 +01:00
Matthew Honnibal
de98d66f87 Set version to v2.2.2 2019-10-31 15:53:31 +01:00
Matthw Honnibal
55f2241d72 Merge branch 'master' of https://github.com/explosion/spaCy 2019-10-31 15:37:52 +01:00
Ines Montani
df4c9ae3dc Fix formatting [ci skip] 2019-10-31 15:10:25 +01:00
Ines Montani
59358d9b71
Remove box-decoration-break from entities in displacy (#4564) 2019-10-31 15:09:43 +01:00
Matthw Honnibal
8b9954d1b7 Set version to v2.2.2.dev5 2019-10-31 15:06:19 +01:00
Ines Montani
2c107f02a4 Auto-format [ci skip] 2019-10-31 15:01:56 +01:00
Matthew Honnibal
e82306937e Put Tok2Vec refactor behind feature flag (#4563)
* Add back pre-2.2.2 tok2vec

* Add simple tok2vec tests

* Add simple tok2vec tests

* Reformat

* Fix CharacterEmbed in new tok2vec

* Fix legacy tok2vec

* Resolve circular imports

* Fix test for Python 2
2019-10-31 15:01:15 +01:00
Ines Montani
828108a57f Update README.md [ci skip] 2019-10-31 13:23:25 +01:00
Ines Montani
5e9849b60f Auto-format [ci skip] 2019-10-30 19:27:18 +01:00
Ines Montani
afe4a428f7
Fix pipeline analysis on remove pipe (#4557)
Validate *after* component is removed, not before
2019-10-30 19:04:17 +01:00
Matthew Honnibal
6b874ef096 Set version to v2.2.2.dev4 2019-10-30 17:36:20 +01:00
Ines Montani
85f2b04c45
Support span._. in component decorator attrs (#4555)
* Support span._. in component decorator attrs

* Adjust error [ci skip]
2019-10-30 17:19:36 +01:00
Ines Montani
86c3185f34 Update syntax iterators [ci skip] 2019-10-30 14:32:50 +01:00
Ines Montani
4e1de85e43 Update syntax iterators [ci skip] 2019-10-30 14:31:40 +01:00
Ines Montani
d8c2365b04 Update universe.json [ci skip] 2019-10-30 13:29:15 +01:00
Ines Montani
726c5dd306 Update universe.json [ci skip] 2019-10-30 13:29:00 +01:00
Neel Kamath
4cbc172cc6 Add "spaCy Server" to spaCy Universe (#4553)
* Add "spaCy Server" to spaCy Universe

* Accept the spaCy Contributor Agreement
2019-10-30 13:21:25 +01:00
Neel Kamath
6c036ab57d Add "spaCy Server" to spaCy Universe (#4553)
* Add "spaCy Server" to spaCy Universe

* Accept the spaCy Contributor Agreement
2019-10-30 13:20:46 +01:00
Nipun Sadvilkar
6316243941 project: pySBD - Python Sentence Boundary Disambiguation (#4455)
*   project: pySBD - Python Sentence Boundary Disambiguation

* 📝  Update links and description

* 🐛  Fix missing comma

* Update universe.json

pysbd as a spacy component through entrypoints

* 🚨  Fix universe.json

* 📝  Update code_example
2019-10-30 12:14:49 +01:00