2017-07-24 10:10:16 +03:00
|
|
|
# coding: utf8
|
|
|
|
from __future__ import unicode_literals
|
|
|
|
|
💫 Tidy up and auto-format .py files (#2983)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files.
- [x] Update flake8 config to exclude very large files (lemmatization tables etc.)
- [x] Update code to be compatible with flake8 rules
- [x] Fix various small bugs, inconsistencies and messy stuff in the language data
- [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means)
Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results.
At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information.
### Types of change
enhancement, code style
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-11-30 19:03:03 +03:00
|
|
|
ID_BASE_EXCEPTIONS = set(
|
|
|
|
"""
|
2017-07-24 10:10:16 +03:00
|
|
|
aba-aba
|
|
|
|
abah-abah
|
2017-07-26 15:12:52 +03:00
|
|
|
abal-abal
|
|
|
|
abang-abang
|
2017-07-27 15:46:30 +03:00
|
|
|
abar-abar
|
2017-07-24 10:10:16 +03:00
|
|
|
abong-abong
|
|
|
|
abrit-abrit
|
2017-07-27 15:46:30 +03:00
|
|
|
abrit-abritan
|
2017-07-24 10:10:16 +03:00
|
|
|
abu-abu
|
2017-07-27 15:46:30 +03:00
|
|
|
abuh-abuhan
|
2017-07-24 10:10:16 +03:00
|
|
|
abuk-abuk
|
|
|
|
abun-abun
|
2017-07-26 15:12:52 +03:00
|
|
|
acak-acak
|
2017-07-27 15:46:30 +03:00
|
|
|
acak-acakan
|
2017-07-24 10:10:16 +03:00
|
|
|
acang-acang
|
2017-07-27 15:46:30 +03:00
|
|
|
acap-acap
|
2017-07-24 10:10:16 +03:00
|
|
|
aci-aci
|
2017-07-27 15:46:30 +03:00
|
|
|
aci-acian
|
|
|
|
aci-acinya
|
|
|
|
aco-acoan
|
2017-07-26 15:12:52 +03:00
|
|
|
ad-blocker
|
|
|
|
ad-interim
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
ada-ada
|
2017-07-27 15:46:30 +03:00
|
|
|
ada-adanya
|
|
|
|
ada-adanyakah
|
2017-07-24 10:10:16 +03:00
|
|
|
adang-adang
|
2017-07-27 15:46:30 +03:00
|
|
|
adap-adapan
|
2017-07-26 15:12:52 +03:00
|
|
|
add-on
|
|
|
|
add-ons
|
|
|
|
adik-adik
|
2017-07-27 15:46:30 +03:00
|
|
|
adik-beradik
|
|
|
|
aduk-adukan
|
2017-07-26 15:12:52 +03:00
|
|
|
after-sales
|
2017-07-27 15:46:30 +03:00
|
|
|
agak-agak
|
2017-07-24 10:10:16 +03:00
|
|
|
agak-agih
|
2017-07-26 15:12:52 +03:00
|
|
|
agama-agama
|
2017-07-24 10:10:16 +03:00
|
|
|
agar-agar
|
2017-07-26 15:12:52 +03:00
|
|
|
age-related
|
2017-07-24 10:10:16 +03:00
|
|
|
agut-agut
|
2017-07-26 15:12:52 +03:00
|
|
|
air-air
|
|
|
|
air-cooled
|
|
|
|
air-to-air
|
|
|
|
ajak-ajak
|
2017-07-24 10:10:16 +03:00
|
|
|
ajar-ajar
|
|
|
|
aji-aji
|
2017-07-27 15:46:30 +03:00
|
|
|
akal-akal
|
2017-07-26 15:12:52 +03:00
|
|
|
akal-akalan
|
2017-07-27 15:46:30 +03:00
|
|
|
akan-akan
|
2017-07-26 15:12:52 +03:00
|
|
|
akar-akar
|
2017-07-27 15:46:30 +03:00
|
|
|
akar-akaran
|
2017-07-26 15:12:52 +03:00
|
|
|
akhir-akhir
|
2017-07-27 15:46:30 +03:00
|
|
|
akhir-akhirnya
|
|
|
|
aki-aki
|
2017-07-26 15:12:52 +03:00
|
|
|
aksi-aksi
|
2017-07-27 15:46:30 +03:00
|
|
|
alah-mengalahi
|
2017-07-24 10:10:16 +03:00
|
|
|
alai-belai
|
|
|
|
alan-alan
|
|
|
|
alang-alang
|
2017-07-27 15:46:30 +03:00
|
|
|
alang-alangan
|
2017-07-24 10:10:16 +03:00
|
|
|
alap-alap
|
2017-07-26 15:12:52 +03:00
|
|
|
alat-alat
|
2017-07-24 10:10:16 +03:00
|
|
|
ali-ali
|
2017-07-27 15:46:30 +03:00
|
|
|
alif-alifan
|
2017-07-24 10:10:16 +03:00
|
|
|
alih-alih
|
2017-07-27 15:46:30 +03:00
|
|
|
aling-aling
|
|
|
|
aling-alingan
|
|
|
|
alip-alipan
|
2017-07-26 15:12:52 +03:00
|
|
|
all-electric
|
|
|
|
all-in-one
|
|
|
|
all-out
|
|
|
|
all-time
|
|
|
|
alon-alon
|
|
|
|
alt-right
|
|
|
|
alt-text
|
2017-07-24 10:10:16 +03:00
|
|
|
alu-alu
|
|
|
|
alu-aluan
|
|
|
|
alun-alun
|
|
|
|
alur-alur
|
2017-07-27 15:46:30 +03:00
|
|
|
alur-aluran
|
2017-07-26 15:12:52 +03:00
|
|
|
always-on
|
2017-07-27 15:46:30 +03:00
|
|
|
amai-amai
|
|
|
|
amatir-amatiran
|
2017-07-24 10:10:16 +03:00
|
|
|
ambah-ambah
|
|
|
|
ambai-ambai
|
2017-07-27 15:46:30 +03:00
|
|
|
ambil-mengambil
|
2017-07-24 10:10:16 +03:00
|
|
|
ambreng-ambrengan
|
2017-07-27 15:46:30 +03:00
|
|
|
ambring-ambringan
|
2017-07-24 10:10:16 +03:00
|
|
|
ambu-ambu
|
|
|
|
ambung-ambung
|
|
|
|
amin-amin
|
|
|
|
amit-amit
|
|
|
|
ampai-ampai
|
2017-07-27 15:46:30 +03:00
|
|
|
amprung-amprungan
|
|
|
|
amung-amung
|
2017-07-24 10:10:16 +03:00
|
|
|
anai-anai
|
2017-07-26 15:12:52 +03:00
|
|
|
anak-anak
|
2017-07-27 15:46:30 +03:00
|
|
|
anak-anakan
|
|
|
|
anak-beranak
|
2017-07-26 15:12:52 +03:00
|
|
|
anak-cucu
|
|
|
|
anak-istri
|
2017-07-24 10:10:16 +03:00
|
|
|
ancak-ancak
|
2017-07-26 15:12:52 +03:00
|
|
|
ancang-ancang
|
2017-07-24 10:10:16 +03:00
|
|
|
ancar-ancar
|
|
|
|
andang-andang
|
|
|
|
andeng-andeng
|
2017-07-27 15:46:30 +03:00
|
|
|
aneh-aneh
|
|
|
|
angan-angan
|
|
|
|
anggar-anggar
|
2017-07-26 15:12:52 +03:00
|
|
|
anggaran-red
|
|
|
|
anggota-anggota
|
2017-07-24 10:10:16 +03:00
|
|
|
anggung-anggip
|
2017-07-26 15:12:52 +03:00
|
|
|
angin-angin
|
|
|
|
angin-anginan
|
2017-07-24 10:10:16 +03:00
|
|
|
angkal-angkal
|
|
|
|
angkul-angkul
|
2017-07-27 15:46:30 +03:00
|
|
|
angkup-angkup
|
2017-07-24 10:10:16 +03:00
|
|
|
angkut-angkut
|
|
|
|
ani-ani
|
|
|
|
aning-aning
|
|
|
|
anjang-anjang
|
|
|
|
anjing-anjing
|
2017-07-27 15:46:30 +03:00
|
|
|
anjung-anjung
|
|
|
|
anjung-anjungan
|
2017-07-24 10:10:16 +03:00
|
|
|
antah-berantah
|
|
|
|
antar-antar
|
2017-07-27 15:46:30 +03:00
|
|
|
antar-mengantar
|
2017-07-26 15:12:52 +03:00
|
|
|
ante-mortem
|
|
|
|
antek-antek
|
2017-07-24 10:10:16 +03:00
|
|
|
anter-anter
|
2017-07-26 15:12:52 +03:00
|
|
|
antihuru-hara
|
2017-07-27 15:46:30 +03:00
|
|
|
anting-anting
|
2017-07-24 10:10:16 +03:00
|
|
|
antung-antung
|
2017-07-27 15:46:30 +03:00
|
|
|
anyam-menganyam
|
2017-07-24 10:10:16 +03:00
|
|
|
anyang-anyang
|
2017-07-26 15:12:52 +03:00
|
|
|
apa-apa
|
2017-07-27 15:46:30 +03:00
|
|
|
apa-apaan
|
2017-07-26 15:12:52 +03:00
|
|
|
apel-apel
|
2017-07-24 10:10:16 +03:00
|
|
|
api-api
|
|
|
|
apit-apit
|
2017-07-26 15:12:52 +03:00
|
|
|
aplikasi-aplikasi
|
|
|
|
apotek-apotek
|
2017-07-27 15:46:30 +03:00
|
|
|
aprit-apritan
|
2017-07-24 10:10:16 +03:00
|
|
|
apu-apu
|
2017-07-27 15:46:30 +03:00
|
|
|
apung-apung
|
|
|
|
arah-arah
|
2017-07-26 15:12:52 +03:00
|
|
|
arak-arak
|
|
|
|
arak-arakan
|
2017-07-27 15:46:30 +03:00
|
|
|
aram-aram
|
2017-07-26 15:12:52 +03:00
|
|
|
arek-arek
|
2017-07-24 10:10:16 +03:00
|
|
|
arem-arem
|
|
|
|
ari-ari
|
2017-07-26 15:12:52 +03:00
|
|
|
artis-artis
|
2017-07-27 15:46:30 +03:00
|
|
|
aru-aru
|
|
|
|
arung-arungan
|
|
|
|
asa-asaan
|
2017-07-26 15:12:52 +03:00
|
|
|
asal-asalan
|
|
|
|
asal-muasal
|
|
|
|
asal-usul
|
2017-07-27 15:46:30 +03:00
|
|
|
asam-asaman
|
2017-07-26 15:12:52 +03:00
|
|
|
asas-asas
|
|
|
|
aset-aset
|
|
|
|
asmaul-husna
|
|
|
|
asosiasi-asosiasi
|
2017-07-24 10:10:16 +03:00
|
|
|
asuh-asuh
|
2017-07-27 15:46:30 +03:00
|
|
|
asyik-asyiknya
|
|
|
|
atas-mengatasi
|
|
|
|
ati-ati
|
|
|
|
atung-atung
|
2017-07-26 15:12:52 +03:00
|
|
|
aturan-aturan
|
|
|
|
audio-video
|
|
|
|
audio-visual
|
|
|
|
auto-brightness
|
|
|
|
auto-complete
|
|
|
|
auto-focus
|
|
|
|
auto-play
|
|
|
|
auto-update
|
|
|
|
avant-garde
|
|
|
|
awan-awan
|
2017-07-27 15:46:30 +03:00
|
|
|
awan-berawan
|
2017-07-26 15:12:52 +03:00
|
|
|
awang-awang
|
2017-07-27 15:46:30 +03:00
|
|
|
awang-gemawang
|
2017-07-24 10:10:16 +03:00
|
|
|
awar-awar
|
|
|
|
awat-awat
|
|
|
|
awik-awik
|
2017-07-27 15:46:30 +03:00
|
|
|
awut-awutan
|
2017-07-26 15:12:52 +03:00
|
|
|
ayah-anak
|
2017-07-24 10:10:16 +03:00
|
|
|
ayak-ayak
|
2017-07-26 15:12:52 +03:00
|
|
|
ayam-ayam
|
2017-07-27 15:46:30 +03:00
|
|
|
ayam-ayaman
|
2017-07-24 10:10:16 +03:00
|
|
|
ayang-ayang
|
2017-07-26 15:12:52 +03:00
|
|
|
ayat-ayat
|
2017-07-27 15:46:30 +03:00
|
|
|
ayeng-ayengan
|
|
|
|
ayun-temayun
|
|
|
|
ayut-ayutan
|
2017-07-24 10:10:16 +03:00
|
|
|
ba-bi-bu
|
2017-07-26 15:12:52 +03:00
|
|
|
back-to-back
|
|
|
|
back-up
|
|
|
|
badan-badan
|
2017-07-27 15:46:30 +03:00
|
|
|
bade-bade
|
2017-07-26 15:12:52 +03:00
|
|
|
badut-badut
|
|
|
|
bagi-bagi
|
|
|
|
bahan-bahan
|
|
|
|
bahu-membahu
|
|
|
|
baik-baik
|
|
|
|
bail-out
|
2017-07-24 10:10:16 +03:00
|
|
|
bajang-bajang
|
|
|
|
baji-baji
|
|
|
|
balai-balai
|
2017-07-27 15:46:30 +03:00
|
|
|
balam-balam
|
|
|
|
balas-berbalas
|
|
|
|
balas-membalas
|
2017-07-26 15:12:52 +03:00
|
|
|
bale-bale
|
|
|
|
baling-baling
|
|
|
|
ball-playing
|
|
|
|
balon-balon
|
2017-07-27 15:46:30 +03:00
|
|
|
balut-balut
|
2017-07-26 15:12:52 +03:00
|
|
|
band-band
|
|
|
|
bandara-bandara
|
2017-07-27 15:46:30 +03:00
|
|
|
bangsa-bangsa
|
2017-07-24 10:10:16 +03:00
|
|
|
bangun-bangun
|
2017-07-26 15:12:52 +03:00
|
|
|
bangunan-bangunan
|
|
|
|
bank-bank
|
2017-07-27 15:46:30 +03:00
|
|
|
bantah-bantah
|
2017-07-26 15:12:52 +03:00
|
|
|
bantahan-bantahan
|
2017-07-27 15:46:30 +03:00
|
|
|
bantal-bantal
|
2017-07-26 15:12:52 +03:00
|
|
|
banyak-banyak
|
|
|
|
bapak-anak
|
|
|
|
bapak-bapak
|
|
|
|
bapak-ibu
|
|
|
|
bapak-ibunya
|
|
|
|
barang-barang
|
2017-07-24 10:10:16 +03:00
|
|
|
barat-barat
|
2017-07-26 15:12:52 +03:00
|
|
|
barat-daya
|
|
|
|
barat-laut
|
2017-07-24 10:10:16 +03:00
|
|
|
barau-barau
|
|
|
|
bare-bare
|
2017-07-26 15:12:52 +03:00
|
|
|
bareng-bareng
|
2017-07-24 10:10:16 +03:00
|
|
|
bari-bari
|
2017-07-27 15:46:30 +03:00
|
|
|
barik-barik
|
|
|
|
baris-berbaris
|
2017-07-26 15:12:52 +03:00
|
|
|
baru-baru
|
|
|
|
baru-batu
|
2017-07-24 10:10:16 +03:00
|
|
|
barung-barung
|
|
|
|
basa-basi
|
|
|
|
bata-bata
|
2017-07-26 15:12:52 +03:00
|
|
|
batalyon-batalyon
|
|
|
|
batang-batang
|
|
|
|
batas-batas
|
2017-07-24 10:10:16 +03:00
|
|
|
batir-batir
|
2017-07-26 15:12:52 +03:00
|
|
|
batu-batu
|
2017-07-27 15:46:30 +03:00
|
|
|
batuk-batuk
|
|
|
|
batung-batung
|
|
|
|
bau-bauan
|
2017-07-26 15:12:52 +03:00
|
|
|
bawa-bawa
|
2017-07-24 10:10:16 +03:00
|
|
|
bayan-bayan
|
2017-07-26 15:12:52 +03:00
|
|
|
bayang-bayang
|
|
|
|
bayi-bayi
|
|
|
|
bea-cukai
|
|
|
|
bedeng-bedeng
|
2017-07-27 15:46:30 +03:00
|
|
|
bedil-bedal
|
|
|
|
bedil-bedilan
|
2017-07-24 10:10:16 +03:00
|
|
|
begana-begini
|
2017-07-26 15:12:52 +03:00
|
|
|
bek-bek
|
2017-07-27 15:46:30 +03:00
|
|
|
bekal-bekalan
|
|
|
|
bekerdom-kerdom
|
|
|
|
bekertak-kertak
|
2017-07-26 15:12:52 +03:00
|
|
|
belang-belang
|
2017-07-24 10:10:16 +03:00
|
|
|
belat-belit
|
2017-07-26 15:12:52 +03:00
|
|
|
beliau-beliau
|
2017-07-24 10:10:16 +03:00
|
|
|
belu-belai
|
2017-07-27 15:46:30 +03:00
|
|
|
belum-belum
|
2017-07-26 15:12:52 +03:00
|
|
|
benar-benar
|
|
|
|
benda-benda
|
2017-07-24 10:10:16 +03:00
|
|
|
bengang-bengut
|
|
|
|
benggal-benggil
|
|
|
|
bengkal-bengkil
|
|
|
|
bengkang-bengkok
|
|
|
|
bengkang-bengkong
|
|
|
|
bengkang-bengkung
|
2017-07-26 15:12:52 +03:00
|
|
|
benteng-benteng
|
|
|
|
bentuk-bentuk
|
|
|
|
benua-benua
|
|
|
|
ber-selfie
|
2017-07-27 15:46:30 +03:00
|
|
|
berabad-abad
|
|
|
|
berabun-rabun
|
|
|
|
beracah-acah
|
|
|
|
berada-ada
|
|
|
|
beradik-berkakak
|
|
|
|
beragah-agah
|
|
|
|
beragak-agak
|
|
|
|
beragam-ragam
|
|
|
|
beraja-raja
|
|
|
|
berakit-rakit
|
|
|
|
beraku-akuan
|
|
|
|
beralu-aluan
|
|
|
|
beralun-alun
|
|
|
|
beramah-ramah
|
|
|
|
beramah-ramahan
|
|
|
|
beramah-tamah
|
2017-07-26 15:12:52 +03:00
|
|
|
beramai-ramai
|
2017-07-27 15:46:30 +03:00
|
|
|
berambai-ambai
|
|
|
|
berambal-ambalan
|
|
|
|
berambil-ambil
|
|
|
|
beramuk-amuk
|
|
|
|
beramuk-amukan
|
|
|
|
berandai-andai
|
|
|
|
berandai-randai
|
|
|
|
beraneh-aneh
|
2017-07-24 10:10:16 +03:00
|
|
|
berang-berang
|
2017-07-27 15:46:30 +03:00
|
|
|
berangan-angan
|
|
|
|
beranggap-anggapan
|
|
|
|
berangguk-angguk
|
|
|
|
berangin-angin
|
|
|
|
berangka-angka
|
|
|
|
berangka-angkaan
|
|
|
|
berangkai-rangkai
|
|
|
|
berangkap-rangkapan
|
|
|
|
berani-berani
|
|
|
|
beranja-anja
|
|
|
|
berantai-rantai
|
|
|
|
berapi-api
|
|
|
|
berapung-apung
|
|
|
|
berarak-arakan
|
2017-07-24 10:10:16 +03:00
|
|
|
beras-beras
|
2017-07-27 15:46:30 +03:00
|
|
|
berasak-asak
|
|
|
|
berasak-asakan
|
|
|
|
berasap-asap
|
|
|
|
berasing-asingan
|
|
|
|
beratus-ratus
|
|
|
|
berawa-rawa
|
|
|
|
berawas-awas
|
|
|
|
berayal-ayalan
|
|
|
|
berayun-ayun
|
|
|
|
berbagai-bagai
|
|
|
|
berbahas-bahasan
|
|
|
|
berbahasa-bahasa
|
|
|
|
berbaik-baikan
|
|
|
|
berbait-bait
|
|
|
|
berbala-bala
|
|
|
|
berbalas-balasan
|
|
|
|
berbalik-balik
|
|
|
|
berbalun-balun
|
|
|
|
berbanjar-banjar
|
|
|
|
berbantah-bantah
|
|
|
|
berbanyak-banyak
|
|
|
|
berbarik-barik
|
|
|
|
berbasa-basi
|
|
|
|
berbasah-basah
|
|
|
|
berbatu-batu
|
|
|
|
berbayang-bayang
|
|
|
|
berbecak-becak
|
2017-07-26 15:12:52 +03:00
|
|
|
berbeda-beda
|
2017-07-27 15:46:30 +03:00
|
|
|
berbedil-bedilan
|
|
|
|
berbega-bega
|
|
|
|
berbeka-beka
|
|
|
|
berbelah-belah
|
|
|
|
berbelakang-belakangan
|
|
|
|
berbelang-belang
|
|
|
|
berbelau-belauan
|
|
|
|
berbeli-beli
|
|
|
|
berbeli-belian
|
2017-07-26 15:12:52 +03:00
|
|
|
berbelit-belit
|
2017-07-27 15:46:30 +03:00
|
|
|
berbelok-belok
|
|
|
|
berbenang-benang
|
|
|
|
berbenar-benar
|
|
|
|
berbencah-bencah
|
|
|
|
berbencol-bencol
|
|
|
|
berbenggil-benggil
|
|
|
|
berbentol-bentol
|
|
|
|
berbentong-bentong
|
|
|
|
berberani-berani
|
|
|
|
berbesar-besar
|
|
|
|
berbidai-bidai
|
|
|
|
berbiduk-biduk
|
|
|
|
berbiku-biku
|
|
|
|
berbilik-bilik
|
|
|
|
berbinar-binar
|
|
|
|
berbincang-bincang
|
|
|
|
berbingkah-bingkah
|
|
|
|
berbintang-bintang
|
|
|
|
berbintik-bintik
|
|
|
|
berbintil-bintil
|
|
|
|
berbisik-bisik
|
|
|
|
berbolak-balik
|
|
|
|
berbolong-bolong
|
|
|
|
berbondong-bondong
|
|
|
|
berbongkah-bongkah
|
|
|
|
berbuai-buai
|
|
|
|
berbual-bual
|
|
|
|
berbudak-budak
|
|
|
|
berbukit-bukit
|
|
|
|
berbulan-bulan
|
|
|
|
berbunga-bunga
|
|
|
|
berbuntut-buntut
|
|
|
|
berbunuh-bunuhan
|
|
|
|
berburu-buru
|
|
|
|
berburuk-buruk
|
|
|
|
berbutir-butir
|
|
|
|
bercabang-cabang
|
|
|
|
bercaci-cacian
|
|
|
|
bercakap-cakap
|
|
|
|
bercakar-cakaran
|
|
|
|
bercamping-camping
|
|
|
|
bercantik-cantik
|
|
|
|
bercari-cari
|
|
|
|
bercari-carian
|
|
|
|
bercarik-carik
|
|
|
|
bercarut-carut
|
|
|
|
bercebar-cebur
|
|
|
|
bercepat-cepat
|
|
|
|
bercerai-berai
|
|
|
|
bercerai-cerai
|
|
|
|
bercetai-cetai
|
|
|
|
berciap-ciap
|
|
|
|
bercikun-cikun
|
|
|
|
bercinta-cintaan
|
2017-07-26 15:12:52 +03:00
|
|
|
bercita-cita
|
2017-07-27 15:46:30 +03:00
|
|
|
berciut-ciut
|
|
|
|
bercompang-camping
|
|
|
|
berconteng-conteng
|
|
|
|
bercoreng-coreng
|
|
|
|
bercoreng-moreng
|
|
|
|
bercuang-caing
|
|
|
|
bercuit-cuit
|
|
|
|
bercumbu-cumbu
|
|
|
|
bercumbu-cumbuan
|
|
|
|
bercura-bura
|
|
|
|
bercura-cura
|
|
|
|
berdada-dadaan
|
|
|
|
berdahulu-dahuluan
|
|
|
|
berdalam-dalam
|
|
|
|
berdalih-dalih
|
|
|
|
berdampung-dampung
|
|
|
|
berdebar-debar
|
|
|
|
berdecak-decak
|
|
|
|
berdecap-decap
|
|
|
|
berdecup-decup
|
|
|
|
berdecut-decut
|
|
|
|
berdedai-dedai
|
|
|
|
berdegap-degap
|
|
|
|
berdegar-degar
|
|
|
|
berdeham-deham
|
|
|
|
berdekah-dekah
|
|
|
|
berdekak-dekak
|
|
|
|
berdekap-dekapan
|
|
|
|
berdekat-dekat
|
|
|
|
berdelat-delat
|
|
|
|
berdembai-dembai
|
|
|
|
berdembun-dembun
|
|
|
|
berdempang-dempang
|
|
|
|
berdempet-dempet
|
|
|
|
berdencing-dencing
|
|
|
|
berdendam-dendaman
|
|
|
|
berdengkang-dengkang
|
|
|
|
berdengut-dengut
|
|
|
|
berdentang-dentang
|
|
|
|
berdentum-dentum
|
|
|
|
berdentung-dentung
|
|
|
|
berdenyar-denyar
|
|
|
|
berdenyut-denyut
|
|
|
|
berdepak-depak
|
|
|
|
berdepan-depan
|
|
|
|
berderai-derai
|
|
|
|
berderak-derak
|
|
|
|
berderam-deram
|
|
|
|
berderau-derau
|
|
|
|
berderik-derik
|
|
|
|
berdering-dering
|
|
|
|
berderung-derung
|
|
|
|
berderus-derus
|
|
|
|
berdesak-desakan
|
|
|
|
berdesik-desik
|
|
|
|
berdesing-desing
|
|
|
|
berdesus-desus
|
|
|
|
berdikit-dikit
|
|
|
|
berdingkit-dingkit
|
|
|
|
berdua-dua
|
|
|
|
berduri-duri
|
|
|
|
berduru-duru
|
|
|
|
berduyun-duyun
|
|
|
|
berebut-rebut
|
|
|
|
berebut-rebutan
|
|
|
|
beregang-regang
|
2017-07-24 10:10:16 +03:00
|
|
|
berek-berek
|
2017-07-27 15:46:30 +03:00
|
|
|
berembut-rembut
|
|
|
|
berempat-empat
|
|
|
|
berenak-enak
|
|
|
|
berencel-encel
|
2017-07-24 10:10:16 +03:00
|
|
|
bereng-bereng
|
2017-07-27 15:46:30 +03:00
|
|
|
berenggan-enggan
|
|
|
|
berenteng-renteng
|
|
|
|
beresa-esaan
|
|
|
|
beresah-resah
|
|
|
|
berfoya-foya
|
|
|
|
bergagah-gagahan
|
|
|
|
bergagap-gagap
|
|
|
|
bergagau-gagau
|
|
|
|
bergalur-galur
|
|
|
|
berganda-ganda
|
|
|
|
berganjur-ganjur
|
2017-07-26 15:12:52 +03:00
|
|
|
berganti-ganti
|
2017-07-27 15:46:30 +03:00
|
|
|
bergarah-garah
|
|
|
|
bergaruk-garuk
|
|
|
|
bergaya-gaya
|
|
|
|
bergegas-gegas
|
|
|
|
bergelang-gelang
|
|
|
|
bergelap-gelap
|
|
|
|
bergelas-gelasan
|
|
|
|
bergeleng-geleng
|
|
|
|
bergemal-gemal
|
|
|
|
bergembar-gembor
|
|
|
|
bergembut-gembut
|
|
|
|
bergepok-gepok
|
|
|
|
bergerek-gerek
|
|
|
|
bergesa-gesa
|
|
|
|
bergilir-gilir
|
|
|
|
bergolak-golak
|
|
|
|
bergolek-golek
|
|
|
|
bergolong-golong
|
|
|
|
bergores-gores
|
2017-07-26 15:12:52 +03:00
|
|
|
bergotong-royong
|
2017-07-27 15:46:30 +03:00
|
|
|
bergoyang-goyang
|
|
|
|
bergugus-gugus
|
|
|
|
bergulung-gulung
|
|
|
|
bergulut-gulut
|
|
|
|
bergumpal-gumpal
|
|
|
|
bergunduk-gunduk
|
|
|
|
bergunung-gunung
|
|
|
|
berhadap-hadapan
|
|
|
|
berhamun-hamun
|
|
|
|
berhandai-handai
|
|
|
|
berhanyut-hanyut
|
2017-07-26 15:12:52 +03:00
|
|
|
berhari-hari
|
|
|
|
berhati-hati
|
|
|
|
berhati-hatilah
|
2017-07-27 15:46:30 +03:00
|
|
|
berhektare-hektare
|
|
|
|
berhilau-hilau
|
|
|
|
berhormat-hormat
|
|
|
|
berhujan-hujan
|
|
|
|
berhura-hura
|
2017-07-24 10:10:16 +03:00
|
|
|
beri-beri
|
2017-07-27 15:46:30 +03:00
|
|
|
beri-memberi
|
|
|
|
beria-ia
|
|
|
|
beria-ria
|
|
|
|
beriak-riak
|
|
|
|
beriba-iba
|
|
|
|
beribu-ribu
|
|
|
|
berigi-rigi
|
|
|
|
berimpit-impit
|
|
|
|
berindap-indap
|
2017-07-24 10:10:16 +03:00
|
|
|
bering-bering
|
2017-07-27 15:46:30 +03:00
|
|
|
beringat-ingat
|
|
|
|
beringgit-ringgit
|
|
|
|
berintik-rintik
|
|
|
|
beriring-iring
|
|
|
|
beriring-iringan
|
2017-07-26 15:12:52 +03:00
|
|
|
berita-berita
|
2017-07-27 15:46:30 +03:00
|
|
|
berjabir-jabir
|
|
|
|
berjaga-jaga
|
|
|
|
berjagung-jagung
|
2017-07-26 15:12:52 +03:00
|
|
|
berjalan-jalan
|
2017-07-27 15:46:30 +03:00
|
|
|
berjalar-jalar
|
|
|
|
berjalin-jalin
|
|
|
|
berjalur-jalur
|
|
|
|
berjam-jam
|
|
|
|
berjari-jari
|
|
|
|
berjauh-jauhan
|
|
|
|
berjegal-jegalan
|
|
|
|
berjejal-jejal
|
|
|
|
berjela-jela
|
|
|
|
berjengkek-jengkek
|
|
|
|
berjenis-jenis
|
|
|
|
berjenjang-jenjang
|
|
|
|
berjilid-jilid
|
|
|
|
berjinak-jinak
|
|
|
|
berjingkat-jingkat
|
|
|
|
berjingkik-jingkik
|
|
|
|
berjingkrak-jingkrak
|
|
|
|
berjongkok-jongkok
|
|
|
|
berjubel-jubel
|
|
|
|
berjujut-jujutan
|
|
|
|
berjulai-julai
|
|
|
|
berjumbai-jumbai
|
|
|
|
berjumbul-jumbul
|
|
|
|
berjuntai-juntai
|
|
|
|
berjurai-jurai
|
|
|
|
berjurus-jurus
|
|
|
|
berjuta-juta
|
|
|
|
berka-li-kali
|
|
|
|
berkabu-kabu
|
|
|
|
berkaca-kaca
|
|
|
|
berkaing-kaing
|
|
|
|
berkait-kaitan
|
|
|
|
berkala-kala
|
2017-07-24 10:10:16 +03:00
|
|
|
berkali-kali
|
2017-07-27 15:46:30 +03:00
|
|
|
berkamit-kamit
|
|
|
|
berkanjar-kanjar
|
|
|
|
berkaok-kaok
|
|
|
|
berkarung-karung
|
|
|
|
berkasak-kusuk
|
|
|
|
berkasih-kasihan
|
|
|
|
berkata-kata
|
|
|
|
berkatak-katak
|
|
|
|
berkecai-kecai
|
|
|
|
berkecek-kecek
|
|
|
|
berkecil-kecil
|
|
|
|
berkecil-kecilan
|
|
|
|
berkedip-kedip
|
|
|
|
berkejang-kejang
|
|
|
|
berkejap-kejap
|
|
|
|
berkejar-kejaran
|
|
|
|
berkelar-kelar
|
|
|
|
berkelepai-kelepai
|
|
|
|
berkelip-kelip
|
|
|
|
berkelit-kelit
|
|
|
|
berkelok-kelok
|
|
|
|
berkelompok-kelompok
|
|
|
|
berkelun-kelun
|
|
|
|
berkembur-kembur
|
|
|
|
berkempul-kempul
|
|
|
|
berkena-kenaan
|
|
|
|
berkenal-kenalan
|
|
|
|
berkendur-kendur
|
|
|
|
berkeok-keok
|
|
|
|
berkepak-kepak
|
|
|
|
berkepal-kepal
|
|
|
|
berkeping-keping
|
|
|
|
berkepul-kepul
|
|
|
|
berkeras-kerasan
|
|
|
|
berkering-kering
|
|
|
|
berkeritik-keritik
|
|
|
|
berkeruit-keruit
|
|
|
|
berkerut-kerut
|
|
|
|
berketai-ketai
|
|
|
|
berketak-ketak
|
|
|
|
berketak-ketik
|
|
|
|
berketap-ketap
|
|
|
|
berketap-ketip
|
|
|
|
berketar-ketar
|
|
|
|
berketi-keti
|
|
|
|
berketil-ketil
|
|
|
|
berketuk-ketak
|
|
|
|
berketul-ketul
|
|
|
|
berkial-kial
|
|
|
|
berkian-kian
|
|
|
|
berkias-kias
|
|
|
|
berkias-kiasan
|
|
|
|
berkibar-kibar
|
|
|
|
berkilah-kilah
|
|
|
|
berkilap-kilap
|
|
|
|
berkilat-kilat
|
|
|
|
berkilau-kilauan
|
|
|
|
berkilo-kilo
|
|
|
|
berkimbang-kimbang
|
|
|
|
berkinja-kinja
|
|
|
|
berkipas-kipas
|
|
|
|
berkira-kira
|
|
|
|
berkirim-kiriman
|
|
|
|
berkisar-kisar
|
|
|
|
berkoak-koak
|
|
|
|
berkoar-koar
|
|
|
|
berkobar-kobar
|
|
|
|
berkobok-kobok
|
|
|
|
berkocak-kocak
|
|
|
|
berkodi-kodi
|
|
|
|
berkolek-kolek
|
|
|
|
berkomat-kamit
|
|
|
|
berkopah-kopah
|
|
|
|
berkoper-koper
|
|
|
|
berkotak-kotak
|
|
|
|
berkuat-kuat
|
|
|
|
berkuat-kuatan
|
|
|
|
berkumur-kumur
|
|
|
|
berkunang-kunang
|
|
|
|
berkunar-kunar
|
|
|
|
berkunjung-kunjungan
|
|
|
|
berkurik-kurik
|
|
|
|
berkurun-kurun
|
|
|
|
berkusau-kusau
|
|
|
|
berkusu-kusu
|
|
|
|
berkusut-kusut
|
|
|
|
berkuting-kuting
|
|
|
|
berkutu-kutuan
|
|
|
|
berlabun-labun
|
|
|
|
berlain-lainan
|
|
|
|
berlaju-laju
|
|
|
|
berlalai-lalai
|
|
|
|
berlama-lama
|
|
|
|
berlambai-lambai
|
|
|
|
berlambak-lambak
|
|
|
|
berlampang-lampang
|
|
|
|
berlanggar-langgar
|
|
|
|
berlapang-lapang
|
|
|
|
berlapis-lapis
|
|
|
|
berlapuk-lapuk
|
|
|
|
berlarah-larah
|
|
|
|
berlarat-larat
|
2017-07-26 15:12:52 +03:00
|
|
|
berlari-lari
|
2017-07-27 15:46:30 +03:00
|
|
|
berlari-larian
|
|
|
|
berlarih-larih
|
|
|
|
berlarik-larik
|
|
|
|
berlarut-larut
|
|
|
|
berlawak-lawak
|
|
|
|
berlayap-layapan
|
|
|
|
berlebih-lebih
|
|
|
|
berlebih-lebihan
|
|
|
|
berleha-leha
|
|
|
|
berlekas-lekas
|
|
|
|
berlekas-lekasan
|
|
|
|
berlekat-lekat
|
|
|
|
berlekuk-lekuk
|
|
|
|
berlempar-lemparan
|
|
|
|
berlena-lena
|
|
|
|
berlengah-lengah
|
|
|
|
berlenggak-lenggok
|
|
|
|
berlenggek-lenggek
|
|
|
|
berlenggok-lenggok
|
|
|
|
berleret-leret
|
|
|
|
berletih-letih
|
|
|
|
berliang-liuk
|
|
|
|
berlibat-libat
|
|
|
|
berligar-ligar
|
2017-07-26 15:12:52 +03:00
|
|
|
berliku-liku
|
2017-07-27 15:46:30 +03:00
|
|
|
berlikur-likur
|
|
|
|
berlimbak-limbak
|
|
|
|
berlimpah-limpah
|
|
|
|
berlimpap-limpap
|
|
|
|
berlimpit-limpit
|
|
|
|
berlinang-linang
|
|
|
|
berlindak-lindak
|
|
|
|
berlipat-lipat
|
|
|
|
berlomba-lomba
|
|
|
|
berlompok-lompok
|
|
|
|
berloncat-loncatan
|
|
|
|
berlopak-lopak
|
|
|
|
berlubang-lubang
|
|
|
|
berlusin-lusin
|
|
|
|
bermaaf-maafan
|
|
|
|
bermabuk-mabukan
|
2017-07-24 10:10:16 +03:00
|
|
|
bermacam-macam
|
2017-07-27 15:46:30 +03:00
|
|
|
bermain-main
|
|
|
|
bermalam-malam
|
|
|
|
bermalas-malas
|
2017-07-26 15:12:52 +03:00
|
|
|
bermalas-malasan
|
2017-07-27 15:46:30 +03:00
|
|
|
bermanik-manik
|
|
|
|
bermanis-manis
|
|
|
|
bermanja-manja
|
|
|
|
bermasak-masak
|
|
|
|
bermati-mati
|
|
|
|
bermegah-megah
|
|
|
|
bermemek-memek
|
|
|
|
bermenung-menung
|
|
|
|
bermesra-mesraan
|
|
|
|
bermewah-mewah
|
|
|
|
bermewah-mewahan
|
|
|
|
berminggu-minggu
|
|
|
|
berminta-minta
|
|
|
|
berminyak-minyak
|
|
|
|
bermuda-muda
|
|
|
|
bermudah-mudah
|
|
|
|
bermuka-muka
|
|
|
|
bermula-mula
|
|
|
|
bermuluk-muluk
|
|
|
|
bermulut-mulut
|
|
|
|
bernafsi-nafsi
|
|
|
|
bernaka-naka
|
|
|
|
bernala-nala
|
|
|
|
bernanti-nanti
|
|
|
|
berniat-niat
|
|
|
|
bernyala-nyala
|
|
|
|
berogak-ogak
|
|
|
|
beroleng-oleng
|
|
|
|
berolok-olok
|
|
|
|
beromong-omong
|
|
|
|
beroncet-roncet
|
|
|
|
beronggok-onggok
|
|
|
|
berorang-orang
|
|
|
|
beroyal-royal
|
|
|
|
berpada-pada
|
|
|
|
berpadu-padu
|
|
|
|
berpahit-pahit
|
|
|
|
berpair-pair
|
|
|
|
berpal-pal
|
|
|
|
berpalu-palu
|
|
|
|
berpalu-paluan
|
|
|
|
berpalun-palun
|
|
|
|
berpanas-panas
|
|
|
|
berpandai-pandai
|
|
|
|
berpandang-pandangan
|
|
|
|
berpangkat-pangkat
|
|
|
|
berpanjang-panjang
|
|
|
|
berpantun-pantun
|
|
|
|
berpasang-pasang
|
|
|
|
berpasang-pasangan
|
|
|
|
berpasuk-pasuk
|
|
|
|
berpayah-payah
|
|
|
|
berpeluh-peluh
|
|
|
|
berpeluk-pelukan
|
|
|
|
berpenat-penat
|
|
|
|
berpencar-pencar
|
|
|
|
berpendar-pendar
|
|
|
|
berpenggal-penggal
|
|
|
|
berperai-perai
|
|
|
|
berperang-perangan
|
|
|
|
berpesai-pesai
|
|
|
|
berpesta-pesta
|
|
|
|
berpesuk-pesuk
|
|
|
|
berpetak-petak
|
|
|
|
berpeti-peti
|
|
|
|
berpihak-pihak
|
|
|
|
berpijar-pijar
|
|
|
|
berpikir-pikir
|
|
|
|
berpikul-pikul
|
|
|
|
berpilih-pilih
|
|
|
|
berpilin-pilin
|
2017-07-26 15:12:52 +03:00
|
|
|
berpindah-pindah
|
2017-07-27 15:46:30 +03:00
|
|
|
berpintal-pintal
|
|
|
|
berpirau-pirau
|
|
|
|
berpisah-pisah
|
|
|
|
berpolah-polah
|
|
|
|
berpolok-polok
|
|
|
|
berpongah-pongah
|
|
|
|
berpontang-panting
|
|
|
|
berporah-porah
|
|
|
|
berpotong-potong
|
|
|
|
berpotong-potongan
|
|
|
|
berpuak-puak
|
|
|
|
berpual-pual
|
|
|
|
berpugak-pugak
|
|
|
|
berpuing-puing
|
|
|
|
berpukas-pukas
|
|
|
|
berpuluh-puluh
|
|
|
|
berpulun-pulun
|
|
|
|
berpuntal-puntal
|
2017-07-26 15:12:52 +03:00
|
|
|
berpura-pura
|
2017-07-27 15:46:30 +03:00
|
|
|
berpusar-pusar
|
|
|
|
berpusing-pusing
|
|
|
|
berpusu-pusu
|
|
|
|
berputar-putar
|
|
|
|
berrumpun-rumpun
|
|
|
|
bersaf-saf
|
|
|
|
bersahut-sahutan
|
|
|
|
bersakit-sakit
|
|
|
|
bersalah-salahan
|
|
|
|
bersalam-salaman
|
|
|
|
bersalin-salin
|
|
|
|
bersalip-salipan
|
2017-07-24 10:10:16 +03:00
|
|
|
bersama-sama
|
2017-07-27 15:46:30 +03:00
|
|
|
bersambar-sambaran
|
|
|
|
bersambut-sambutan
|
|
|
|
bersampan-sampan
|
|
|
|
bersantai-santai
|
|
|
|
bersapa-sapaan
|
|
|
|
bersarang-sarang
|
|
|
|
bersedan-sedan
|
|
|
|
bersedia-sedia
|
|
|
|
bersedu-sedu
|
|
|
|
bersejuk-sejuk
|
|
|
|
bersekat-sekat
|
|
|
|
berselang-selang
|
|
|
|
berselang-seli
|
|
|
|
berselang-seling
|
|
|
|
berselang-tenggang
|
|
|
|
berselit-selit
|
|
|
|
berseluk-beluk
|
|
|
|
bersembunyi-sembunyi
|
|
|
|
bersembunyi-sembunyian
|
|
|
|
bersembur-semburan
|
|
|
|
bersempit-sempit
|
2017-07-26 15:12:52 +03:00
|
|
|
bersenang-senang
|
2017-07-27 15:46:30 +03:00
|
|
|
bersenang-senangkan
|
|
|
|
bersenda-senda
|
|
|
|
bersendi-sendi
|
|
|
|
bersenggang-senggang
|
|
|
|
bersenggau-senggau
|
|
|
|
bersepah-sepah
|
|
|
|
bersepak-sepakan
|
|
|
|
bersepi-sepi
|
|
|
|
berserak-serak
|
|
|
|
berseri-seri
|
|
|
|
berseru-seru
|
|
|
|
bersesak-sesak
|
|
|
|
bersetai-setai
|
|
|
|
bersia-sia
|
2017-07-24 10:10:16 +03:00
|
|
|
bersiap-siap
|
2017-07-27 15:46:30 +03:00
|
|
|
bersiar-siar
|
2017-07-26 15:12:52 +03:00
|
|
|
bersih-bersih
|
2017-07-27 15:46:30 +03:00
|
|
|
bersikut-sikutan
|
|
|
|
bersilir-silir
|
|
|
|
bersimbur-simburan
|
|
|
|
bersinau-sinau
|
|
|
|
bersopan-sopan
|
|
|
|
bersorak-sorai
|
|
|
|
bersuap-suapan
|
|
|
|
bersudah-sudah
|
|
|
|
bersuka-suka
|
|
|
|
bersuka-sukaan
|
|
|
|
bersuku-suku
|
|
|
|
bersulang-sulang
|
|
|
|
bersumpah-sumpahan
|
|
|
|
bersungguh-sungguh
|
|
|
|
bersungut-sungut
|
|
|
|
bersunyi-sunyi
|
|
|
|
bersuruk-surukan
|
|
|
|
bersusah-susah
|
|
|
|
bersusuk-susuk
|
|
|
|
bersusuk-susukan
|
|
|
|
bersutan-sutan
|
|
|
|
bertabur-tabur
|
|
|
|
bertahan-tahan
|
|
|
|
bertahu-tahu
|
2017-07-26 15:12:52 +03:00
|
|
|
bertahun-tahun
|
2017-07-27 15:46:30 +03:00
|
|
|
bertajuk-tajuk
|
|
|
|
bertakik-takik
|
|
|
|
bertala-tala
|
|
|
|
bertalah-talah
|
|
|
|
bertali-tali
|
|
|
|
bertalu-talu
|
|
|
|
bertalun-talun
|
|
|
|
bertambah-tambah
|
|
|
|
bertanda-tandaan
|
|
|
|
bertangis-tangisan
|
|
|
|
bertangkil-tangkil
|
2017-07-24 10:10:16 +03:00
|
|
|
bertanya-tanya
|
2017-07-27 15:46:30 +03:00
|
|
|
bertarik-tarikan
|
|
|
|
bertatai-tatai
|
|
|
|
bertatap-tatapan
|
|
|
|
bertatih-tatih
|
|
|
|
bertawan-tawan
|
|
|
|
bertawar-tawaran
|
|
|
|
bertebu-tebu
|
|
|
|
bertebu-tebukan
|
|
|
|
berteguh-teguh
|
|
|
|
berteguh-teguhan
|
|
|
|
berteka-teki
|
|
|
|
bertelang-telang
|
|
|
|
bertelau-telau
|
|
|
|
bertele-tele
|
|
|
|
bertembuk-tembuk
|
|
|
|
bertempat-tempat
|
|
|
|
bertempuh-tempuh
|
|
|
|
bertenang-tenang
|
|
|
|
bertenggang-tenggangan
|
|
|
|
bertentu-tentu
|
|
|
|
bertepek-tepek
|
|
|
|
berterang-terang
|
|
|
|
berterang-terangan
|
2017-07-26 15:12:52 +03:00
|
|
|
berteriak-teriak
|
2017-07-27 15:46:30 +03:00
|
|
|
bertikam-tikaman
|
|
|
|
bertimbal-timbalan
|
|
|
|
bertimbun-timbun
|
|
|
|
bertimpa-timpa
|
|
|
|
bertimpas-timpas
|
|
|
|
bertingkah-tingkah
|
|
|
|
bertingkat-tingkat
|
|
|
|
bertinjau-tinjauan
|
|
|
|
bertiras-tiras
|
|
|
|
bertitar-titar
|
|
|
|
bertitik-titik
|
|
|
|
bertoboh-toboh
|
|
|
|
bertolak-tolak
|
|
|
|
bertolak-tolakan
|
|
|
|
bertolong-tolongan
|
|
|
|
bertonjol-tonjol
|
|
|
|
bertruk-truk
|
|
|
|
bertua-tua
|
|
|
|
bertua-tuaan
|
|
|
|
bertual-tual
|
2017-07-26 15:12:52 +03:00
|
|
|
bertubi-tubi
|
2017-07-27 15:46:30 +03:00
|
|
|
bertukar-tukar
|
|
|
|
bertukar-tukaran
|
|
|
|
bertukas-tukas
|
|
|
|
bertumpak-tumpak
|
|
|
|
bertumpang-tindih
|
|
|
|
bertumpuk-tumpuk
|
|
|
|
bertunda-tunda
|
|
|
|
bertunjuk-tunjukan
|
|
|
|
bertura-tura
|
2017-07-24 10:10:16 +03:00
|
|
|
berturut-turut
|
2017-07-27 15:46:30 +03:00
|
|
|
bertutur-tutur
|
|
|
|
beruas-ruas
|
2017-07-26 15:12:52 +03:00
|
|
|
berubah-ubah
|
2017-07-27 15:46:30 +03:00
|
|
|
berulang-alik
|
2017-07-26 15:12:52 +03:00
|
|
|
berulang-ulang
|
2017-07-27 15:46:30 +03:00
|
|
|
berumbai-rumbai
|
|
|
|
berundak-undak
|
|
|
|
berundan-undan
|
|
|
|
berundung-undung
|
|
|
|
berunggas-runggas
|
|
|
|
berunggun-unggun
|
|
|
|
berunggut-unggut
|
|
|
|
berungkur-ungkuran
|
|
|
|
beruntai-untai
|
|
|
|
beruntun-runtun
|
|
|
|
beruntung-untung
|
|
|
|
berunyai-unyai
|
|
|
|
berupa-rupa
|
|
|
|
berura-ura
|
|
|
|
beruris-uris
|
|
|
|
berurut-urutan
|
|
|
|
berwarna-warna
|
|
|
|
berwarna-warni
|
|
|
|
berwindu-windu
|
|
|
|
berwiru-wiru
|
|
|
|
beryang-yang
|
2017-07-26 15:12:52 +03:00
|
|
|
besar-besar
|
|
|
|
besar-besaran
|
2017-07-24 10:10:16 +03:00
|
|
|
betak-betak
|
|
|
|
beti-beti
|
2017-07-27 15:46:30 +03:00
|
|
|
betik-betik
|
|
|
|
betul-betul
|
|
|
|
biang-biang
|
2017-07-24 10:10:16 +03:00
|
|
|
biar-biar
|
2017-07-26 15:12:52 +03:00
|
|
|
biaya-biaya
|
2017-07-27 15:46:30 +03:00
|
|
|
bicu-bicu
|
2017-07-26 15:12:52 +03:00
|
|
|
bidadari-bidadari
|
|
|
|
bidang-bidang
|
|
|
|
bijak-bijaklah
|
|
|
|
biji-bijian
|
2017-07-27 15:46:30 +03:00
|
|
|
bila-bila
|
2017-07-24 10:10:16 +03:00
|
|
|
bilang-bilang
|
2017-07-26 15:12:52 +03:00
|
|
|
bincang-bincang
|
2017-07-24 10:10:16 +03:00
|
|
|
bincang-bincut
|
2017-07-27 15:46:30 +03:00
|
|
|
bingkah-bingkah
|
|
|
|
bini-binian
|
2017-07-26 15:12:52 +03:00
|
|
|
bintang-bintang
|
|
|
|
bintik-bintik
|
|
|
|
bio-oil
|
2017-07-24 10:10:16 +03:00
|
|
|
biri-biri
|
2017-07-26 15:12:52 +03:00
|
|
|
biru-biru
|
|
|
|
biru-hitam
|
|
|
|
biru-kuning
|
|
|
|
bisik-bisik
|
2017-07-27 15:46:30 +03:00
|
|
|
biti-biti
|
2017-07-26 15:12:52 +03:00
|
|
|
blak-blakan
|
|
|
|
blok-blok
|
|
|
|
bocah-bocah
|
|
|
|
bohong-bohong
|
2017-07-27 15:46:30 +03:00
|
|
|
bohong-bohongan
|
2017-07-26 15:12:52 +03:00
|
|
|
bola-bola
|
2017-07-24 10:10:16 +03:00
|
|
|
bolak-balik
|
|
|
|
bolang-baling
|
2017-07-26 15:12:52 +03:00
|
|
|
boleh-boleh
|
|
|
|
bom-bom
|
|
|
|
bomber-bomber
|
|
|
|
bonek-bonek
|
2017-07-24 10:10:16 +03:00
|
|
|
bongkar-bangkir
|
2017-07-27 15:46:30 +03:00
|
|
|
bongkar-membongkar
|
2017-07-26 15:12:52 +03:00
|
|
|
bongkar-pasang
|
2017-07-24 10:10:16 +03:00
|
|
|
boro-boro
|
2017-07-26 15:12:52 +03:00
|
|
|
bos-bos
|
|
|
|
bottom-up
|
|
|
|
box-to-box
|
2017-07-24 10:10:16 +03:00
|
|
|
boyo-boyo
|
2017-07-26 15:12:52 +03:00
|
|
|
buah-buahan
|
|
|
|
buang-buang
|
2017-07-27 15:46:30 +03:00
|
|
|
buat-buatan
|
2017-07-24 10:10:16 +03:00
|
|
|
buaya-buaya
|
2017-07-27 15:46:30 +03:00
|
|
|
bubun-bubun
|
2017-07-24 10:10:16 +03:00
|
|
|
bugi-bugi
|
2017-07-26 15:12:52 +03:00
|
|
|
build-up
|
|
|
|
built-in
|
|
|
|
built-up
|
|
|
|
buka-buka
|
|
|
|
buka-bukaan
|
|
|
|
buka-tutup
|
2017-07-27 15:46:30 +03:00
|
|
|
bukan-bukan
|
2017-07-26 15:12:52 +03:00
|
|
|
bukti-bukti
|
|
|
|
buku-buku
|
|
|
|
bulan-bulan
|
2017-07-27 15:46:30 +03:00
|
|
|
bulan-bulanan
|
2017-07-24 10:10:16 +03:00
|
|
|
bulang-baling
|
2017-07-27 15:46:30 +03:00
|
|
|
bulang-bulang
|
2017-07-26 15:12:52 +03:00
|
|
|
bulat-bulat
|
2017-07-24 10:10:16 +03:00
|
|
|
buli-buli
|
|
|
|
bulu-bulu
|
2017-07-27 15:46:30 +03:00
|
|
|
buluh-buluh
|
2017-07-24 10:10:16 +03:00
|
|
|
bulus-bulus
|
2017-07-26 15:12:52 +03:00
|
|
|
bunga-bunga
|
2017-07-27 15:46:30 +03:00
|
|
|
bunga-bungaan
|
|
|
|
bunuh-membunuh
|
|
|
|
bunyi-bunyian
|
2017-07-26 15:12:52 +03:00
|
|
|
bupati-bupati
|
|
|
|
bupati-wakil
|
|
|
|
buru-buru
|
|
|
|
burung-burung
|
2017-07-27 15:46:30 +03:00
|
|
|
burung-burungan
|
2017-07-26 15:12:52 +03:00
|
|
|
bus-bus
|
|
|
|
business-to-business
|
2017-07-27 15:46:30 +03:00
|
|
|
busur-busur
|
2017-07-26 15:12:52 +03:00
|
|
|
butir-butir
|
|
|
|
by-pass
|
|
|
|
bye-bye
|
|
|
|
cabang-cabang
|
2017-07-27 15:46:30 +03:00
|
|
|
cabik-cabik
|
|
|
|
cabik-mencabik
|
2017-07-26 15:12:52 +03:00
|
|
|
cabup-cawabup
|
2017-07-24 10:10:16 +03:00
|
|
|
caci-maki
|
2017-07-26 15:12:52 +03:00
|
|
|
cagub-cawagub
|
2017-07-27 15:46:30 +03:00
|
|
|
caing-caing
|
|
|
|
cakar-mencakar
|
|
|
|
cakup-mencakup
|
|
|
|
calak-calak
|
|
|
|
calar-balar
|
2017-07-26 15:12:52 +03:00
|
|
|
caleg-caleg
|
|
|
|
calo-calo
|
|
|
|
calon-calon
|
2017-07-27 15:46:30 +03:00
|
|
|
campang-camping
|
2017-07-26 15:12:52 +03:00
|
|
|
campur-campur
|
|
|
|
capres-cawapres
|
|
|
|
cara-cara
|
|
|
|
cari-cari
|
2017-07-27 15:46:30 +03:00
|
|
|
cari-carian
|
2017-07-26 15:12:52 +03:00
|
|
|
carut-marut
|
|
|
|
catch-up
|
|
|
|
cawali-cawawali
|
2017-07-24 10:10:16 +03:00
|
|
|
cawe-cawe
|
2017-07-27 15:46:30 +03:00
|
|
|
cawi-cawi
|
|
|
|
cebar-cebur
|
2017-07-26 15:12:52 +03:00
|
|
|
celah-celah
|
2017-07-24 10:10:16 +03:00
|
|
|
celam-celum
|
|
|
|
celangak-celinguk
|
|
|
|
celas-celus
|
|
|
|
celedang-celedok
|
|
|
|
celengkak-celengkok
|
2017-07-27 15:46:30 +03:00
|
|
|
celingak-celinguk
|
|
|
|
celung-celung
|
|
|
|
cemas-cemas
|
2017-07-24 10:10:16 +03:00
|
|
|
cenal-cenil
|
|
|
|
cengar-cengir
|
|
|
|
cengir-cengir
|
|
|
|
cengis-cengis
|
2017-07-27 15:46:30 +03:00
|
|
|
cengking-mengking
|
|
|
|
centang-perenang
|
2017-07-26 15:12:52 +03:00
|
|
|
cepat-cepat
|
2017-07-24 10:10:16 +03:00
|
|
|
ceplas-ceplos
|
|
|
|
cerai-berai
|
2017-07-26 15:12:52 +03:00
|
|
|
cerita-cerita
|
2017-07-27 15:46:30 +03:00
|
|
|
ceruk-menceruk
|
|
|
|
ceruk-meruk
|
2017-07-26 15:12:52 +03:00
|
|
|
cetak-biru
|
2017-07-27 15:46:30 +03:00
|
|
|
cetak-mencetak
|
|
|
|
cetar-ceter
|
2017-07-26 15:12:52 +03:00
|
|
|
check-in
|
|
|
|
check-ins
|
|
|
|
check-up
|
|
|
|
chit-chat
|
|
|
|
choki-choki
|
2017-07-27 15:46:30 +03:00
|
|
|
cingak-cinguk
|
2017-07-26 15:12:52 +03:00
|
|
|
cipika-cipiki
|
|
|
|
ciri-ciri
|
|
|
|
ciri-cirinya
|
2017-07-27 15:46:30 +03:00
|
|
|
cirit-birit
|
2017-07-26 15:12:52 +03:00
|
|
|
cita-cita
|
|
|
|
cita-citaku
|
|
|
|
close-up
|
|
|
|
closed-circuit
|
|
|
|
coba-coba
|
2017-07-24 10:10:16 +03:00
|
|
|
cobak-cabik
|
|
|
|
cobar-cabir
|
|
|
|
cola-cala
|
|
|
|
colang-caling
|
|
|
|
comat-comot
|
2017-07-27 15:46:30 +03:00
|
|
|
comot-comot
|
2017-07-24 10:10:16 +03:00
|
|
|
compang-camping
|
2017-07-26 15:12:52 +03:00
|
|
|
computer-aided
|
|
|
|
computer-generated
|
2017-07-27 15:46:30 +03:00
|
|
|
condong-mondong
|
|
|
|
congak-cangit
|
2017-07-24 10:10:16 +03:00
|
|
|
conggah-canggih
|
|
|
|
congkah-cangkih
|
|
|
|
congkah-mangkih
|
|
|
|
copak-capik
|
2017-07-26 15:12:52 +03:00
|
|
|
copy-paste
|
2017-07-27 15:46:30 +03:00
|
|
|
corak-carik
|
2017-07-26 15:12:52 +03:00
|
|
|
corat-coret
|
2017-07-27 15:46:30 +03:00
|
|
|
coreng-moreng
|
|
|
|
coret-coret
|
2017-07-24 10:10:16 +03:00
|
|
|
crat-crit
|
2017-07-26 15:12:52 +03:00
|
|
|
cross-border
|
|
|
|
cross-dressing
|
|
|
|
crypto-ransomware
|
2017-07-27 15:46:30 +03:00
|
|
|
cuang-caing
|
2017-07-26 15:12:52 +03:00
|
|
|
cublak-cublak
|
2017-07-27 15:46:30 +03:00
|
|
|
cubung-cubung
|
|
|
|
culik-culik
|
2017-07-26 15:12:52 +03:00
|
|
|
cuma-cuma
|
2017-07-24 10:10:16 +03:00
|
|
|
cumi-cumi
|
2017-07-27 15:46:30 +03:00
|
|
|
cungap-cangip
|
|
|
|
cupu-cupu
|
2017-07-26 15:12:52 +03:00
|
|
|
dabu-dabu
|
|
|
|
daerah-daerah
|
2017-07-24 10:10:16 +03:00
|
|
|
dag-dag
|
|
|
|
dag-dig-dug
|
2017-07-27 15:46:30 +03:00
|
|
|
daging-dagingan
|
|
|
|
dahulu-mendahului
|
2017-07-26 15:12:52 +03:00
|
|
|
dalam-dalam
|
2017-07-24 10:10:16 +03:00
|
|
|
dali-dali
|
2017-07-27 15:46:30 +03:00
|
|
|
dam-dam
|
2017-07-26 15:12:52 +03:00
|
|
|
danau-danau
|
2017-07-27 15:46:30 +03:00
|
|
|
dansa-dansi
|
2017-07-26 15:12:52 +03:00
|
|
|
dapil-dapil
|
2017-07-24 10:10:16 +03:00
|
|
|
dapur-dapur
|
|
|
|
dari-dari
|
|
|
|
daru-daru
|
2017-07-26 15:12:52 +03:00
|
|
|
dasar-dasar
|
2017-07-27 15:46:30 +03:00
|
|
|
datang-datang
|
|
|
|
datang-mendatangi
|
2017-07-26 15:12:52 +03:00
|
|
|
daun-daun
|
2017-07-27 15:46:30 +03:00
|
|
|
daun-daunan
|
2017-07-24 10:10:16 +03:00
|
|
|
dawai-dawai
|
2017-07-27 15:46:30 +03:00
|
|
|
dayang-dayang
|
|
|
|
dayung-mayung
|
|
|
|
debak-debuk
|
2017-07-26 15:12:52 +03:00
|
|
|
debu-debu
|
|
|
|
deca-core
|
|
|
|
decision-making
|
|
|
|
deep-lying
|
|
|
|
deg-degan
|
2017-07-27 15:46:30 +03:00
|
|
|
degap-degap
|
2017-07-24 10:10:16 +03:00
|
|
|
dekak-dekak
|
2017-07-27 15:46:30 +03:00
|
|
|
dekat-dekat
|
|
|
|
dengar-dengaran
|
|
|
|
dengking-mendengking
|
2017-07-26 15:12:52 +03:00
|
|
|
departemen-departemen
|
|
|
|
depo-depo
|
|
|
|
deputi-deputi
|
|
|
|
desa-desa
|
|
|
|
desa-kota
|
2017-07-24 10:10:16 +03:00
|
|
|
desas-desus
|
2017-07-26 15:12:52 +03:00
|
|
|
detik-detik
|
|
|
|
dewa-dewa
|
|
|
|
dewa-dewi
|
|
|
|
dewan-dewan
|
2017-07-24 10:10:16 +03:00
|
|
|
dewi-dewi
|
2017-07-26 15:12:52 +03:00
|
|
|
dial-up
|
|
|
|
diam-diam
|
|
|
|
dibayang-bayangi
|
|
|
|
dibuat-buat
|
|
|
|
diiming-imingi
|
|
|
|
dilebih-lebihkan
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
dimana-mana
|
2017-07-26 15:12:52 +03:00
|
|
|
dimata-matai
|
|
|
|
dinas-dinas
|
2017-07-24 10:10:16 +03:00
|
|
|
dinul-Islam
|
2017-07-26 15:12:52 +03:00
|
|
|
diobok-obok
|
|
|
|
diolok-olok
|
|
|
|
direksi-direksi
|
|
|
|
direktorat-direktorat
|
|
|
|
dirjen-dirjen
|
|
|
|
dirut-dirut
|
|
|
|
ditunggu-tunggu
|
|
|
|
divisi-divisi
|
|
|
|
do-it-yourself
|
|
|
|
doa-doa
|
2017-07-24 10:10:16 +03:00
|
|
|
dog-dog
|
2017-07-26 15:12:52 +03:00
|
|
|
doggy-style
|
2017-07-24 10:10:16 +03:00
|
|
|
dokok-dokok
|
|
|
|
dolak-dalik
|
2017-07-27 15:46:30 +03:00
|
|
|
dor-doran
|
2017-07-26 15:12:52 +03:00
|
|
|
dorong-mendorong
|
|
|
|
dosa-dosa
|
|
|
|
dress-up
|
|
|
|
drive-in
|
2017-07-27 15:46:30 +03:00
|
|
|
dua-dua
|
|
|
|
dua-duaan
|
2017-07-26 15:12:52 +03:00
|
|
|
dua-duanya
|
|
|
|
dubes-dubes
|
|
|
|
duduk-duduk
|
|
|
|
dugaan-dugaan
|
2017-07-24 10:10:16 +03:00
|
|
|
dulang-dulang
|
2017-07-26 15:12:52 +03:00
|
|
|
duri-duri
|
|
|
|
duta-duta
|
|
|
|
dwi-kewarganegaraan
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
e-arena
|
|
|
|
e-billing
|
|
|
|
e-budgeting
|
|
|
|
e-cctv
|
|
|
|
e-class
|
|
|
|
e-commerce
|
|
|
|
e-counting
|
|
|
|
e-elektronik
|
|
|
|
e-entertainment
|
|
|
|
e-evolution
|
|
|
|
e-faktur
|
|
|
|
e-filing
|
|
|
|
e-fin
|
|
|
|
e-form
|
|
|
|
e-government
|
|
|
|
e-govt
|
|
|
|
e-hakcipta
|
|
|
|
e-id
|
|
|
|
e-info
|
|
|
|
e-katalog
|
|
|
|
e-ktp
|
|
|
|
e-leadership
|
|
|
|
e-lhkpn
|
|
|
|
e-library
|
|
|
|
e-loket
|
|
|
|
e-m1
|
|
|
|
e-money
|
|
|
|
e-news
|
|
|
|
e-nisn
|
|
|
|
e-npwp
|
|
|
|
e-paspor
|
|
|
|
e-paten
|
|
|
|
e-pay
|
|
|
|
e-perda
|
|
|
|
e-perizinan
|
|
|
|
e-planning
|
|
|
|
e-polisi
|
|
|
|
e-power
|
|
|
|
e-punten
|
|
|
|
e-retribusi
|
|
|
|
e-samsat
|
|
|
|
e-sport
|
|
|
|
e-store
|
|
|
|
e-tax
|
|
|
|
e-ticketing
|
|
|
|
e-tilang
|
|
|
|
e-toll
|
|
|
|
e-visa
|
|
|
|
e-voting
|
|
|
|
e-wallet
|
|
|
|
e-warong
|
2017-07-27 15:46:30 +03:00
|
|
|
ecek-ecek
|
2017-07-26 15:12:52 +03:00
|
|
|
eco-friendly
|
|
|
|
eco-park
|
2017-07-27 15:46:30 +03:00
|
|
|
edan-edanan
|
2017-07-26 15:12:52 +03:00
|
|
|
editor-editor
|
|
|
|
editor-in-chief
|
|
|
|
efek-efek
|
|
|
|
ekonomi-ekonomi
|
|
|
|
eksekutif-legislatif
|
|
|
|
ekspor-impor
|
|
|
|
elang-elang
|
|
|
|
elemen-elemen
|
|
|
|
emak-emak
|
2017-07-27 15:46:30 +03:00
|
|
|
embuh-embuhan
|
|
|
|
empat-empat
|
2017-07-26 15:12:52 +03:00
|
|
|
empek-empek
|
2017-07-27 15:46:30 +03:00
|
|
|
empet-empetan
|
|
|
|
empok-empok
|
|
|
|
empot-empotan
|
2017-07-26 15:12:52 +03:00
|
|
|
enak-enak
|
2017-07-27 15:46:30 +03:00
|
|
|
encal-encal
|
2017-07-26 15:12:52 +03:00
|
|
|
end-to-end
|
|
|
|
end-user
|
2017-07-27 15:46:30 +03:00
|
|
|
endap-endap
|
|
|
|
endut-endut
|
|
|
|
endut-endutan
|
|
|
|
engah-engah
|
2017-07-24 10:10:16 +03:00
|
|
|
engap-engap
|
2017-07-27 15:46:30 +03:00
|
|
|
enggan-enggan
|
|
|
|
engkah-engkah
|
2017-07-24 10:10:16 +03:00
|
|
|
engket-engket
|
2017-07-27 15:46:30 +03:00
|
|
|
entah-berentah
|
|
|
|
enten-enten
|
2017-07-26 15:12:52 +03:00
|
|
|
entry-level
|
|
|
|
equity-linked
|
2017-07-27 15:46:30 +03:00
|
|
|
erang-erot
|
|
|
|
erat-erat
|
2017-07-24 10:10:16 +03:00
|
|
|
erek-erek
|
2017-07-27 15:46:30 +03:00
|
|
|
ereng-ereng
|
|
|
|
erong-erong
|
2017-07-26 15:12:52 +03:00
|
|
|
esek-esek
|
|
|
|
ex-officio
|
|
|
|
exchange-traded
|
|
|
|
exercise-induced
|
|
|
|
extra-time
|
|
|
|
face-down
|
|
|
|
face-to-face
|
|
|
|
fair-play
|
|
|
|
fakta-fakta
|
|
|
|
faktor-faktor
|
|
|
|
fakultas-fakultas
|
|
|
|
fase-fase
|
|
|
|
fast-food
|
|
|
|
feed-in
|
|
|
|
fifty-fifty
|
|
|
|
file-file
|
|
|
|
first-leg
|
|
|
|
first-team
|
|
|
|
fitur-fitur
|
|
|
|
fitur-fiturnya
|
|
|
|
fixed-income
|
|
|
|
flip-flop
|
2017-07-24 10:10:16 +03:00
|
|
|
flip-plop
|
2017-07-26 15:12:52 +03:00
|
|
|
fly-in
|
|
|
|
follow-up
|
|
|
|
foto-foto
|
|
|
|
foya-foya
|
|
|
|
fraksi-fraksi
|
|
|
|
free-to-play
|
|
|
|
front-end
|
|
|
|
fungsi-fungsi
|
2017-07-24 10:10:16 +03:00
|
|
|
gaba-gaba
|
2017-07-27 15:46:30 +03:00
|
|
|
gabai-gabai
|
2017-07-24 10:10:16 +03:00
|
|
|
gada-gada
|
|
|
|
gading-gading
|
2017-07-26 15:12:52 +03:00
|
|
|
gadis-gadis
|
2017-07-24 10:10:16 +03:00
|
|
|
gado-gado
|
2017-07-27 15:46:30 +03:00
|
|
|
gail-gail
|
2017-07-26 15:12:52 +03:00
|
|
|
gajah-gajah
|
2017-07-27 15:46:30 +03:00
|
|
|
gajah-gajahan
|
2017-07-24 10:10:16 +03:00
|
|
|
gala-gala
|
2017-07-26 15:12:52 +03:00
|
|
|
galeri-galeri
|
|
|
|
gali-gali
|
2017-07-27 15:46:30 +03:00
|
|
|
gali-galian
|
2017-07-24 10:10:16 +03:00
|
|
|
galing-galing
|
|
|
|
galu-galu
|
2017-07-27 15:46:30 +03:00
|
|
|
gamak-gamak
|
2017-07-26 15:12:52 +03:00
|
|
|
gambar-gambar
|
2017-07-27 15:46:30 +03:00
|
|
|
gambar-menggambar
|
|
|
|
gamit-gamitan
|
|
|
|
gampang-gampangan
|
2017-07-24 10:10:16 +03:00
|
|
|
gana-gini
|
2017-07-27 15:46:30 +03:00
|
|
|
ganal-ganal
|
|
|
|
ganda-berganda
|
|
|
|
ganjal-mengganjal
|
2017-07-26 15:12:52 +03:00
|
|
|
ganjil-genap
|
|
|
|
ganteng-ganteng
|
|
|
|
gantung-gantung
|
2017-07-24 10:10:16 +03:00
|
|
|
gapah-gopoh
|
|
|
|
gara-gara
|
2017-07-27 15:46:30 +03:00
|
|
|
garah-garah
|
2017-07-26 15:12:52 +03:00
|
|
|
garis-garis
|
2017-07-27 15:46:30 +03:00
|
|
|
gasak-gasakan
|
2017-07-26 15:12:52 +03:00
|
|
|
gatal-gatal
|
|
|
|
gaun-gaun
|
2017-07-27 15:46:30 +03:00
|
|
|
gawar-gawar
|
|
|
|
gaya-gayanya
|
2017-07-24 10:10:16 +03:00
|
|
|
gayang-gayang
|
2017-07-26 15:12:52 +03:00
|
|
|
ge-er
|
2017-07-24 10:10:16 +03:00
|
|
|
gebyah-uyah
|
2017-07-27 15:46:30 +03:00
|
|
|
gebyar-gebyar
|
2017-07-24 10:10:16 +03:00
|
|
|
gedana-gedini
|
|
|
|
gedebak-gedebuk
|
|
|
|
gedebar-gedebur
|
2017-07-26 15:12:52 +03:00
|
|
|
gedung-gedung
|
|
|
|
gelang-gelang
|
2017-07-27 15:46:30 +03:00
|
|
|
gelap-gelapan
|
2017-07-26 15:12:52 +03:00
|
|
|
gelar-gelar
|
|
|
|
gelas-gelas
|
2017-07-27 15:46:30 +03:00
|
|
|
gelembung-gelembungan
|
2017-07-26 15:12:52 +03:00
|
|
|
geleng-geleng
|
2017-07-24 10:10:16 +03:00
|
|
|
geli-geli
|
2017-07-27 15:46:30 +03:00
|
|
|
geliang-geliut
|
|
|
|
geliat-geliut
|
2017-07-24 10:10:16 +03:00
|
|
|
gembar-gembor
|
|
|
|
gembrang-gembreng
|
|
|
|
gempul-gempul
|
2017-07-27 15:46:30 +03:00
|
|
|
gempur-menggempur
|
|
|
|
gendang-gendang
|
|
|
|
gengsi-gengsian
|
2017-07-24 10:10:16 +03:00
|
|
|
genjang-genjot
|
2017-07-27 15:46:30 +03:00
|
|
|
genjot-genjotan
|
|
|
|
genjrang-genjreng
|
2017-07-26 15:12:52 +03:00
|
|
|
genome-wide
|
|
|
|
geo-politik
|
2017-07-27 15:46:30 +03:00
|
|
|
gerabak-gerubuk
|
2017-07-26 15:12:52 +03:00
|
|
|
gerak-gerik
|
|
|
|
gerak-geriknya
|
|
|
|
gerakan-gerakan
|
2017-07-24 10:10:16 +03:00
|
|
|
gerbas-gerbus
|
2017-07-26 15:12:52 +03:00
|
|
|
gereja-gereja
|
2017-07-24 10:10:16 +03:00
|
|
|
gereng-gereng
|
2017-07-27 15:46:30 +03:00
|
|
|
geriak-geriuk
|
|
|
|
gerit-gerit
|
2017-07-24 10:10:16 +03:00
|
|
|
gerot-gerot
|
2017-07-27 15:46:30 +03:00
|
|
|
geruh-gerah
|
2017-07-24 10:10:16 +03:00
|
|
|
getak-getuk
|
|
|
|
getem-getem
|
|
|
|
geti-geti
|
2017-07-27 15:46:30 +03:00
|
|
|
gial-gial
|
|
|
|
gial-giul
|
|
|
|
gila-gila
|
2017-07-26 15:12:52 +03:00
|
|
|
gila-gilaan
|
2017-07-27 15:46:30 +03:00
|
|
|
gilang-gemilang
|
|
|
|
gilap-gemilap
|
2017-07-24 10:10:16 +03:00
|
|
|
gili-gili
|
2017-07-27 15:46:30 +03:00
|
|
|
giling-giling
|
|
|
|
gilir-bergilir
|
|
|
|
ginang-ginang
|
2017-07-24 10:10:16 +03:00
|
|
|
girap-girap
|
2017-07-27 15:46:30 +03:00
|
|
|
girik-girik
|
2017-07-24 10:10:16 +03:00
|
|
|
giring-giring
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
go-auto
|
|
|
|
go-bills
|
|
|
|
go-bluebird
|
|
|
|
go-box
|
|
|
|
go-car
|
|
|
|
go-clean
|
|
|
|
go-food
|
|
|
|
go-glam
|
|
|
|
go-jek
|
2017-07-26 15:12:52 +03:00
|
|
|
go-kart
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
go-mart
|
|
|
|
go-massage
|
|
|
|
go-med
|
|
|
|
go-points
|
|
|
|
go-pulsa
|
|
|
|
go-ride
|
|
|
|
go-send
|
|
|
|
go-shop
|
|
|
|
go-tix
|
2017-07-26 15:12:52 +03:00
|
|
|
go-to-market
|
2017-07-27 15:46:30 +03:00
|
|
|
goak-goak
|
2017-07-26 15:12:52 +03:00
|
|
|
goal-line
|
|
|
|
gol-gol
|
2017-07-24 10:10:16 +03:00
|
|
|
golak-galik
|
|
|
|
gondas-gandes
|
2017-07-26 15:12:52 +03:00
|
|
|
gonjang-ganjing
|
2017-07-27 15:46:30 +03:00
|
|
|
gonjlang-ganjling
|
2017-07-24 10:10:16 +03:00
|
|
|
gonta-ganti
|
2017-07-27 15:46:30 +03:00
|
|
|
gontok-gontokan
|
|
|
|
gorap-gorap
|
2017-07-24 10:10:16 +03:00
|
|
|
gorong-gorong
|
2017-07-26 15:12:52 +03:00
|
|
|
gotong-royong
|
2017-07-24 10:10:16 +03:00
|
|
|
gresek-gresek
|
2017-07-26 15:12:52 +03:00
|
|
|
gua-gua
|
2017-07-27 15:46:30 +03:00
|
|
|
gual-gail
|
2017-07-26 15:12:52 +03:00
|
|
|
gubernur-gubernur
|
2017-07-24 10:10:16 +03:00
|
|
|
gudu-gudu
|
2017-07-26 15:12:52 +03:00
|
|
|
gula-gula
|
2017-07-24 10:10:16 +03:00
|
|
|
gulang-gulang
|
2017-07-27 15:46:30 +03:00
|
|
|
gulung-menggulung
|
|
|
|
guna-ganah
|
|
|
|
guna-guna
|
|
|
|
gundala-gundala
|
|
|
|
guntang-guntang
|
|
|
|
gunung-ganang
|
|
|
|
gunung-gemunung
|
|
|
|
gunung-gunungan
|
2017-07-26 15:12:52 +03:00
|
|
|
guru-guru
|
2017-07-27 15:46:30 +03:00
|
|
|
habis-habis
|
2017-07-26 15:12:52 +03:00
|
|
|
habis-habisan
|
|
|
|
hak-hak
|
|
|
|
hak-hal
|
|
|
|
hakim-hakim
|
|
|
|
hal-hal
|
2017-07-24 10:10:16 +03:00
|
|
|
halai-balai
|
2017-07-26 15:12:52 +03:00
|
|
|
half-time
|
|
|
|
hama-hama
|
2017-07-27 15:46:30 +03:00
|
|
|
hampir-hampir
|
|
|
|
hancur-hancuran
|
|
|
|
hancur-menghancurkan
|
2017-07-26 15:12:52 +03:00
|
|
|
hands-free
|
|
|
|
hands-on
|
|
|
|
hang-out
|
|
|
|
hantu-hantu
|
|
|
|
happy-happy
|
|
|
|
harap-harap
|
2017-07-27 15:46:30 +03:00
|
|
|
harap-harapan
|
2017-07-26 15:12:52 +03:00
|
|
|
hard-disk
|
|
|
|
harga-harga
|
|
|
|
hari-hari
|
|
|
|
harimau-harimau
|
2017-07-27 15:46:30 +03:00
|
|
|
harum-haruman
|
2017-07-26 15:12:52 +03:00
|
|
|
hasil-hasil
|
2017-07-24 10:10:16 +03:00
|
|
|
hasta-wara
|
2017-07-26 15:12:52 +03:00
|
|
|
hat-trick
|
|
|
|
hati-hati
|
|
|
|
hati-hatilah
|
|
|
|
head-mounted
|
|
|
|
head-to-head
|
|
|
|
head-up
|
|
|
|
heads-up
|
|
|
|
heavy-duty
|
2017-07-27 15:46:30 +03:00
|
|
|
hebat-hebatan
|
2017-07-26 15:12:52 +03:00
|
|
|
hewan-hewan
|
|
|
|
hexa-core
|
|
|
|
hidup-hidup
|
|
|
|
hidup-mati
|
2017-07-24 10:10:16 +03:00
|
|
|
hila-hila
|
2017-07-27 15:46:30 +03:00
|
|
|
hilang-hilang
|
|
|
|
hina-menghinakan
|
2017-07-26 15:12:52 +03:00
|
|
|
hip-hop
|
2017-07-24 10:10:16 +03:00
|
|
|
hiru-biru
|
|
|
|
hiru-hara
|
2017-07-26 15:12:52 +03:00
|
|
|
hiruk-pikuk
|
|
|
|
hitam-putih
|
|
|
|
hitung-hitung
|
|
|
|
hitung-hitungan
|
2017-07-27 15:46:30 +03:00
|
|
|
hormat-menghormati
|
2017-07-26 15:12:52 +03:00
|
|
|
hot-swappable
|
|
|
|
hotel-hotel
|
|
|
|
how-to
|
2017-07-24 10:10:16 +03:00
|
|
|
hubar-habir
|
|
|
|
hubaya-hubaya
|
2017-07-26 15:12:52 +03:00
|
|
|
hukum-red
|
|
|
|
hukuman-hukuman
|
|
|
|
hula-hoop
|
2017-07-24 10:10:16 +03:00
|
|
|
hula-hula
|
2017-07-26 15:12:52 +03:00
|
|
|
hulu-hilir
|
|
|
|
humas-humas
|
2017-07-24 10:10:16 +03:00
|
|
|
hura-hura
|
|
|
|
huru-hara
|
|
|
|
ibar-ibar
|
2017-07-26 15:12:52 +03:00
|
|
|
ibu-anak
|
|
|
|
ibu-ibu
|
2017-07-24 10:10:16 +03:00
|
|
|
icak-icak
|
2017-07-26 15:12:52 +03:00
|
|
|
icip-icip
|
2017-07-27 15:46:30 +03:00
|
|
|
idam-idam
|
2017-07-26 15:12:52 +03:00
|
|
|
ide-ide
|
2017-07-27 15:46:30 +03:00
|
|
|
igau-igauan
|
2017-07-24 10:10:16 +03:00
|
|
|
ikan-ikan
|
2017-07-27 15:46:30 +03:00
|
|
|
ikut-ikut
|
|
|
|
ikut-ikutan
|
2017-07-24 10:10:16 +03:00
|
|
|
ilam-ilam
|
|
|
|
ilat-ilatan
|
2017-07-26 15:12:52 +03:00
|
|
|
ilmu-ilmu
|
2017-07-27 15:46:30 +03:00
|
|
|
imbang-imbangan
|
2017-07-24 10:10:16 +03:00
|
|
|
iming-iming
|
|
|
|
imut-imut
|
2017-07-27 15:46:30 +03:00
|
|
|
inang-inang
|
|
|
|
inca-binca
|
2017-07-24 10:10:16 +03:00
|
|
|
incang-incut
|
2017-07-26 15:12:52 +03:00
|
|
|
industri-industri
|
2017-07-27 15:46:30 +03:00
|
|
|
ingar-bingar
|
|
|
|
ingar-ingar
|
2017-07-24 10:10:16 +03:00
|
|
|
ingat-ingat
|
2017-07-27 15:46:30 +03:00
|
|
|
ingat-ingatan
|
|
|
|
ingau-ingauan
|
2017-07-24 10:10:16 +03:00
|
|
|
inggang-inggung
|
2017-07-27 15:46:30 +03:00
|
|
|
injak-injak
|
2017-07-26 15:12:52 +03:00
|
|
|
input-output
|
|
|
|
instansi-instansi
|
|
|
|
instant-on
|
|
|
|
instrumen-instrumen
|
|
|
|
inter-governmental
|
2017-07-24 10:10:16 +03:00
|
|
|
ira-ira
|
|
|
|
irah-irahan
|
2017-07-27 15:46:30 +03:00
|
|
|
iras-iras
|
2017-07-26 15:12:52 +03:00
|
|
|
iring-iringan
|
2017-07-27 15:46:30 +03:00
|
|
|
iris-irisan
|
2017-07-24 10:10:16 +03:00
|
|
|
isak-isak
|
2017-07-26 15:12:52 +03:00
|
|
|
isat-bb
|
2017-07-27 15:46:30 +03:00
|
|
|
iseng-iseng
|
2017-07-26 15:12:52 +03:00
|
|
|
istana-istana
|
|
|
|
istri-istri
|
|
|
|
isu-isu
|
|
|
|
iya-iya
|
|
|
|
jabatan-jabatan
|
|
|
|
jadi-jadian
|
|
|
|
jagoan-jagoan
|
2017-07-27 15:46:30 +03:00
|
|
|
jaja-jajaan
|
2017-07-26 15:12:52 +03:00
|
|
|
jaksa-jaksa
|
2017-07-27 15:46:30 +03:00
|
|
|
jala-jala
|
2017-07-26 15:12:52 +03:00
|
|
|
jalan-jalan
|
2017-07-24 10:10:16 +03:00
|
|
|
jali-jali
|
2017-07-27 15:46:30 +03:00
|
|
|
jalin-berjalin
|
|
|
|
jalin-menjalin
|
2017-07-26 15:12:52 +03:00
|
|
|
jam-jam
|
2017-07-27 15:46:30 +03:00
|
|
|
jamah-jamahan
|
|
|
|
jambak-jambakan
|
|
|
|
jambu-jambu
|
2017-07-26 15:12:52 +03:00
|
|
|
jampi-jampi
|
|
|
|
janda-janda
|
|
|
|
jangan-jangan
|
|
|
|
janji-janji
|
2017-07-27 15:46:30 +03:00
|
|
|
jarang-jarang
|
2017-07-26 15:12:52 +03:00
|
|
|
jari-jari
|
|
|
|
jaring-jaring
|
2017-07-24 10:10:16 +03:00
|
|
|
jarum-jarum
|
2017-07-26 15:12:52 +03:00
|
|
|
jasa-jasa
|
|
|
|
jatuh-bangun
|
|
|
|
jauh-dekat
|
|
|
|
jauh-jauh
|
2017-07-27 15:46:30 +03:00
|
|
|
jawi-jawi
|
|
|
|
jebar-jebur
|
|
|
|
jebat-jebatan
|
2017-07-24 10:10:16 +03:00
|
|
|
jegal-jegalan
|
2017-07-26 15:12:52 +03:00
|
|
|
jejak-jejak
|
2017-07-27 15:46:30 +03:00
|
|
|
jelang-menjelang
|
2017-07-26 15:12:52 +03:00
|
|
|
jelas-jelas
|
2017-07-24 10:10:16 +03:00
|
|
|
jelur-jelir
|
2017-07-26 15:12:52 +03:00
|
|
|
jembatan-jembatan
|
|
|
|
jenazah-jenazah
|
2017-07-27 15:46:30 +03:00
|
|
|
jendal-jendul
|
2017-07-26 15:12:52 +03:00
|
|
|
jenderal-jenderal
|
2017-07-27 15:46:30 +03:00
|
|
|
jenggar-jenggur
|
2017-07-26 15:12:52 +03:00
|
|
|
jenis-jenis
|
|
|
|
jenis-jenisnya
|
2017-07-24 10:10:16 +03:00
|
|
|
jentik-jentik
|
2017-07-27 15:46:30 +03:00
|
|
|
jerah-jerih
|
|
|
|
jinak-jinak
|
2017-07-26 15:12:52 +03:00
|
|
|
jiwa-jiwa
|
2017-07-27 15:46:30 +03:00
|
|
|
joli-joli
|
|
|
|
jolong-jolong
|
|
|
|
jongkang-jangking
|
2017-07-24 10:10:16 +03:00
|
|
|
jongkar-jangkir
|
|
|
|
jongkat-jangkit
|
2017-07-26 15:12:52 +03:00
|
|
|
jor-joran
|
2017-07-27 15:46:30 +03:00
|
|
|
jotos-jotosan
|
|
|
|
juak-juak
|
2017-07-26 15:12:52 +03:00
|
|
|
jual-beli
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
juang-juang
|
2017-07-27 15:46:30 +03:00
|
|
|
julo-julo
|
2017-07-26 15:12:52 +03:00
|
|
|
julung-julung
|
2017-07-27 15:46:30 +03:00
|
|
|
julur-julur
|
|
|
|
jumbai-jumbai
|
|
|
|
jungkang-jungkit
|
|
|
|
jungkat-jungkit
|
|
|
|
jurai-jurai
|
2017-07-24 10:10:16 +03:00
|
|
|
kabang-kabang
|
2017-07-26 15:12:52 +03:00
|
|
|
kabar-kabari
|
2017-07-27 15:46:30 +03:00
|
|
|
kabir-kabiran
|
|
|
|
kabruk-kabrukan
|
2017-07-24 10:10:16 +03:00
|
|
|
kabu-kabu
|
2017-07-26 15:12:52 +03:00
|
|
|
kabupaten-kabupaten
|
|
|
|
kabupaten-kota
|
|
|
|
kaca-kaca
|
2017-07-24 10:10:16 +03:00
|
|
|
kacang-kacang
|
|
|
|
kacang-kacangan
|
2017-07-26 15:12:52 +03:00
|
|
|
kacau-balau
|
|
|
|
kadang-kadang
|
|
|
|
kader-kader
|
|
|
|
kades-kades
|
|
|
|
kadis-kadis
|
2017-07-27 15:46:30 +03:00
|
|
|
kail-kail
|
2017-07-26 15:12:52 +03:00
|
|
|
kain-kain
|
|
|
|
kait-kait
|
|
|
|
kakak-adik
|
|
|
|
kakak-beradik
|
|
|
|
kakak-kakak
|
2017-07-27 15:46:30 +03:00
|
|
|
kakek-kakek
|
2017-07-26 15:12:52 +03:00
|
|
|
kakek-nenek
|
|
|
|
kaki-kaki
|
2017-07-24 10:10:16 +03:00
|
|
|
kala-kala
|
|
|
|
kalau-kalau
|
2017-07-27 15:46:30 +03:00
|
|
|
kaleng-kalengan
|
|
|
|
kali-kalian
|
2017-07-26 15:12:52 +03:00
|
|
|
kalimat-kalimat
|
|
|
|
kalung-kalung
|
2017-07-24 10:10:16 +03:00
|
|
|
kalut-malut
|
2017-07-26 15:12:52 +03:00
|
|
|
kambing-kambing
|
2017-07-27 15:46:30 +03:00
|
|
|
kamit-kamit
|
2017-07-26 15:12:52 +03:00
|
|
|
kampung-kampung
|
|
|
|
kampus-kampus
|
2017-07-24 10:10:16 +03:00
|
|
|
kanak-kanak
|
2017-07-26 15:12:52 +03:00
|
|
|
kanak-kanan
|
|
|
|
kanan-kanak
|
|
|
|
kanan-kiri
|
2017-07-27 15:46:30 +03:00
|
|
|
kangen-kangenan
|
2017-07-26 15:12:52 +03:00
|
|
|
kanwil-kanwil
|
2017-07-27 15:46:30 +03:00
|
|
|
kapa-kapa
|
2017-07-26 15:12:52 +03:00
|
|
|
kapal-kapal
|
2017-07-27 15:46:30 +03:00
|
|
|
kapan-kapan
|
2017-07-26 15:12:52 +03:00
|
|
|
kapolda-kapolda
|
|
|
|
kapolres-kapolres
|
|
|
|
kapolsek-kapolsek
|
2017-07-27 15:46:30 +03:00
|
|
|
kapu-kapu
|
|
|
|
karang-karangan
|
|
|
|
karang-mengarang
|
2017-07-24 10:10:16 +03:00
|
|
|
kareseh-peseh
|
2017-07-26 15:12:52 +03:00
|
|
|
karut-marut
|
|
|
|
karya-karya
|
2017-07-24 10:10:16 +03:00
|
|
|
kasak-kusuk
|
2017-07-26 15:12:52 +03:00
|
|
|
kasus-kasus
|
|
|
|
kata-kata
|
2017-07-24 10:10:16 +03:00
|
|
|
katang-katang
|
2017-07-26 15:12:52 +03:00
|
|
|
kava-kava
|
2017-07-24 10:10:16 +03:00
|
|
|
kawa-kawa
|
2017-07-26 15:12:52 +03:00
|
|
|
kawan-kawan
|
|
|
|
kawin-cerai
|
2017-07-27 15:46:30 +03:00
|
|
|
kawin-mawin
|
|
|
|
kayu-kayu
|
|
|
|
kayu-kayuan
|
|
|
|
ke-Allah-an
|
|
|
|
keabu-abuan
|
|
|
|
kearab-araban
|
|
|
|
keasyik-asyikan
|
|
|
|
kebarat-baratan
|
|
|
|
kebasah-basahan
|
|
|
|
kebat-kebit
|
|
|
|
kebata-bataan
|
|
|
|
kebayi-bayian
|
|
|
|
kebelanda-belandaan
|
|
|
|
keberlarut-larutan
|
|
|
|
kebesar-hatian
|
2017-07-26 15:12:52 +03:00
|
|
|
kebiasaan-kebiasaan
|
|
|
|
kebijakan-kebijakan
|
2017-07-27 15:46:30 +03:00
|
|
|
kebiru-biruan
|
|
|
|
kebudak-budakan
|
2017-07-26 15:12:52 +03:00
|
|
|
kebun-kebun
|
|
|
|
kebut-kebutan
|
|
|
|
kecamatan-kecamatan
|
2017-07-27 15:46:30 +03:00
|
|
|
kecentang-perenangan
|
2017-07-26 15:12:52 +03:00
|
|
|
kecil-kecil
|
|
|
|
kecil-kecilan
|
2017-07-27 15:46:30 +03:00
|
|
|
kecil-mengecil
|
|
|
|
kecokelat-cokelatan
|
|
|
|
kecomak-kecimik
|
2017-07-24 10:10:16 +03:00
|
|
|
kecuh-kecah
|
2017-07-27 15:46:30 +03:00
|
|
|
kedek-kedek
|
|
|
|
kedekak-kedekik
|
|
|
|
kedesa-desaan
|
2017-07-26 15:12:52 +03:00
|
|
|
kedubes-kedubes
|
|
|
|
kedutaan-kedutaan
|
2017-07-27 15:46:30 +03:00
|
|
|
keempat-empatnya
|
|
|
|
kegadis-gadisan
|
|
|
|
kegelap-gelapan
|
2017-07-26 15:12:52 +03:00
|
|
|
kegiatan-kegiatan
|
2017-07-27 15:46:30 +03:00
|
|
|
kegila-gilaan
|
|
|
|
kegirang-girangan
|
2017-07-26 15:12:52 +03:00
|
|
|
kehati-hatian
|
2017-07-27 15:46:30 +03:00
|
|
|
keheran-heranan
|
|
|
|
kehijau-hijauan
|
|
|
|
kehitam-hitaman
|
|
|
|
keinggris-inggrisan
|
|
|
|
kejaga-jagaan
|
2017-07-26 15:12:52 +03:00
|
|
|
kejahatan-kejahatan
|
|
|
|
kejang-kejang
|
|
|
|
kejar-kejar
|
|
|
|
kejar-kejaran
|
2017-07-27 15:46:30 +03:00
|
|
|
kejar-mengejar
|
|
|
|
kejingga-jinggaan
|
|
|
|
kejut-kejut
|
2017-07-26 15:12:52 +03:00
|
|
|
kejutan-kejutan
|
2017-07-27 15:46:30 +03:00
|
|
|
kekabur-kaburan
|
|
|
|
kekanak-kanakan
|
|
|
|
kekoboi-koboian
|
|
|
|
kekota-kotaan
|
2017-07-26 15:12:52 +03:00
|
|
|
kekuasaan-kekuasaan
|
2017-07-27 15:46:30 +03:00
|
|
|
kekuning-kuningan
|
2017-07-24 10:10:16 +03:00
|
|
|
kelak-kelik
|
|
|
|
kelak-keluk
|
2017-07-27 15:46:30 +03:00
|
|
|
kelaki-lakian
|
2017-07-24 10:10:16 +03:00
|
|
|
kelang-kelok
|
|
|
|
kelap-kelip
|
2017-07-27 15:46:30 +03:00
|
|
|
kelasah-kelusuh
|
|
|
|
kelek-kelek
|
|
|
|
kelek-kelekan
|
2017-07-24 10:10:16 +03:00
|
|
|
kelemak-kelemek
|
|
|
|
kelik-kelik
|
2017-07-27 15:46:30 +03:00
|
|
|
kelip-kelip
|
2017-07-26 15:12:52 +03:00
|
|
|
kelompok-kelompok
|
2017-07-24 10:10:16 +03:00
|
|
|
kelontang-kelantung
|
2017-07-26 15:12:52 +03:00
|
|
|
keluar-masuk
|
|
|
|
kelurahan-kelurahan
|
2017-07-24 10:10:16 +03:00
|
|
|
kelusuh-kelasah
|
2017-07-27 15:46:30 +03:00
|
|
|
kelut-melut
|
|
|
|
kemak-kemik
|
|
|
|
kemalu-maluan
|
2017-07-26 15:12:52 +03:00
|
|
|
kemana-mana
|
2017-07-27 15:46:30 +03:00
|
|
|
kemanja-manjaan
|
|
|
|
kemarah-marahan
|
|
|
|
kemasam-masaman
|
|
|
|
kemati-matian
|
2017-07-26 15:12:52 +03:00
|
|
|
kembang-kembang
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
kemenpan-rb
|
2017-07-26 15:12:52 +03:00
|
|
|
kementerian-kementerian
|
2017-07-27 15:46:30 +03:00
|
|
|
kemerah-merahan
|
|
|
|
kempang-kempis
|
|
|
|
kempas-kempis
|
|
|
|
kemuda-mudaan
|
|
|
|
kena-mengena
|
|
|
|
kenal-mengenal
|
2017-07-26 15:12:52 +03:00
|
|
|
kenang-kenangan
|
2017-07-27 15:46:30 +03:00
|
|
|
kencang-kencung
|
|
|
|
kencing-mengencingi
|
2017-07-24 10:10:16 +03:00
|
|
|
kencrang-kencring
|
2017-07-27 15:46:30 +03:00
|
|
|
kendang-kendang
|
|
|
|
kendang-kendangan
|
|
|
|
keningrat-ningratan
|
|
|
|
kentung-kentung
|
|
|
|
kenyat-kenyit
|
2017-07-26 15:12:52 +03:00
|
|
|
kepala-kepala
|
2017-07-27 15:46:30 +03:00
|
|
|
kepala-kepalaan
|
|
|
|
kepandir-pandiran
|
|
|
|
kepang-kepot
|
|
|
|
keperak-perakan
|
|
|
|
kepetah-lidahan
|
|
|
|
kepilu-piluan
|
2017-07-26 15:12:52 +03:00
|
|
|
keping-keping
|
2017-07-27 15:46:30 +03:00
|
|
|
kepucat-pucatan
|
|
|
|
kepuh-kepuh
|
|
|
|
kepura-puraan
|
|
|
|
keputih-putihan
|
|
|
|
kerah-kerahan
|
|
|
|
kerancak-rancakan
|
|
|
|
kerang-kerangan
|
2017-07-24 10:10:16 +03:00
|
|
|
kerang-keroh
|
2017-07-27 15:46:30 +03:00
|
|
|
kerang-kerot
|
|
|
|
kerang-keruk
|
|
|
|
kerang-kerung
|
|
|
|
kerap-kerap
|
|
|
|
keras-mengerasi
|
2017-07-24 10:10:16 +03:00
|
|
|
kercap-kercip
|
|
|
|
kercap-kercup
|
|
|
|
keriang-keriut
|
2017-07-26 15:12:52 +03:00
|
|
|
kerja-kerja
|
2017-07-24 10:10:16 +03:00
|
|
|
kernyat-kernyut
|
2017-07-27 15:46:30 +03:00
|
|
|
kerobak-kerabit
|
|
|
|
kerobak-kerobek
|
|
|
|
kerobak-kerobik
|
|
|
|
kerobat-kerabit
|
|
|
|
kerong-kerong
|
2017-07-24 10:10:16 +03:00
|
|
|
keropas-kerapis
|
2017-07-27 15:46:30 +03:00
|
|
|
kertak-kertuk
|
|
|
|
kertap-kertap
|
2017-07-24 10:10:16 +03:00
|
|
|
keruntang-pungkang
|
2017-07-26 15:12:52 +03:00
|
|
|
kesalahan-kesalahan
|
2017-07-24 10:10:16 +03:00
|
|
|
kesap-kesip
|
2017-07-27 15:46:30 +03:00
|
|
|
kesemena-menaan
|
|
|
|
kesenak-senakan
|
|
|
|
kesewenang-wenangan
|
|
|
|
kesia-siaan
|
|
|
|
kesik-kesik
|
|
|
|
kesipu-sipuan
|
2017-07-24 10:10:16 +03:00
|
|
|
kesu-kesi
|
|
|
|
kesuh-kesih
|
|
|
|
kesuk-kesik
|
|
|
|
ketakar-keteker
|
2017-07-26 15:12:52 +03:00
|
|
|
ketakutan-ketakutan
|
2017-07-24 10:10:16 +03:00
|
|
|
ketap-ketap
|
2017-07-27 15:46:30 +03:00
|
|
|
ketap-ketip
|
2017-07-26 15:12:52 +03:00
|
|
|
ketar-ketir
|
|
|
|
ketentuan-ketentuan
|
2017-07-27 15:46:30 +03:00
|
|
|
ketergesa-gesaan
|
|
|
|
keti-keti
|
|
|
|
ketidur-tiduran
|
|
|
|
ketiga-tiganya
|
2017-07-24 10:10:16 +03:00
|
|
|
ketir-ketir
|
2017-07-26 15:12:52 +03:00
|
|
|
ketua-ketua
|
2017-07-27 15:46:30 +03:00
|
|
|
ketua-tuaan
|
|
|
|
ketuan-tuanan
|
|
|
|
keungu-unguan
|
|
|
|
kewangi-wangian
|
2017-07-26 15:12:52 +03:00
|
|
|
ki-ka
|
2017-07-27 15:46:30 +03:00
|
|
|
kia-kia
|
2017-07-26 15:12:52 +03:00
|
|
|
kiai-kiai
|
2017-07-27 15:46:30 +03:00
|
|
|
kiak-kiak
|
|
|
|
kial-kial
|
2017-07-24 10:10:16 +03:00
|
|
|
kiang-kiut
|
2017-07-26 15:12:52 +03:00
|
|
|
kiat-kiat
|
2017-07-24 10:10:16 +03:00
|
|
|
kibang-kibut
|
|
|
|
kicang-kecoh
|
|
|
|
kicang-kicu
|
2017-07-26 15:12:52 +03:00
|
|
|
kick-off
|
2017-07-24 10:10:16 +03:00
|
|
|
kida-kida
|
|
|
|
kijang-kijang
|
2017-07-27 15:46:30 +03:00
|
|
|
kilau-mengilau
|
|
|
|
kili-kili
|
|
|
|
kilik-kilik
|
2017-07-26 15:12:52 +03:00
|
|
|
kincir-kincir
|
|
|
|
kios-kios
|
2017-07-24 10:10:16 +03:00
|
|
|
kira-kira
|
2017-07-27 15:46:30 +03:00
|
|
|
kira-kiraan
|
2017-07-26 15:12:52 +03:00
|
|
|
kiri-kanan
|
2017-07-27 15:46:30 +03:00
|
|
|
kirim-berkirim
|
2017-07-26 15:12:52 +03:00
|
|
|
kisah-kisah
|
|
|
|
kisi-kisi
|
|
|
|
kitab-kitab
|
2017-07-27 15:46:30 +03:00
|
|
|
kitang-kitang
|
2017-07-24 10:10:16 +03:00
|
|
|
kiu-kiu
|
2017-07-26 15:12:52 +03:00
|
|
|
klaim-klaim
|
2017-07-27 15:46:30 +03:00
|
|
|
klik-klikan
|
2017-07-26 15:12:52 +03:00
|
|
|
klip-klip
|
|
|
|
klub-klub
|
2017-07-24 10:10:16 +03:00
|
|
|
kluntang-klantung
|
2017-07-26 15:12:52 +03:00
|
|
|
knock-knock
|
|
|
|
knock-on
|
|
|
|
knock-out
|
|
|
|
ko-as
|
|
|
|
ko-pilot
|
2017-07-27 15:46:30 +03:00
|
|
|
koak-koak
|
|
|
|
koboi-koboian
|
2017-07-24 10:10:16 +03:00
|
|
|
kocah-kacih
|
|
|
|
kocar-kacir
|
2017-07-26 15:12:52 +03:00
|
|
|
kodam-kodam
|
|
|
|
kode-kode
|
|
|
|
kodim-kodim
|
2017-07-24 10:10:16 +03:00
|
|
|
kodok-kodok
|
|
|
|
kolang-kaling
|
|
|
|
kole-kole
|
|
|
|
koleh-koleh
|
2017-07-27 15:46:30 +03:00
|
|
|
kolong-kolong
|
|
|
|
koma-koma
|
2017-07-24 10:10:16 +03:00
|
|
|
komat-kamit
|
2017-07-26 15:12:52 +03:00
|
|
|
komisaris-komisaris
|
|
|
|
komisi-komisi
|
|
|
|
komite-komite
|
|
|
|
komoditas-komoditas
|
2017-07-27 15:46:30 +03:00
|
|
|
kongko-kongko
|
2017-07-26 15:12:52 +03:00
|
|
|
konsulat-konsulat
|
|
|
|
konsultan-konsultan
|
2017-07-24 10:10:16 +03:00
|
|
|
kontal-kantil
|
|
|
|
kontang-kanting
|
2017-07-26 15:12:52 +03:00
|
|
|
kontra-terorisme
|
|
|
|
kontrak-kontrak
|
|
|
|
konvensi-konvensi
|
2017-07-24 10:10:16 +03:00
|
|
|
kopat-kapit
|
2017-07-26 15:12:52 +03:00
|
|
|
koperasi-koperasi
|
|
|
|
kopi-kopi
|
|
|
|
koran-koran
|
2017-07-27 15:46:30 +03:00
|
|
|
koreng-koreng
|
2017-07-26 15:12:52 +03:00
|
|
|
kos-kosan
|
2017-07-24 10:10:16 +03:00
|
|
|
kosak-kasik
|
2017-07-26 15:12:52 +03:00
|
|
|
kota-kota
|
|
|
|
kota-wakil
|
2017-07-24 10:10:16 +03:00
|
|
|
kotak-katik
|
2017-07-26 15:12:52 +03:00
|
|
|
kotak-kotak
|
2017-07-27 15:46:30 +03:00
|
|
|
koyak-koyak
|
|
|
|
kuas-kuas
|
|
|
|
kuat-kuat
|
2017-07-26 15:12:52 +03:00
|
|
|
kubu-kubuan
|
2017-07-24 10:10:16 +03:00
|
|
|
kucar-kacir
|
2017-07-27 15:46:30 +03:00
|
|
|
kucing-kucing
|
2017-07-26 15:12:52 +03:00
|
|
|
kucing-kucingan
|
2017-07-24 10:10:16 +03:00
|
|
|
kuda-kuda
|
2017-07-27 15:46:30 +03:00
|
|
|
kuda-kudaan
|
|
|
|
kudap-kudap
|
2017-07-26 15:12:52 +03:00
|
|
|
kue-kue
|
2017-07-27 15:46:30 +03:00
|
|
|
kulah-kulah
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
kulak-kulak
|
2017-07-24 10:10:16 +03:00
|
|
|
kulik-kulik
|
2017-07-27 15:46:30 +03:00
|
|
|
kulum-kulum
|
|
|
|
kumat-kamit
|
2017-07-26 15:12:52 +03:00
|
|
|
kumpul-kumpul
|
2017-07-24 10:10:16 +03:00
|
|
|
kunang-kunang
|
|
|
|
kunar-kunar
|
2017-07-26 15:12:52 +03:00
|
|
|
kung-fu
|
|
|
|
kuning-hitam
|
2017-07-24 10:10:16 +03:00
|
|
|
kupat-kapit
|
|
|
|
kupu-kupu
|
|
|
|
kura-kura
|
2017-07-27 15:46:30 +03:00
|
|
|
kurang-kurang
|
2017-07-24 10:10:16 +03:00
|
|
|
kusat-mesat
|
|
|
|
kutat-kutet
|
2017-07-27 15:46:30 +03:00
|
|
|
kuti-kuti
|
2017-07-24 10:10:16 +03:00
|
|
|
kuwung-kuwung
|
2017-07-26 15:12:52 +03:00
|
|
|
kyai-kyai
|
2017-07-24 10:10:16 +03:00
|
|
|
laba-laba
|
|
|
|
labi-labi
|
2017-07-27 15:46:30 +03:00
|
|
|
labu-labu
|
2017-07-26 15:12:52 +03:00
|
|
|
laga-laga
|
|
|
|
lagi-lagi
|
|
|
|
lagu-lagu
|
2017-07-24 10:10:16 +03:00
|
|
|
laguh-lagah
|
2017-07-26 15:12:52 +03:00
|
|
|
lain-lain
|
|
|
|
laki-laki
|
2017-07-24 10:10:16 +03:00
|
|
|
lalu-lalang
|
2017-07-26 15:12:52 +03:00
|
|
|
lalu-lintas
|
2017-07-27 15:46:30 +03:00
|
|
|
lama-kelamaan
|
|
|
|
lama-lama
|
2017-07-24 10:10:16 +03:00
|
|
|
lamat-lamat
|
2017-07-27 15:46:30 +03:00
|
|
|
lambat-lambat
|
2017-07-26 15:12:52 +03:00
|
|
|
lampion-lampion
|
|
|
|
lampu-lampu
|
2017-07-27 15:46:30 +03:00
|
|
|
lancang-lancang
|
2017-07-24 10:10:16 +03:00
|
|
|
lancar-lancar
|
|
|
|
langak-longok
|
2017-07-27 15:46:30 +03:00
|
|
|
langgar-melanggar
|
2017-07-24 10:10:16 +03:00
|
|
|
langit-langit
|
2017-07-26 15:12:52 +03:00
|
|
|
langkah-langka
|
|
|
|
langkah-langkah
|
2017-07-27 15:46:30 +03:00
|
|
|
lanja-lanjaan
|
2017-07-26 15:12:52 +03:00
|
|
|
lapas-lapas
|
2017-07-24 10:10:16 +03:00
|
|
|
lapat-lapat
|
2017-07-26 15:12:52 +03:00
|
|
|
laporan-laporan
|
|
|
|
laptop-tablet
|
|
|
|
large-scale
|
2017-07-27 15:46:30 +03:00
|
|
|
lari-lari
|
2017-07-26 15:12:52 +03:00
|
|
|
lari-larian
|
|
|
|
laskar-laskar
|
|
|
|
lauk-pauk
|
2017-07-27 15:46:30 +03:00
|
|
|
laun-laun
|
2017-07-26 15:12:52 +03:00
|
|
|
laut-timur
|
2017-07-27 15:46:30 +03:00
|
|
|
lawah-lawah
|
|
|
|
lawak-lawak
|
2017-07-26 15:12:52 +03:00
|
|
|
lawan-lawan
|
2017-07-27 15:46:30 +03:00
|
|
|
lawi-lawi
|
2017-07-26 15:12:52 +03:00
|
|
|
layang-layang
|
2017-07-27 15:46:30 +03:00
|
|
|
layu-layuan
|
|
|
|
lebih-lebih
|
2017-07-26 15:12:52 +03:00
|
|
|
lecet-lecet
|
2017-07-24 10:10:16 +03:00
|
|
|
legak-legok
|
2017-07-27 15:46:30 +03:00
|
|
|
legum-legum
|
2017-07-24 10:10:16 +03:00
|
|
|
legup-legup
|
|
|
|
leha-leha
|
|
|
|
lekak-lekuk
|
|
|
|
lekap-lekup
|
2017-07-27 15:46:30 +03:00
|
|
|
lekas-lekas
|
|
|
|
lekat-lekat
|
2017-07-24 10:10:16 +03:00
|
|
|
lekuh-lekih
|
|
|
|
lekum-lekum
|
|
|
|
lekup-lekap
|
2017-07-26 15:12:52 +03:00
|
|
|
lembaga-lembaga
|
2017-07-27 15:46:30 +03:00
|
|
|
lempar-lemparan
|
2017-07-24 10:10:16 +03:00
|
|
|
lenggak-lenggok
|
2017-07-27 15:46:30 +03:00
|
|
|
lenggok-lenggok
|
|
|
|
lenggut-lenggut
|
|
|
|
lengket-lengket
|
2017-07-24 10:10:16 +03:00
|
|
|
lentam-lentum
|
|
|
|
lentang-lentok
|
2017-07-27 15:46:30 +03:00
|
|
|
lentang-lentung
|
|
|
|
lepa-lepa
|
|
|
|
lerang-lerang
|
|
|
|
lereng-lereng
|
2017-07-26 15:12:52 +03:00
|
|
|
lese-majeste
|
2017-07-27 15:46:30 +03:00
|
|
|
letah-letai
|
2017-07-24 10:10:16 +03:00
|
|
|
lete-lete
|
2017-07-27 15:46:30 +03:00
|
|
|
letuk-letuk
|
|
|
|
letum-letum
|
|
|
|
letup-letup
|
2017-07-26 15:12:52 +03:00
|
|
|
leyeh-leyeh
|
2017-07-27 15:46:30 +03:00
|
|
|
liang-liuk
|
|
|
|
liang-liut
|
2017-07-26 15:12:52 +03:00
|
|
|
liar-liar
|
2017-07-27 15:46:30 +03:00
|
|
|
liat-liut
|
2017-07-24 10:10:16 +03:00
|
|
|
lidah-lidah
|
2017-07-26 15:12:52 +03:00
|
|
|
life-toxins
|
|
|
|
liga-liga
|
|
|
|
light-emitting
|
|
|
|
lika-liku
|
|
|
|
lil-alamin
|
|
|
|
lilin-lilin
|
|
|
|
line-up
|
|
|
|
lintas-selat
|
2017-07-27 15:46:30 +03:00
|
|
|
lipat-melipat
|
2017-07-26 15:12:52 +03:00
|
|
|
liquid-cooled
|
|
|
|
lithium-ion
|
|
|
|
lithium-polymer
|
2017-07-24 10:10:16 +03:00
|
|
|
liuk-liuk
|
|
|
|
liung-liung
|
|
|
|
lobi-lobi
|
2017-07-26 15:12:52 +03:00
|
|
|
lock-up
|
|
|
|
locked-in
|
|
|
|
lokasi-lokasi
|
|
|
|
long-term
|
2017-07-24 10:10:16 +03:00
|
|
|
longak-longok
|
|
|
|
lontang-lanting
|
|
|
|
lontang-lantung
|
2017-07-27 15:46:30 +03:00
|
|
|
lopak-lapik
|
|
|
|
lopak-lopak
|
2017-07-26 15:12:52 +03:00
|
|
|
low-cost
|
|
|
|
low-density
|
|
|
|
low-end
|
|
|
|
low-light
|
|
|
|
low-multi
|
|
|
|
low-pass
|
|
|
|
lucu-lucu
|
|
|
|
luka-luka
|
|
|
|
lukisan-lukisan
|
2017-07-24 10:10:16 +03:00
|
|
|
lumba-lumba
|
|
|
|
lumi-lumi
|
|
|
|
luntang-lantung
|
|
|
|
lupa-lupa
|
2017-07-27 15:46:30 +03:00
|
|
|
lupa-lupaan
|
2017-07-26 15:12:52 +03:00
|
|
|
lurah-camat
|
2017-07-27 15:46:30 +03:00
|
|
|
maaf-memaafkan
|
|
|
|
mabuk-mabukan
|
|
|
|
mabul-mabul
|
|
|
|
macam-macam
|
|
|
|
macan-macanan
|
2017-07-26 15:12:52 +03:00
|
|
|
machine-to-machine
|
|
|
|
mafia-mafia
|
|
|
|
mahasiswa-mahasiswi
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
mahasiswa/i
|
2017-07-24 10:10:16 +03:00
|
|
|
mahi-mahi
|
2017-07-26 15:12:52 +03:00
|
|
|
main-main
|
2017-07-27 15:46:30 +03:00
|
|
|
main-mainan
|
2017-07-26 15:12:52 +03:00
|
|
|
main-mainlah
|
|
|
|
majelis-majelis
|
|
|
|
maju-mundur
|
|
|
|
makam-makam
|
2017-07-27 15:46:30 +03:00
|
|
|
makan-makan
|
|
|
|
makan-makanan
|
2017-07-26 15:12:52 +03:00
|
|
|
makanan-red
|
|
|
|
make-up
|
|
|
|
maki-maki
|
2017-07-27 15:46:30 +03:00
|
|
|
maki-makian
|
2017-07-26 15:12:52 +03:00
|
|
|
mal-mal
|
2017-07-27 15:46:30 +03:00
|
|
|
malai-malai
|
2017-07-26 15:12:52 +03:00
|
|
|
malam-malam
|
2017-07-27 15:46:30 +03:00
|
|
|
malar-malar
|
|
|
|
malas-malasan
|
2017-07-24 10:10:16 +03:00
|
|
|
mali-mali
|
2017-07-26 15:12:52 +03:00
|
|
|
malu-malu
|
|
|
|
mama-mama
|
|
|
|
man-in-the-middle
|
|
|
|
mana-mana
|
|
|
|
manajer-manajer
|
2017-07-27 15:46:30 +03:00
|
|
|
manik-manik
|
2017-07-26 15:12:52 +03:00
|
|
|
manis-manis
|
2017-07-27 15:46:30 +03:00
|
|
|
manis-manisan
|
2017-07-26 15:12:52 +03:00
|
|
|
marah-marah
|
|
|
|
mark-up
|
|
|
|
mas-mas
|
|
|
|
masa-masa
|
2017-07-27 15:46:30 +03:00
|
|
|
masak-masak
|
2017-07-26 15:12:52 +03:00
|
|
|
masalah-masalah
|
|
|
|
mash-up
|
2017-07-24 10:10:16 +03:00
|
|
|
masing-masing
|
2017-07-26 15:12:52 +03:00
|
|
|
masjid-masjid
|
|
|
|
masuk-keluar
|
2017-07-27 15:46:30 +03:00
|
|
|
mat-matan
|
2017-07-24 10:10:16 +03:00
|
|
|
mata-mata
|
2017-07-26 15:12:52 +03:00
|
|
|
match-fixing
|
|
|
|
mati-mati
|
2017-07-27 15:46:30 +03:00
|
|
|
mati-matian
|
|
|
|
maya-maya
|
2017-07-26 15:12:52 +03:00
|
|
|
mayat-mayat
|
|
|
|
mayday-mayday
|
|
|
|
media-media
|
|
|
|
mega-bintang
|
|
|
|
mega-tsunami
|
2017-07-24 10:10:16 +03:00
|
|
|
megal-megol
|
|
|
|
megap-megap
|
2017-07-27 15:46:30 +03:00
|
|
|
meger-meger
|
2017-07-24 10:10:16 +03:00
|
|
|
megrek-megrek
|
|
|
|
melak-melak
|
2017-07-27 15:46:30 +03:00
|
|
|
melambai-lambai
|
|
|
|
melambai-lambaikan
|
|
|
|
melambat-lambatkan
|
|
|
|
melaun-laun
|
|
|
|
melawak-lawak
|
|
|
|
melayang-layang
|
|
|
|
melayap-layap
|
|
|
|
melayap-layapkan
|
|
|
|
melebih-lebihi
|
|
|
|
melebih-lebihkan
|
|
|
|
melejang-lejangkan
|
|
|
|
melek-melekan
|
|
|
|
meleleh-leleh
|
|
|
|
melengah-lengah
|
|
|
|
melihat-lihat
|
|
|
|
melimpah-limpah
|
|
|
|
melincah-lincah
|
|
|
|
meliuk-liuk
|
|
|
|
melolong-lolong
|
|
|
|
melompat-lompat
|
|
|
|
meloncat-loncat
|
|
|
|
melonco-lonco
|
|
|
|
melongak-longok
|
|
|
|
melonjak-lonjak
|
|
|
|
memacak-macak
|
|
|
|
memada-madai
|
|
|
|
memadan-madan
|
|
|
|
memaki-maki
|
|
|
|
memaksa-maksa
|
|
|
|
memanas-manasi
|
|
|
|
memancit-mancitkan
|
|
|
|
memandai-mandai
|
|
|
|
memanggil-manggil
|
|
|
|
memanis-manis
|
|
|
|
memanjut-manjut
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
memantas-mantas
|
2017-07-27 15:46:30 +03:00
|
|
|
memasak-masak
|
2017-07-26 15:12:52 +03:00
|
|
|
memata-matai
|
2017-07-27 15:46:30 +03:00
|
|
|
mematah-matah
|
|
|
|
mematuk-matuk
|
|
|
|
mematut-matut
|
|
|
|
memau-mau
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
memayah-mayahkan
|
2017-07-27 15:46:30 +03:00
|
|
|
membaca-baca
|
|
|
|
membacah-bacah
|
|
|
|
membagi-bagikan
|
|
|
|
membalik-balik
|
|
|
|
membangkit-bangkit
|
|
|
|
membarut-barut
|
|
|
|
membawa-bawa
|
|
|
|
membayang-bayangi
|
|
|
|
membayang-bayangkan
|
|
|
|
membeda-bedakan
|
|
|
|
membelai-belai
|
|
|
|
membeli-beli
|
|
|
|
membelit-belitkan
|
|
|
|
membelu-belai
|
|
|
|
membenar-benar
|
|
|
|
membenar-benari
|
|
|
|
memberai-beraikan
|
|
|
|
membesar-besar
|
|
|
|
membesar-besarkan
|
|
|
|
membikin-bikin
|
|
|
|
membilah-bilah
|
|
|
|
membolak-balikkan
|
|
|
|
membongkar-bangkir
|
|
|
|
membongkar-bongkar
|
2017-07-26 15:12:52 +03:00
|
|
|
membuang-buang
|
2017-07-27 15:46:30 +03:00
|
|
|
membuat-buat
|
|
|
|
membulan-bulani
|
|
|
|
membunga-bungai
|
|
|
|
membungkuk-bungkuk
|
|
|
|
memburu-buru
|
|
|
|
memburu-burukan
|
|
|
|
memburuk-burukkan
|
|
|
|
memelintir-melintir
|
|
|
|
memencak-mencak
|
|
|
|
memencar-mencar
|
|
|
|
memercik-mercik
|
|
|
|
memetak-metak
|
|
|
|
memetang-metangkan
|
|
|
|
memetir-metir
|
|
|
|
memijar-mijar
|
|
|
|
memikir-mikir
|
|
|
|
memikir-mikirkan
|
|
|
|
memilih-milih
|
|
|
|
memilin-milin
|
|
|
|
meminang-minang
|
|
|
|
meminta-minta
|
|
|
|
memisah-misahkan
|
|
|
|
memontang-mantingkan
|
|
|
|
memorak-perandakan
|
|
|
|
memorak-porandakan
|
|
|
|
memotong-motong
|
|
|
|
memperamat-amat
|
|
|
|
memperamat-amatkan
|
|
|
|
memperbagai-bagaikan
|
|
|
|
memperganda-gandakan
|
|
|
|
memperganduh-ganduhkan
|
|
|
|
memperimpit-impitkan
|
|
|
|
memperkuda-kudakan
|
|
|
|
memperlengah-lengah
|
|
|
|
memperlengah-lengahkan
|
|
|
|
mempermacam-macamkan
|
|
|
|
memperolok-olok
|
|
|
|
memperolok-olokkan
|
|
|
|
mempersama-samakan
|
|
|
|
mempertubi-tubi
|
|
|
|
mempertubi-tubikan
|
|
|
|
memperturut-turutkan
|
|
|
|
memuja-muja
|
|
|
|
memukang-mukang
|
|
|
|
memulun-mulun
|
|
|
|
memundi-mundi
|
|
|
|
memundi-mundikan
|
|
|
|
memutar-mutar
|
|
|
|
memuyu-muyu
|
2017-07-26 15:12:52 +03:00
|
|
|
men-tweet
|
2017-07-27 15:46:30 +03:00
|
|
|
menagak-nagak
|
|
|
|
menakut-nakuti
|
2017-07-26 15:12:52 +03:00
|
|
|
menang-kalah
|
2017-07-27 15:46:30 +03:00
|
|
|
menanjur-nanjur
|
2017-07-24 10:10:16 +03:00
|
|
|
menanti-nanti
|
2017-07-26 15:12:52 +03:00
|
|
|
menari-nari
|
2017-07-27 15:46:30 +03:00
|
|
|
mencabik-cabik
|
|
|
|
mencabik-cabikkan
|
|
|
|
mencacah-cacah
|
|
|
|
mencaing-caing
|
|
|
|
mencak-mencak
|
|
|
|
mencakup-cakup
|
|
|
|
mencapak-capak
|
|
|
|
mencari-cari
|
|
|
|
mencarik-carik
|
|
|
|
mencarik-carikkan
|
|
|
|
mencarut-carut
|
|
|
|
mencengis-cengis
|
|
|
|
mencepak-cepak
|
|
|
|
mencepuk-cepuk
|
|
|
|
mencerai-beraikan
|
|
|
|
mencetai-cetai
|
|
|
|
menciak-ciak
|
|
|
|
menciap-ciap
|
|
|
|
menciar-ciar
|
|
|
|
mencita-citakan
|
|
|
|
mencium-cium
|
|
|
|
menciut-ciut
|
2017-07-24 10:10:16 +03:00
|
|
|
mencla-mencle
|
2017-07-27 15:46:30 +03:00
|
|
|
mencoang-coang
|
|
|
|
mencoba-coba
|
|
|
|
mencocok-cocok
|
|
|
|
mencolek-colek
|
|
|
|
menconteng-conteng
|
|
|
|
mencubit-cubit
|
|
|
|
mencucuh-cucuh
|
|
|
|
mencucuh-cucuhkan
|
|
|
|
mencuri-curi
|
|
|
|
mendecap-decap
|
|
|
|
mendegam-degam
|
|
|
|
mendengar-dengar
|
|
|
|
mendengking-dengking
|
|
|
|
mendengus-dengus
|
|
|
|
mendengut-dengut
|
|
|
|
menderai-deraikan
|
|
|
|
menderak-derakkan
|
|
|
|
menderau-derau
|
|
|
|
menderu-deru
|
|
|
|
mendesas-desuskan
|
|
|
|
mendesus-desus
|
|
|
|
mendetap-detap
|
|
|
|
mendewa-dewakan
|
|
|
|
mendudu-dudu
|
|
|
|
menduga-duga
|
|
|
|
menebu-nebu
|
|
|
|
menegur-neguri
|
|
|
|
menepak-nepak
|
|
|
|
menepak-nepakkan
|
|
|
|
mengabung-ngabung
|
|
|
|
mengaci-acikan
|
|
|
|
mengacu-acu
|
|
|
|
mengada-ada
|
2017-07-26 15:12:52 +03:00
|
|
|
mengada-ngada
|
2017-07-27 15:46:30 +03:00
|
|
|
mengadang-adangi
|
|
|
|
mengaduk-aduk
|
|
|
|
mengagak-agak
|
|
|
|
mengagak-agihkan
|
|
|
|
mengagut-agut
|
|
|
|
mengais-ngais
|
|
|
|
mengalang-alangi
|
|
|
|
mengali-ali
|
|
|
|
mengalur-alur
|
|
|
|
mengamang-amang
|
|
|
|
mengamat-amati
|
|
|
|
mengambai-ambaikan
|
|
|
|
mengambang-ambang
|
|
|
|
mengambung-ambung
|
|
|
|
mengambung-ambungkan
|
|
|
|
mengamit-ngamitkan
|
|
|
|
mengancai-ancaikan
|
|
|
|
mengancak-ancak
|
|
|
|
mengancar-ancar
|
|
|
|
mengangan-angan
|
|
|
|
mengangan-angankan
|
|
|
|
mengangguk-angguk
|
|
|
|
menganggut-anggut
|
|
|
|
mengangin-anginkan
|
|
|
|
mengangkat-angkat
|
|
|
|
menganjung-anjung
|
|
|
|
menganjung-anjungkan
|
|
|
|
mengap-mengap
|
|
|
|
mengapa-apai
|
|
|
|
mengapi-apikan
|
|
|
|
mengarah-arahi
|
|
|
|
mengarang-ngarang
|
|
|
|
mengata-ngatai
|
|
|
|
mengatup-ngatupkan
|
|
|
|
mengaum-aum
|
|
|
|
mengaum-aumkan
|
|
|
|
mengejan-ejan
|
|
|
|
mengejar-ngejar
|
|
|
|
mengejut-ngejuti
|
|
|
|
mengelai-ngelai
|
|
|
|
mengelepik-ngelepik
|
|
|
|
mengelip-ngelip
|
|
|
|
mengelu-elukan
|
|
|
|
mengelus-elus
|
|
|
|
mengembut-embut
|
|
|
|
mengempas-empaskan
|
|
|
|
mengenap-enapkan
|
|
|
|
mengendap-endap
|
|
|
|
mengenjak-enjak
|
|
|
|
mengentak-entak
|
|
|
|
mengentak-entakkan
|
|
|
|
mengepak-ngepak
|
|
|
|
mengepak-ngepakkan
|
|
|
|
mengepal-ngepalkan
|
|
|
|
mengerjap-ngerjap
|
|
|
|
mengerling-ngerling
|
|
|
|
mengertak-ngertakkan
|
|
|
|
mengesot-esot
|
|
|
|
menggaba-gabai
|
|
|
|
menggali-gali
|
|
|
|
menggalur-galur
|
|
|
|
menggamak-gamak
|
|
|
|
menggamit-gamitkan
|
|
|
|
menggapai-gapai
|
|
|
|
menggapai-gapaikan
|
|
|
|
menggaruk-garuk
|
|
|
|
menggebu-gebu
|
|
|
|
menggebyah-uyah
|
|
|
|
menggeleng-gelengkan
|
|
|
|
menggelepar-gelepar
|
|
|
|
menggelepar-geleparkan
|
|
|
|
menggeliang-geliutkan
|
|
|
|
menggelinding-gelinding
|
|
|
|
menggemak-gemak
|
|
|
|
menggembar-gemborkan
|
|
|
|
menggerak-gerakkan
|
|
|
|
menggerecak-gerecak
|
|
|
|
menggesa-gesakan
|
|
|
|
menggili-gili
|
|
|
|
menggodot-godot
|
|
|
|
menggolak-galikkan
|
|
|
|
menggorek-gorek
|
|
|
|
menggoreng-goreng
|
|
|
|
menggosok-gosok
|
|
|
|
menggoyang-goyangkan
|
|
|
|
mengguit-guit
|
|
|
|
menghalai-balaikan
|
|
|
|
menghalang-halangi
|
|
|
|
menghambur-hamburkan
|
|
|
|
menghinap-hinap
|
|
|
|
menghitam-memutihkan
|
|
|
|
menghitung-hitung
|
|
|
|
menghubung-hubungkan
|
|
|
|
menghujan-hujankan
|
|
|
|
mengiang-ngiang
|
|
|
|
mengibar-ngibarkan
|
|
|
|
mengibas-ngibas
|
|
|
|
mengibas-ngibaskan
|
|
|
|
mengidam-idamkan
|
|
|
|
mengilah-ngilahkan
|
|
|
|
mengilai-ilai
|
|
|
|
mengilat-ngilatkan
|
|
|
|
mengilik-ngilik
|
|
|
|
mengimak-imak
|
|
|
|
mengimbak-imbak
|
|
|
|
mengiming-iming
|
|
|
|
mengincrit-incrit
|
|
|
|
mengingat-ingat
|
|
|
|
menginjak-injak
|
|
|
|
mengipas-ngipas
|
|
|
|
mengira-ngira
|
|
|
|
mengira-ngirakan
|
|
|
|
mengiras-iras
|
|
|
|
mengiras-irasi
|
|
|
|
mengiris-iris
|
|
|
|
mengitar-ngitar
|
|
|
|
mengitik-ngitik
|
|
|
|
mengodol-odol
|
|
|
|
mengogok-ogok
|
|
|
|
mengolak-alik
|
|
|
|
mengolak-alikkan
|
|
|
|
mengolang-aling
|
|
|
|
mengolang-alingkan
|
|
|
|
mengoleng-oleng
|
2017-07-26 15:12:52 +03:00
|
|
|
mengolok-olok
|
2017-07-27 15:46:30 +03:00
|
|
|
mengombang-ambing
|
|
|
|
mengombang-ambingkan
|
|
|
|
mengongkang-ongkang
|
|
|
|
mengongkok-ongkok
|
|
|
|
mengonyah-anyih
|
|
|
|
mengopak-apik
|
|
|
|
mengorak-arik
|
|
|
|
mengorat-oret
|
|
|
|
mengorek-ngorek
|
|
|
|
mengoret-oret
|
|
|
|
mengorok-orok
|
|
|
|
mengotak-atik
|
|
|
|
mengotak-ngatikkan
|
|
|
|
mengotak-ngotakkan
|
|
|
|
mengoyak-ngoyak
|
|
|
|
mengoyak-ngoyakkan
|
|
|
|
mengoyak-oyak
|
|
|
|
menguar-nguarkan
|
|
|
|
menguar-uarkan
|
2017-07-26 15:12:52 +03:00
|
|
|
mengubah-ubah
|
2017-07-27 15:46:30 +03:00
|
|
|
mengubek-ubek
|
|
|
|
menguber-uber
|
|
|
|
mengubit-ubit
|
|
|
|
mengubrak-abrik
|
|
|
|
mengucar-ngacirkan
|
|
|
|
mengucek-ngucek
|
|
|
|
mengucek-ucek
|
|
|
|
menguik-uik
|
|
|
|
menguis-uis
|
|
|
|
mengulang-ulang
|
|
|
|
mengulas-ulas
|
|
|
|
mengulit-ulit
|
|
|
|
mengulum-ngulum
|
|
|
|
mengulur-ulur
|
|
|
|
menguman-uman
|
|
|
|
mengumbang-ambingkan
|
|
|
|
mengumpak-umpak
|
|
|
|
mengungkat-ungkat
|
|
|
|
mengungkit-ungkit
|
|
|
|
mengupa-upa
|
|
|
|
mengurik-urik
|
|
|
|
mengusil-usil
|
|
|
|
mengusil-usilkan
|
|
|
|
mengutak-atik
|
|
|
|
mengutak-ngatikkan
|
|
|
|
mengutik-ngutik
|
|
|
|
mengutik-utik
|
|
|
|
menika-nika
|
|
|
|
menimang-nimang
|
|
|
|
menimbang-nimbang
|
|
|
|
menimbun-nimbun
|
|
|
|
menimpang-nimpangkan
|
|
|
|
meningkat-ningkat
|
|
|
|
meniru-niru
|
2017-07-26 15:12:52 +03:00
|
|
|
menit-menit
|
2017-07-27 15:46:30 +03:00
|
|
|
menitar-nitarkan
|
|
|
|
meniup-niup
|
|
|
|
menjadi-jadi
|
|
|
|
menjadi-jadikan
|
|
|
|
menjedot-jedotkan
|
|
|
|
menjelek-jelekkan
|
|
|
|
menjengek-jengek
|
|
|
|
menjengit-jengit
|
|
|
|
menjerit-jerit
|
|
|
|
menjilat-jilat
|
|
|
|
menjungkat-jungkit
|
2017-07-26 15:12:52 +03:00
|
|
|
menko-menko
|
|
|
|
menlu-menlu
|
2017-07-27 15:46:30 +03:00
|
|
|
menonjol-nonjolkan
|
2017-07-26 15:12:52 +03:00
|
|
|
mentah-mentah
|
2017-07-27 15:46:30 +03:00
|
|
|
mentang-mentang
|
2017-07-26 15:12:52 +03:00
|
|
|
menteri-menteri
|
2017-07-24 10:10:16 +03:00
|
|
|
mentul-mentul
|
2017-07-27 15:46:30 +03:00
|
|
|
menuding-nuding
|
|
|
|
menumpah-numpahkan
|
|
|
|
menunda-nunda
|
|
|
|
menunduk-nunduk
|
|
|
|
menusuk-nusuk
|
|
|
|
menyala-nyala
|
|
|
|
menyama-nyama
|
|
|
|
menyama-nyamai
|
|
|
|
menyambar-nyambar
|
|
|
|
menyangkut-nyangkutkan
|
|
|
|
menyanjung-nyanjung
|
|
|
|
menyanjung-nyanjungkan
|
|
|
|
menyapu-nyapu
|
|
|
|
menyarat-nyarat
|
|
|
|
menyayat-nyayat
|
|
|
|
menyedang-nyedang
|
|
|
|
menyedang-nyedangkan
|
|
|
|
menyelang-nyelangkan
|
|
|
|
menyelang-nyeling
|
|
|
|
menyelang-nyelingkan
|
|
|
|
menyenak-nyenak
|
|
|
|
menyendi-nyendi
|
|
|
|
menyentak-nyentak
|
|
|
|
menyentuh-nyentuh
|
|
|
|
menyepak-nyepakkan
|
|
|
|
menyerak-nyerakkan
|
|
|
|
menyeret-nyeret
|
|
|
|
menyeru-nyerukan
|
|
|
|
menyetel-nyetel
|
|
|
|
menyia-nyiakan
|
|
|
|
menyibak-nyibak
|
|
|
|
menyobek-nyobek
|
|
|
|
menyorong-nyorongkan
|
|
|
|
menyungguh-nyungguhi
|
|
|
|
menyuruk-nyuruk
|
|
|
|
meraba-raba
|
2017-07-26 15:12:52 +03:00
|
|
|
merah-hitam
|
|
|
|
merah-merah
|
2017-07-27 15:46:30 +03:00
|
|
|
merambang-rambang
|
|
|
|
merangkak-rangkak
|
|
|
|
merasa-rasai
|
|
|
|
merata-ratakan
|
|
|
|
meraung-raung
|
|
|
|
meraung-raungkan
|
|
|
|
merayau-rayau
|
|
|
|
merayu-rayu
|
2017-07-24 10:10:16 +03:00
|
|
|
mercak-mercik
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
mercedes-benz
|
2017-07-26 15:12:52 +03:00
|
|
|
merek-merek
|
|
|
|
mereka-mereka
|
2017-07-27 15:46:30 +03:00
|
|
|
mereka-reka
|
|
|
|
merelap-relap
|
|
|
|
merem-merem
|
|
|
|
meremah-remah
|
|
|
|
meremas-remas
|
|
|
|
meremeh-temehkan
|
|
|
|
merempah-rempah
|
|
|
|
merempah-rempahi
|
|
|
|
merengek-rengek
|
|
|
|
merengeng-rengeng
|
|
|
|
merenik-renik
|
|
|
|
merenta-renta
|
|
|
|
merenyai-renyai
|
|
|
|
meresek-resek
|
|
|
|
merintang-rintang
|
|
|
|
merintik-rintik
|
|
|
|
merobek-robek
|
|
|
|
meronta-ronta
|
|
|
|
meruap-ruap
|
|
|
|
merubu-rubu
|
|
|
|
merungus-rungus
|
|
|
|
merungut-rungut
|
2017-07-26 15:12:52 +03:00
|
|
|
meta-analysis
|
|
|
|
metode-metode
|
2017-07-27 15:46:30 +03:00
|
|
|
mewanti-wanti
|
|
|
|
mewarna-warnikan
|
|
|
|
meyakin-yakini
|
2017-07-26 15:12:52 +03:00
|
|
|
mid-range
|
|
|
|
mid-size
|
2017-07-27 15:46:30 +03:00
|
|
|
miju-miju
|
2017-07-26 15:12:52 +03:00
|
|
|
mikro-kecil
|
|
|
|
mimpi-mimpi
|
|
|
|
minggu-minggu
|
2017-07-27 15:46:30 +03:00
|
|
|
minta-minta
|
2017-07-26 15:12:52 +03:00
|
|
|
minuman-minuman
|
|
|
|
mixed-use
|
|
|
|
mobil-mobil
|
|
|
|
mobile-first
|
|
|
|
mobile-friendly
|
2017-07-27 15:46:30 +03:00
|
|
|
moga-moga
|
2017-07-26 15:12:52 +03:00
|
|
|
mola-mola
|
|
|
|
momen-momen
|
2017-07-24 10:10:16 +03:00
|
|
|
mondar-mandir
|
2017-07-26 15:12:52 +03:00
|
|
|
monyet-monyet
|
2017-07-27 15:46:30 +03:00
|
|
|
morak-marik
|
2017-07-24 10:10:16 +03:00
|
|
|
morat-marit
|
2017-07-26 15:12:52 +03:00
|
|
|
move-on
|
|
|
|
muda-muda
|
|
|
|
muda-mudi
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
muda/i
|
2017-07-26 15:12:52 +03:00
|
|
|
mudah-mudahan
|
|
|
|
muka-muka
|
2017-07-27 15:46:30 +03:00
|
|
|
mula-mula
|
2017-07-26 15:12:52 +03:00
|
|
|
multiple-output
|
|
|
|
muluk-muluk
|
2017-07-27 15:46:30 +03:00
|
|
|
mulut-mulutan
|
2017-07-26 15:12:52 +03:00
|
|
|
mumi-mumi
|
|
|
|
mundur-mundur
|
|
|
|
muntah-muntah
|
|
|
|
murid-muridnya
|
|
|
|
musda-musda
|
|
|
|
museum-museum
|
|
|
|
muslim-muslimah
|
|
|
|
musuh-musuh
|
|
|
|
musuh-musuhnya
|
|
|
|
nabi-nabi
|
2017-07-27 15:46:30 +03:00
|
|
|
nada-nadanya
|
|
|
|
naga-naga
|
|
|
|
naga-naganya
|
2017-07-26 15:12:52 +03:00
|
|
|
naik-naik
|
|
|
|
naik-turun
|
2017-07-27 15:46:30 +03:00
|
|
|
nakal-nakalan
|
2017-07-26 15:12:52 +03:00
|
|
|
nama-nama
|
2017-07-27 15:46:30 +03:00
|
|
|
nanti-nantian
|
2017-07-26 15:12:52 +03:00
|
|
|
nanya-nanya
|
2017-07-24 10:10:16 +03:00
|
|
|
nasi-nasi
|
2017-07-27 15:46:30 +03:00
|
|
|
nasib-nasiban
|
2017-07-26 15:12:52 +03:00
|
|
|
near-field
|
|
|
|
negara-negara
|
|
|
|
negera-negara
|
|
|
|
negeri-negeri
|
|
|
|
negeri-red
|
2017-07-24 10:10:16 +03:00
|
|
|
neka-neka
|
2017-07-27 15:46:30 +03:00
|
|
|
nekat-nekat
|
2017-07-26 15:12:52 +03:00
|
|
|
neko-neko
|
|
|
|
nenek-nenek
|
|
|
|
neo-liberalisme
|
|
|
|
next-gen
|
|
|
|
next-generation
|
2017-07-27 15:46:30 +03:00
|
|
|
ngeang-ngeang
|
2017-07-26 15:12:52 +03:00
|
|
|
ngeri-ngeri
|
|
|
|
nggak-nggak
|
|
|
|
ngobrol-ngobrol
|
|
|
|
ngumpul-ngumpul
|
|
|
|
nilai-nilai
|
|
|
|
nine-dash
|
|
|
|
nipa-nipa
|
2017-07-24 10:10:16 +03:00
|
|
|
nong-nong
|
2017-07-26 15:12:52 +03:00
|
|
|
norma-norma
|
|
|
|
novel-novel
|
2017-07-27 15:46:30 +03:00
|
|
|
nyai-nyai
|
|
|
|
nyolong-nyolong
|
|
|
|
nyut-nyutan
|
2017-07-26 15:12:52 +03:00
|
|
|
ob-gyn
|
|
|
|
obat-obat
|
|
|
|
obat-obatan
|
|
|
|
objek-objek
|
|
|
|
obok-obok
|
|
|
|
obrak-abrik
|
|
|
|
octa-core
|
|
|
|
odong-odong
|
|
|
|
oedipus-kompleks
|
|
|
|
off-road
|
2017-07-24 10:10:16 +03:00
|
|
|
ogah-agih
|
|
|
|
ogah-ogah
|
2017-07-27 15:46:30 +03:00
|
|
|
ogah-ogahan
|
2017-07-24 10:10:16 +03:00
|
|
|
ogak-agik
|
|
|
|
ogak-ogak
|
2017-07-26 15:12:52 +03:00
|
|
|
ogoh-ogoh
|
2017-07-24 10:10:16 +03:00
|
|
|
olak-alik
|
|
|
|
olak-olak
|
|
|
|
olang-aling
|
2017-07-27 15:46:30 +03:00
|
|
|
olang-alingan
|
2017-07-26 15:12:52 +03:00
|
|
|
ole-ole
|
2017-07-24 10:10:16 +03:00
|
|
|
oleh-oleh
|
2017-07-26 15:12:52 +03:00
|
|
|
olok-olok
|
2017-07-27 15:46:30 +03:00
|
|
|
olok-olokan
|
2017-07-24 10:10:16 +03:00
|
|
|
olong-olong
|
2017-07-26 15:12:52 +03:00
|
|
|
om-om
|
2017-07-24 10:10:16 +03:00
|
|
|
ombang-ambing
|
2017-07-26 15:12:52 +03:00
|
|
|
omni-channel
|
|
|
|
on-board
|
|
|
|
on-demand
|
|
|
|
on-fire
|
|
|
|
on-line
|
|
|
|
on-off
|
|
|
|
on-premises
|
|
|
|
on-roll
|
|
|
|
on-screen
|
|
|
|
on-the-go
|
2017-07-24 10:10:16 +03:00
|
|
|
onde-onde
|
|
|
|
ondel-ondel
|
2017-07-27 15:46:30 +03:00
|
|
|
ondos-ondos
|
2017-07-26 15:12:52 +03:00
|
|
|
one-click
|
|
|
|
one-to-one
|
|
|
|
one-touch
|
|
|
|
one-two
|
2017-07-24 10:10:16 +03:00
|
|
|
oneng-oneng
|
2017-07-27 15:46:30 +03:00
|
|
|
ongkang-ongkang
|
2017-07-24 10:10:16 +03:00
|
|
|
ongol-ongol
|
2017-07-26 15:12:52 +03:00
|
|
|
online-to-offline
|
2017-07-24 10:10:16 +03:00
|
|
|
ontran-ontran
|
|
|
|
onyah-anyih
|
|
|
|
onyak-anyik
|
|
|
|
opak-apik
|
2017-07-26 15:12:52 +03:00
|
|
|
opsi-opsi
|
|
|
|
opt-in
|
2017-07-24 10:10:16 +03:00
|
|
|
orak-arik
|
|
|
|
orang-aring
|
2017-07-26 15:12:52 +03:00
|
|
|
orang-orang
|
2017-07-27 15:46:30 +03:00
|
|
|
orang-orangan
|
2017-07-24 10:10:16 +03:00
|
|
|
orat-oret
|
2017-07-26 15:12:52 +03:00
|
|
|
organisasi-organisasi
|
|
|
|
ormas-ormas
|
2017-07-24 10:10:16 +03:00
|
|
|
orok-orok
|
|
|
|
orong-orong
|
2017-07-26 15:12:52 +03:00
|
|
|
oseng-oseng
|
2017-07-24 10:10:16 +03:00
|
|
|
otak-atik
|
|
|
|
otak-otak
|
2017-07-27 15:46:30 +03:00
|
|
|
otak-otakan
|
2017-07-26 15:12:52 +03:00
|
|
|
over-heating
|
|
|
|
over-the-air
|
|
|
|
over-the-top
|
|
|
|
pa-pa
|
|
|
|
pabrik-pabrik
|
2017-07-27 15:46:30 +03:00
|
|
|
padi-padian
|
2017-07-26 15:12:52 +03:00
|
|
|
pagi-pagi
|
|
|
|
pagi-sore
|
|
|
|
pajak-pajak
|
|
|
|
paket-paket
|
2017-07-24 10:10:16 +03:00
|
|
|
palas-palas
|
|
|
|
palato-alveolar
|
2017-07-27 15:46:30 +03:00
|
|
|
paling-paling
|
2017-07-26 15:12:52 +03:00
|
|
|
palu-arit
|
2017-07-27 15:46:30 +03:00
|
|
|
palu-memalu
|
2017-07-26 15:12:52 +03:00
|
|
|
panas-dingin
|
2017-07-27 15:46:30 +03:00
|
|
|
panas-panas
|
2017-07-26 15:12:52 +03:00
|
|
|
pandai-pandai
|
2017-07-27 15:46:30 +03:00
|
|
|
pandang-memandang
|
2017-07-26 15:12:52 +03:00
|
|
|
panel-panel
|
|
|
|
pangeran-pangeran
|
|
|
|
panggung-panggung
|
|
|
|
pangkalan-pangkalan
|
|
|
|
panja-panja
|
|
|
|
panji-panji
|
|
|
|
pansus-pansus
|
|
|
|
pantai-pantai
|
2017-07-24 10:10:16 +03:00
|
|
|
pao-pao
|
2017-07-27 15:46:30 +03:00
|
|
|
para-para
|
2017-07-24 10:10:16 +03:00
|
|
|
parang-parang
|
2017-07-26 15:12:52 +03:00
|
|
|
parpol-parpol
|
|
|
|
partai-partai
|
|
|
|
paru-paru
|
|
|
|
pas-pasan
|
|
|
|
pasal-pasal
|
2017-07-27 15:46:30 +03:00
|
|
|
pasang-memasang
|
2017-07-26 15:12:52 +03:00
|
|
|
pasang-surut
|
|
|
|
pasar-pasar
|
2017-07-24 10:10:16 +03:00
|
|
|
pasu-pasu
|
2017-07-26 15:12:52 +03:00
|
|
|
paus-paus
|
2017-07-27 15:46:30 +03:00
|
|
|
paut-memaut
|
2017-07-26 15:12:52 +03:00
|
|
|
pay-per-click
|
2017-07-24 10:10:16 +03:00
|
|
|
paya-paya
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
pdi-p
|
2017-07-26 15:12:52 +03:00
|
|
|
pecah-pecah
|
|
|
|
pecat-pecatan
|
|
|
|
peer-to-peer
|
|
|
|
pejabat-pejabat
|
2017-07-27 15:46:30 +03:00
|
|
|
pekak-pekak
|
|
|
|
pekik-pekuk
|
2017-07-26 15:12:52 +03:00
|
|
|
pelabuhan-pelabuhan
|
|
|
|
pelacur-pelacur
|
|
|
|
pelajar-pelajar
|
|
|
|
pelan-pelan
|
|
|
|
pelangi-pelangi
|
|
|
|
pem-bully
|
|
|
|
pemain-pemain
|
2017-07-27 15:46:30 +03:00
|
|
|
pemata-mataan
|
2017-07-26 15:12:52 +03:00
|
|
|
pemda-pemda
|
|
|
|
pemeluk-pemeluknya
|
|
|
|
pemerintah-pemerintah
|
|
|
|
pemerintah-red
|
|
|
|
pemerintah-swasta
|
2017-07-27 15:46:30 +03:00
|
|
|
pemetang-metangan
|
2017-07-26 15:12:52 +03:00
|
|
|
pemilu-pemilu
|
|
|
|
pemimpin-pemimpin
|
2017-07-27 15:46:30 +03:00
|
|
|
peminta-minta
|
2017-07-26 15:12:52 +03:00
|
|
|
pemuda-pemuda
|
|
|
|
pemuda-pemudi
|
|
|
|
penanggung-jawab
|
2017-07-27 15:46:30 +03:00
|
|
|
pengali-ali
|
2017-07-26 15:12:52 +03:00
|
|
|
pengaturan-pengaturan
|
2017-07-27 15:46:30 +03:00
|
|
|
penggembar-gemboran
|
|
|
|
pengorak-arik
|
|
|
|
pengotak-ngotakan
|
|
|
|
pengundang-undang
|
2017-07-26 15:12:52 +03:00
|
|
|
pengusaha-pengusaha
|
2017-07-27 15:46:30 +03:00
|
|
|
pentung-pentungan
|
2017-07-26 15:12:52 +03:00
|
|
|
penyakit-penyakit
|
2017-07-24 10:10:16 +03:00
|
|
|
perak-perak
|
2017-07-26 15:12:52 +03:00
|
|
|
perang-perangan
|
2017-07-24 10:10:16 +03:00
|
|
|
peras-perus
|
2017-07-26 15:12:52 +03:00
|
|
|
peraturan-peraturan
|
|
|
|
perda-perda
|
|
|
|
perempat-final
|
|
|
|
perempuan-perempuan
|
|
|
|
pergi-pergi
|
|
|
|
pergi-pulang
|
2017-07-27 15:46:30 +03:00
|
|
|
perintang-rintang
|
2017-07-26 15:12:52 +03:00
|
|
|
perkereta-apian
|
|
|
|
perlahan-lahan
|
2017-07-27 15:46:30 +03:00
|
|
|
perlip-perlipan
|
2017-07-26 15:12:52 +03:00
|
|
|
permen-permen
|
|
|
|
pernak-pernik
|
2017-07-27 15:46:30 +03:00
|
|
|
pernik-pernik
|
2017-07-24 10:10:16 +03:00
|
|
|
pertama-tama
|
2017-07-26 15:12:52 +03:00
|
|
|
pertandingan-pertandingan
|
|
|
|
pertimbangan-pertimbangan
|
|
|
|
perudang-undangan
|
|
|
|
perundang-undangan
|
|
|
|
perundangan-undangan
|
|
|
|
perusahaan-perusahaan
|
|
|
|
perusahaan-perusahan
|
|
|
|
perwakilan-perwakilan
|
|
|
|
pesan-pesan
|
|
|
|
pesawat-pesawat
|
|
|
|
peta-jalan
|
2017-07-27 15:46:30 +03:00
|
|
|
petang-petang
|
2017-07-24 10:10:16 +03:00
|
|
|
petantang-petenteng
|
|
|
|
petatang-peteteng
|
|
|
|
pete-pete
|
2017-07-26 15:12:52 +03:00
|
|
|
piala-piala
|
2017-07-27 15:46:30 +03:00
|
|
|
piat-piut
|
2017-07-26 15:12:52 +03:00
|
|
|
pick-up
|
|
|
|
picture-in-picture
|
|
|
|
pihak-pihak
|
2017-07-27 15:46:30 +03:00
|
|
|
pijak-pijak
|
|
|
|
pijar-pijar
|
|
|
|
pijat-pijat
|
2017-07-26 15:12:52 +03:00
|
|
|
pikir-pikir
|
|
|
|
pil-pil
|
|
|
|
pilah-pilih
|
|
|
|
pilih-pilih
|
|
|
|
pilihan-pilihan
|
2017-07-27 15:46:30 +03:00
|
|
|
pilin-memilin
|
2017-07-26 15:12:52 +03:00
|
|
|
pilkada-pilkada
|
2017-07-24 10:10:16 +03:00
|
|
|
pina-pina
|
2017-07-26 15:12:52 +03:00
|
|
|
pindah-pindah
|
|
|
|
ping-pong
|
|
|
|
pinjam-meminjam
|
|
|
|
pintar-pintarlah
|
2017-07-27 15:46:30 +03:00
|
|
|
pisang-pisang
|
|
|
|
pistol-pistolan
|
|
|
|
piting-memiting
|
2017-07-26 15:12:52 +03:00
|
|
|
planet-planet
|
|
|
|
play-off
|
|
|
|
plin-plan
|
2017-07-24 10:10:16 +03:00
|
|
|
plintat-plintut
|
|
|
|
plonga-plongo
|
2017-07-26 15:12:52 +03:00
|
|
|
plug-in
|
|
|
|
plus-minus
|
|
|
|
plus-plus
|
|
|
|
poco-poco
|
2017-07-27 15:46:30 +03:00
|
|
|
pohon-pohonan
|
2017-07-26 15:12:52 +03:00
|
|
|
poin-poin
|
|
|
|
point-of-sale
|
|
|
|
point-of-sales
|
|
|
|
pokemon-pokemon
|
|
|
|
pokja-pokja
|
|
|
|
pokok-pokok
|
2017-07-27 15:46:30 +03:00
|
|
|
pokrol-pokrolan
|
|
|
|
polang-paling
|
2017-07-26 15:12:52 +03:00
|
|
|
polda-polda
|
2017-07-27 15:46:30 +03:00
|
|
|
poleng-poleng
|
|
|
|
polong-polongan
|
2017-07-26 15:12:52 +03:00
|
|
|
polres-polres
|
|
|
|
polsek-polsek
|
|
|
|
polwan-polwan
|
2017-07-27 15:46:30 +03:00
|
|
|
poma-poma
|
2017-07-26 15:12:52 +03:00
|
|
|
pondok-pondok
|
|
|
|
ponpes-ponpes
|
2017-07-24 10:10:16 +03:00
|
|
|
pontang-panting
|
2017-07-26 15:12:52 +03:00
|
|
|
pop-up
|
2017-07-24 10:10:16 +03:00
|
|
|
porak-parik
|
|
|
|
porak-peranda
|
|
|
|
porak-poranda
|
2017-07-26 15:12:52 +03:00
|
|
|
pos-pos
|
|
|
|
posko-posko
|
2017-07-27 15:46:30 +03:00
|
|
|
potong-memotong
|
2017-07-26 15:12:52 +03:00
|
|
|
praktek-praktek
|
|
|
|
praktik-praktik
|
|
|
|
produk-produk
|
|
|
|
program-program
|
|
|
|
promosi-degradasi
|
|
|
|
provinsi-provinsi
|
|
|
|
proyek-proyek
|
|
|
|
puing-puing
|
|
|
|
puisi-puisi
|
2017-07-27 15:46:30 +03:00
|
|
|
puji-pujian
|
2017-07-24 10:10:16 +03:00
|
|
|
pukang-pukang
|
2017-07-27 15:46:30 +03:00
|
|
|
pukul-memukul
|
2017-07-26 15:12:52 +03:00
|
|
|
pulang-pergi
|
|
|
|
pulau-pulai
|
|
|
|
pulau-pulau
|
|
|
|
pull-up
|
2017-07-27 15:46:30 +03:00
|
|
|
pulut-pulut
|
2017-07-26 15:12:52 +03:00
|
|
|
pundi-pundi
|
2017-07-24 10:10:16 +03:00
|
|
|
pungak-pinguk
|
2017-07-27 15:46:30 +03:00
|
|
|
punggung-memunggung
|
2017-07-24 10:10:16 +03:00
|
|
|
pura-pura
|
|
|
|
puruk-parak
|
2017-07-27 15:46:30 +03:00
|
|
|
pusar-pusar
|
2017-07-26 15:12:52 +03:00
|
|
|
pusat-pusat
|
|
|
|
push-to-talk
|
|
|
|
push-up
|
|
|
|
push-ups
|
2017-07-27 15:46:30 +03:00
|
|
|
pusing-pusing
|
2017-07-26 15:12:52 +03:00
|
|
|
puskesmas-puskesmas
|
2017-07-27 15:46:30 +03:00
|
|
|
putar-putar
|
2017-07-26 15:12:52 +03:00
|
|
|
putera-puteri
|
|
|
|
putih-hitam
|
|
|
|
putih-putih
|
|
|
|
putra-putra
|
|
|
|
putra-putri
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
putra/i
|
2017-07-26 15:12:52 +03:00
|
|
|
putri-putri
|
|
|
|
putus-putus
|
|
|
|
putusan-putusan
|
2017-07-24 10:10:16 +03:00
|
|
|
puvi-puvi
|
2017-07-26 15:12:52 +03:00
|
|
|
quad-core
|
2017-07-27 15:46:30 +03:00
|
|
|
raba-rabaan
|
2017-07-24 10:10:16 +03:00
|
|
|
raba-rubu
|
2017-07-27 15:46:30 +03:00
|
|
|
rada-rada
|
2017-07-26 15:12:52 +03:00
|
|
|
radio-frequency
|
|
|
|
ragu-ragu
|
2017-07-27 15:46:30 +03:00
|
|
|
rahasia-rahasiaan
|
2017-07-26 15:12:52 +03:00
|
|
|
raja-raja
|
2017-07-24 10:10:16 +03:00
|
|
|
rama-rama
|
2017-07-26 15:12:52 +03:00
|
|
|
ramai-ramai
|
|
|
|
ramalan-ramalan
|
2017-07-27 15:46:30 +03:00
|
|
|
rambeh-rambeh
|
2017-07-26 15:12:52 +03:00
|
|
|
rambu-rambu
|
|
|
|
rame-rame
|
2017-07-27 15:46:30 +03:00
|
|
|
ramu-ramuan
|
2017-07-24 10:10:16 +03:00
|
|
|
randa-rondo
|
2017-07-27 15:46:30 +03:00
|
|
|
rangkul-merangkul
|
2017-07-24 10:10:16 +03:00
|
|
|
rango-rango
|
2017-07-26 15:12:52 +03:00
|
|
|
rap-rap
|
2017-07-27 15:46:30 +03:00
|
|
|
rasa-rasanya
|
2017-07-26 15:12:52 +03:00
|
|
|
rata-rata
|
2017-07-27 15:46:30 +03:00
|
|
|
raun-raun
|
2017-07-26 15:12:52 +03:00
|
|
|
read-only
|
|
|
|
real-life
|
|
|
|
real-time
|
2017-07-27 15:46:30 +03:00
|
|
|
rebah-rebah
|
|
|
|
rebah-rebahan
|
|
|
|
rebas-rebas
|
2017-07-26 15:12:52 +03:00
|
|
|
red-eye
|
2017-07-27 15:46:30 +03:00
|
|
|
redam-redam
|
|
|
|
redep-redup
|
2017-07-26 15:12:52 +03:00
|
|
|
rehab-rekon
|
2017-07-27 15:46:30 +03:00
|
|
|
reja-reja
|
|
|
|
reka-reka
|
|
|
|
reka-rekaan
|
2017-07-26 15:12:52 +03:00
|
|
|
rekan-rekan
|
|
|
|
rekan-rekannya
|
|
|
|
rekor-rekor
|
|
|
|
relief-relief
|
2017-07-27 15:46:30 +03:00
|
|
|
remah-remah
|
2017-07-26 15:12:52 +03:00
|
|
|
remang-remang
|
2017-07-27 15:46:30 +03:00
|
|
|
rembah-rembah
|
|
|
|
rembah-rembih
|
|
|
|
remeh-cemeh
|
|
|
|
remeh-temeh
|
2017-07-26 15:12:52 +03:00
|
|
|
rempah-rempah
|
|
|
|
rencana-rencana
|
2017-07-27 15:46:30 +03:00
|
|
|
renyai-renyai
|
2017-07-24 10:10:16 +03:00
|
|
|
rep-repan
|
2017-07-27 15:46:30 +03:00
|
|
|
repot-repot
|
|
|
|
repuh-repuh
|
2017-07-26 15:12:52 +03:00
|
|
|
restoran-restoran
|
2017-07-27 15:46:30 +03:00
|
|
|
retak-retak
|
2017-07-24 10:10:16 +03:00
|
|
|
riang-riang
|
|
|
|
ribu-ribu
|
2017-07-26 15:12:52 +03:00
|
|
|
ribut-ribut
|
|
|
|
rica-rica
|
|
|
|
ride-sharing
|
2017-07-24 10:10:16 +03:00
|
|
|
rigi-rigi
|
2017-07-27 15:46:30 +03:00
|
|
|
rinai-rinai
|
2017-07-24 10:10:16 +03:00
|
|
|
rintik-rintik
|
2017-07-26 15:12:52 +03:00
|
|
|
ritual-ritual
|
2017-07-24 10:10:16 +03:00
|
|
|
robak-rabik
|
|
|
|
robat-rabit
|
2017-07-26 15:12:52 +03:00
|
|
|
robot-robot
|
|
|
|
role-play
|
|
|
|
role-playing
|
|
|
|
roll-on
|
2017-07-24 10:10:16 +03:00
|
|
|
rombang-rambing
|
|
|
|
romol-romol
|
2017-07-27 15:46:30 +03:00
|
|
|
rompang-romping
|
2017-07-24 10:10:16 +03:00
|
|
|
rondah-rondih
|
|
|
|
ropak-rapik
|
2017-07-27 15:46:30 +03:00
|
|
|
royal-royalan
|
2017-07-26 15:12:52 +03:00
|
|
|
royo-royo
|
2017-07-27 15:46:30 +03:00
|
|
|
ruak-ruak
|
2017-07-24 10:10:16 +03:00
|
|
|
ruba-ruba
|
2017-07-26 15:12:52 +03:00
|
|
|
rudal-rudal
|
2017-07-27 15:46:30 +03:00
|
|
|
ruji-ruji
|
|
|
|
ruku-ruku
|
2017-07-26 15:12:52 +03:00
|
|
|
rumah-rumah
|
2017-07-27 15:46:30 +03:00
|
|
|
rumah-rumahan
|
2017-07-24 10:10:16 +03:00
|
|
|
rumbai-rumbai
|
2017-07-27 15:46:30 +03:00
|
|
|
rumput-rumputan
|
|
|
|
runding-merunding
|
2017-07-24 10:10:16 +03:00
|
|
|
rundu-rundu
|
|
|
|
runggu-rangga
|
2017-07-26 15:12:52 +03:00
|
|
|
runner-up
|
2017-07-24 10:10:16 +03:00
|
|
|
runtang-runtung
|
2017-07-26 15:12:52 +03:00
|
|
|
rupa-rupa
|
2017-07-27 15:46:30 +03:00
|
|
|
rupa-rupanya
|
2017-07-26 15:12:52 +03:00
|
|
|
rusun-rusun
|
|
|
|
rute-rute
|
|
|
|
saat-saat
|
2017-07-27 15:46:30 +03:00
|
|
|
saban-saban
|
2017-07-24 10:10:16 +03:00
|
|
|
sabu-sabu
|
2017-07-27 15:46:30 +03:00
|
|
|
sabung-menyabung
|
2017-07-26 15:12:52 +03:00
|
|
|
sah-sah
|
|
|
|
sahabat-sahabat
|
|
|
|
saham-saham
|
2017-07-27 15:46:30 +03:00
|
|
|
sahut-menyahut
|
|
|
|
saing-menyaing
|
|
|
|
saji-sajian
|
|
|
|
sakit-sakitan
|
2017-07-26 15:12:52 +03:00
|
|
|
saksi-saksi
|
2017-07-27 15:46:30 +03:00
|
|
|
saku-saku
|
|
|
|
salah-salah
|
2017-07-24 10:10:16 +03:00
|
|
|
sama-sama
|
2017-07-27 15:46:30 +03:00
|
|
|
samar-samar
|
|
|
|
sambar-menyambar
|
|
|
|
sambung-bersambung
|
|
|
|
sambung-menyambung
|
|
|
|
sambut-menyambut
|
2017-07-24 10:10:16 +03:00
|
|
|
samo-samo
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
sampah-sampah
|
2017-07-24 10:10:16 +03:00
|
|
|
sampai-sampai
|
2017-07-27 15:46:30 +03:00
|
|
|
samping-menyamping
|
2017-07-26 15:12:52 +03:00
|
|
|
sana-sini
|
2017-07-27 15:46:30 +03:00
|
|
|
sandar-menyandar
|
2017-07-26 15:12:52 +03:00
|
|
|
sandi-sandi
|
2017-07-27 15:46:30 +03:00
|
|
|
sangat-sangat
|
|
|
|
sangkut-menyangkut
|
|
|
|
sapa-menyapa
|
|
|
|
sapai-sapai
|
2017-07-26 15:12:52 +03:00
|
|
|
sapi-sapi
|
2017-07-27 15:46:30 +03:00
|
|
|
sapu-sapu
|
2017-07-26 15:12:52 +03:00
|
|
|
saran-saran
|
|
|
|
sarana-prasarana
|
2017-07-27 15:46:30 +03:00
|
|
|
sari-sari
|
2017-07-24 10:10:16 +03:00
|
|
|
sarit-sarit
|
2017-07-26 15:12:52 +03:00
|
|
|
satu-dua
|
|
|
|
satu-satu
|
|
|
|
satu-satunya
|
|
|
|
satuan-satuan
|
|
|
|
saudara-saudara
|
2017-07-27 15:46:30 +03:00
|
|
|
sauk-menyauk
|
|
|
|
sauk-sauk
|
2017-07-26 15:12:52 +03:00
|
|
|
sayang-sayang
|
|
|
|
sayap-sayap
|
2017-07-27 15:46:30 +03:00
|
|
|
sayup-menyayup
|
|
|
|
sayup-sayup
|
2017-07-26 15:12:52 +03:00
|
|
|
sayur-mayur
|
2017-07-27 15:46:30 +03:00
|
|
|
sayur-sayuran
|
2017-07-26 15:12:52 +03:00
|
|
|
sci-fi
|
2017-07-27 15:46:30 +03:00
|
|
|
seagak-agak
|
|
|
|
seakal-akal
|
|
|
|
seakan-akan
|
|
|
|
sealak-alak
|
|
|
|
seari-arian
|
2017-07-24 10:10:16 +03:00
|
|
|
sebaik-baiknya
|
2017-07-27 15:46:30 +03:00
|
|
|
sebelah-menyebelah
|
|
|
|
sebentar-sebentar
|
|
|
|
seberang-menyeberang
|
|
|
|
seberuntung-beruntungnya
|
2017-07-26 15:12:52 +03:00
|
|
|
sebesar-besarnya
|
2017-07-27 15:46:30 +03:00
|
|
|
seboleh-bolehnya
|
|
|
|
sedalam-dalamnya
|
|
|
|
sedam-sedam
|
|
|
|
sedang-menyedang
|
2017-07-26 15:12:52 +03:00
|
|
|
sedang-sedang
|
2017-07-27 15:46:30 +03:00
|
|
|
sedap-sedapan
|
|
|
|
sedapat-dapatnya
|
|
|
|
sedikit-dikitnya
|
|
|
|
sedikit-sedikit
|
|
|
|
sedikit-sedikitnya
|
|
|
|
sedini-dininya
|
|
|
|
seelok-eloknya
|
|
|
|
segala-galanya
|
|
|
|
segan-menyegan
|
|
|
|
segan-menyegani
|
2017-07-24 10:10:16 +03:00
|
|
|
segan-segan
|
2017-07-27 15:46:30 +03:00
|
|
|
sehabis-habisnya
|
2017-07-26 15:12:52 +03:00
|
|
|
sehari-hari
|
2017-07-27 15:46:30 +03:00
|
|
|
sehari-harian
|
2017-07-26 15:12:52 +03:00
|
|
|
sehari-harinya
|
2017-07-27 15:46:30 +03:00
|
|
|
sejadi-jadinya
|
2017-07-24 10:10:16 +03:00
|
|
|
sekali-kali
|
2017-07-27 15:46:30 +03:00
|
|
|
sekali-sekali
|
|
|
|
sekenyang-kenyangnya
|
|
|
|
sekira-kira
|
2017-07-26 15:12:52 +03:00
|
|
|
sekolah-sekolah
|
2017-07-24 10:10:16 +03:00
|
|
|
sekonyong-konyong
|
2017-07-27 15:46:30 +03:00
|
|
|
sekosong-kosongnya
|
2017-07-26 15:12:52 +03:00
|
|
|
sektor-sektor
|
2017-07-27 15:46:30 +03:00
|
|
|
sekuasa-kuasanya
|
|
|
|
sekuat-kuatnya
|
2017-07-24 10:10:16 +03:00
|
|
|
sekurang-kurangnya
|
2017-07-26 15:12:52 +03:00
|
|
|
sel-sel
|
2017-07-27 15:46:30 +03:00
|
|
|
sela-menyela
|
|
|
|
sela-sela
|
2017-07-24 10:10:16 +03:00
|
|
|
selak-seluk
|
|
|
|
selama-lamanya
|
2017-07-26 15:12:52 +03:00
|
|
|
selambat-lambatnya
|
2017-07-24 10:10:16 +03:00
|
|
|
selang-seli
|
|
|
|
selang-seling
|
2017-07-27 15:46:30 +03:00
|
|
|
selar-belar
|
|
|
|
selat-latnya
|
2017-07-26 15:12:52 +03:00
|
|
|
selatan-tenggara
|
2017-07-27 15:46:30 +03:00
|
|
|
selekas-lekasnya
|
2017-07-24 10:10:16 +03:00
|
|
|
selentang-selenting
|
2017-07-27 15:46:30 +03:00
|
|
|
selepas-lepas
|
2017-07-26 15:12:52 +03:00
|
|
|
self-driving
|
|
|
|
self-esteem
|
|
|
|
self-healing
|
|
|
|
self-help
|
2017-07-27 15:46:30 +03:00
|
|
|
selir-menyelir
|
|
|
|
seloyong-seloyong
|
2017-07-24 10:10:16 +03:00
|
|
|
seluk-beluk
|
2017-07-27 15:46:30 +03:00
|
|
|
seluk-semeluk
|
2017-07-24 10:10:16 +03:00
|
|
|
sema-sema
|
2017-07-27 15:46:30 +03:00
|
|
|
semah-semah
|
|
|
|
semak-semak
|
|
|
|
semaksimal-maksimalnya
|
|
|
|
semalam-malaman
|
2017-07-24 10:10:16 +03:00
|
|
|
semang-semang
|
2017-07-27 15:46:30 +03:00
|
|
|
semanis-manisnya
|
|
|
|
semasa-masa
|
2017-07-24 10:10:16 +03:00
|
|
|
semata-mata
|
2017-07-27 15:46:30 +03:00
|
|
|
semau-maunya
|
2017-07-26 15:12:52 +03:00
|
|
|
sembunyi-sembunyi
|
2017-07-27 15:46:30 +03:00
|
|
|
sembunyi-sembunyian
|
|
|
|
sembur-sembur
|
2017-07-26 15:12:52 +03:00
|
|
|
semena-mena
|
2017-07-27 15:46:30 +03:00
|
|
|
semenda-menyemenda
|
|
|
|
semengga-mengga
|
|
|
|
semenggah-menggah
|
|
|
|
sementang-mentang
|
|
|
|
semerdeka-merdekanya
|
2017-07-26 15:12:52 +03:00
|
|
|
semi-final
|
|
|
|
semi-permanen
|
2017-07-27 15:46:30 +03:00
|
|
|
sempat-sempatnya
|
|
|
|
semu-semu
|
|
|
|
semua-muanya
|
|
|
|
semujur-mujurnya
|
|
|
|
semut-semutan
|
|
|
|
sen-senan
|
2017-07-26 15:12:52 +03:00
|
|
|
sendiri-sendiri
|
2017-07-27 15:46:30 +03:00
|
|
|
sengal-sengal
|
2017-07-24 10:10:16 +03:00
|
|
|
sengar-sengir
|
2017-07-27 15:46:30 +03:00
|
|
|
sengau-sengauan
|
|
|
|
senggak-sengguk
|
|
|
|
senggang-tenggang
|
|
|
|
senggol-menyenggol
|
2017-07-26 15:12:52 +03:00
|
|
|
senior-junior
|
|
|
|
senjata-senjata
|
|
|
|
senyum-senyum
|
2017-07-24 10:10:16 +03:00
|
|
|
seolah-olah
|
|
|
|
sepala-pala
|
2017-07-27 15:46:30 +03:00
|
|
|
sepandai-pandai
|
|
|
|
sepetang-petangan
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
sepoi-sepoi
|
2017-07-27 15:46:30 +03:00
|
|
|
sepraktis-praktisnya
|
|
|
|
sepuas-puasnya
|
|
|
|
serak-serak
|
|
|
|
serak-serik
|
|
|
|
serang-menyerang
|
|
|
|
serang-serangan
|
2017-07-26 15:12:52 +03:00
|
|
|
serangan-serangan
|
2017-07-27 15:46:30 +03:00
|
|
|
seraya-menyeraya
|
2017-07-26 15:12:52 +03:00
|
|
|
serba-serbi
|
2017-07-24 10:10:16 +03:00
|
|
|
serbah-serbih
|
|
|
|
serembah-serembih
|
2017-07-26 15:12:52 +03:00
|
|
|
serigala-serigala
|
2017-07-27 15:46:30 +03:00
|
|
|
sering-sering
|
|
|
|
serobot-serobotan
|
|
|
|
serong-menyerong
|
|
|
|
serta-menyertai
|
2017-07-24 10:10:16 +03:00
|
|
|
serta-merta
|
2017-07-27 15:46:30 +03:00
|
|
|
serta-serta
|
2017-07-26 15:12:52 +03:00
|
|
|
seru-seruan
|
|
|
|
service-oriented
|
2017-07-27 15:46:30 +03:00
|
|
|
sesak-menyesak
|
|
|
|
sesal-menyesali
|
|
|
|
sesayup-sayup
|
2017-07-26 15:12:52 +03:00
|
|
|
sesi-sesi
|
2017-07-27 15:46:30 +03:00
|
|
|
sesuang-suang
|
|
|
|
sesudah-sudah
|
|
|
|
sesudah-sudahnya
|
|
|
|
sesuka-suka
|
|
|
|
sesuka-sukanya
|
2017-07-26 15:12:52 +03:00
|
|
|
set-piece
|
2017-07-27 15:46:30 +03:00
|
|
|
setempat-setempat
|
2017-07-26 15:12:52 +03:00
|
|
|
setengah-setengah
|
2017-07-24 10:10:16 +03:00
|
|
|
setidak-tidaknya
|
2017-07-26 15:12:52 +03:00
|
|
|
setinggi-tingginya
|
2017-07-27 15:46:30 +03:00
|
|
|
seupaya-upaya
|
|
|
|
seupaya-upayanya
|
2017-07-26 15:12:52 +03:00
|
|
|
sewa-menyewa
|
2017-07-27 15:46:30 +03:00
|
|
|
sewaktu-waktu
|
2017-07-26 15:12:52 +03:00
|
|
|
sewenang-wenang
|
2017-07-27 15:46:30 +03:00
|
|
|
sewot-sewotan
|
2017-07-26 15:12:52 +03:00
|
|
|
shabu-shabu
|
|
|
|
short-term
|
|
|
|
short-throw
|
2017-07-24 10:10:16 +03:00
|
|
|
sia-sia
|
2017-07-26 15:12:52 +03:00
|
|
|
siang-siang
|
|
|
|
siap-siap
|
|
|
|
siapa-siapa
|
2017-07-27 15:46:30 +03:00
|
|
|
sibar-sibar
|
|
|
|
sibur-sibur
|
|
|
|
sida-sida
|
2017-07-26 15:12:52 +03:00
|
|
|
side-by-side
|
|
|
|
sign-in
|
2017-07-27 15:46:30 +03:00
|
|
|
siku-siku
|
|
|
|
sikut-sikutan
|
|
|
|
silah-silah
|
|
|
|
silang-menyilang
|
|
|
|
silir-semilir
|
2017-07-26 15:12:52 +03:00
|
|
|
simbol-simbol
|
|
|
|
simpan-pinjam
|
2017-07-27 15:46:30 +03:00
|
|
|
sinar-menyinar
|
|
|
|
sinar-seminar
|
|
|
|
sinar-suminar
|
|
|
|
sindir-menyindir
|
2017-07-26 15:12:52 +03:00
|
|
|
singa-singa
|
2017-07-27 15:46:30 +03:00
|
|
|
singgah-menyinggah
|
2017-07-26 15:12:52 +03:00
|
|
|
single-core
|
|
|
|
sipil-militer
|
2017-07-27 15:46:30 +03:00
|
|
|
sir-siran
|
|
|
|
sirat-sirat
|
2017-07-26 15:12:52 +03:00
|
|
|
sisa-sisa
|
|
|
|
sisi-sisi
|
|
|
|
siswa-siswa
|
|
|
|
siswa-siswi
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
siswa/i
|
2017-07-26 15:12:52 +03:00
|
|
|
siswi-siswi
|
|
|
|
situ-situ
|
|
|
|
situs-situs
|
|
|
|
six-core
|
|
|
|
six-speed
|
2017-07-24 10:10:16 +03:00
|
|
|
slintat-slintut
|
2017-07-26 15:12:52 +03:00
|
|
|
slo-mo
|
|
|
|
slow-motion
|
|
|
|
snap-on
|
2017-07-27 15:46:30 +03:00
|
|
|
sobek-sobekan
|
|
|
|
sodok-sodokan
|
2017-07-26 15:12:52 +03:00
|
|
|
sok-sokan
|
2017-07-27 15:46:30 +03:00
|
|
|
solek-menyolek
|
2017-07-26 15:12:52 +03:00
|
|
|
solid-state
|
2017-07-24 10:10:16 +03:00
|
|
|
sorak-sorai
|
2017-07-26 15:12:52 +03:00
|
|
|
sorak-sorak
|
|
|
|
sore-sore
|
|
|
|
sosio-ekonomi
|
|
|
|
soya-soya
|
|
|
|
spill-resistant
|
|
|
|
split-screen
|
|
|
|
sponsor-sponsor
|
2017-07-27 15:46:30 +03:00
|
|
|
sponsor-sponsoran
|
2017-07-26 15:12:52 +03:00
|
|
|
srikandi-srikandi
|
|
|
|
staf-staf
|
|
|
|
stand-by
|
|
|
|
stand-up
|
|
|
|
start-up
|
|
|
|
stasiun-stasiun
|
|
|
|
state-owned
|
|
|
|
striker-striker
|
|
|
|
studi-studi
|
2017-07-27 15:46:30 +03:00
|
|
|
suam-suam
|
2017-07-26 15:12:52 +03:00
|
|
|
suami-isteri
|
|
|
|
suami-istri
|
|
|
|
suami-suami
|
2017-07-27 15:46:30 +03:00
|
|
|
suang-suang
|
2017-07-26 15:12:52 +03:00
|
|
|
suara-suara
|
|
|
|
sudin-sudin
|
2017-07-24 10:10:16 +03:00
|
|
|
sudu-sudu
|
2017-07-27 15:46:30 +03:00
|
|
|
sudung-sudung
|
|
|
|
sugi-sugi
|
2017-07-26 15:12:52 +03:00
|
|
|
suka-suka
|
|
|
|
suku-suku
|
2017-07-27 15:46:30 +03:00
|
|
|
sulang-menyulang
|
2017-07-24 10:10:16 +03:00
|
|
|
sulat-sulit
|
2017-07-27 15:46:30 +03:00
|
|
|
sulur-suluran
|
2017-07-26 15:12:52 +03:00
|
|
|
sum-sum
|
|
|
|
sumber-sumber
|
2017-07-24 10:10:16 +03:00
|
|
|
sumpah-sumpah
|
2017-07-27 15:46:30 +03:00
|
|
|
sumpit-sumpit
|
|
|
|
sundut-bersundut
|
2017-07-26 15:12:52 +03:00
|
|
|
sungai-sungai
|
|
|
|
sungguh-sungguh
|
2017-07-27 15:46:30 +03:00
|
|
|
sungut-sungut
|
|
|
|
sunting-menyunting
|
2017-07-26 15:12:52 +03:00
|
|
|
super-damai
|
|
|
|
super-rahasia
|
|
|
|
super-sub
|
|
|
|
supply-demand
|
|
|
|
supply-side
|
2017-07-27 15:46:30 +03:00
|
|
|
suram-suram
|
|
|
|
surat-menyurat
|
2017-07-26 15:12:52 +03:00
|
|
|
surat-surat
|
2017-07-27 15:46:30 +03:00
|
|
|
suruh-suruhan
|
|
|
|
suruk-surukan
|
2017-07-26 15:12:52 +03:00
|
|
|
susul-menyusul
|
2017-07-27 15:46:30 +03:00
|
|
|
suwir-suwir
|
2017-07-26 15:12:52 +03:00
|
|
|
syarat-syarat
|
|
|
|
system-on-chip
|
|
|
|
t-shirt
|
|
|
|
t-shirts
|
2017-07-24 10:10:16 +03:00
|
|
|
tabar-tabar
|
2017-07-27 15:46:30 +03:00
|
|
|
tabir-mabir
|
|
|
|
tabrak-tubruk
|
|
|
|
tabuh-tabuhan
|
|
|
|
tabun-menabun
|
|
|
|
tahu-menahu
|
|
|
|
tahu-tahu
|
2017-07-26 15:12:52 +03:00
|
|
|
tahun-tahun
|
2017-07-27 15:46:30 +03:00
|
|
|
takah-takahnya
|
2017-07-24 10:10:16 +03:00
|
|
|
takang-takik
|
2017-07-26 15:12:52 +03:00
|
|
|
take-off
|
2017-07-27 15:46:30 +03:00
|
|
|
takut-takut
|
|
|
|
takut-takutan
|
|
|
|
tali-bertali
|
|
|
|
tali-tali
|
|
|
|
talun-temalun
|
2017-07-26 15:12:52 +03:00
|
|
|
taman-taman
|
2017-07-27 15:46:30 +03:00
|
|
|
tampak-tampak
|
|
|
|
tanak-tanakan
|
|
|
|
tanam-menanam
|
|
|
|
tanam-tanaman
|
2017-07-26 15:12:52 +03:00
|
|
|
tanda-tanda
|
2017-07-27 15:46:30 +03:00
|
|
|
tangan-menangan
|
2017-07-26 15:12:52 +03:00
|
|
|
tangan-tangan
|
|
|
|
tangga-tangga
|
|
|
|
tanggal-tanggal
|
|
|
|
tanggul-tanggul
|
2017-07-27 15:46:30 +03:00
|
|
|
tanggung-menanggung
|
|
|
|
tanggung-tanggung
|
2017-07-26 15:12:52 +03:00
|
|
|
tank-tank
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681)
* Update _install.jade (#2688)
Typo fix: "models" -> "model"
* Add FAC to spacy.explain (resolves #2706)
* Remove docstrings for deprecated arguments (see #2703)
* When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
* update bengali token rules for hyphen and digits (#2731)
* Less norm computations in token similarity (#2730)
* Less norm computations in token similarity
* Contributor agreement
* Remove ')' for clarity (#2737)
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
* added contributor agreement for mbkupfer (#2738)
* Basic support for Telugu language (#2751)
* Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
* Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
* Describe converters more explicitly (see #2643)
* Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
* Fix formatting
* Fix dependency scheme docs (closes #2705) [ci skip]
* Don't set stop word in example (closes #2657) [ci skip]
* Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list
* add exception token
* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception
* add tokenizer exceptions list
* combining base_norms with norm_exceptions
* adding norm_exception
* fix double key in lemmatizer
* remove unused import on punctuation.py
* reformat stop_words to reduce number of lines, improve readibility
* updating tokenizer exception
* implement is_currency for lang/id
* adding orth_first_upper in tokenizer_exceptions
* update the norm_exception list
* remove bunch of abbreviations
* adding contributors file
* Fixed spaCy+Keras example (#2763)
* bug fixes in keras example
* created contributor agreement
* Adding French hyphenated first name (#2786)
* Fix typo (closes #2784)
* Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
* Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.
* Adding contributor agreement
* Updating contributor agreement
* Also include lowercase norm exceptions
* Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
* added spaCy Contributor Agreement
* Add charlax's contributor agreement (#2805)
* agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement
* Contributors agreement
* Contributors agreement
* Add jupyter=True to displacy.render in documentation (#2806)
* Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf37cfcfab60be2b97d6deae949b30e9e.
* Remove deprecated encoding argument to msgpack
* Set up dependency tree pattern matching skeleton (#2732)
* Fix bug when too many entity types. Fixes #2800
* Fix Python 2 test failure
* Require older msgpack-numpy
* Restore encoding arg on msgpack-numpy
* Try to fix version pin for msgpack-numpy
* Update Portuguese Language (#2790)
* Add words to portuguese language _num_words
* Add words to portuguese language _num_words
* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols
* Extended punctuation and norm_exceptions in the Portuguese language
* Correct error in spacy universe docs concerning spacy-lookup (#2814)
* Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example
* created contributor agreement
* baseline for Parikh model
* initial version of parikh 2016 implemented
* tested asymmetric models
* fixed grevious error in normalization
* use standard SNLI test file
* begin to rework parikh example
* initial version of running example
* start to document the new version
* start to document the new version
* Update Decompositional Attention.ipynb
* fixed calls to similarity
* updated the README
* import sys package duh
* simplified indexing on mapping word to IDs
* stupid python indent error
* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
* Fix typo (closes #2815) [ci skip]
* Update regex version dependency
* Set version to 2.0.13.dev3
* Skip seemingly problematic test
* Remove problematic test
* Try previous version of regex
* Revert "Remove problematic test"
This reverts commit bdebbef45552d698d390aa430b527ee27830f11b.
* Unskip test
* Try older version of regex
* 💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->
## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.
### Types of change
enhancements
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page
* Add contribution agreement
* Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement
* Correct some grammatical inaccuracies in lang\ru\examples.py
* Move contributor agreement to separate file
* Set version to 2.0.13.dev4
* Add Persian(Farsi) language support (#2797)
* Also include lowercase norm exceptions
* Remove in favour of https://github.com/explosion/spaCy/graphs/contributors
* Rule-based French Lemmatizer (#2818)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class.
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
- Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version.
- Add several files containing exhaustive list of words for each part of speech
- Add some lemma rules
- Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX
- Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned
- Modify the lemmatize function to check in lookup table as a last resort
- Init files are updated so the model can support all the functionalities mentioned above
- Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [X] I have submitted the spaCy Contributor Agreement.
- [X] I ran the tests, and all new and existing tests passed.
- [X] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Set version to 2.0.13
* Fix formatting and consistency
* Update docs for new version [ci skip]
* Increment version [ci skip]
* Add info on wheels [ci skip]
* Adding "This is a sentence" example to Sinhala (#2846)
* Add wheels badge
* Update badge [ci skip]
* Update README.rst [ci skip]
* Update murmurhash pin
* Increment version to 2.0.14.dev0
* Update GPU docs for v2.0.14
* Add wheel to setup_requires
* Import prefer_gpu and require_gpu functions from Thinc
* Add tests for prefer_gpu() and require_gpu()
* Update requirements and setup.py
* Workaround bug in thinc require_gpu
* Set version to v2.0.14
* Update push-tag script
* Unhack prefer_gpu
* Require thinc 6.10.6
* Update prefer_gpu and require_gpu docs [ci skip]
* Fix specifiers for GPU
* Set version to 2.0.14.dev1
* Set version to 2.0.14
* Update Thinc version pin
* Increment version
* Fix msgpack-numpy version pin
* Increment version
* Update version to 2.0.16
* Update version [ci skip]
* Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* raise error when setting overlapping entities as doc.ents (#2880)
* Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Change PyThaiNLP Url (#2876)
* Fix missing comma
* Add example showing a fix-up rule for space entities
* Set version to 2.0.17.dev0
* Update regex version
* Revert "Update regex version"
This reverts commit 62358dd867d15bc6a475942dff34effba69dd70a.
* Try setting older regex version, to align with conda
* Set version to 2.0.17
* Add spacy-js to universe [ci-skip]
* Add spacy-raspberry to universe (closes #2889)
* Add script to validate universe json [ci skip]
* Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation
* - added contributor info
* Allow input text of length up to max_length, inclusive (#2922)
* Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
* Minor formatting changes [ci skip]
* Fix image [ci skip]
Twitter URL doesn't work on live site
* Check if the word is in one of the regular lists specific to each POS (#2886)
* 💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix typo [ci skip]
* fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948
* Update spacy/compat.py
Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
* Fix formatting
* Update universe [ci skip]
* Catalan Language Support (#2940)
* Catalan language Support
* Ddding Catalan to documentation
* Sort languages alphabetically [ci skip]
* Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix regex pin to harmonize with conda (#2964)
* Update README.rst
* Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
* Fix typo
* Fix typo
* Remove duplicate file
* Require thinc 7.0.0.dev2
Fixes bug in gpu_ops that would use cupy instead of numpy on CPU
* Add missing import
* Fix error IDs
* Fix tests
2018-11-29 18:30:29 +03:00
|
|
|
tante-tante
|
2017-07-26 15:12:52 +03:00
|
|
|
tanya-jawab
|
2017-07-27 15:46:30 +03:00
|
|
|
tapa-tapa
|
|
|
|
tapak-tapak
|
|
|
|
tari-menari
|
|
|
|
tari-tarian
|
2017-07-26 15:12:52 +03:00
|
|
|
tarik-menarik
|
|
|
|
tarik-ulur
|
|
|
|
tata-tertib
|
2017-07-27 15:46:30 +03:00
|
|
|
tatah-tatah
|
2017-07-26 15:12:52 +03:00
|
|
|
tau-tau
|
2017-07-24 10:10:16 +03:00
|
|
|
tawa-tawa
|
|
|
|
tawak-tawak
|
|
|
|
tawang-tawang
|
2017-07-27 15:46:30 +03:00
|
|
|
tawar-menawar
|
2017-07-24 10:10:16 +03:00
|
|
|
tawar-tawar
|
2017-07-27 15:46:30 +03:00
|
|
|
tayum-temayum
|
2017-07-26 15:12:52 +03:00
|
|
|
tebak-tebakan
|
2017-07-27 15:46:30 +03:00
|
|
|
tebu-tebu
|
|
|
|
tedong-tedong
|
|
|
|
tegak-tegak
|
|
|
|
tegerbang-gerbang
|
|
|
|
teh-tehan
|
2017-07-26 15:12:52 +03:00
|
|
|
tek-tek
|
2017-07-24 10:10:16 +03:00
|
|
|
teka-teki
|
2017-07-26 15:12:52 +03:00
|
|
|
teknik-teknik
|
|
|
|
teman-teman
|
|
|
|
teman-temanku
|
2017-07-27 15:46:30 +03:00
|
|
|
temas-temas
|
|
|
|
tembak-menembak
|
|
|
|
temeh-temeh
|
|
|
|
tempa-menempa
|
2017-07-26 15:12:52 +03:00
|
|
|
tempat-tempat
|
2017-07-27 15:46:30 +03:00
|
|
|
tempo-tempo
|
2017-07-24 10:10:16 +03:00
|
|
|
temut-temut
|
2017-07-26 15:12:52 +03:00
|
|
|
tenang-tenang
|
|
|
|
tengah-tengah
|
2017-07-27 15:46:30 +03:00
|
|
|
tenggang-menenggang
|
|
|
|
tengok-menengok
|
2017-07-26 15:12:52 +03:00
|
|
|
teori-teori
|
2017-07-27 15:46:30 +03:00
|
|
|
teraba-raba
|
|
|
|
teralang-alang
|
|
|
|
terambang-ambang
|
|
|
|
terambung-ambung
|
|
|
|
terang-terang
|
2017-07-26 15:12:52 +03:00
|
|
|
terang-terangan
|
2017-07-27 15:46:30 +03:00
|
|
|
teranggar-anggar
|
|
|
|
terangguk-angguk
|
|
|
|
teranggul-anggul
|
|
|
|
terangin-angin
|
|
|
|
terangkup-angkup
|
|
|
|
teranja-anja
|
|
|
|
terapung-apung
|
|
|
|
terayan-rayan
|
|
|
|
terayap-rayap
|
|
|
|
terbada-bada
|
2017-07-26 15:12:52 +03:00
|
|
|
terbahak-bahak
|
2017-07-27 15:46:30 +03:00
|
|
|
terbang-terbang
|
|
|
|
terbata-bata
|
|
|
|
terbatuk-batuk
|
|
|
|
terbayang-bayang
|
|
|
|
terbeda-bedakan
|
|
|
|
terbengkil-bengkil
|
|
|
|
terbengong-bengong
|
2017-07-26 15:12:52 +03:00
|
|
|
terbirit-birit
|
2017-07-27 15:46:30 +03:00
|
|
|
terbuai-buai
|
|
|
|
terbuang-buang
|
|
|
|
terbungkuk-bungkuk
|
2017-07-26 15:12:52 +03:00
|
|
|
terburu-buru
|
2017-07-27 15:46:30 +03:00
|
|
|
tercangak-cangak
|
|
|
|
tercengang-cengang
|
|
|
|
tercilap-cilap
|
|
|
|
tercongget-congget
|
|
|
|
tercoreng-moreng
|
|
|
|
tercungap-cungap
|
|
|
|
terdangka-dangka
|
|
|
|
terdengih-dengih
|
|
|
|
terduga-duga
|
|
|
|
terekeh-ekeh
|
|
|
|
terembut-embut
|
|
|
|
terembut-rembut
|
|
|
|
terempas-empas
|
|
|
|
terengah-engah
|
|
|
|
teresak-esak
|
|
|
|
tergagap-gagap
|
|
|
|
tergagau-gagau
|
|
|
|
tergaguk-gaguk
|
|
|
|
tergapai-gapai
|
|
|
|
tergegap-gegap
|
|
|
|
tergegas-gegas
|
|
|
|
tergelak-gelak
|
|
|
|
tergelang-gelang
|
|
|
|
tergeleng-geleng
|
|
|
|
tergelung-gelung
|
|
|
|
tergerai-gerai
|
|
|
|
tergerenyeng-gerenyeng
|
2017-07-26 15:12:52 +03:00
|
|
|
tergesa-gesa
|
|
|
|
tergila-gila
|
2017-07-27 15:46:30 +03:00
|
|
|
tergolek-golek
|
|
|
|
tergontai-gontai
|
|
|
|
tergudik-gudik
|
|
|
|
tergugu-gugu
|
|
|
|
terguling-guling
|
|
|
|
tergulut-gulut
|
|
|
|
terhambat-hambat
|
|
|
|
terharak-harak
|
|
|
|
terharap-harap
|
|
|
|
terhengit-hengit
|
|
|
|
terheran-heran
|
|
|
|
terhinggut-hinggut
|
|
|
|
terigau-igau
|
|
|
|
terimpi-impi
|
|
|
|
terincut-incut
|
|
|
|
teringa-inga
|
2017-07-24 10:10:16 +03:00
|
|
|
teringat-ingat
|
2017-07-27 15:46:30 +03:00
|
|
|
terinjak-injak
|
|
|
|
terisak-isak
|
|
|
|
terjembak-jembak
|
|
|
|
terjerit-jerit
|
|
|
|
terkadang-kadang
|
|
|
|
terkagum-kagum
|
|
|
|
terkaing-kaing
|
|
|
|
terkakah-kakah
|
|
|
|
terkakak-kakak
|
|
|
|
terkampul-kampul
|
|
|
|
terkanjar-kanjar
|
|
|
|
terkantuk-kantuk
|
|
|
|
terkapah-kapah
|
|
|
|
terkapai-kapai
|
|
|
|
terkapung-kapung
|
|
|
|
terkatah-katah
|
2017-07-26 15:12:52 +03:00
|
|
|
terkatung-katung
|
2017-07-27 15:46:30 +03:00
|
|
|
terkecap-kecap
|
|
|
|
terkedek-kedek
|
|
|
|
terkedip-kedip
|
|
|
|
terkejar-kejar
|
|
|
|
terkekau-kekau
|
|
|
|
terkekeh-kekeh
|
|
|
|
terkekek-kekek
|
|
|
|
terkelinjat-kelinjat
|
|
|
|
terkelip-kelip
|
|
|
|
terkempul-kempul
|
|
|
|
terkemut-kemut
|
|
|
|
terkencar-kencar
|
|
|
|
terkencing-kencing
|
|
|
|
terkentut-kentut
|
|
|
|
terkepak-kepak
|
|
|
|
terkesot-kesot
|
|
|
|
terkesut-kesut
|
|
|
|
terkial-kial
|
|
|
|
terkijai-kijai
|
|
|
|
terkikih-kikih
|
|
|
|
terkikik-kikik
|
|
|
|
terkincak-kincak
|
|
|
|
terkindap-kindap
|
|
|
|
terkinja-kinja
|
|
|
|
terkirai-kirai
|
|
|
|
terkitar-kitar
|
|
|
|
terkocoh-kocoh
|
|
|
|
terkojol-kojol
|
|
|
|
terkokol-kokol
|
|
|
|
terkosel-kosel
|
|
|
|
terkotak-kotak
|
|
|
|
terkoteng-koteng
|
|
|
|
terkuai-kuai
|
|
|
|
terkumpal-kumpal
|
|
|
|
terlara-lara
|
|
|
|
terlayang-layang
|
|
|
|
terlebih-lebih
|
|
|
|
terlincah-lincah
|
|
|
|
terliuk-liuk
|
|
|
|
terlolong-lolong
|
|
|
|
terlongong-longong
|
2017-07-26 15:12:52 +03:00
|
|
|
terlunta-lunta
|
2017-07-27 15:46:30 +03:00
|
|
|
termangu-mangu
|
|
|
|
termanja-manja
|
|
|
|
termata-mata
|
|
|
|
termengah-mengah
|
|
|
|
termenung-menung
|
|
|
|
termimpi-mimpi
|
|
|
|
termonyong-monyong
|
|
|
|
ternanti-nanti
|
|
|
|
terngiang-ngiang
|
|
|
|
teroleng-oleng
|
2017-07-26 15:12:52 +03:00
|
|
|
terombang-ambing
|
2017-07-27 15:46:30 +03:00
|
|
|
terpalit-palit
|
|
|
|
terpandang-pandang
|
|
|
|
terpecah-pecah
|
|
|
|
terpekik-pekik
|
|
|
|
terpencar-pencar
|
|
|
|
terpereh-pereh
|
|
|
|
terpijak-pijak
|
|
|
|
terpikau-pikau
|
|
|
|
terpilah-pilah
|
|
|
|
terpinga-pinga
|
|
|
|
terpingkal-pingkal
|
|
|
|
terpingkau-pingkau
|
|
|
|
terpontang-panting
|
|
|
|
terpusing-pusing
|
|
|
|
terputus-putus
|
|
|
|
tersanga-sanga
|
|
|
|
tersaruk-saruk
|
|
|
|
tersedan-sedan
|
|
|
|
tersedih-sedih
|
|
|
|
tersedu-sedu
|
|
|
|
terseduh-seduh
|
|
|
|
tersendat-sendat
|
|
|
|
tersendeng-sendeng
|
|
|
|
tersengal-sengal
|
|
|
|
tersengguk-sengguk
|
|
|
|
tersengut-sengut
|
|
|
|
terseok-seok
|
|
|
|
tersera-sera
|
|
|
|
terserak-serak
|
|
|
|
tersetai-setai
|
|
|
|
tersia-sia
|
|
|
|
tersipu-sipu
|
|
|
|
tersoja-soja
|
|
|
|
tersungkuk-sungkuk
|
|
|
|
tersuruk-suruk
|
|
|
|
tertagak-tagak
|
|
|
|
tertahan-tahan
|
|
|
|
tertatih-tatih
|
|
|
|
tertegun-tegun
|
|
|
|
tertekan-tekan
|
|
|
|
terteleng-teleng
|
|
|
|
tertendang-tendang
|
|
|
|
tertimpang-timpang
|
|
|
|
tertitar-titar
|
|
|
|
terumbang-ambing
|
|
|
|
terumbang-umbang
|
|
|
|
terungkap-ungkap
|
2017-07-26 15:12:52 +03:00
|
|
|
terus-menerus
|
|
|
|
terus-terusan
|
|
|
|
tete-a-tete
|
|
|
|
text-to-speech
|
|
|
|
think-tank
|
|
|
|
think-thank
|
|
|
|
third-party
|
|
|
|
third-person
|
|
|
|
three-axis
|
|
|
|
three-point
|
|
|
|
tiap-tiap
|
2017-07-24 10:10:16 +03:00
|
|
|
tiba-tiba
|
2017-07-27 15:46:30 +03:00
|
|
|
tidak-tidak
|
|
|
|
tidur-tidur
|
|
|
|
tidur-tiduran
|
2017-07-26 15:12:52 +03:00
|
|
|
tie-dye
|
|
|
|
tie-in
|
2017-07-27 15:46:30 +03:00
|
|
|
tiga-tiganya
|
|
|
|
tikam-menikam
|
2017-07-26 15:12:52 +03:00
|
|
|
tiki-taka
|
|
|
|
tikus-tikus
|
2017-07-27 15:46:30 +03:00
|
|
|
tilik-menilik
|
2017-07-26 15:12:52 +03:00
|
|
|
tim-tim
|
2017-07-24 10:10:16 +03:00
|
|
|
timah-timah
|
2017-07-27 15:46:30 +03:00
|
|
|
timang-timangan
|
|
|
|
timbang-menimbang
|
2017-07-26 15:12:52 +03:00
|
|
|
time-lapse
|
2017-07-27 15:46:30 +03:00
|
|
|
timpa-menimpa
|
2017-07-24 10:10:16 +03:00
|
|
|
timu-timu
|
2017-07-27 15:46:30 +03:00
|
|
|
timun-timunan
|
2017-07-26 15:12:52 +03:00
|
|
|
timur-barat
|
|
|
|
timur-laut
|
|
|
|
timur-tenggara
|
2017-07-27 15:46:30 +03:00
|
|
|
tindih-bertindih
|
|
|
|
tindih-menindih
|
|
|
|
tinjau-meninjau
|
|
|
|
tinju-meninju
|
2017-07-26 15:12:52 +03:00
|
|
|
tip-off
|
|
|
|
tipu-tipu
|
2017-07-27 15:46:30 +03:00
|
|
|
tiru-tiruan
|
2017-07-26 15:12:52 +03:00
|
|
|
titik-titik
|
|
|
|
titik-titiknya
|
2017-07-27 15:46:30 +03:00
|
|
|
tiup-tiup
|
2017-07-26 15:12:52 +03:00
|
|
|
to-do
|
2017-07-27 15:46:30 +03:00
|
|
|
tokak-takik
|
2017-07-26 15:12:52 +03:00
|
|
|
toko-toko
|
|
|
|
tokoh-tokoh
|
2017-07-27 15:46:30 +03:00
|
|
|
tokok-menokok
|
|
|
|
tolak-menolak
|
|
|
|
tolong-menolong
|
2017-07-26 15:12:52 +03:00
|
|
|
tong-tong
|
|
|
|
top-level
|
|
|
|
top-up
|
2017-07-27 15:46:30 +03:00
|
|
|
totol-totol
|
2017-07-26 15:12:52 +03:00
|
|
|
touch-screen
|
|
|
|
trade-in
|
|
|
|
training-camp
|
|
|
|
trans-nasional
|
|
|
|
treble-winner
|
|
|
|
tri-band
|
|
|
|
trik-trik
|
|
|
|
triple-core
|
|
|
|
truk-truk
|
|
|
|
tua-tua
|
|
|
|
tuan-tuan
|
2017-07-24 10:10:16 +03:00
|
|
|
tuang-tuang
|
2017-07-27 15:46:30 +03:00
|
|
|
tuban-tuban
|
2017-07-26 15:12:52 +03:00
|
|
|
tubuh-tubuh
|
|
|
|
tujuan-tujuan
|
|
|
|
tuk-tuk
|
2017-07-27 15:46:30 +03:00
|
|
|
tukang-menukang
|
|
|
|
tukar-menukar
|
2017-07-26 15:12:52 +03:00
|
|
|
tulang-belulang
|
2017-07-27 15:46:30 +03:00
|
|
|
tulang-tulangan
|
2017-07-24 10:10:16 +03:00
|
|
|
tuli-tuli
|
2017-07-27 15:46:30 +03:00
|
|
|
tulis-menulis
|
|
|
|
tumbuh-tumbuhan
|
2017-07-24 10:10:16 +03:00
|
|
|
tumpang-tindih
|
2017-07-26 15:12:52 +03:00
|
|
|
tune-up
|
2017-07-27 15:46:30 +03:00
|
|
|
tunggang-tunggik
|
|
|
|
tunggang-tungging
|
|
|
|
tunggang-tunggit
|
|
|
|
tunggul-tunggul
|
|
|
|
tunjuk-menunjuk
|
2017-07-24 10:10:16 +03:00
|
|
|
tupai-tupai
|
2017-07-27 15:46:30 +03:00
|
|
|
tupai-tupaian
|
|
|
|
turi-turian
|
2017-07-26 15:12:52 +03:00
|
|
|
turn-based
|
|
|
|
turnamen-turnamen
|
|
|
|
turun-temurun
|
2017-07-27 15:46:30 +03:00
|
|
|
turut-menurut
|
|
|
|
turut-turutan
|
|
|
|
tuyuk-tuyuk
|
2017-07-26 15:12:52 +03:00
|
|
|
twin-cam
|
|
|
|
twin-turbocharged
|
|
|
|
two-state
|
|
|
|
two-step
|
|
|
|
two-tone
|
|
|
|
u-shape
|
2017-07-27 15:46:30 +03:00
|
|
|
uang-uangan
|
|
|
|
uar-uar
|
|
|
|
ubek-ubekan
|
|
|
|
ubel-ubel
|
2017-07-24 10:10:16 +03:00
|
|
|
ubrak-abrik
|
|
|
|
ubun-ubun
|
|
|
|
ubur-ubur
|
|
|
|
uci-uci
|
2017-07-26 15:12:52 +03:00
|
|
|
udang-undang
|
2017-07-27 15:46:30 +03:00
|
|
|
udap-udapan
|
2017-07-24 10:10:16 +03:00
|
|
|
ugal-ugalan
|
|
|
|
uget-uget
|
|
|
|
uir-uir
|
2017-07-27 15:46:30 +03:00
|
|
|
ujar-ujar
|
2017-07-26 15:12:52 +03:00
|
|
|
uji-coba
|
|
|
|
ujung-ujung
|
|
|
|
ujung-ujungnya
|
|
|
|
uka-uka
|
2017-07-27 15:46:30 +03:00
|
|
|
ukir-mengukir
|
|
|
|
ukir-ukiran
|
2017-07-24 10:10:16 +03:00
|
|
|
ula-ula
|
2017-07-27 15:46:30 +03:00
|
|
|
ulak-ulak
|
|
|
|
ulam-ulam
|
2017-07-24 10:10:16 +03:00
|
|
|
ulang-alik
|
|
|
|
ulang-aling
|
2017-07-27 15:46:30 +03:00
|
|
|
ulang-ulang
|
2017-07-24 10:10:16 +03:00
|
|
|
ulap-ulap
|
|
|
|
ular-ular
|
|
|
|
ular-ularan
|
2017-07-27 15:46:30 +03:00
|
|
|
ulek-ulek
|
2017-07-24 10:10:16 +03:00
|
|
|
ulu-ulu
|
|
|
|
ulung-ulung
|
|
|
|
umang-umang
|
|
|
|
umbang-ambing
|
2017-07-26 15:12:52 +03:00
|
|
|
umbi-umbian
|
2017-07-24 10:10:16 +03:00
|
|
|
umbul-umbul
|
|
|
|
umbut-umbut
|
|
|
|
uncang-uncit
|
2017-07-27 15:46:30 +03:00
|
|
|
undak-undakan
|
2017-07-26 15:12:52 +03:00
|
|
|
undang-undang
|
|
|
|
undang-undangnya
|
2017-07-24 10:10:16 +03:00
|
|
|
unduk-unduk
|
|
|
|
undung-undung
|
|
|
|
undur-undur
|
|
|
|
unek-unek
|
|
|
|
ungah-angih
|
|
|
|
unggang-anggit
|
|
|
|
unggat-unggit
|
2017-07-27 15:46:30 +03:00
|
|
|
unggul-mengungguli
|
|
|
|
ungkit-ungkit
|
2017-07-26 15:12:52 +03:00
|
|
|
unit-unit
|
|
|
|
universitas-universitas
|
|
|
|
unsur-unsur
|
2017-07-24 10:10:16 +03:00
|
|
|
untang-anting
|
2017-07-27 15:46:30 +03:00
|
|
|
unting-unting
|
|
|
|
untung-untung
|
|
|
|
untung-untungan
|
|
|
|
upah-mengupah
|
|
|
|
upih-upih
|
2017-07-26 15:12:52 +03:00
|
|
|
upside-down
|
2017-07-24 10:10:16 +03:00
|
|
|
ura-ura
|
|
|
|
uran-uran
|
2017-07-26 15:12:52 +03:00
|
|
|
urat-urat
|
2017-07-27 15:46:30 +03:00
|
|
|
uring-uringan
|
|
|
|
urup-urup
|
|
|
|
urup-urupan
|
|
|
|
urus-urus
|
2017-07-26 15:12:52 +03:00
|
|
|
usaha-usaha
|
2017-07-24 10:10:16 +03:00
|
|
|
user-user
|
2017-07-27 15:46:30 +03:00
|
|
|
user-useran
|
2017-07-24 10:10:16 +03:00
|
|
|
utak-atik
|
2017-07-26 15:12:52 +03:00
|
|
|
utang-piutang
|
|
|
|
utang-utang
|
2017-07-24 10:10:16 +03:00
|
|
|
utar-utar
|
2017-07-26 15:12:52 +03:00
|
|
|
utara-jauh
|
|
|
|
utara-selatan
|
2017-07-24 10:10:16 +03:00
|
|
|
uter-uter
|
2017-07-26 15:12:52 +03:00
|
|
|
utusan-utusan
|
|
|
|
v-belt
|
|
|
|
v-neck
|
|
|
|
value-added
|
|
|
|
very-very
|
|
|
|
video-video
|
|
|
|
visi-misi
|
|
|
|
visi-misinya
|
|
|
|
voa-islam
|
|
|
|
voice-over
|
|
|
|
volt-ampere
|
|
|
|
wajah-wajah
|
|
|
|
wajar-wajar
|
|
|
|
wake-up
|
|
|
|
wakil-wakil
|
|
|
|
walk-in
|
|
|
|
walk-out
|
2017-07-27 15:46:30 +03:00
|
|
|
wangi-wangian
|
2017-07-26 15:12:52 +03:00
|
|
|
wanita-wanita
|
2017-07-24 10:10:16 +03:00
|
|
|
wanti-wanti
|
2017-07-27 15:46:30 +03:00
|
|
|
wara-wara
|
2017-07-24 10:10:16 +03:00
|
|
|
wara-wiri
|
2017-07-26 15:12:52 +03:00
|
|
|
warna-warna
|
2017-07-24 10:10:16 +03:00
|
|
|
warna-warni
|
2017-07-26 15:12:52 +03:00
|
|
|
was-was
|
|
|
|
water-cooled
|
|
|
|
web-based
|
|
|
|
wide-angle
|
|
|
|
wilayah-wilayah
|
|
|
|
win-win
|
2017-07-24 10:10:16 +03:00
|
|
|
wira-wiri
|
|
|
|
wora-wari
|
2017-07-26 15:12:52 +03:00
|
|
|
work-life
|
|
|
|
world-class
|
2017-07-24 10:10:16 +03:00
|
|
|
yang-yang
|
2017-07-26 15:12:52 +03:00
|
|
|
yayasan-yayasan
|
|
|
|
year-on-year
|
|
|
|
yel-yel
|
|
|
|
yo-yo
|
|
|
|
zam-zam
|
2017-07-24 10:10:16 +03:00
|
|
|
zig-zag
|
💫 Tidy up and auto-format .py files (#2983)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files.
- [x] Update flake8 config to exclude very large files (lemmatization tables etc.)
- [x] Update code to be compatible with flake8 rules
- [x] Fix various small bugs, inconsistencies and messy stuff in the language data
- [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means)
Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results.
At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information.
### Types of change
enhancement, code style
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-11-30 19:03:03 +03:00
|
|
|
""".split()
|
|
|
|
)
|