💫 Tidy up and auto-format .py files (#2983)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files.
- [x] Update flake8 config to exclude very large files (lemmatization tables etc.)
- [x] Update code to be compatible with flake8 rules
- [x] Fix various small bugs, inconsistencies and messy stuff in the language data
- [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means)
Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results.
At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information.
### Types of change
enhancement, code style
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-11-30 19:03:03 +03:00
|
|
|
STOP_WORDS = set(
|
|
|
|
"""
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
a acuerdo adelante ademas además afirmó agregó ahi ahora ahí al algo alguna
|
|
|
|
algunas alguno algunos algún alli allí alrededor ambos ante anterior antes
|
|
|
|
apenas aproximadamente aquel aquella aquellas aquello aquellos aqui aquél
|
|
|
|
aquélla aquéllas aquéllos aquí arriba aseguró asi así atras aun aunque añadió
|
|
|
|
aún
|
2016-12-18 18:54:19 +03:00
|
|
|
|
|
|
|
bajo bastante bien breve buen buena buenas bueno buenos
|
|
|
|
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
cada casi cierta ciertas cierto ciertos cinco claro comentó como con conmigo
|
|
|
|
conocer conseguimos conseguir considera consideró consigo consigue consiguen
|
|
|
|
consigues contigo contra creo cual cuales cualquier cuando cuanta cuantas
|
|
|
|
cuanto cuantos cuatro cuenta cuál cuáles cuándo cuánta cuántas cuánto cuántos
|
|
|
|
cómo
|
2016-12-18 18:54:19 +03:00
|
|
|
|
|
|
|
da dado dan dar de debajo debe deben debido decir dejó del delante demasiado
|
|
|
|
demás dentro deprisa desde despacio despues después detras detrás dia dias dice
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
dicen dicho dieron diez diferente diferentes dijeron dijo dio doce donde dos
|
|
|
|
durante día días dónde
|
2016-12-18 18:54:19 +03:00
|
|
|
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
e el ella ellas ello ellos embargo en encima encuentra enfrente enseguida
|
|
|
|
entonces entre era eramos eran eras eres es esa esas ese eso esos esta estaba
|
|
|
|
estaban estado estados estais estamos estan estar estará estas este esto estos
|
|
|
|
estoy estuvo está están excepto existe existen explicó expresó él ésa ésas ése
|
|
|
|
ésos ésta éstas éste éstos
|
2016-12-18 18:54:19 +03:00
|
|
|
|
|
|
|
fin final fue fuera fueron fui fuimos
|
|
|
|
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
gran grande grandes
|
2016-12-18 18:54:19 +03:00
|
|
|
|
|
|
|
ha haber habia habla hablan habrá había habían hace haceis hacemos hacen hacer
|
|
|
|
hacerlo haces hacia haciendo hago han hasta hay haya he hecho hemos hicieron
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
hizo hoy hubo
|
2016-12-18 18:54:19 +03:00
|
|
|
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
igual incluso indicó informo informó ir
|
2016-12-18 18:54:19 +03:00
|
|
|
|
|
|
|
junto
|
|
|
|
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
la lado largo las le les llegó lleva llevar lo los luego
|
2016-12-18 18:54:19 +03:00
|
|
|
|
|
|
|
mal manera manifestó mas mayor me mediante medio mejor mencionó menos menudo mi
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
mia mias mientras mio mios mis misma mismas mismo mismos modo mucha muchas
|
|
|
|
mucho muchos muy más mí mía mías mío míos
|
2016-12-18 18:54:19 +03:00
|
|
|
|
|
|
|
nada nadie ni ninguna ningunas ninguno ningunos ningún no nos nosotras nosotros
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
nuestra nuestras nuestro nuestros nueva nuevas nueve nuevo nuevos nunca
|
2016-12-18 18:54:19 +03:00
|
|
|
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
o ocho once os otra otras otro otros
|
2016-12-18 18:54:19 +03:00
|
|
|
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
para parece parte partir pasada pasado paìs peor pero pesar poca pocas poco
|
|
|
|
pocos podeis podemos poder podria podriais podriamos podrian podrias podrá
|
2016-12-18 18:54:19 +03:00
|
|
|
podrán podría podrían poner por porque posible primer primera primero primeros
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
pronto propia propias propio propios proximo próximo próximos pudo pueda puede
|
|
|
|
pueden puedo pues
|
2016-12-18 18:54:19 +03:00
|
|
|
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
qeu que quedó queremos quien quienes quiere quiza quizas quizá quizás quién
|
|
|
|
quiénes qué
|
2016-12-18 18:54:19 +03:00
|
|
|
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
realizado realizar realizó repente respecto
|
2016-12-18 18:54:19 +03:00
|
|
|
|
|
|
|
sabe sabeis sabemos saben saber sabes salvo se sea sean segun segunda segundo
|
|
|
|
según seis ser sera será serán sería señaló si sido siempre siendo siete sigue
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
siguiente sin sino sobre sois sola solamente solas solo solos somos son soy su
|
|
|
|
supuesto sus suya suyas suyo suyos sé sí sólo
|
2016-12-18 18:54:19 +03:00
|
|
|
|
|
|
|
tal tambien también tampoco tan tanto tarde te temprano tendrá tendrán teneis
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
tenemos tener tenga tengo tenido tenía tercera tercero ti tiene tienen toda
|
|
|
|
todas todavia todavía todo todos total tras trata través tres tu tus tuvo tuya
|
|
|
|
tuyas tuyo tuyos tú
|
2016-12-18 18:54:19 +03:00
|
|
|
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
u ultimo un una unas uno unos usa usais usamos usan usar usas uso usted ustedes
|
2016-12-18 18:54:19 +03:00
|
|
|
última últimas último últimos
|
|
|
|
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
va vais vamos van varias varios vaya veces ver verdad verdadera verdadero vez
|
|
|
|
vosotras vosotros voy vuestra vuestras vuestro vuestros
|
2016-12-18 18:54:19 +03:00
|
|
|
|
Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:
https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100
Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
pais, principalmente, raras
Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve
Some reformatting to 79 columns.
When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 23:04:02 +03:00
|
|
|
y ya yo
|
💫 Tidy up and auto-format .py files (#2983)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files.
- [x] Update flake8 config to exclude very large files (lemmatization tables etc.)
- [x] Update code to be compatible with flake8 rules
- [x] Fix various small bugs, inconsistencies and messy stuff in the language data
- [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means)
Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results.
At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information.
### Types of change
enhancement, code style
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-11-30 19:03:03 +03:00
|
|
|
""".split()
|
|
|
|
)
|