spaCy/spacy/cli/info.py
Ines Montani 37c7c85a86 💫 New JSON helpers, training data internals & CLI rewrite (#2932)
* Support nowrap setting in util.prints

* Tidy up and fix whitespace

* Simplify script and use read_jsonl helper

* Add JSON schemas (see #2928)

* Deprecate Doc.print_tree

Will be replaced with Doc.to_json, which will produce a unified format

* Add Doc.to_json() method (see #2928)

Converts Doc objects to JSON using the same unified format as the training data. Method also supports serializing selected custom attributes in the doc._. space.

* Remove outdated test

* Add write_json and write_jsonl helpers

* WIP: Update spacy train

* Tidy up spacy train

* WIP: Use wasabi for formatting

* Add GoldParse helpers for JSON format

* WIP: add debug-data command

* Fix typo

* Add missing import

* Update wasabi pin

* Add missing import

* 💫 Refactor CLI (#2943)

To be merged into #2932.

## Description
- [x] refactor CLI To use [`wasabi`](https://github.com/ines/wasabi)
- [x] use [`black`](https://github.com/ambv/black) for auto-formatting
- [x] add `flake8` config
- [x] move all messy UD-related scripts to `cli.ud`
- [x] make converters function that take the opened file and return the converted data (instead of having them handle the IO)

### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

* Update wasabi pin

* Delete old test

* Update errors

* Fix typo

* Tidy up and format remaining code

* Fix formatting

* Improve formatting of messages

* Auto-format remaining code

* Add tok2vec stuff to spacy.train

* Fix typo

* Update wasabi pin

* Fix path checks for when train() is called as function

* Reformat and tidy up pretrain script

* Update argument annotations

* Raise error if model language doesn't match lang

* Document new train command
2018-11-30 20:16:14 +01:00

78 lines
2.5 KiB
Python

# coding: utf8
from __future__ import unicode_literals
import plac
import platform
from pathlib import Path
from wasabi import Printer
from ._messages import Messages
from ..compat import path2str
from .. import util
from .. import about
@plac.annotations(
model=("Optional shortcut link of model", "positional", None, str),
markdown=("Generate Markdown for GitHub issues", "flag", "md", str),
silent=("Don't print anything (just return)", "flag", "s"),
)
def info(model=None, markdown=False, silent=False):
"""
Print info about spaCy installation. If a model shortcut link is
speficied as an argument, print model information. Flag --markdown
prints details in Markdown for easy copy-pasting to GitHub issues.
"""
msg = Printer()
if model:
if util.is_package(model):
model_path = util.get_package_path(model)
else:
model_path = util.get_data_path() / model
meta_path = model_path / "meta.json"
if not meta_path.is_file():
msg.fail(Messages.M020, meta_path, exits=1)
meta = util.read_json(meta_path)
if model_path.resolve() != model_path:
meta["link"] = path2str(model_path)
meta["source"] = path2str(model_path.resolve())
else:
meta["source"] = path2str(model_path)
if not silent:
title = "Info about model '{}'".format(model)
model_meta = {
k: v for k, v in meta.items() if k not in ("accuracy", "speed")
}
if markdown:
util.print_markdown(model_meta, title=title)
else:
msg.table(model_meta, title=title)
return meta
data = {
"spaCy version": about.__version__,
"Location": path2str(Path(__file__).parent.parent),
"Platform": platform.platform(),
"Python version": platform.python_version(),
"Models": list_models(),
}
if not silent:
title = "Info about spaCy"
if markdown:
util.print_markdown(data, title=title)
else:
msg.table(data, title=title)
return data
def list_models():
def exclude_dir(dir_name):
# exclude common cache directories and hidden directories
exclude = ("cache", "pycache", "__pycache__")
return dir_name in exclude or dir_name.startswith(".")
data_path = util.get_data_path()
if data_path:
models = [f.parts[-1] for f in data_path.iterdir() if f.is_dir()]
return ", ".join([m for m in models if not exclude_dir(m)])
return "-"