mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-01 00:17:44 +03:00 
			
		
		
		
	
		
			
				
	
	
		
			38 lines
		
	
	
		
			1.4 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			38 lines
		
	
	
		
			1.4 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: Corpus
 | |
| teaser: An annotated corpus
 | |
| tag: class
 | |
| source: spacy/gold/corpus.py
 | |
| new: 3
 | |
| ---
 | |
| 
 | |
| This class manages annotated corpora and can read training and development
 | |
| datasets in the [DocBin](/api/docbin) (`.spacy`) format.
 | |
| 
 | |
| ## Corpus.\_\_init\_\_ {#init tag="method"}
 | |
| 
 | |
| Create a `Corpus`. The input data can be a file or a directory of files.
 | |
| 
 | |
| | Name        | Type         | Description                                                      |
 | |
| | ----------- | ------------ | ---------------------------------------------------------------- |
 | |
| | `train`     | str / `Path` | Training data (`.spacy` file or directory of `.spacy` files).    |
 | |
| | `dev`       | str / `Path` | Development data (`.spacy` file or directory of `.spacy` files). |
 | |
| | `limit`     | int          | Maximum number of examples returned.                             |
 | |
| | **RETURNS** | `Corpus`     | The newly constructed object.                                    |
 | |
| 
 | |
| <!-- TODO: document remaining methods / decide which to document -->
 | |
| 
 | |
| ## Corpus.walk_corpus {#walk_corpus tag="staticmethod"}
 | |
| 
 | |
| ## Corpus.make_examples {#make_examples tag="method"}
 | |
| 
 | |
| ## Corpus.make_examples_gold_preproc {#make_examples_gold_preproc tag="method"}
 | |
| 
 | |
| ## Corpus.read_docbin {#read_docbin tag="method"}
 | |
| 
 | |
| ## Corpus.count_train {#count_train tag="method"}
 | |
| 
 | |
| ## Corpus.train_dataset {#train_dataset tag="method"}
 | |
| 
 | |
| ## Corpus.dev_dataset {#dev_dataset tag="method"}
 |