tcflib.tcf module¶
This module provides an API for TCF documents.
-
class
tcflib.tcf.AnnotationLayerBase(initialdata=None)[source]¶ Bases:
objectBase class for annotation layers.
-
corpus= None¶ The corpus this layer belongs to.
-
parent= None¶ The parent layer, in case of nested layers.
-
tcf¶ Return the layer as an etree.Element.
-
-
class
tcflib.tcf.AnnotationLayer(initialdata=None)[source]¶ Bases:
tcflib.tcf.AnnotationLayerBase,collections.UserListAnnotation layer that acts like a list of Annotations.
-
class
tcflib.tcf.AnnotationLayerWithIDs(initialdata=None)[source]¶ Bases:
tcflib.tcf.AnnotationLayerBase,collections.UserDictAnnotation layer that holds IDs of annotations.
This class acts like a hybrid of a list and a dict: It can be used like a list, e.g. it has an append method and it iterates over its values. But its items can also be set and retrieved using annotation IDs with dict- like element access.
-
class
tcflib.tcf.AnnotationElement(*, tokens=None)[source]¶ Bases:
objectBase class for annotation elements.
-
parent= None¶ The annotation layer the element belongs to.
-
tcf¶ Return the element as an etree.Element.
-
-
class
tcflib.tcf.TokenList(initialdata=None)[source]¶ Bases:
collections.UserListProxy token list that sets token attributes.
Used for token lists of AnnotationElement`s that maintain a relation between the element and the token. E.g., appending a token to `reference.tokens should set the token’s reference attribute.
-
class
tcflib.tcf.TextCorpus(input_data=None, *, layers=None)[source]¶ Bases:
objectThe main class that represents a TextCorpus.
A TextCorpus consists of a series of AnnotationLayers.
Parameters: - input_data (str or None) – The XML input.
- layers (list or None) – A list of layers that should be parsed.
-
tree¶ Return the corpus as an etree.ElementTree.
The original XML tree is kept in memory, so that only newly added layers get serialized. This makes sure that the original tree is not touched.
-
write(file_or_path, *, encoding='utf-8', pretty_print=True)[source]¶ Write the XML tree into a file.
This method writes each layer successively and discards it afterwards. This is more memory efficient than building the whole tree at once.
Parameters: file_or_path (A file object or a file path.) – The target to which to write the XML tree.
-
add_layer(layer)[source]¶ Add an
AnnotationLayerBaseobject to the corpus.
-
class
tcflib.tcf.Text(text)[source]¶ Bases:
tcflib.tcf.AnnotationLayerBaseThe text annotation layer.
-
text= None¶ The unannotated text.
-
tcf¶ Return the layer as an etree.Element.
-
-
class
tcflib.tcf.Tokens(initialdata=None)[source]¶ Bases:
tcflib.tcf.AnnotationLayerWithIDsThe tokens annotation layer.
It holds a sequence of
Tokenobjects.
-
class
tcflib.tcf.Token(text)[source]¶ Bases:
tcflib.tcf.AnnotationElementThe token annotation element.
-
text= None¶ The token text.
-
lemma= None¶ The token lemma.
-
tag= None¶ The POS tag value.
-
entity= None¶ The
NamedEntityobject for the token.
-
wordsenses= None¶ The list of word senses for the token.
-
tcf¶ Return the element as an etree.Element.
-
postag¶ The POS tag as a
POSTagBase
-
semantic_unit¶ The semantic unit for a token.
The semantic unit can be the (disambiguated) lemma, a named entity, or a referenced semantic unit.
-
-
class
tcflib.tcf.Lemmas(initialdata=None)[source]¶ Bases:
tcflib.tcf.AnnotationLayerThe lemmas annotation layer.
-
tcf¶ Return the layer as an etree.Element.
-
-
class
tcflib.tcf.Wsd(source)[source]¶ Bases:
tcflib.tcf.AnnotationLayerThe word senses (wsd) annotation layer.
-
tcf¶ Return the layer as an etree.Element.
-
Bases:
tcflib.tcf.AnnotationLayerThe POStags annotation layer.
Return the layer as an etree.Element.
-
class
tcflib.tcf.DepParsing(tagset, emptytoks=False, multigovs=False)[source]¶ Bases:
tcflib.tcf.AnnotationLayerWithIDsThe depparsing annotation layer.
It holds a sequence of
DepParseobjects.-
tcf¶ Return the layer as an etree.Element.
-
-
class
tcflib.tcf.DepParse[source]¶ Bases:
tcflib.tcf.AnnotationLayerThe parse annotation element.
It holds a sequence of
Dependencyobjects.
-
class
tcflib.tcf.Dependency(func, gov_tokens=None, dep_tokens=None)[source]¶ Bases:
tcflib.tcf.AnnotationElementThe dependecy annotation element.
-
tcf¶ Return the element as an etree.Element.
-
-
class
tcflib.tcf.NamedEntities(type)[source]¶ Bases:
tcflib.tcf.AnnotationLayerWithIDsThe namedEntities annotation layer.
It holds a sequence of
NamedEntityobjects.-
tcf¶ Return the layer as an etree.Element.
-
-
class
tcflib.tcf.NamedEntity(class_=None, tokens=None)[source]¶ Bases:
tcflib.tcf.AnnotationElementThe token annotation element.
-
tcf¶ Return the element as an etree.Element.
-
-
class
tcflib.tcf.References(typetagset, reltagset, extrefs)[source]¶ Bases:
tcflib.tcf.AnnotationLayerThe references annotation layer.
-
tcf¶ Return the layer as an etree.Element.
-
-
class
tcflib.tcf.Entity[source]¶ Bases:
tcflib.tcf.AnnotationLayerWithIDsThe entity annotation element.
This class represents a coreference entity inside the references annotation layer. The entity inside the namedEntities annotation layer is represented by the
NamedEntityclass. In TCF, both share the entity tag name.An entity holds a sequence of
Referenceobjects.-
tcf¶ Return the layer as an etree.Element.
-
-
class
tcflib.tcf.Reference(*, type=None, rel=None, target=None, tokens=None)[source]¶ Bases:
tcflib.tcf.AnnotationElementThe reference annotation element.
-
tokens¶ The tokens for this reference.
-
tcf¶ Return the element as an etree.Element.
-
-
class
tcflib.tcf.Sentences(initialdata=None)[source]¶ Bases:
tcflib.tcf.AnnotationLayerWithIDsThe sentences annotation layer.
It holds a sequence of
Sentenceobjects.
-
class
tcflib.tcf.Sentence(*, tokens=None)[source]¶ Bases:
tcflib.tcf.AnnotationElementThe token annotation element.
-
class
tcflib.tcf.TextStructure(initialdata=None)[source]¶ Bases:
tcflib.tcf.AnnotationLayerThe textstructure annotation layer.
It holds a sequence of
TextSpanobjects.
-
class
tcflib.tcf.TextSpan(type=None)[source]¶ Bases:
tcflib.tcf.AnnotationElementThe token annotation element.
-
type= None¶ The type of span.
-
tcf¶ Return the element as an etree.Element.
-
-
class
tcflib.tcf.Graph(*, label='lemma', weight='count')[source]¶ Bases:
tcflib.tcf.AnnotationLayerBaseThe graph annotation layer.
This layer implements a graph API to store graph representations of the text (e.g., cooccurrence graphs).
-
tcf¶ Return the layer as an etree.Element.
-
-
exception
tcflib.tcf.LoopError[source]¶ Bases:
ExceptionThis exception is raised if a request to add an edge would result in a loop.
-
tcflib.tcf.serialize(obj)[source]¶ Serialize an object into a byte string.
Parameters: obj – A TextCorpus, etree.ElementTree or string.Return type: bytes