tcflib.tcf module¶
This module provides an API for TCF documents.
-
class
tcflib.tcf.
AnnotationLayerBase
(initialdata=None)[source]¶ Bases:
object
Base class for annotation layers.
-
corpus
= None¶ The corpus this layer belongs to.
-
parent
= None¶ The parent layer, in case of nested layers.
-
tcf
¶ Return the layer as an etree.Element.
-
-
class
tcflib.tcf.
AnnotationLayer
(initialdata=None)[source]¶ Bases:
tcflib.tcf.AnnotationLayerBase
,collections.UserList
Annotation layer that acts like a list of Annotations.
-
class
tcflib.tcf.
AnnotationLayerWithIDs
(initialdata=None)[source]¶ Bases:
tcflib.tcf.AnnotationLayerBase
,collections.UserDict
Annotation layer that holds IDs of annotations.
This class acts like a hybrid of a list and a dict: It can be used like a list, e.g. it has an append method and it iterates over its values. But its items can also be set and retrieved using annotation IDs with dict- like element access.
-
class
tcflib.tcf.
AnnotationElement
(*, tokens=None)[source]¶ Bases:
object
Base class for annotation elements.
-
parent
= None¶ The annotation layer the element belongs to.
-
tcf
¶ Return the element as an etree.Element.
-
-
class
tcflib.tcf.
TokenList
(initialdata=None)[source]¶ Bases:
collections.UserList
Proxy token list that sets token attributes.
Used for token lists of AnnotationElement`s that maintain a relation between the element and the token. E.g., appending a token to `reference.tokens should set the token’s reference attribute.
-
class
tcflib.tcf.
TextCorpus
(input_data=None, *, layers=None)[source]¶ Bases:
object
The main class that represents a TextCorpus.
A TextCorpus consists of a series of AnnotationLayers.
Parameters: - input_data (str or None) – The XML input.
- layers (list or None) – A list of layers that should be parsed.
-
tree
¶ Return the corpus as an etree.ElementTree.
The original XML tree is kept in memory, so that only newly added layers get serialized. This makes sure that the original tree is not touched.
-
write
(file_or_path, *, encoding='utf-8', pretty_print=True)[source]¶ Write the XML tree into a file.
This method writes each layer successively and discards it afterwards. This is more memory efficient than building the whole tree at once.
Parameters: file_or_path (A file object or a file path.) – The target to which to write the XML tree.
-
add_layer
(layer)[source]¶ Add an
AnnotationLayerBase
object to the corpus.
-
class
tcflib.tcf.
Text
(text)[source]¶ Bases:
tcflib.tcf.AnnotationLayerBase
The text annotation layer.
-
text
= None¶ The unannotated text.
-
tcf
¶ Return the layer as an etree.Element.
-
-
class
tcflib.tcf.
Tokens
(initialdata=None)[source]¶ Bases:
tcflib.tcf.AnnotationLayerWithIDs
The tokens annotation layer.
It holds a sequence of
Token
objects.
-
class
tcflib.tcf.
Token
(text)[source]¶ Bases:
tcflib.tcf.AnnotationElement
The token annotation element.
-
text
= None¶ The token text.
-
lemma
= None¶ The token lemma.
-
tag
= None¶ The POS tag value.
-
entity
= None¶ The
NamedEntity
object for the token.
-
wordsenses
= None¶ The list of word senses for the token.
-
tcf
¶ Return the element as an etree.Element.
-
postag
¶ The POS tag as a
POSTagBase
-
semantic_unit
¶ The semantic unit for a token.
The semantic unit can be the (disambiguated) lemma, a named entity, or a referenced semantic unit.
-
-
class
tcflib.tcf.
Lemmas
(initialdata=None)[source]¶ Bases:
tcflib.tcf.AnnotationLayer
The lemmas annotation layer.
-
tcf
¶ Return the layer as an etree.Element.
-
-
class
tcflib.tcf.
Wsd
(source)[source]¶ Bases:
tcflib.tcf.AnnotationLayer
The word senses (wsd) annotation layer.
-
tcf
¶ Return the layer as an etree.Element.
-
Bases:
tcflib.tcf.AnnotationLayer
The POStags annotation layer.
Return the layer as an etree.Element.
-
class
tcflib.tcf.
DepParsing
(tagset, emptytoks=False, multigovs=False)[source]¶ Bases:
tcflib.tcf.AnnotationLayerWithIDs
The depparsing annotation layer.
It holds a sequence of
DepParse
objects.-
tcf
¶ Return the layer as an etree.Element.
-
-
class
tcflib.tcf.
DepParse
[source]¶ Bases:
tcflib.tcf.AnnotationLayer
The parse annotation element.
It holds a sequence of
Dependency
objects.
-
class
tcflib.tcf.
Dependency
(func, gov_tokens=None, dep_tokens=None)[source]¶ Bases:
tcflib.tcf.AnnotationElement
The dependecy annotation element.
-
tcf
¶ Return the element as an etree.Element.
-
-
class
tcflib.tcf.
NamedEntities
(type)[source]¶ Bases:
tcflib.tcf.AnnotationLayerWithIDs
The namedEntities annotation layer.
It holds a sequence of
NamedEntity
objects.-
tcf
¶ Return the layer as an etree.Element.
-
-
class
tcflib.tcf.
NamedEntity
(class_=None, tokens=None)[source]¶ Bases:
tcflib.tcf.AnnotationElement
The token annotation element.
-
tcf
¶ Return the element as an etree.Element.
-
-
class
tcflib.tcf.
References
(typetagset, reltagset, extrefs)[source]¶ Bases:
tcflib.tcf.AnnotationLayer
The references annotation layer.
-
tcf
¶ Return the layer as an etree.Element.
-
-
class
tcflib.tcf.
Entity
[source]¶ Bases:
tcflib.tcf.AnnotationLayerWithIDs
The entity annotation element.
This class represents a coreference entity inside the references annotation layer. The entity inside the namedEntities annotation layer is represented by the
NamedEntity
class. In TCF, both share the entity tag name.An entity holds a sequence of
Reference
objects.-
tcf
¶ Return the layer as an etree.Element.
-
-
class
tcflib.tcf.
Reference
(*, type=None, rel=None, target=None, tokens=None)[source]¶ Bases:
tcflib.tcf.AnnotationElement
The reference annotation element.
-
tokens
¶ The tokens for this reference.
-
tcf
¶ Return the element as an etree.Element.
-
-
class
tcflib.tcf.
Sentences
(initialdata=None)[source]¶ Bases:
tcflib.tcf.AnnotationLayerWithIDs
The sentences annotation layer.
It holds a sequence of
Sentence
objects.
-
class
tcflib.tcf.
Sentence
(*, tokens=None)[source]¶ Bases:
tcflib.tcf.AnnotationElement
The token annotation element.
-
class
tcflib.tcf.
TextStructure
(initialdata=None)[source]¶ Bases:
tcflib.tcf.AnnotationLayer
The textstructure annotation layer.
It holds a sequence of
TextSpan
objects.
-
class
tcflib.tcf.
TextSpan
(type=None)[source]¶ Bases:
tcflib.tcf.AnnotationElement
The token annotation element.
-
type
= None¶ The type of span.
-
tcf
¶ Return the element as an etree.Element.
-
-
class
tcflib.tcf.
Graph
(*, label='lemma', weight='count')[source]¶ Bases:
tcflib.tcf.AnnotationLayerBase
The graph annotation layer.
This layer implements a graph API to store graph representations of the text (e.g., cooccurrence graphs).
-
tcf
¶ Return the layer as an etree.Element.
-
-
exception
tcflib.tcf.
LoopError
[source]¶ Bases:
Exception
This exception is raised if a request to add an edge would result in a loop.
-
tcflib.tcf.
serialize
(obj)[source]¶ Serialize an object into a byte string.
Parameters: obj – A TextCorpus
, etree.ElementTree or string.Return type: bytes