The project creates a digital set of interrelated resources in order to use them in an interdisciplinary perspective. In particular, we are developing: a corpus of epigraphic texts, a computational lexicon for the languages involved, a dataset of bibliographic references, and an experimental semantic dataset of the research data.
The corpus of the texts is managed and exploited in a digital archive containing the formal representation of the texts leveraging the TEI/EpiDoc encoding schema; it was been necessary to create an ad hoc schema for the peculiarities presented by the languages of fragmentary attestation. The application of the EpiDoc model to Restsprachen is, thus, a complete novelty in the field of Digital Epigraphy.
Each text in the archive is enriched with shared and standard metadata allowing for their accurate description, both as a linguistic object (text: language, alphabet, date, etc.) and as a material object (support: chronology, data of discovery, material, etc.). Upon completion, a fac-simile of the inscriptions is provided. For each inscription a .xml file is released containing the entire description in TEI / EpiDoc.
The project additionally experiments the use of CRMtex and CRMinf extensions of CIDOC CRM, the de facto standard ontology in the Digital Humanities. CRMtex allows the description in a semantic format of textual entities, CRMinf of their scientific interpretations.
The lexical entities present in the texts are described and thoroughly investigated in order to produce a multilingual (Venetic, Oscan, Faliscan, and Celtic) computational lexicon. The project investigated the specific requirements for the design of an efficient computational lexical model specifically dedicated to languages of fragmentary attestation. We adopt Semantic Web standards and vocabularies for providing a structured and formal representation of the lexical items and their related information as well as for allowing for a sophisticated semantic access to the corpus of inscriptions. The challenges to be faced in lexical modeling are numerous, since we are dealing with Restsprachen, and they range from lemmatisation issues to sense representation, given that often meanings can be reconstructed only partially and hypothetically.
Corpus and lexicon interact with each other and will be equipped with a bibliographic apparatus structured according to digital bibliographic models. A bibliography of the languages of ancient Italy, with particular reference to the languages analysed by the project, was created through the ZOTERO bibliographic platform, in a public group library, constantly updated. Furthermore, the bibliography is released in TEI format, to make it compatible with the EpiDoc standard, and mapped according to FRBRoo, a formal ontology intended to represent the underlying semantics of bibliographic information, for the LOD release.
We also experiment Domain-Specific Languages to deploy a system that can assist scholars in the creation of the textual digital resources and ensure compatibility with the used standards.
All resources are fully interoperable and available to the scientific community through an advanced query platform. The tools and resources produced and developed within the project will finally be made available through relevant European-wide Research Infrastructures such as CLARIN and DARIAH, the two currently main infrastructures for the e-Humanities and (immaterial) Cultural Heritage. This will ensure both a long-term preservation of the resources produced and a high valorisation of this heritage.