The project plans to create a digital set of interrelated resources in order to use them in an interdisciplinary perspective. In particular, we plan to develop: a corpus of epigraphic texts, a computational lexicon for the languages involved, a dataset of bibliographic references, and an experimental semantic dataset of the research data.
The corpus of the texts will be managed and exploited in a digital archive containing the formal representation of the texts leveraging the TEI/EpiDoc encoding schema; it may be necessary to create an ad hoc schema for the peculiarities presented by the languages of fragmentary attestation. The application of the EpiDoc model to Restsprachen is, thus, a complete novelty in the field of Digital Epigraphy.
Each text in the archive will be enriched with shared and standard metadata allowing for their accurate description, both as a linguistic object (text: language, alphabet, date, etc.) and as a material object (support: chronology, data of discovery, material, etc.). Upon completion, a fac-simile of the inscriptions will be created. For each inscription a .xml file will be released containing the entire description in TEI / EpiDoc.
The project additionally plans to experiment with the use of CRMtex and CRMinf extensions of CIDOC CRM, the de facto standard ontology in the Digital Humanities. CRMtex allows the description in a semantic format of textual entities, CRMinf of their scientific interpretations.
The lexical entities present in the texts will be described and thoroughly investigated in order to produce a multilingual (Venetic, Oscan, Faliscan, and Celtic) computational lexicon. The project will investigate the specific requirements for the design of an efficient computational lexical model specifically dedicated to languages of fragmentary attestation. We will adopt Semantic Web standards and vocabularies for providing a structured and formal representation of the lexical items and their related information as well as for allowing for a sophisticated semantic access to the corpus of inscriptions. The challenges to be faced in lexical modeling are numerous, since we are dealing with Restsprachen, and they range from lemmatisation issues to sense representation, given that often meanings can be reconstructed only partially and hypothetically.
Corpus and lexicon will interact with each other and will be equipped with a bibliographic apparatus structured according to digital bibliographic models.
It is also planned to experiment with Domain-Specific Languages to deploy a system that can assist scholars in the creation of the textual digital resources and ensure compatibility with the used standards.
All resources will be fully interoperable and will be made available to the scientific community through an advanced query platform. The tools and resources produced and developed within the project will finally be made available through relevant European-wide Research Infrastructures such as CLARIN and DARIAH, the two currently main infrastructures for the e-Humanities and (immaterial) Cultural Heritage. This will ensure both a long-term preservation of the resources produced and a high valorisation of this heritage.