← Index
A bilingual recipe corpus · Romania ◇ Italia

Three thousand recipes,
two languages,
one open corpus.

SAVOR brings together interwar Romanian manuscripts, Pellegrino Artusi's 1891 cookbook, and 500 contemporary student recipes — structured, semantically annotated, and made queryable. A FAIR benchmark for the digital humanities, an invitation for the curious.

3,300
recipes · ro + it
~180k
semantic triples
M12
delivery · zenodo doi
For digital humanists

A clean, validated corpus structured against schema.org/Recipe and FOODon. Provenance preserved per record; manuscripts cited at the folio.

For NLP researchers

Multilingual LaBSE embeddings, FAISS index, and a REST/SPARQL surface for cross-lingual retrieval, clustering, and zero-shot evaluation.

For the curious

Browse Sunday-lunch dishes from Cluj alongside Sunday-lunch dishes from Romagna. Find what changes, what doesn't, and what surprises.

The corpus, at a glance

3,300 recipes, three sources, two languages.

Each square is one recipe. Aubergine is Romanian manuscript heritage — interwar notebooks, church archives, ethnographic collections. Ochre is contemporary student cooking from USAMV Cluj. Terracotta is Pellegrino Artusi's 1891 cookbook, digitised by Casa Artusi.

2,000 · RO manuscripts 500 · RO student 800 · IT Artusi
each square = one recipe 3,300 of 3,300 ingested · M9
Semantic search

Ask the corpus the way you'd ask a fellow gourmand.

Not a search box — a correspondence. Write to SAVOR in plain Romanian, Italian, or English. It will write back with recipes, citations, and the occasional aside.

SAVOR / Tavola

For people who cook, teach, and curate.

The website, the search, the recipe pages, the exhibits. Warm, editorial, marginal. This is where the corpus becomes legible.

You are here →

SAVOR / Lab

For people who index, embed, and infer.

The schema, the SPARQL endpoint, the embedding index, the GitHub releases. Quiet, structured, technical. The corpus as data, ready to compose with.

Open the lab →
From manuscript to machine

Six stages, twelve months, three quality gates.

01
Source consolidation
XML from Casa Artusi, CSV from USAMV, OCR text from Romanian archives. Harmonised in UTF-8.
M1 — M2
02
Schema & validation
SAVOR-JSON schema, extending schema.org/Recipe. CI workflows on GitHub for every record.
M1 — M4
03
Semantic annotation
spaCy + FOODon + Wikidata for ingredients. GeoNames + PeriodO for context. FAO for nutrition.
M3 — M7
04
AI-readiness
LaBSE multilingual embeddings, FAISS index, REST API, Docker container, Jupyter notebooks.
M4 — M10
05
Quality assurance
Schema validation, linguistic proofreading, expert review. 10% sample manually verified.
ongoing
06
Cloud ingestion
RDF/Turtle, BagIt, ECHOES Cloud PIDs, OAI-PMH endpoint, Zenodo DOI. Annual re-builds.
M8 — M12
Consortium

Four institutions, two countries, one corpus.

RO · Coordinator
Romanian Academy

Cluj-Napoca Branch — digitisation, ethnography, technical coordination.

Cultural Heritage Institution / Archive
RO
RACAI

Institute for Artificial Intelligence "Mihai Drăgănescu" — NLP pipelines, multilingual lexical resources.

Research institute
RO
USAMV Cluj-Napoca

University of Agricultural Sciences & Veterinary Medicine — food science, nutrition, contemporary corpus.

University
IT
Casa Artusi

Forlimpopoli — Pellegrino Artusi's archive, food-history scholarship, cultural communication.

Cultural Heritage Institution / Museum