"defoe: A Spark-based Toolbox for Analysing Digital Historical Textual Data"

Friday, October 4, 2019, 11:00 am PSTiCal
6th Floor Large Conference Room (689), ISI MdR
This event is open to the public.
ISI Seminar Series
Rosa Filgueira, University of Edinburgh
Video Recording:


In this talk I will present "defoe," a new, scalable, and portable digital toolbox that enables historical research. It allows for extracting knowledge from historical data by running text analyses across large digital collections, such as historical newspapers and books in parallel. It offers a rich set of text mining queries, which have been defined by humanities researchers. We have included NLP prepossessing techniques to mitigate against OCR errors and standardise the textual data. We have tested defoe using six different large-scale historical text datasets and three HPC environments, as well as on desktops. 

Speaker Bio

Rosa Filgueira is a research fellow at EPCC (University of Edinburgh), working in several national and international funded projects. Before that, she was working as a Senior Data Scientist at the British Geological Survey,  as a Senior Research Associate at the Data Intensive Research Group of the University Edinburgh and as a Research and Teaching Assistant at the Computer Architecture Group of University Carlos III Madrid. Her research is concerned with two closely topics. The first one is to develop adaptive communication techniques which optimise the data movement for data-intensive applications at different HPC levels. The second one is to facilitate the development of scientific workflows/application that can by run in many HPC environments while hiding the complexity to the users.

This seminar is brought to you by the Science Automation Technologies (SciTech) team.
« Return to Upcoming Events