The project has several phases:
1/ A phase of realization of the anonymization tool which is composed of two application modules:
- The anonymization engine
- The curing interface
The anonymization engine takes the form of a batch processing that is integrated into an existing EAI-like chain and will run nightly. The tool receives as input a list of documents in several formats (.doc, .txt, etc.) and provides as output a list of anonymized documents in the same or different formats (.doc, .txt, xml, etc.). The tool can easily integrate new formats as input and output. The engine does not use any other structured database-type information to identify surnames/first names/addresses, but only an exhaustive syntactic analysis of each decision.
The curation interface allows different users to control the anonymization process by planning and tracking the execution of batches of documents to be anonymized. In addition, it allows the consultation of the anonymization results, the comparative view of a document in its initial and anonymized form as well as the correction of the anonymization of a document or the relaunching of the anonymization of a document or a batch of documents. Access to the curation interface is restricted to authorized users and advanced profile management defines user access to features and documents from different jurisdictions. In terms of administration, in addition to the management of users and profiles, the interface allows the evolution of the anonymization engine by enriching the dictionaries it uses.
The tool meets the following objectives:
- Anonymize in a coherent way the input documents (to eliminate the personal information of the persons, but keep the meaning of the document)
- Automatically anonymize more than 90% of the documents to be processed
- Allow the supervision of the anonymization operation from a dedicated interface
- Identify the documents that could not be anonymous, or on which a doubt remains and allow their manual anonymization from a dedicated interface.
- To be able to compare documents in their initial, anonymized version
- Process approximately 600 documents in less than 3 hours
- Offer near 100% uptime, including maintenance
2/ A phase of deployment and production start-up and training
3/ A phase of TMA (Third Party Application Maintenance) which includes :
- Corrective and adaptive maintenance
- Achievement of the changes compared to the initial scope of the project