Skip to content


Document Archive Classification

Document Archive Classification


Document Archive Classification

Document Archive Classification

A client engaged to prove that document classification including category and retention length from a variety of mediums could be achieved.

Technology Stack


Natural Language Processing


The client was struggling with compliance of record keeping due to the menial nature of the task. Individuals would either not complete the task or lump everything into one category regardless of accuracy. As a result the client wanted to implement an automated record classification and archiving bot. 


The solution leverages a support vector machine to classify documents into retention categories based on the text within. The model was trained to recognise 7 classes. These classes all map to a retention period for that document type, meaning an automated workflow could be created where all assets (from emails to confidential documents) can be ingested and filed for retention with limited human interaction. 


The solution archives all documents correctly rather than having employees allocate time in their day to do so. 

This allowed for documents to be classified correctly to ensure they are retained for the correct length of time and ultimately increased compliance, freed skilled employees for non-menial tasks and created a time saving solution.