EVENT PREDICTION AND NAMED ENTITY EXTRACTION
WHO ARE WE?
Producer and broadcaster of high value added information on companies, Infolegale is a company with strong growth created in 2008. Our offer consists of differentiated solutions to address solvency, credit risk management, compliance and marketing information. Located in Lyon Part-Dieu, Infolegale is a human-sized company that beside its 10 years history has kept it “start-up” mentality.
By being part of the Data Innovation team of Infolegale you would have the opportunity to work on exciting topics within a dynamic team where personal initiative is highly valued.
Many events in the life of a society requires the publication of legal announcements.
Since its creation the digitalization of legal advertisement has been a strong differentiated factor for Infolegale. For the last ten year we have been digitizing the completeness of the French legal announcements which represents more that 10 millions legal notices published in more than 500 different newspapers.
The complex process of digitizing is, for its major part, manual. In order to keep that competitive advantage
Infolegale wishes to bring its current process to the next level.
In order to achieve the automated digitizing we are looking at a motivated Data Scientist intern interested in growing his skills around NLP and NER techniques.
Since Infolegale has been generating the events for more than 10Millions ads this problem is purely a supervised one and will involve NLP and classic modelling skills. From the raw legal ad text predict which event or series of events it describes.
Particular attention must be pay to performance assessment.
Extract the information from the ad’s text (ie company name, share capital, address, executive name and position…)
This is a classic but challenging NLP/NER problem where an annotated corpus can be created based on the already digitized ad. A special attention should be paid to the specific syntax, vocabulary and punctuation that is used in the legal notice.
CANDIDATE ARE EXPECTED TO HAVE:
- Good understanding of machine learning concepts
- Good understanding of performance measurements of classification models (Sensibility, Gini…) Strong analytic and communication skills
- Strong programming skills (especially python) Basic SQL skills
- Some knowledge with the R programming language would be a plus.
- A previous experience in Natural Language Processing and Named Entity Recognition would be appreciated.
During this internship you will have the opportunity to discover the whole process of setting up a production grade data centric solutions, master NLP/NER techniques and level up your analytic and programming skills.
A permanent position may be proposed at the end of the internship
Remuneration : 1100€/month + target-based bonus