CREyeild

Client

Capital Real Estate business enthusiasts sketched out a plan to empower the Owners and Investors. The venture involved patently novel insights driven by Artificial Intelligence.

The company wanted to make a system that is adaptable across the United States to every Real Estate business. CRE incorporates minds that are proficient in every aspect of real estate and thus they wanted to develop a system through which the ambiguities about the rules and procedures could be minimized as much as possible.

Objective

The objective of the client is to make a system efficient enough to enable Owners and Investors in their every concern regarding real estate. The main idea was to design a system that could help users gain an insight about the ongoing fluctuations in the real estate trends and could henceforth valuate their property. The required information for property valuation could reside anywhere on the web and therefore the system shall be able to search the whole web in order to generate most relevant and latest content.

Challenges

1. Security Exchange Commission (SEC) Edgar

a. SEC Edgar is an online Database maintaining financial records of all the companies across US. The goal was to fetch data of Real Estate Investment Trust (REIT) from the enormous pool of SEC Edgar. This extracted information could then be adapted for valuating a property and generate a system that was consistent and up to date with rest of the real estate companies in the market.

b. This data was Semi-Structured, i.e. in Tabular forms where one table had multiple tables within and extraction of relevant information from this was a major challenge.


2. Annotation

a. To fetch the most relevant and latest real estate information, the news articles incorporating property related information were to be catered that contained a lot of details in a single sentence.

b. This data was un-structured and performing extractions for relevant information from this was a tedious task.

Process

SEC Edgar:
Annotation:

Solution

SEC Edgar

Extraction of information from a semi-structured data involved a lengthy process of R&D. After days of research and experimentation, the method adapted for data extraction involved OCR (Optical Character Recognition) methodology. Screenshots of nested tables were fed to a customized OCR algorithm that produced most relevant data in the form of CSV files.

Annotation

IBM Watson Knowledge Studio is a cloud-based application that enables developers and domain experts to collaborate on the creation of custom annotator components that can be used to identify mentions and relations in unstructured text. Systems like IBM Watson provide Natural Language Processing power to “understand” the unstructured data. Therefore, a glossary was defined to identify relevant information from the News Articles to extract useful details from the unstructured information.

Technologies