Technology Toolkit 2021 is a technical white paper describing core technologies that are being researched and developed by Samsung SDS R&D Center. We would like to introduce in this paper a total of seven technologies concerning AI, Blockchain, Cloud, and Security with details on their technical definition, key features, differentiating points, and use cases to give our readers some insights into our work.
Knowledge graph is a representation of knowledge structure showing related information in edges and nodes. When information is stored in knowledge graph format, information of high relevance can easily be identified and thus provide users with richer information.
Information retrieval is one area where knowledge graph can be put to its best use. In the past, data were stored using inverted index method (linking the texts and location of keywords within a set of documents) for information retrieval. This method can show documents containing query without a problem, but it is limited in that it doesn’t show related information.
Building knowledge graph is the key to advancing information retrieval and delivering desired answers with related information in clear format. Knowledge graph technology has high utility value as proven by its inclusion in Gartner's Top Ten Data and Analysis Technology Trends in 2019, and we expect to see it applied to broader area moving forward. In this paper, we will give you an overview of knowledge graph – which is also being actively adopted by leading global IT companies like Google and Amazon for storing knowledge for advanced search– and some of the solutions that incorporate knowledge graph.
① Knowledge Graph
Knowledge graph refers to knowledge stored in graph format comprising of nodes and edges – node for data on individual entity and edge for an association between each entity. The main purpose of presenting data in knowledge graph format is that graph is the most useful data structure for accumulating and delivering knowledge with their association.
② Knowledge Design
Because knowledge holds different concept and purpose depending on the area where it was accumulated, nodes and edges of knowledge graph should contain information that differs by area. Knowledge design is the process of specifically deciding which types of nodes and edges to use in accordance with the final information that will be provided to users using knowledge graph.
③ Knowledge Engineering
Knowledge graph is constructed by analyzing data collected from multiple data sources. It takes text data such as documents and uses natural language analysis technology to analyze, convert and accumulate them in nodes and edges. In order to overcome the limitations of natural language processing technology and to reflect newly emerging data, knowledge experts must check knowledge graph periodically and modify data not correctly accumulated. This process is called knowledge engineering.
Knowledge graph offers many advantages over traditional table-based databases. First, it allows you to infer relationship (reasoning) that is not explicitly defined. With existing table-based database, if the relationship between entities is not clearly defined in the table, it would be impossible to infer their relationship, but with knowledge graph, it is possible to explore and infer new relationship based on previously defined relationship of knowledge graph. Second, knowledge graph yields better performance for query that can only be answered using multiple references. Let’s say you enter the following query "show me the document cited by the document that was cited by the proposal" into database. With existing table-based database, both the document table and the tables representing the citation relationship must be referenced, which increases the number of data accesses. On the other hand, the knowledge graph requires only few approaches to find an answer to your query because the documents are linked to citation relationships. Finally, when there is a change in data such as adding of a new data or deleting or changing of existing data, the knowledge graph can easily incorporate these changes. With existing database, every time a new type of data is added, a table must be extended, and its relevance to existing data has to be considered thoroughly. This will not only put a financial burden on calculation, but may compromise data integrity as well. It is very easy to manage knowledge graph because all you have to do is just add or delete nodes from the existing knowledge graph, and connect or delete edges based on the relationship between existing nodes.
We provide functions needed to organize, manage and use unstructured text data as a knowledge graph in a set of technologies called SDS Insight Engine. Our SDS Insight Engine builds knowledge graph from knowledge information database owned by a company using AI- based unstructured text analysis technology. In addition, our solution offers API that can advance information retrieval and recommendation service using knowledge graph.
Knowledge information data held by a company are stored in various repositories ranging from DB, web, to cloud. The information is stored as images as well as text and attachments of various formats. Our Insight Engine basically extracts knowledge from unstructured text data but it can also extract and analyze texts that are in image format or are included in office documents of various format (pdf, doc, ppt, pdf, html).
Our Insight Engine builds knowledge graph by analyzing unstructured text data obtained from data source, therefore basic language analysis tools such as morpheme analysis, language identification, entity name recognition, tokenizer, and relation extraction module are provided to support the analysis. We provide API that builds a personalized recommendation model based on the knowledge stored in knowledge graph. To build and utilize knowledge graph, you need graph database that’s equipped with a function for storing and finding knowledge information. Our Insight Engine is designed to build knowledge graph using JanusGraph or neo4j, an open source graph database. We provide API with varying functions that can add or delete edges and nodes and search nodes as well as sub-nodes in knowledge graph when query is entered.
The knowledge accumulated in knowledge graph can be used in conjunction with existing AI model designed for search, recommendation, classification, and prediction. In addition, our Insight Engine can be applied to application service that provides complex question-answering from natural language knowledge stored in knowledge graph using natural language processing (NLP) and understanding technology (NLU). SDS Insight Engine offers the following features.
Integration of multiple data sources
∙ Integrates and leverages image and audio data as well as text (E-Mail, HTML) data
Recognizes intention and situation
∙ Recognizes intention and situation through AI-based natural language processing
∙ Provides meaningful information by extracting relationships from knowledge graph
∙ Expands knowledge with configuration of relationships between data sources
∙ Efficiently updates and operates information collected/changed in real time
∙ Provides complex question-answering using natural language processing and understanding (NLP, NLU) technology
∙ Connects to existing search/recommendation/classification/predictive AI model
Customers who are thinking of adopting knowledge graph to advance their information retrieval system are burdened by the thought of having to build and run a new knowledge graph separately from their existing search system. But this is not a problem with our SDS Insight Engine. Our Insight Engine comes with automation function that integrates knowledge information databases and stores information in knowledge graph with ease. For example, our Insight Engine refers to database schema and extracts node information like the name of a person/place/company and document category as well as their relationships. Moreover, it extracts text data from the attachment file and uses it in natural language processing analysis.
Nowadays, a lot of knowledge is stored in image format (jpg, png, etc.) like document scan, or in structure format containing both text and tables (such as html), making it difficult to extract information and create knowledge graph using existing technology. The advantage of SDS Insight Engine lies in that it can handle information stored in various formats – it can extract and analyze unstructured data using the right text extract technology and build knowledge graph.
Our Insight Engine brings unstructured text data from knowledge information dataset and associated attachments, and uses AI-based natural language processing technology to extract meaningful entity names and their relationship. Named entity recognition is a technology that extracts words fitting pre-defined classification from unstructured text such as the name of a person, organization, or place, and these extracted words become the nodes of knowledge graph. Named entity recognition technology incorporated into our SDS Insight Engine is unique in that it complementarily uses deep learning technology and dictionary-based method which means our deep-learning based model understands context and recognizes unlearned name of a new company or person with flexibility and customizes dictionary to handle documents of unfamiliar new domains such as humanities, business medical science or finance. Furthermore, when the relationship between entities need to be defined in knowledge graph, our AI-based relationship extraction model refers to the context of document and selects relationship that best describes them.
We provide application API that allows you to use knowledge graph to areas such as QA system or natural language-based search enhancement. We provide differentiated knowledge graph-based technology using our top notch multi-hop QA technology (QA system that answers questions by analyzing or making inferences upon review of multiple documents) and Korean reading comprehension (MRC) technology.
For detailed descriptions and differentiating points of our QA technology, please refer to “smart QA model that even understands complex tables”. Personalized recommendation is also one area where knowledge graph can become of use. The limitations of traditional recommendation system can be alleviated by showing information related to the data recommended by personalized recommendation algorithm using knowledge graph. Please refer to “4. Business Cases” for real examples of using knowledge graph for advanced information retrieval and personalized recommendation.
Knowledge Management, the process of leveraging knowledge database to handle various business issues and make sound business decisions, is becoming a global business trend. As a result, there is a growing demand for technology that enables companies to integrate, manage and search various knowledge data they own.
Knowledge graph shows its qualities the best when it comes to enterprise search. It shows related information along with the data requested from scads of corporate documents and in-house technologies thereby promising users with rich set of information.
In line with this global trend, we applied knowledge graph-based search enhancement and knowledge recommendation solution to ARISAM, our internal knowledge portal system. We used our Insight Engine to integrate and analyze scads of documents pertaining to business opportunities, proposals, and project deliverables that are scattered across 28 different internal web sites, as well as their metadata and attachments of various formats and we built all this knowledge into a single knowledge graph structure. In addition, we designed the system so that when questions are entered in a search box, the system retrieves related information - keywords, businesses, employees, and recommended knowledge - from knowledge graph and show them on the result screen along with the information requested.
Because knowledge graph technology is still a work in progress, a lot needs to be done before we can actually make it available for business adoption. We need to make specific plans in advance as to what data to use to build which service and design which information to accumulate in knowledge graph. We would also need knowledge engineers in implementation and operation phase to provide continuous quality management of knowledge.
We expect these remaining works will help us build more insightful search engine that will allow us to explore various connecting relationships in data with flexibility.
▶ The content is proected by law and the copyright belongs to the author.
▶ The content is prohibited to copy or quote without the author's permission.
ML Research Team at Samsung SDS R&D Center
As a machine learning-based model and solution researcher & engineer with a major in a brain engineering, Sunghoon JOO is involved in knowledge graph construction and advanced search using AI natural language processing technology.
If you have any inquiries, comments, or ideas for improvement concerning technologies introduced in Technology Toolkit 2021, please contact us at firstname.lastname@example.org.
Brightics DL is a platform for unstructured data analysis, which provides technologies for accelerating AI developments. Through Brightics DL, enterprises can quickly and easily apply deep learning analytics services to their businesses through automatic labeling and distributed machine learning (DML).