Technology Toolkit 2021 is a technical white paper describing core technologies that
are being researched and developed by Samsung SDS R&D Center. We would like to introduce in this paper a total of
seven technologies concerning AI, Blockchain, Cloud, and Security with details on their technical definition, key
features, differentiating points, and use cases to give our readers some insights into our work.
Knowledge graph is a representation of knowledge structure showing related information in edges and nodes. When
information is stored in knowledge graph format, information of high relevance can easily be identified and thus
provide users with richer information.
Information retrieval is one area where knowledge graph can be put to its best use. In the past, data were stored
using inverted index method (linking the texts and location of keywords within a set of documents) for information
retrieval. This method can show documents containing query without a problem, but it is limited in that it
doesn’t show related information.
Building knowledge graph is the key to advancing information retrieval and delivering desired answers with related
information in clear format. Knowledge graph technology has high utility value as proven by its inclusion in Gartner's
Top Ten Data and Analysis Technology Trends in 2019, and we expect to see it applied to broader area
moving forward. In this paper, we will give you an overview of knowledge graph – which is also being actively
adopted by leading global IT companies like Google and Amazon for storing knowledge for advanced search– and
some of the solutions that incorporate knowledge graph.
① Knowledge Graph
Knowledge graph refers to knowledge stored in graph format comprising of nodes and edges – node for data on individual entity and edge for an association between each entity. The main purpose of presenting data in knowledge graph format is that graph is the most useful data structure for accumulating and delivering knowledge with their association.
② Knowledge Design
Because knowledge holds different concept and purpose depending on the area where it was accumulated, nodes and edges of knowledge graph should contain information that differs by area. Knowledge design is the process of specifically deciding which types of nodes and edges to use in accordance with the final information that will be provided to users using knowledge graph.
③ Knowledge Engineering
Knowledge graph is constructed by analyzing data collected from multiple data sources. It takes text data such as documents and uses natural language analysis technology to analyze, convert and accumulate them in nodes and edges. In order to overcome the limitations of natural language processing technology and to reflect newly emerging data, knowledge experts must check knowledge graph periodically and modify data not correctly accumulated. This process is called knowledge engineering.
Knowledge graph offers many advantages over traditional table-based databases. First, it allows you to infer
relationship (reasoning) that is not explicitly defined. With existing table-based database, if the relationship
between entities is not clearly defined in the table, it would be impossible to infer their relationship, but with
knowledge graph, it is possible to explore and infer new relationship based on previously defined relationship of
knowledge graph. Second, knowledge graph yields better performance for query that can only be answered using multiple
references. Let’s say you enter the following query "show me the document cited by the document that was cited
by the proposal" into database. With existing table-based database, both the document table and the tables
representing the citation relationship must be referenced, which increases the number of data accesses. On the other
hand, the knowledge graph requires only few approaches to find an answer to your query because the documents are
linked to citation relationships. Finally, when there is a change in data such as adding of a new data or deleting or
changing of existing data, the knowledge graph can easily incorporate these changes. With existing database, every
time a new type of data is added, a table must be extended, and its relevance to existing data has to be considered
thoroughly. This will not only put a financial burden on calculation, but may compromise data integrity as well. It is
very easy to manage knowledge graph because all you have to do is just add or delete nodes from the existing knowledge
graph, and connect or delete edges based on the relationship between existing nodes.
We provide functions needed to organize, manage and use unstructured text data as a knowledge graph in a set of
technologies called SDS Insight Engine. Our SDS Insight Engine builds knowledge graph from knowledge information
database owned by a company using AI- based unstructured text analysis technology. In addition, our solution offers
API that can advance information retrieval and recommendation service using knowledge graph.
Knowledge information data held by a company are stored in various repositories ranging from DB, web, to cloud. The
information is stored as images as well as text and attachments of various formats. Our Insight Engine basically
extracts knowledge from unstructured text data but it can also extract and analyze texts that are in image format or
are included in office documents of various format (pdf, doc, ppt, pdf, html).
Our Insight Engine builds knowledge graph by analyzing unstructured text data obtained from data source, therefore
basic language analysis tools such as morpheme analysis, language identification, entity name recognition, tokenizer,
and relation extraction module are provided to support the analysis. We provide API that builds a personalized
recommendation model based on the knowledge stored in knowledge graph. To build and utilize knowledge graph, you need
graph database that’s equipped with a function for storing and finding knowledge information. Our Insight Engine
is designed to build knowledge graph using JanusGraph or neo4j, an open source graph database. We provide API with
varying functions that can add or delete edges and nodes and search nodes as well as sub-nodes in knowledge graph when
query is entered.
The knowledge accumulated in knowledge graph can be used in conjunction with existing AI model designed for search,
recommendation, classification, and prediction. In addition, our Insight Engine can be applied to application service
that provides complex question-answering from natural language knowledge stored in knowledge graph using natural
language processing (NLP) and understanding technology (NLU). SDS Insight Engine offers the following features.
Integration of multiple data sources
∙ Integrates and leverages image and audio data as well as text (E-Mail, HTML) data
Recognizes intention and situation
∙ Recognizes intention and situation through AI-based natural language processing
∙ Provides meaningful information by extracting relationships from knowledge graph
∙ Expands knowledge with configuration of relationships between data sources
∙ Efficiently updates and operates information collected/changed in real time
∙ Provides complex question-answering using natural language processing and understanding (NLP, NLU) technology
∙ Connects to existing search/recommendation/classification/predictive AI model
Customers who are thinking of adopting knowledge graph to advance their information retrieval system are burdened by
the thought of having to build and run a new knowledge graph separately from their existing search system. But this is
not a problem with our SDS Insight Engine. Our Insight Engine comes with automation function that integrates knowledge
information databases and stores information in knowledge graph with ease. For example, our Insight Engine refers to
database schema and extracts node information like the name of a person/place/company and document category as well as
their relationships. Moreover, it extracts text data from the attachment file and uses it in natural language
Nowadays, a lot of knowledge is stored in image format (jpg, png, etc.) like document scan, or in structure format containing both text and tables (such as html), making it difficult to extract information and create knowledge graph using existing technology. The advantage of SDS Insight Engine lies in that it can handle information stored in various formats – it can extract and analyze unstructured data using the right text extract technology and build knowledge graph.
Our Insight Engine brings unstructured text data from knowledge information dataset and associated attachments, and
uses AI-based natural language processing technology to extract meaningful entity names and their relationship. Named
entity recognition is a technology that extracts words fitting pre-defined classification from unstructured text such
as the name of a person, organization, or place, and these extracted words become the nodes of knowledge graph. Named
entity recognition technology incorporated into our SDS Insight Engine is unique in that it complementarily uses deep
learning technology and dictionary-based method which means our deep-learning based model understands context and
recognizes unlearned name of a new company or person with flexibility and customizes dictionary to handle documents of
unfamiliar new domains such as humanities, business medical science or finance. Furthermore, when the relationship
between entities need to be defined in knowledge graph, our AI-based relationship extraction model refers to the
context of document and selects relationship that best describes them.
We provide application API that allows you to use knowledge graph to areas such as QA system or natural
language-based search enhancement. We provide differentiated knowledge graph-based technology using our top notch
multi-hop QA technology (QA system that answers questions by analyzing or making inferences upon review of multiple
documents) and Korean reading comprehension (MRC) technology.
For detailed descriptions and differentiating points of our QA technology, please refer to “smart QA model that even understands complex tables”. Personalized recommendation is also one area where knowledge graph can become of use. The limitations of traditional recommendation system can be alleviated by showing information related to the data recommended by personalized recommendation algorithm using knowledge graph. Please refer to “4. Business Cases” for real examples of using knowledge graph for advanced information retrieval and personalized recommendation.
Search konwledge graph, Search subgraphs using search interface Question answering system
Question : Does insurance company s insure manual therapy?
Knowledge Management, the process of leveraging knowledge database to handle various business issues and make sound
business decisions, is becoming a global business trend. As a result, there is a growing demand for technology that
enables companies to integrate, manage and search various knowledge data they own.
Knowledge graph shows its qualities the best when it comes to enterprise search. It shows related information along with the data requested from scads of corporate documents and in-house technologies thereby promising users with rich set of information.
In line with this global trend, we applied knowledge graph-based search enhancement and knowledge recommendation solution to ARISAM, our internal knowledge portal system. We used our Insight Engine to integrate and analyze scads of documents pertaining to business opportunities, proposals, and project deliverables that are scattered across 28 different internal web sites, as well as their metadata and attachments of various formats and we built all this knowledge into a single knowledge graph structure. In addition, we designed the system so that when questions are entered in a search box, the system retrieves related information - keywords, businesses, employees, and recommended knowledge - from knowledge graph and show them on the result screen along with the information requested.
Because knowledge graph technology is still a work in progress, a lot needs to be done before we can actually make it
available for business adoption. We need to make specific plans in advance as to what data to use to build which
service and design which information to accumulate in knowledge graph. We would also need knowledge engineers in
implementation and operation phase to provide continuous quality management of knowledge.
We expect these remaining works will help us build more insightful search engine that will allow us to explore various connecting relationships in data with flexibility.
▶ The content is proected by law and the copyright belongs to the author.
▶ The content is prohibited to copy or quote without the author's permission.
ML Research Team at Samsung SDS R&D Center
As a machine learning-based model and solution researcher & engineer with a major in a brain engineering, Sunghoon JOO is involved in knowledge graph construction and advanced search using AI natural language processing technology.
If you have any inquiries, comments, or ideas for improvement concerning technologies introduced in Technology Toolkit 2021, please contact us at firstname.lastname@example.org.
Brightics DL is a platform for unstructured data analysis, which provides technologies for accelerating AI developments. Through Brightics DL, enterprises can quickly and easily apply deep learning analytics services to their businesses through automatic labeling and distributed machine learning (DML).