Social Data Analysis
It is the truely big data age. Our daily work starts with reading news, checking emails, or posting messages on the social media using a smartphone. We don't just end up consuming data that others generated but produce data actively. This is not limited to individuals.
Enterprises already generate and utilize data in the various areas such as production, logistics, service and R&D, and the size of data will become bigger and bigger within the frame called the 4th industrial revolution.
The market research firm IDC forecast that data will increase from 0.8 zeta bytes as of 2009 to 35.2 zeta bytes in 2020, which is approx. 44-fold increase. In the case of Facebook, 1,440 million users send average 31.25 million messages per minute and watch 2.77 million video clips per minute. While it was difficult to analyze data meaningfully due to the shortage of data in the past, currently it is difficult to analyze data due to too much data.
Today, messages posted on the social media for fun or communication in the past have become the most essential elements under the title called the social data analysis.
When negative words such as 'gloomy' and 'nervous' increased in Facebook in 2011, the unemployment rate in U.S.A. increased. More than 88% of the companies for which a lot of negative comments were posted on Twitter experienced stock price drop. Moreover, organizations and companies utilize social data in various areas such as new product development, natural disaster contingency plan and marketing strategy establishment.
As business needs for social data analysis increase, various consulting and information analysis firms provide related services. IBM supports revenue increase, cost reduction and market development of enterprises through social data analysis under the title called Social Intelligence.
For example, in China, an actress, wearing a skirt from a fashion company, appears in a TV program that hundreds of million of people watch. At least one million female bloggers, mostly office workers key customers of the company, post messages on this actress within 10 minutes.
IBM's Social Intelligence service analyzes the emotional trend based on popular topics, related sentiments and local characteristics within 10 minutes and sends an email notifying a new sales opportunity to the product production team. This email contains the analysis result on the skirt such as fabric, length, color and optimized asymmetric skirt shape that consumers prefer. Then, the product production team produces a new limited edition product that is different from the original product.
The new trend information is delivered to the design team and the distribution team so that they can establish product development and sales strategies and respond to the trend promptly by receiving pre-orders through the online store. In addition, considering the vast Chinese middle class consumer market, the insight to the regional style preferences is derived. Then it becomes possible to sell the new product at a price 25% higher than that of the original product.
As you can see from the case, analyzing social data promptly and applying the data to the company business can provide value to the company in a way different from the past.
Social data analysis is performed based on two different data types.
They are the structured data that can be stored in the database in the specified form and the unstructured data, such as documents, images and videos, of which the data field is not defined. Structured data include relationships with people or access frequencies, which can be quantified in the social media. Unstructured data include those that are shared in the form of text or video. 85% of the big data are unstructured, and opinion mining mainly analyzes the unstructured data.
Opinion mining is performed to analyze people’s attitudes and emotions towards specific products, services, candidates or recent issues, help rationalize the preferences.
Opinion mining is often called sentiment analysis is a natural language processing technology that analyzes subjective data such as people's opinions or tendencies expressed in a text.
One of the most successful was 2012 U.S. presidential election. President Obama's camp collected and analyzed social data and utilized election strategies customized for individual voters to maximize donations and obtain the swing votes. By analyzing the social data, they figured out that the voters with the highest potential to donate in the campaign donation event were women in their forties, deduced that the celebrity who could appeal to this group was George Clooney, and successfully raised donations using this information.
It is very hard to extract people's opinions. Even with mind reading, it is difficult to correctly read minds of many people. However, messages that people posted on social media unconsciously or frankly are gathered to build the social big data. If it is possible to read emotions of the people from the data almost in real time, it would be very attractive. Of course, it is totally different from the situation that IBM's Watson overbeat actual human contestants at Jeopardy, The U.S's number 1 quiz show.
It should be different from simply storing the data and utilizing it. It is necessary to collect opinions of people, to decompose natural language sentences into words, and to analyze sentiment score of the words to positive, negative, or neutral. Sentiment score of words are gathered to form the sentiment score of a sentence, which becomes the criterion to determine whether the opinion of the person who wrote it is positive or negative.
General steps to conduct opinion mind
First step is to collects the text documents to be analyzed. Recently, the technique called web crawling makes After saving a massive amount of text in a database, it performs the subjectivity detection in the second step, which is to discard parts not related to sentiment. Non-subjective parts are removed from the sentiment-related texts to be utilized in the opinion mining process. In the third step, it decomposes the texts to derive words, of which polarities are analyzed.
Each word contained in the text is analyzed to determine whether it is positive or negative to derive its polarity. If polarity points are already derived for words, only the word-extracting process is performed in this step. The last step is to detect the polarity of each text based on polarities of words. In general, the total polarity of a text is analyzed from the weighted sum of the frequency and sentiment score of the words in the text. For sentiment score , a value between -1 (negative) and 1 (positive) is assigned to a word depending on its sentiment.
Areas in utilizing opinion mining
First, trend identification is to analyze how people perceive current issues. If people are interested in political issues then they actively express their opinions online, which make a useful analysis base for opinion mining. The same can be apply equally applued - to areas such as entertainment and sports.
In the case of product or service evaluation, it has more direct use since it is directly connected to the sales of a company. Particularly, a lot of product or service evaluations or complaints are posted in online communities and product purchase reviews are shared in online shopping site.
Since the scores customers assign may not match their reviews in many cases, opinion mining can provide the insight that cannot be found from the evaluation of review points. Lastly, future prediction is very difficult but definitely necessary. After all, since opinion mining can be used as a tool assisting decision making, it is necessary to provide information on matters that may potentially occur in the future. Opinion mining is already used to predict the stock price there are various attempts to predict the national economic crisis as well.
Opinion Mining Based on Artificial Intelligence(AI)
The most difficult thing in the opinion mining application process is to determine polarities of so many words.
Since the meaning of a word varies depending on the context, context should be considered in any analysis. However, it is difficult to reflect such things only with human judgment or simple analysis.
In addition, to analyze a vast amount of texts, it is essential to utilize AI. Opinion mining can be utilized in a great deal of areas from the establishment of election strategies to profit generation. Definitely the use of artificial intelligence, which can perform these activities faster, more correctly and at a lower cost, is at the forefront of related studies.
For opinion mining, AI is mainly utilized to analyze polarities of words. The word polarity can be determined by using either the method classifying the sentiment into 'good' and 'bad' or the method assigning scores to the words.
The former uses the machine learning algorithm such as Naïve Bayes or Support Vector Machine. Some of the ways to derive the emotional point utilizing machine learning is Word2vec and graph-based semi-supervised learning.
Word2vec is a machine learning method that analyzes word relationship from texts by considering their contexts.
If two words frequently appear together in texts based on the frequency data of words, the vector distance between those 2 words becomes close. If the data amount is sufficient, it is possible to identify the meaning of a word and to derive the relationship between words accurately. As shown in the figure, the relationship between similar words are considered to map the words in close locations.
Once relationships between words are analyzed by AI using Word2vec, the polarity analysis, the most important step in opinion mining, should be performed. In this step, graph-based semi-supervised learning mentioned above is used.
Semi-supervised learning exhibits excellent performance when the number of words to which polarities are assigned is significantly small and you want to find polarities of words to which polarities are not assigned. When analyzing texts, sometimes you may want to assign either 1 or -1 to a word since the word is absolutely positive or negative. However, since such words are very infrequent, they are used as seed words and the relationships between words derived from Word2vec are used to derive sentiment score of the remaining words.
Graph-based semi-supervised learning indicates the relationships between words into a graph. And the machine learning based on AI learns the sentiment score of seed words and the inter-word relationships of Word2vec to derive sentiment score of all the words.
Using the two AI methods as above, it is possible to derive opinions intelligently from massively accumulated texts. I think that further breakthrough development of artificial intelligence will provide the opportunity to further increase the accuracy and performance of opinion mining.
▶ The contents are protected by copyrights laws and the copyrights are owned by the creator.
▶ Re-use or reproduction as well as commercial use of the contents without prior consent is strictly prohibited.
Professor Yoon Byeong-un received his bachelor's, master's, and doctoral degrees in Industrial Engineering from Seoul National University. After completing postdoc at Centre for Technology Management (CTM) of Cambridge University in England, he is now a professor of the Department of Industrial Systems Engineering at Dongguk University. His areas of study include technology forecasting, technology roadmap, patent analysis, artificial intelligence, big data analysis, and technology-human society convergence. In recent years, Prof. Yoon has been making efforts to establish and spread the concept of technology intelligence.