,

AI frontdesk – improve office security and working conditions

Imagine that someone in your office serves as doorkeeper, takes care of visitors and even cares about your working conditions, 24-7? One of our missions at Ailabs.tw is to explore AI solutions to address society’s problems and improve the quality of life of people and, we have developed one AI-powered front-desk to do all of the tasks mentioned above.

Based on 2016 annual report from Taiwan MOL (Ministry of Labor), the average work hours per year of Taiwanese employee is 2106 hours. Compared with OECD stats, this number ranked No.3 in the world which is just below Mexico and Costa Rica.

Recently on 4th, December, 2017,  the first review of the Labor Standards Act revision was passed. The new version of the law will allow flexible work-time arrangements and expand monthly maximum work hours up to 300. Other major changes of the amendment includes conditionally allowing employees to work 12 days in a row and reduction of a minimum 11 hour break between shifts down to 8 hours. The ruling party plans to finish second and third-reading procedure of this revision early next year (2018), and it will put 9-million Taiwanese labors in worse working environment.To get rid off the bad reputation of “Taiwan – The Island of Overwork “, a system which will notify both employee and employer that one has been extremely over-working, and the attendance report can not easily be manipulated is needed.

In May 2017, an employee Luo Yufen from Pxmart, one of Taiwan’s major supermarket chain, died from a long time of overwork after 7 days of being in the state of coma. However, the OSHA(Occupational Safety and Health Administration) initially find no evidence of overwork after reviewing the clocking report provided by Pxmart which looks ‘normal’. It wasn’t until August, when Luo’s case are requested for further investigation, that the Luo’s real working hours before her death proves her overwork condition.

Read more

PTT Hired First AI Reporter Named Copycat (記者快抄)

Just early this July, Ailab.tw released an AI reporter named Copycat(記者快抄) that produces news covering contents from Taiwan’s largest online forum PTT. It performs its job faster and produces more contents than its human colleagues in real time.

 

 

Now Copycat can write about 500 news articles automatically with popular topics every day.

The Requirements of Media Industry Nowadays

How to attract reader’s attention to produced content, and how to make content rank higher on social networks or search engine are getting more and more important for media industry. To meet this goal, reporters need to produce as many articles as they can, update fast enough and search for interesting materials all over the world. Copycat (記者快抄), an AI reporter, can do this task as well by generating news based on the most discussed topic from Taiwan’s largest online forums PTT.

In the beginning this was a side project. However, we found people are interested in this website, so we made some effort to improve it.

 

PTT, the biggest and non-commercial forum in Taiwan.

 

Generate News Automatically

PTT is the largest terminal-based bulletin board system (BBS) based in Taiwan, it has more than 1.5 million registered users with over 150,000 users online in peak time. This BBS is a non-commercial and open-source online platform which has over 20,000 boards covering a multitude of topics and generates 500,000 comments every day.

Our system now fetches important articles and posts from PTT every 30 minutes, parses them and posts the results on the dashboard. Likes and Boos are also collected to display on each posts, indicating the general public’s reactions.

Three Steps to Generate News Articles

Summary

First, summarization. Based on the popular posts on PTT forum, we describe the main idea in a few sentences. Article contents are broken down into sentences and a score is given to each sentence to represent how tight it connects with other sentences in the article. In addition, other deep learning techniques such as word embedding is also used to support the algorithm.

 

AI generated news from PTT

 

Fill-In

With a list of sentences candidates, we algorithmically pick and compile them into an article. We collect some widely used news templates so Copycat can mix the key sentences with these templates and turns out a common daily news.

Generate

The last part is to make the news article more readable. PTT users often write posts with their own styles and formats such as unexpected new lines and spaces. This make it hard for machine to read and understand the content. To deal with this problem we generate a model from newspaper text as a grammar corrector to teach Copycat how to write like a professional reporter.

Feature Image Selection

Only text is not enough. A news article should have images. The posts on PTT forum often includes some image links which can be a great resource. However, many of them do not have an image associated with the posts.

To search for an image like how a human editor does, we trained a multi-layer document retrieval RNN model as an image search engine. This engine grasps an image by comparing the text-similarity between the image’s description and the news content.

Now, our AI reporter Copycat can not only copy the images from the original post, but also can find a related image when needed.

 

The figure is auto-selected by Copycat based on text content

More to Come

The original categories on PTT and the topic extracted by Copycat are useful tags for people to find related news articles. The discussion and re-posts on the forum are potential data to show further and different standpoints of certain topics.

After importing our face and speech recognition module, Copycat can search for celebrities’ comment related to specific topic all over video clips on the Internet. This news knowledge graph can also benefit human-reporters.

We believe that artificial intelligence will be a support rather than a threat to help reporter produce news with higher quality. By automating the process of picking topics and generate articles online, reporters can move the needle on the content generation process and focus on creating insights or stories for readers.

Copycat is constantly improving and on the way to become a better reporter.

 

Featured image by filipe ferreira / CC BY

Recognize The Speech of Taiwan

We are exploring the new ways people interacts with technologies in the age of AI and speech is one of the most common and natural means of communication. In this post we are introducing our core recipes for automatic speech recognition system in Taiwan.

Cornerstone of Natural Human-Computer Interaction

Mobiles, IoT, wearable devices and robots. Our daily life are more and more likely to be surrounded by smart devices in the future. With the target to interact with them naturally,  just as with human-beings, we need to develop related AI techniques such as machine learning, computer vision, natural language processing and speech processing.

Speech Recognition, so called ASR, is one of the cornerstone that link all these interactions together. With deep-learning-based model and graphical decoder, ASR nowadays is getting more reliable on both accuracy and speed.

 

Unique Language Habits in Taiwan

Different usage of words, new phrases and sentence structures are generated each day in our modern society and between cultures. This is especially true in Taiwan where the language habits of Taiwanese people is different from other Mandarin speakers.

Due to these reasons, the current ASR solutions in the Mandarin-speaking space have limitation when it comes to supporting general usages in Taiwanese people’s daily life. For example, the biggest Taiwan forum and Internet community, PTT, invents hundreds of words and phrases every month. The newly-created words might be used repeatedly or spread frequently by millions of users in online chatting and posting.

Therefore, the challenges of building a localized ASR system are not only about training a local neural network model, but also about how the system updates and adapts rapidly to the dynamically evolved language.

 

 

With a Taiwan-specific language model, our ASR can be much more friendly for speech related applications in Taiwan.

 

Multi-Language Speech Recognition

Although Mandarin is the official language in Taiwan, a Mandarin-only ASR system cannot satisfy our goals. Taiwan is an place with many different cultures. In addition to Mandarin, other languages such as English, Taiwanese, Hakka and Indigenous languages are also used pretty often in Taiwan. To deal with this problem Ailabs.tw gathered linguistics, phonetics and machine learning experts to set up a standard process when ASR facing cross language requirements.

 

 

These processes includes enriching language model with multiple languages and handling mixed-up words and sentences. Our early ASR experiments on Taiwanese works and we are now enhancing our system to production-level.

 

ASR Applications in Ailabs.tw

ASR system is already a powering the front-desk system in Ailabs.tw now. When an employee arrives at the office, they interacts with the ASR system for door access and need ID cards or badges no more.

An employee ask for door access to the ASR system

Another application is to generate automatic transcripts or captions. Videos of news, conferences, interviews can be convert to text files in real-time using ASR.

News video can now generate live captions with ASR

Our ASR API is ready to open, contact us if you want further cooperation.

 

Looking Forward

Speed, accuracy, multi-language and rapid updates are core aspects of a easy to use ASR system. We are continuously improving these cores and trying different deep learning algorithms to reach to a point where AI is doing a better job than human in this field. If you are interested in working on this problem, please contact us, we are actively hiring!

 

featured image by Peter Coombe / CC BY