Label360: An Implementation of a 360 Segmentation Labelling Tool

The image above shows an example of the segmentation mask overlaying on top of the 360 image we got from our drone. This image is labeled by one of our in-house labelers. 

 

Semantic segmentation is one of the key problems in computer vision. It is important for image analysis tasks, and it paves the way towards scene understanding. Semantic segmentation refers to the process of assigning each pixel of the image with a class label, such as sky, road, or person. There are numerous applications that nourish from inferring knowledge from imagery. Some applications include self-driving vehicles, human-computer interaction, and virtual reality. 

360 images and videos are popular nowadays for applications like game design, surveillance systems, and virtual tourism. Researchers use 360 images as input for object detection and semantic segmentation models. However,  researchers usually convert 360 images to normal field-of-view first before labelling them. For example, Stanford 2D-3D-Semantics Dataset has 360 images, but the segmentation datasets are sampled images from equirectangular projection with different field-of-views  [1]. Other 360 datasets only have labeled saliency but not segmentation, such as Salient360 and video saliency dataset [2][3]. Lastly, there are 360 datasets with many equirectangular images, but they are not yet labeled, such as Pano2Vid and Sports360 [4][5]. 

To our knowledge, there are no public annotation tools that are suitable for 360 images, and so we decided to build a semantic segmentation annotator from the ground up that is specifically for 360 images, hoping to increase the amount of research relating to semantic segmentation on equirectangular images. 

The first problem with labelling 360 images is that it is difficult to label and recognize objects at the top and bottom of the equirectangular images. This is because when spherical surface projects to a plane, the top and bottom of the sphere gets stretched to the width of the image. 

Converting to cubemap solves that problem but raises another:  Objects that span across two faces of the cube are harder to label. To deal with the two problems above, we use cubemap and provide drawing canvas with expanded field-of-view. We will describe these methods in detail later on. 

 

Our 360 segmentation tool

UI Components: 

  1. Toolbar: It has plotting, editing, zooming, and undoing functions.
  2. Drawing canvas: The user can annotate on the drawing canvas. The canvas displays a face of the cubemap with an expanded field-of-view. 
  3. Cubemap viewer: The user can select a face in the cubemap to annotate and view annotations in cubemap. 
  4. Image navigator: The user can navigate to different images.
  5. Equirectangular viewer: The user can see mapped annotations in equirectangular view in real-time.
  6. Class selector: The user can view annotations of different classes.

In the cubemap viewer, the border color of the faces in the cubemap canvas indicates the status of the annotations:  Faces with existing annotations are indicated using a green border, and those without using a red border. The current face shown in draw canvas is indicated in yellow. We will describe drawing canvas in detail later on. 

User journey:

Annotation Process:

The flow chart below shows the annotation process from equirectangular image to the input to the semantic segmentation model. 

Design:

  • The use of cubemap solves the problem of distortion at the top and bottom of 360 images 

The main difference between 360 images and normal field-of-view images is that the top and bottom of 360 images are distorted. This distortion results from points near the top and bottom of the images being stretched to fit the full width of the image. If we directly place the equirectangular image into a widely used annotation tool, it is difficult to label the top and bottom of the image. Moreover, it is harder and more time consuming to label the curves at those areas using polygons.  

To make it easier to recognize and label equirectangular images, we designed our annotator to display cubemaps instead. The image below shows the conversion between a cubemap (left) and an equirectangular image (right). 

By converting the equirectangular image to a cubemap, we allow annotators to see objects in a normal field-of-view. In addition, we allow users to annotate each side of the cubemap separately. Below shows our original image (right) and the cubemap we convert to (left). 

   

 

  • Annotation on an expanded field-of-view and real-time display of equirectangular annotations solve the border problem

As we developed the segmentation annotator, we found out that the borders between faces of the cube have gaps or do not appear connected. This may cause a problem because a road that crosses several sides of the cube may be discontinuous. Moreover, it is difficult to draw near the borders. The annotator has  to spend a lot of time adjusting the points to the borders. 

  

Images above show our method of dealing with the border problem. The drawing canvas has a 100 degree field-of-view on one face of the cube. The yellow square inside the drawing canvas has a 90 degree field-of-view. Annotators can label objects in the expanded field-of-view, but only the annotations inside the normal field-of-view will be saved. 

We are also able to use the cubemap viewer and the equirectangular viewer to see how the annotations turn out and whether annotations that cross different sides are connected properly. 

  

The mask on the left (above) is an example of discontinuous objects across different faces of the cubemap. There are white borders around each face. Objects don’t connect well, and they are likely to be labeled with different classes. The mask on the right is an example of having continuous objects across different faces of the cubemap. 

 

Summary:

Our 360 annotation platform separates us from other annotation tools with our features specifically designed for 360 images, such as being able to annotate a specific side of the cubemap, our distinct drawing canvas with an expanded field-of-view, and the real-time display of annotations in equirectangular viewer. These features solve the problems of using off-the-shelf annotation platforms to annotate 360 images, such as the distortion problem and the border problem mentioned earlier in the article. We hope that the implementation of our 360 segmentation labelling platform can produce more semantic segmentation datasets for 360 images, and thus nourish the growth of research relating to computer vision task of semantic segmentation on 360 images. 

 

Reference:

  1. Armeni, Iro, et al. “Joint 2d-3d-semantic data for indoor scene understanding.” arXiv preprint arXiv:1702.01105 (2017).
  2. Gutiérrez, Jesús, et al. “Introducing UN Salient360! Benchmark: A platform for evaluating visual attention models for 360° contents.” 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 2018.
  3. Zhang, Ziheng, et al. “Saliency detection in 360 videos.” Proceedings of the European Conference on Computer Vision (ECCV). 2018.
  4. Bares, W., et al. “Pano2Vid: Automatic Cinematography for Watching 360◦ Videos.”
  5. Hu, Hou-Ning, et al. “Deep 360 pilot: Learning a deep agent for piloting through 360 sports videos.” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
,

Expanding Computer Vision Multi-View Stereo Capabilities: Automatic Generation of 3-dimensional Models via 360 Camera Footage

As the world we live in is three-dimension, 3D model is the most iconic representation of our world. 3D modeling allows people to see what they would not see when viewing in 2D. It gives people the ability to physically see how much real estate an object takes from all perspectives. But compared to generated from scratch, we are also able to build 3D models from video automatically.

Taiwan AI Labs has built a virtual aerial website: droneye.tw. This website has great 360 video resources. It triggers an idea. Can we use 360 videos to generate high-quality 3D model within only few flights in order to minimize the cost to obtain the 3D model? Indeed, it is feasible no matter using stereo-views computer vision algorithm or deep learning approach. The concept of 3d model reconstruction can be illustrated as figure 1 that we are able to solve 3D object coordinate from the same point on the at least 2 images.

Fig 1. Concept of 3D Reconstruction (Modified from Tokkari etc., 2017)

 

360-degree cameras capture the whole scene around a photographer in a single shot. 360 cameras are becoming a new paradigm for photogrammetry. In fact, the camera can be pointed to any direction, and the large field of view reduces the number of photographs.[1] The camera we used at the former flight is Virb 360, which record the video in panoramic ways. As a result, it cannot be used to reconstruct the 3D model directly. We have to project its equirectangular images to any FOV perspective images (Fig 2.) or cubemap so we are able to take advantage of these images to build our 3D models. It can also support reconstructing the 3D model inside the building, we can choose the image angle we want or just choose cubemap format, it will automatically use the reprojection images to generate 3D model.

Fig 2. Different angle of views from drone

 

First of all, tanksandtemples website which present a benchmark for image-based 3D reconstruction have indicated results of 3D model generated by deep learning still do not surpass the stereo-views computer vision algorithm. We have implemented both algorithms and compare the results. The test input is 16 high resolution NTU campus photos from drone. The deep learning algorithm we use is R-MVSNet, it’s an end-to-end deep learning architecture for depth map inference from multi-view images.[8] The computer vision algorithm is Structure from Motion and semi-global matching. We can tell from the result that state-of-the-art deep learning algorithm of 3D model still has a way to go. Nonetheless, deep learning algorithm for 3D reconstruction undoubtedly has a great prospect in the future.

Fig. The Architecture of R-MVSNet (Yao etc, 2019)

 

Table. Point Cloud results of deep learning and conventional computer vision

Semi-Global Matching 

R-MVSNet

   

 

Base on the above statement, we decide to apply Structure from Motion & patch matching algorithm to reconstruct our 3d model. The main steps include sparse reconstruction and dense reconstruction. In the sparse reconstruction, we use SIFT to match features. Now we have the corresponding points on each image. Then we use bundle adjustment to adjust the camera extrinsic and intrinsic by least square method.[4] After these sub-steps we obtain the more accurate camera extrinsic and intrinsic for the dense reconstruction. In the dense reconstruction, first we take advantage of camera position to calculate the depth map of each image by semi-global matching.[2] Then we fuse the neighbor depth maps to generate 3d point cloud. However, data volume of point cloud is relatively huge, we simplify the point to Delaunay triangulations or so-called mesh, which turn the points into planes.[5] Final, we texture the mesh with the corresponding images.[7] 

Fig. 3D reconstruction pipeline (Bianco etc, 2018)

 

 

Table. Result of NTU Green House

Real Picture   3D Model       
 

 

Table. Result of Tainan Qingping Park

Real 360 Picture

3D Model

   

 

Although 360 video has the advantage of multi-views in single frame, it has significant drawbacks such as unstable camera intrinsic and when projecting to the perspective images, the resolution will decrease drastically. Thus, we adopt super resolution algorithm – ESRGAN in order to overcome low-quality of images. Indeed, this strategy not only increases the details of the 3d models especially when texturing the mesh but also densify the point cloud. In order to obtain better results, we can train our own model on the “Taiwan landscape” the prevent bias of the unsuitable pre-trained model and to meet the special needs of drone data in Taiwan. 

 

Bilinear x4 SR x4      

 

Nonetheless, ESRGAN doesn’t restore the original information, it learns high frequency information in the images by inference.[9] Due to the fact, it would hurt the quality of structure from motion. If we would like to take advantage of the better results of super-resolution and also maintain the quality of Structure from Motion, we could use the SR images as input at the step of dense matching. To sum up, by using state-of-the-art deep learning algorithm such as super-resolution (ESRGAN), we may be able to reduce some drawbacks of 360 video and generate the desired 3D models.

 

Fig. Chimei Museum

 

Fig. Lin Mo-Niang Park

 

Fig. Tzu Chi Senior High School

 

Reference

  1. Barazzetti, L., Previtali, M., & Roncoroni, F. (2018). Can we use low-cost 360 degree cameras to create accurate 3D models?. International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences, 42(2).
  2. Barnes, C., Shechtman, E., Finkelstein, A., & Goldman, D. B. (2009). PatchMatch: A randomized correspondence algorithm for structural image editing. In ACM Transactions on Graphics (ToG) (Vol. 28, No. 3, p. 24).
  3. Bianco, S., Ciocca, G., & Marelli, D. (2018). Evaluating the performance of structure from motion pipelines. Journal of Imaging, 4(8), 98.
  4. Fraundorfer, F., Scaramuzza, D., & Pollefeys, M. (2010). A constricted bundle adjustment parameterization for relative scale estimation in visual odometry. In 2010 IEEE International Conference on Robotics and Automation (pp. 1899-1904). IEEE.
  5. Jancosek, M., & Pajdla, T. (2014). Exploiting visibility information in surface reconstruction to preserve weakly supported surfaces. International scholarly research notices.
  6. Shen, S. (2013). Accurate multiple view 3d reconstruction using patch-based stereo for large-scale scenes. IEEE transactions on image processing, 22(5), 1901-1914.
  7. Waechter, M., Moehrle, N., & Goesele, M. (2014). Let there be color! Large-scale texturing of 3D reconstructions. In European Conference on Computer Vision (pp. 836-850).
  8. Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C.,& Change Loy, C. (2018). Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 0-0).
  9. Yao, Y., Luo, Z., Li, S., Fang, T., & Quan, L. (2018). Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 767-783).

Virtual Aerial Tour

Written By: Yi-Chun Kuo, Hao-Kai Wen

Chi Po-Lin’s Beyond Beauty: Taiwan from Above brought a new perspective on our familiar island. With this spirit, we architect a smart city service, targeting to explore the surroundings of our daily life from a unique angle. To develop a prototype, we fly drones over the campus of National Taiwan University, and Anping District, Tainan City to record the street view videos with 360 camera, and build an aerial tour application. 360 camera provides flexible viewing angles for processing the videos. In this way, we only need to fly over a road once, and we can present various flying experiences. Eco-House, NTU Library, or Drunken Moon Lake may be familiar to you, but have you ever seen them from the above? Please be invited to visit the campus of NTU in the drone’s eye view

https://smartcity.ailabs.tw/aerial-tour/ntu/.

Text-to-speech attraction introduction

We integrate AI Labs’s cool factors into our service. With text-to-speech, a virtual tour guide vividly introduces the attractions to users. Videos with adjusted color, tone, brightness and contrast present a more beautiful scene to users. Drawing-style videos show a distinctive city view to visitors. Below let us introduce the methodology behind the scene.

style transferred Anping scene

 

Shaky videos is an issue for our platform. The instability of drone footage are due to the vibration of propellers. We applied Kopf’s [1] method to estimate camera poses between frames. Once we have the camera positions, we can offset the shaking and the videos are stabilized.

Color inconsistency between videos is another issue. Because the videos were taken on different days, the light and weather conditions differ a lot. Abrupt color change in the scene makes users uncomfortable. Many trials have been made to overcome this problem. Style transfer approaches fail because of non-photorealistic results. The most convincing method is [2] to our audience. Use deep features to match semantically similar regions in images, and transfer the color of corresponding regions accordingly. That is to say, the color of tree is transferred to tree and the color of the car is transferred to the car. In this way, the videos taken on different days keep consistent color.

left: original image, right: color transferred image

 

In addition to color transfer, sky replacement algorithm, such as [3], helps reduce the color inconsistency between videos. If the sky are the same between videos, human will assume the videos are shot in the same day. We developed a sky replacement algorithm to replace the cloudy sky with a clear sky. First we use semantic segmentation models to detect the coarse skyline, and use matting algorithm to refine the details. Then, the sequence of 360 camera positions is built as 360 video stabilization. We rotate the sky image with the help of these camera positions, to simulate a new sunny sky. At the end, we composite the new sky with original videos to generate appealing street view videos.

                                            up: original image, down: sky replaced image

This is the stage we are now toward building an aerial tour service. We keep exploring an interesting and innovative method to bring a unique perspective on our beautiful home.

 

Reference

[1] Kopf, Johannes. “360 video stabilization.” ACM Transactions on Graphics (TOG) 35.6 (2016): 195. [link]

[2] Liao, Jing, et al. “Visual attribute transfer through deep image analogy.” arXiv preprint arXiv:1705.01088 (2017). [link]

[3] Tsai, Yi-Hsuan, et al. “Sky is not the limit: semantic-aware sky replacement.” ACM Trans. Graph. 35.4 (2016): 149-1. [link]

, ,

ptt.ai, open source blockchain for AI Data Justice

[ 換日線報導中文連結 ]

The beginning of Data Justice” movement

By collaborating with online citizen and social science workers in Taiwan, Taiwan AILabs promotes the Data Justice” in the following principles:

  1. Prioritize Privacy and Integrity with goodwill for applications before data collection
    • In addition to Privacy Protection Acts, review the tech giant on potential abuse of monopoly position forcing users to give up their the privacy, or misuse user content and data for different purpose. In particular, organizations that became monopoly in the market should be reviewed regularly by local administration knowing if there is any abuse of data when users are unwillingly giving up their privacy.
  2. Users’ data and activities belong to users
    • The platform should remain neutral to avoid the misuse of user data and its creation.
  3. Public data collected should open for public researches
    • The government organization data holder is responsible for its openness while privacy and integrity are secured.For example, health insurance data for public health and smart city data for traffic researches.
  4. Regulate mandatory data openness
    • For the data critical to major public welfare controlled by monopoly private agency, we shall equip the administration the power for data openness.
    • For example, Taipower electric power usage data in Taiwan.

Monopoly now is worse than oil monopoly”

In 1882, the American oil giant John D. Rockefeller founded Standard Oil Trust and united with 40 oil-related companies to reach price control. In 1890, U.S. government sued Standard Oil Trust to prevent unfair monopoly. The antitrust laws have been formulated so as to ensure fair trade, fair competition, and prevent price manipulation. The governments of various countries followed the movement to establish anti-monopoly laws. In 1984 AT&T, a telecom giant, was split into several companies for antitrust laws. Microsoft was sued in 2001 for having Internet Explorer in its operating systems.

In 2003, Network Neutrality principle mandated ISPs (Internet Service Providers) to treat all data on Internet the same. FCC (Federal Communications Commission) successfully stopped Comcast, AT&T, Verizon and other giants from slowing down or throttling traffic based on application or domain level. Apple FaceTime, Google YouTube and Netflix are benefited from the principle. After 10 years, the oil and ISPs companies are no longer in the top 10 most valuable companies in the world. Instead, the Internet companies protected by Network Neutrality a decade ago have became the new giants. In the US market, the most valuable companies in the world dominate the market shares in many places. In February 2018, Apple reached 50% of the smart phone market share, Google dominated more than 60% of search traffic, and Facebook controlled nearly 70% of social traffic. Facebook and Google two companies have controlled 73% of the online Ads market. Amazon is on the path grabbing 50% of online shopping revenue. At China side, the situation is even worse. AliPay is owned by Alibaba and WePay is owned by WeChat. Two companies together contributed to 90% of China’s payment market.

When data became weapons, innovations and users become meatloaf

After series of AI breakthrough in the 2010’s, big data as import as crude oil. In internet era, users grant Internet companies permission on collecting their personal data for connecting with creditable users and content out of convenience. For example, the magazine publishes articles on Facebook because Facebook allows users to subscribe their article. At the same time, the publisher can manage their subscribers’ relationship with messenger system. The recommendation system helped to rank users and their content published. All the free services are sponsored from advertisements, which pay the cost of internet space and traffic. This model has encouraged more users to join the platform. Users and content accumulated on the platform also attracted more users to participate in. After 4G mobile era, mobile users are always online. It pushed the data aggregation to a whole new level. After merging and acquisition between Internet companies, a few companies stands out dominating user’s daily data today. New initiatives can no longer reach users easily by launching a new website or an app. On the other hand, Internet giants can easily issue a copycat of innovation, and leverage their traffic, funding and data resources to gain the territories. Startups had little choice but being acquired or burnout by unfair competition. Fewer and fewer stories about innovation from garages. More and more stories about tech giants’ copy startup ideas before they being shaped. There is a well quoted statement in China for example: Being acquired or die, new start-up now will never bypass the giants today.”. The phenomenon of monopoly also limited users’ choices. If a user does not consent to the data collection policy there is no alternative platform usually.

Net Neutrality repealed, giants eat the world

Nasim Aghdam’s anger at YouTube casts a nightmarish shadow over how it deals with creators and advertisers. She shot at the YouTube headquarters and caused 3 injuries. She killed herself in the end. At the beginning of Internet era, innovative content creators can be reasonably rewarded for their own creations. However, after the platform became monopoly, content providers find that their creation of content are ranked through opaque algorithms which ranked their content farther and farther away from their loyal subscribers. Before their subscribers can reach their content, poor advertising and fake news stand on the way. If the publisher wants to retain the original popularity, the content creator need also pay for advertisement. Suddenly reputable content providers are being charged for reaching their own loyal subscribers. Even worse, their subscribers’ information and user behavior are being consumed platform’s machine learning algorithms for serving targeting Ads. At the same time, the platform doesn’t really effectively screen the Advertisers, low quality fake news and fake ads are being served. It is known for scams and elections. After Facebook scandal, users discovered their own private data are being used through analysis tools to attack their mind. However at the #deletefacebook movement, users find no alternative platform due to the monopoly of technical giants. Friends and users are at the platform.

In December 2017, FCC voted to repeal the Net Neutrality principle for the reason that US had failed to achieved Net Neutrality. ISPs companies are not the ones to blame. After a decade, Internet companies who benefited from Net Neutrality are now the monopoly giants and Net Neutrality wasn’t able to be applied for their private ranking and censorship algorithm. Facebook for example offers mobile access to selected sites on its platform at different charge of data service which was widely panned for violating net neutrality principles. It is still active in 63 other countries around the world. The situation is getting worse in the era of AI. Tech giants have leveraged their data power and stepped into the automotive, medical, home, manufacturing, retail, and financial sectors. Through acquisitions by the giants rapidly accumulating new types of vertical data and forcing the traditional industries opening up their data ownership. The traditional industries are facing a even larger and smarter technology monopoly than the ISP or oil companies in a decades.

Taiwan experience may mitigate global data monopoly

Starting from the root cause, at the vertical point of view, The user who contributed the data” was motivated by the trust” of the their friends or the reputable content provider. In order to have the convenience and better service, the user consents to collecting their private data and grant the platform for further analysis. The user who contributed the content” consents to publishing their creation on the platform because the users are already on the platform. The platform now owns the power of the data and content that should originally belong to the users and publisher. For privacy, safety and convenience purpose, the platform prevents other platforms or users from consuming the data. Repeatedly, it results in an exclusive platform for users and content providers.

From horizontal point of view, in order to reach user, for data and traffic, the startup company signed unfair consent with the platform. In the end, the good innovations is usually swallowed by the platform because the platform also owns data and traffic for the innovations. Therefore, the platform will become larger and larger by either merging or copying the good innovation.

In order to break this vicious cycle and create fair competition environment for AI researches. Taiwan AILabs shared at 2018 3/27 Taipei Global Smart City Expo and a panel at 3/28 Taiwan German 2018 Global Solution Workshop with visiting experts and scholars on data policies making. Taiwan AILabs exchanged Taiwan’s unique experience on Data Justice. In the discussion we concluded opportunities that can potentially break the cycle.

The opportunities comes from the the following observations in Taiwan. Currently, the mainstream of the world’s online social network platforms is provided by private companies optimized for advertising revenue. Taiwan has a mature network of users, open source workers and open data campaigns. Internet users” in Taiwan are closer to online citizens”. Taiwan Internet platform, PTT(ptt.cc) for example, is not running for profit. The users elect the managers directly. Over the years, this culture has not cooled down. PTT is still dominating. Due to its equity of voice, it is difficult to be manipulated by Ads contribution. Fake news and fraud can be easily detected by its online evidence. PTT is a more of a major platform for public opinions compared with Facebook in Taiwan. With the collaboration between PTT and Taiwan AILabs, it now has its AI news writer to report news out of its users’ activities. The AI based new writer can minimize editor’s bias.

g0v.tw is another group of non profit organization in Taiwan focusing on citizen science and technology. It promotes the transparency and openness of government organizations through hackathon. It collaborated with the government, academia, non-governmental organizations, and international organizations for data openness on public data with open source collaboration in various fields.

Introducing ptt.ai project: using blockchain for Data Justice” in AI era

PTT is Taiwan’s most impactful online platform running for 23 years. It has its own digital currency – P coin, instant messaging, e-mail, users, elections and administrators elected by users. However, the services hosting the online platform are still relatively centralized. 

In the past, users chose a trusted platform for trusted information. For convenience and Internet space, users and content providers consent to unfair data collection. To avoid centralized data storage, blockchain technology gives new directions. Blockchain is capable to certify the users and content by its chain of trust. The credit system is not built on top of single owner and at the same time the content storage system is also built on top of the chain. It avoids the control of a single organization which becomes the super power.

Ptt.ai is a research starting to learn from PTT’s data economy, combining with the latest blockchain encryption technology and implementing in the decentralization approach.

The mainstream social network platforms in China and the United States created new super power of data through the creation of users and users’ own friends. It will continue to collect more information by horizontally merging industries with unequal data power. The launch of ptt.ai is a thinking of data ownership in different direction. We hope to study how to upgrade the system PTT in the era of AI, and use this platform as the basis for enabling more industries to cooperate with data platforms. It gives the data control back to users and mitigate the data monopoly happening. Ptt.ai will also collaborate with leading players on the automotive, medical, smart home, manufacturing, retail, and financial sectors who are interested in creating open community platform. 

Currently, the experimentation of technology started on an independent platform. It does not involve the operation or migration of the current PTT yet. Please follow the latest news of ptt.ai on http://ptt.ai .

 

[2018/10/24 Updates]:

The open source project is on github now: https://github.com/ailabstw/go-pttai

[2019/4/2 Updates]:

More open source projects are on github now:

 

AILabs builds intelligence behind Taipei Traffic Density Network

Taipei is a city with millions of cars and motorcycles. Heavy traffic congestion occurs on the streets daily. Incidents have serious impact to traffic. Taipei have built tens of thousands citywide cameras recording real-time traffic videos. Police officers will consume the result to remove the traffic congestion. However, existing method requires human effort, only 16% of the incidents are manually detected.

Unlike cities in Mainland China, 2 major constraints exist on building the smart city systems in Taiwan. Humanity with Privacy and Integrity is the top priority. Taiwan is highly sensitive on human rights. The policy making need  to ensure the goodwill with integrity. The protection of privacy and prevention of the future abuse and misuse together are the top priority on building the system. Taipei used videos with lower resolution that can’t further recognize residents’ identity in the video. Second, the solution need to be green and environmentally friendly. Forcing the city government to retire the old cameras is challenging. At the same time, Taipei city used low-frame-rate cameras to reduce the energy consumption. With these 2 constraints, Taiwan AILabs collaborated with Taipei City Government to build the autonomous traffic detection and prediction system.

Existing Traffic Cameras (CCTV) of Taipei City

The videos taken by the existing traffic cameras are low-resolution and low-frame-rate. We design a method with robust network to detect and predict the traffic congestion in real-time. We presented Taipei Traffic Density Network (TTDN). The network is able to precisely detect vehicle density of defined region in real-time. TTDN is a fully convolutional networks. In TTDN, we acquire multi-scale features from the images of traffic videos and obtain the estimation of vehicle density after pixel-wise regressions.

 

The Outline of Taipei Traffic Density Network (TTDN)

We would like to remark that our approach is intertwined with our key ingredients of smart city, protection of privacy and conservation of energy. Since the traffic videos are low-resolution, it’s hard to perform any face recognition or plate recognition. This ensure the protection of privacy. Meanwhile, on top of reusing existing traffic cameras, the data with low-resolution and low-frame-rate also reduce storage space and the computational cost.

Using the mathematical assumption of vehicle density map, we can efficiently detect and predict traffic congestion. Besides using density map directly, some existing works show that the density map is also a strong feature map. The fusion of density map and some other features may be able to detect accident, road construction, etc. This is an interesting topic we are currently working on.


The 2018 Smart City Summit & Expo (SCSE)

Feautured Photo by  highwaysagency / CC BY 2.0

, ,

Humanity with Privacy and Integrity is Taiwan AI Mindset

The 2018 Smart City Summit & Expo (SCSE) along with three sub-expos have taken place at Taipei Nangang Exhibition Center on March 27th with 210 exhibitors from around the world this year, exhibiting a diversity of innovative applications and solutions in building a smart city. Taiwan is known for the friendly and healthy business environment, ranked as 11th by World Bank. With 40+ years in ICT manufacturing and top level embedded systems, companies form a vigorous ecosystem in Taiwan. With an openness toward innovation, 17 out of 22 Taiwan cities have made it to the top in Intelligent Community Forum (ICF).

Ethan Tu, Taiwan AILabs Founder, gave a talk of “AI in Smart Society for City Governance” and laid out AI position in Taiwan that smart cities is for “humanity with privacy and integrity” besides “safety and convenience”. He said “AI in Taiwan is for humanity. Privacy and integrity will also be protected.”. The maturity of crowd participation, transparency and open data mindset are the key assets to drive Taiwan on smart cities to deliver humanity with privacy and integrity. Taiwan AILabs took social participating and AI collaborated editing open-source news site of http://news.ptt.cc as an example. The city governments are now consuming the news to detect the social events happening in Taiwan in real-time for the AI news’ robustness and reliability in scale. AILabs collaborated with Tainan city on AI drone project to simulate “Beyond Beauty” director Chi Po-lin who dies in helicopter crash. AILabs also established “Taipei Traffic Density Network (TTDN)” supporting real-time traffic detection and prediction with citizen’s privacy secured, no people nor car can be identified without necessity for Taipei city.

The Global Solutions (GS) Taipei Workshop 2018 with “Shaping the Future of an Inclusive Digital Society” took place at the Ambassador Hotel on March 28, 2018 in Taipei. It is co-organized by Chung-Hua Institute for Economic Research (CIER) and the Kiel Institute for the World Economy. The “Using Big Data to Support Economic and Societal Development” panel section was hosted by Dennis Görlich Head, Global Challenges Center, Kiel Institute for the World Economy. Chien-Chih Liu, Founder of the Asia IoT Alliance (AIOTA), Thomas Losse-Müller, Senior Fellow at the Hertie School of Governance, Reuben Ng, Assistant Professor, and Lee Kuan Yew School of Public Policy, National University of Singapore all participated in the discussion. Big data has been identified as oil for AI and economic growth. He shared the vision in his panel, “We don’t have to sacrifice for safety or convenience. On the other hand, Facebook movement is a good example that the tech giants who overlook privacy and integrity will be dumped.”

Ethan explained 3 key principles from Taiwan societies on big data collection. The following principles exist and are contributed by the mature open internet societies and movements in Taiwan. AILabs will promote them as fundamental guidances for data collection on medical records, government records, open communities and so on.

1. Data produced by users belongs to users. The policy makers shall ensure no solo authority such as social media platform is too dominant to user and force users on giving up data ownership.

2. Data collected by public agent belongs to public. The policy makers shall ensure the data collected by public agency shall provide the roadmap on opening data for general public on researches. g0v.tw for example is a NPO for the open data movement.

3. “Net Neutrality” is not only ISP but also for social media and content hosting service. Ptt.cc for example, persists in equality of voice without Ads. Over the time the equality of voice has overcome the fake news by standing-out evidences.

“Humanity is the direction for AILabs. Privacy and Integrity are what we insist.” said Ethan.Smart City workshop with Amsterdam Innovation Exchange Lab from Netherlands

SITEC from Malaysia visiting AILabs.tw

Learn from Chi Bo Lin’s view

To love it, one needs to see the beauty of it, as well as its problems, only then can one pray for Taiwan’s future from the heart.

— Chi Po-lin

Chi Po-lin‘s documentary film “Beyond Beauty: Taiwan from Above (看見台灣) captures Taiwan completely in aerial cinematography and broke the Taiwan box office records for the largest opening weekend and the highest total gross of a locally produced documentary. It brings a complete different perspective of understanding the beauty of the land as well as raises awareness of environmental issues which later prompted calls on the government to amend laws and repeal Asia Cement’s mining license.

 

Beyond Beauty: Taiwan from Above official trailer

 

During the press conference of sequel to “Beyond Beauty: Taiwan from Above”, one asked Chi why he did not use drones. He pointed out it is due to the poor image quality and the monotone of the camera movement that he did not consider to make film by drones.

On June 10, 2017, Chi died in a helicopter crash in a mountainous area in Hualien County’s when the group was shooting footage for the sequel. Since using helicopters for aerial cinematography put photographer in tremendous danger, using drones with the aid of artificial intelligence might be worth a try.

We decided to start this project: learning from Chi’s view and shooting the documentary by AI.

 

The helicopter crashed in a mountainous area in Hualien County’s when Chi’s group was shooting footage for the sequel

 

Camera angle is one of the most important factor in producing scenic view. Where the camera is placed in relation to the subject and how the angle is taken can affect the way the viewer perceives the subject and invoke feelings and emotions.

Chi’s film uses different camera angles to achieve landscape videography, in which the majority of them is bird’s-eye view. It constitutes 71 precent of the angles that taking shoots of subjects such as coastlines, paddy fields and cities. This shot gives the audience a wider view and creates a spatial perspective that is rare for human’s viewpoints where objects or human seems harmless and insignificant.

The other part of the documentary consist of both high-angle views and eye-level views. High-angle views are taken when the camera is placed above the subject and lens points down while eye-level views are when the camera is looking straight on with the subject.

 

Composition of camera angles of “Beyond Beauty: Taiwan from Above”

 

Let’s look at the basic camera moves that are used in Chi’s film as well. Two main techniques used are dollying and orbiting, constitute to 31 and 26 percent, respectively. In these techniques, the camera on the helicopter flies along an object, often coastlines or field roads, and move slowly up, down or side-to-side. Sometimes the camera orbits around an object such as lighthouses or mountain tips.

 

Composition of camera movements of “Beyond Beauty: Taiwan from Above”

 

Other moves including tracking shots, pan, tilts and zoom are also applied in the film. We analyze these techniques and usages to understand how a videographer captures the beauty of our lands. These provide our system with the camera strategies and styles to create AI-powered observational documentaries.

We have started collaborations with Tainan City Government, Department of Aeronautics and Astronautics from National Cheng Kung University and GEOSAT Aerospace & Technology Inc to enable AI to shoot a documentary, and learning from “Beyond Beauty: Taiwan from Above” is just the start of it.

 

Feautured Photo by  總統府 / CC BY

, ,

Meet JARVIS – The Engine Behind AILabs

In Taiwan AI Labs, we are constantly teaching computers to see the world, hear the world, and feel the world so that computers can make sense of them and interact with people in exciting new ways. The process requires moving a large amount of data through various training and evaluation stages, wherein each stage consumes a substantial amount of resources to compute. In other words, the computations we perform are both CPU/GPU bound and I/O bound.

This impose a tremendous challenge in engineering such a computing environment, as conventional systems are either CPU bound or I/O bound, but rarely both.

We recognized this need and crafted our own computing environment from day one. We call it Jarvis internally, named after the system that runs everything for Iron Man. It primarily comprises a frontdoor endpoint that accepts media and control streams from the outside world, a cluster master that manages bare metal resources within the cluster, a set of streaming and routing endpoints that are capable of muxing and demuxing media streams for each computing stage, and a storage system to store and feed data to cluster members.

The core system is written in C++ with a Python adapter layer to integrate with various machine learning libraries.

 

 

The design of Jarvis emphasizes realtime processing capability. The core of Jarvis enables data streams flow between computing processors to have minimal latency, and each processing stage is engineered to achieve a required throughput per second. For a long complex procedure, we break it down into smaller sub-tasks and use Jarvis to form a computing pipeline to achieve the target throughput. We also utilize muxing and demuxing techniques to process portions of the data stream in parallel to further increase throughput without incurring too much latency. Once the computational tasks are defined, the blue-print is then handed over to cluster master to allocate underlying hardware resources and dispatch tasks to run on them. The allocation algorithm has to take special care about GPUs, as they are scarce resources that cannot be virtualized at the moment.

Altogether, Jarvis becomes a powerful yet agile platform to perform machine learning tasks. It handles huge amount of work with minimum overhead. Moreover, Jarvis can be scaled up horizontally with little effort by just adding new machines to the cluster. It suits our needs pretty well. We have re-engineered Jarvis several times in the past few months, and will continue to evolve it. Jarvis is our engine to move fast in this fast-changing AI field.

 

Featured image by Nathan Rupert / CC BY

,

AI frontdesk – improve office security and working conditions

Imagine that someone in your office serves as doorkeeper, takes care of visitors and even cares about your working conditions, 24-7? One of our missions at Ailabs.tw is to explore AI solutions to address society’s problems and improve the quality of life of people and, we have developed one AI-powered front-desk to do all of the tasks mentioned above.

Based on 2016 annual report from Taiwan MOL (Ministry of Labor), the average work hours per year of Taiwanese employee is 2106 hours. Compared with OECD stats, this number ranked No.3 in the world which is just below Mexico and Costa Rica.

Recently on 4th, December, 2017,  the first review of the Labor Standards Act revision was passed. The new version of the law will allow flexible work-time arrangements and expand monthly maximum work hours up to 300. Other major changes of the amendment includes conditionally allowing employees to work 12 days in a row and reduction of a minimum 11 hour break between shifts down to 8 hours. The ruling party plans to finish second and third-reading procedure of this revision early next year (2018), and it will put 9-million Taiwanese labors in worse working environment.To get rid off the bad reputation of “Taiwan – The Island of Overwork “, a system which will notify both employee and employer that one has been extremely over-working, and the attendance report can not easily be manipulated is needed.

In May 2017, an employee Luo Yufen from Pxmart, one of Taiwan’s major supermarket chain, died from a long time of overwork after 7 days of being in the state of coma. However, the OSHA(Occupational Safety and Health Administration) initially find no evidence of overwork after reviewing the clocking report provided by Pxmart which looks ‘normal’. It wasn’t until August, when Luo’s case are requested for further investigation, that the Luo’s real working hours before her death proves her overwork condition.

Read more

The Road to Understand Aerial Cinematography

How does AI picks out the best views and shoot an aerial documentary? At Ailabs, we’ve built a system, equipped a drone and a 360 camera, to have an eye of a videographer.

We want to enable filmmaking to achieve camera movement and tracking shot by artificial intelligence itself. We design our system to be able to pick interesting object and features such as lighthouse or coastline from a 360-degree video and create a flat, standard documentary without manual control of camera movement and angle.

 

The drone carries a 360-degree 4K camera and flies alongside Tainan city, Taiwan, to collect videos that are later used for post editing done by our system.

 

This project is inspired by Chi Po-lin‘s documentary film “Beyond Beauty: Taiwan from Above (看見台灣) which captures Taiwan completely in aerial cinematography and broke the Taiwan box office records for the largest opening weekend and the highest total gross of a locally produced documentary. Unfortunately, Chi died in a helicopter crash in a mountainous area in Hualien County’s when the group was shooting footage for the sequel.

Chi’s pointed out it is due to the poor image quality and the monotone of the camera movement that he did not consider to make film by drones. Besides, flying on helicopters for aerial cinematography put photographer in tremendous danger. For this reason, we started the “Chi Po-Lin project” using a drone, a 360 camera and our AI-powered post-editing software algorithm.

Automatic Cinematography for 360

When using a helicopter for videography, the pilot flies the route, and the videographer operates the video camera respectively. In the case of drone,  the videographer is replaced by a 360 camera and an algorithm in post-editing to determines where to focus on in the 360 images.

 

 

As in the scenario above, we installed a 360 camera on the drone to let the AI control the perspective in the 360 image and render a portion of the 360 image to create virtual camera movements such as pan, tilt, and zoom.

 

Algorithmically controlled perspectives from the 360 images 

 

A reason for the monotone of the aerial videos is that it is very difficult for a person to manipulate the controls of the drone and consider the composition details at the same time. Moreover, the conventional methods of automatic cinematography simplify the problem to aligning the camera to the center of the object of interest. So, we give the control of the camera to the AI which, unlike the existing method, takes into account the composition and the semantic flow of the scene.

360 cameras record every point of view. After the videos are collected, the AI recognizes the scenes and objects of interest encountered during the flight, selects the best angles of each moment, and automatically plans multiple sets of suitable trajectories to assist the user in editing the movie.

Automatic Color Enhancement

In order to enhance image quality, we start from color enhancement. We want our video looks as appealing and clearly as professional flim producer took. We leveraged a model which can learn color enhancement by the original input photo and high quality HDR image. The model is learned in an unsupervised way with GAN. In order to reduce model complexity and speed up training, the model is trained with low-resolution image and thus can only output low-resolution image. We extend the model to support high-resolution image by patch-based method. We divide high-resolution image to several overlapping patches and use alpha blending to stitch them. Although it is a image enhancement model, the model is stable enough that we can directly apply it on frames from video without temporal problem. The result looks more appealing than original video in color aspect. According to detail enhancement, result video have more detail than original. For example, the sky looks more clearly than original input video.

Comparison video

Final result with automatic color enhancement

Partners:

– AILabs (台灣人工智慧實驗室)
– Southern Taiwan Science Park Bureau (科技部南科管理局)
– Tainan City Government (台南市政府)
– Department of Aeronautics and Astronautics, NCKU (成大航太)
– GEOSAT Aerospace & Technology Inc (經緯航太)

Sponsored:

– Microsoft
– Nvidia
– Garmin

 

Featured image by YELLOW Mao. 黃毛, Photographer / CC BY