Decoding the Complex World of Somatic Mutations: A Pioneering Whole-exome Sequencing analysis Benchmark Study

In the quest to understand and treat cancer more effectively, our genomics team has embarked on a groundbreaking journey. By collaborating with Dr. Shu-Jui Hsu’s team at  National Taiwan University College of Medicine, we’ve delved into the intricate world of somatic mutations, those genetic alterations that occur in cancer cells but not in healthy cells. We leveraged the comprehensive tumor Whole Exome Sequencing (WES) data from the Sequencing and Quality Control Phase 2 (SEQC2) project to make a significant leap in evaluating the accuracy and robustness of somatic mutation analysis in clinical settings.

BY Jia-Hsin Huang

Our study aims to improve the detection of genetic changes in cancer cells using whole exome sequencing techniques. Recently, the FDA’s SEQC2 consortium has made available a new dataset for evaluating how well various methods detect cancer mutations from tumor DNA. This dataset is expected to improve the quality of cancer genetic data and set a new standard for research in this area. Finding reliable analysis tools is challenging, but vital for accurate results. We investigated 18 combinations of several reputable mutation caller and aligner tools for detecting specific types of changes, including single nucleotide variants (SNVs) and small insertions/deletions (INDELs) detection. This approach could significantly enhance cancer diagnosis and treatment planning. The preprint of this study is now available on BioRxiv.


In this study, we discovered the pivotal role of mutation callers over aligners in influencing overall sensitivity in mutation detection. No single combination stood out across all tests, suggesting that combining different tools could provide the most accurate detection of genetic mutations in cancer. However, DeepVariant might not be ideal for somatic mutation analysis due to the poor performance in our tests. And the combination of BWA and Mutect2 demonstrated the best performance in open-source software categories for SNV detection. Our study highlights the benefits of using a mix of genetic analysis tools, showing improved detection of cancer mutations over using BWA_Mutect2 alone. Merging Mutect2 with various aligners slightly boosted the accuracy for single changes (SNVs), while for insertions/deletions (INDELs), these combinations outperformed DRAGEN in precision and overall accuracy.

Understanding somatic mutations is not just a scientific pursuit; it has real-world implications in clinical oncology. We annotated the detected variants using the Catalogue Of Somatic Mutations In Cancer (COSMIC) to evaluate the accuracy of various variant caller tools that detect mutations in genes related to cancer. DeepVariant, in particular, struggled to identify key gene variants, suggesting it might not be the best fit for pinpointing somatic mutations crucial for cancer treatment decisions. Additionally, we found that TNscope, while generally effective, had a higher tendency to miss certain mutations. This highlights the importance of choosing the right tools for genetic analysis, as these choices can significantly influence the direction and effectiveness of a patient’s treatment.


The ability to pinpoint specific mutations aids in targeted therapies and can flag potential drug resistance, underscoring the importance of our study. We also assessed how different genetic analysis tools detect mutations linked to drug resistance and treatment choices. Key findings show that while some tools accurately identify critical mutations, others, like DeepVariant, missed more than half, which could have major implications for treatment efficacy. For instance, certain mutations that contribute to chemotherapy resistance or affect targeted therapy response were not consistently detected by all tools. This variability underscores the necessity for precise genetic testing to inform personalized medicine approaches.

Furthermore, Tumor Mutational Burden (TMB) estimation can be indicative of a tumor’s instability and has been used as a biomarker to predict the efficacy of immunotherapy. In our research on measuring TMB, we found that different tools and sample preparation methods can produce varying results. For instance, Mutect2 often predicted higher TMB levels than what was actually present, especially in samples prepared with a specific Roche kit. Meanwhile, DRAGEN_C provided results that closely matched the expected TMB, and TNscope generally gave lower TMB estimates. These findings highlight that choosing the right tools and methods is crucial for accurate TMB measurement, which is key to making the best clinical decisions for cancer treatment.

In summary, our research offers a beacon of guidance for cancer genomic researchers, providing a detailed comparison of diverse tool combinations for tumor mutation identification. It’s a step forward in our collective mission to demystify cancer genetics and enhance the precision of medical treatments.



Somatic mutation detection workflow validity distinctly influences clinical decision

Pei-Miao Chien, Chinyi Cheng, Tzu-Hang Yuan, Yu-Bin Wang, Pei-Lung Chen, Chien-Yu Chen, Jia-Hsin Huang, Jacob Shujui Hsu. bioRxiv 2023.10.26.562640

Taiwan AI Labs and Turing College Sign Memorandum of Cooperation to Foster Collaboration in the Field of Artificial Intelligence

Taiwan AI Labs, a leading artificial intelligence research institution, has announced a significant milestone in its global expansion efforts with the signing of a memorandum of cooperation with Turing College, a pioneering educational institution based in Lithuania. The collaboration aims to facilitate intensive exchanges in areas such as federated learning and will also encompass internship programs, fostering a mutually beneficial partnership.


Turing College is a renowned online educational institution offering cutting-edge artificial intelligence courses to working professionals. Students have the unique opportunity to learn from experts in the field, including AI industry leaders such as Google, Meta, Amazon, and now, Taiwan AI Labs, as well as prestigious research universities like Cambridge and Stanford. Currently, the college has over 500 students from more than 20 countries, with the ambitious goal of becoming one of the largest AI academies in the European Union by 2025.


Founder of Taiwan AI Labs, Mr. Ethan Tu, expressed his excitement about the collaboration, stating, “This partnership provides Taiwan AI Labs with an exceptional opportunity to showcase Taiwan’s technological prowess and jointly develop trustworthy AI technologies that align with the principles of privacy and human rights in the European Union. We also anticipate that this collaboration will foster greater industry partnerships in the EU region, shining a spotlight on Taiwan’s contributions to the global AI landscape.”


Mr. Jason Huang, Representative of the Taiwanese Representative Office in Lithuania, extended his congratulations, highlighting the synergies between Taiwan and Lithuania as both economies thrive on advanced technology. He emphasized the immense potential for collaboration in the burgeoning AI industry, stating, “Both Taiwan and Lithuania are leaders in advanced technology, and together, we have substantial room for cooperation in the realm of AI knowledge.”


The collaboration between Taiwan AI Labs and Turing College was formalized following intensive discussions and consensus on various aspects, including the development of AI education, utilization of federated learning methodologies, establishment of best practices, and the initiation of an internship exchange program. The memorandum of cooperation was signed in April, marking the commencement of a fruitful partnership in the field of AI application and academia.


Lukas Kaminskis, CEO of Turing College, expressed his enthusiasm for the collaboration, stating, “Partnering with Taiwan AI Labs provides our students with an exceptional opportunity to learn from one of the leading AI institutions in Asia. We are excited about the possibilities this collaboration holds for their educational journey.”


With this collaboration, Taiwan AI Labs and Turing College aim to foster knowledge exchange, nurture AI talent, and contribute to the advancement of the global AI ecosystem. 


About Taiwan AI Labs:

Taiwan AI Labs is a renowned research institution at the forefront of artificial intelligence technology. With a mission to push the boundaries of AI innovation, Taiwan AI Labs focuses on groundbreaking research, development of cutting-edge AI applications, and fostering partnerships to create a positive impact on society. For more information, please visit [website link].


About Turing College

Turing College is an online educational institution dedicated to providing high-quality AI courses to professionals worldwide. Through its industry partnerships and expert-led curriculum, Turing College aims to empower individuals with the skills and knowledge required to thrive in the AI-driven world. To learn more about Turing College, please visit [website link].


Press Contact

Mr. Barrett Tsai

PR Account Executive | Taiwan AI Labs 

Call to Action: confirm leading edge medical AI technology improves radiosurgery clinical workflow

To distinguished medical professionals,

We would like to request your assistance with an AI project designed to make radiosurgery more precise and accessible. Participants will have access to leading edge AI technology which seeks to revolutionize radiosurgery procedures for patients, clinicians, and institutions. This project is initiated by Taiwan AI Federated Learning Alliance (TAIFA) and supported by Taiwan AI Labs.

To start, we propose each institution participating in the project verify that DeepMets makes the radiosurgery clinical workflow more efficient. We anticipate the benefits of this technology will include:

  1. Faster clinical workflow
  2. Increased cost-effectiveness of procedures
  3. Improved accuracy and precision in diagnostics

DeepMets is part of the DeepBraiM family of AI-powered tools. DeepBraiM is designed to aid the detection of various brain tumors and other abnormal neurological conditions to facilitate treatment workflow. DeepMets was developed by Taiwan AI Labs and Taipei Veterans General Hospital. It is Taiwan FDA approved. DeepMets can quickly identify brain metastasis to aid clinicians and radiologists in detecting brain metastases on magnetic resonance images (MRI).

DeepMets works by analyzing MRI and quickly identifies and contours the locations of brain metastasis (between 4mm to 40mm), typically within minutes for an MRI examination. Reports in both PDF and RTSS format are generated for review and validation by the clinician or radiologist. This report can be used to create a treatment plan and evaluating treatment effects.

In addition to the clinical implementation of DeepMets, the technology offers the benefits of Federated Learning (FL) and Federated Validation (FV). FL, also known as decentralized learning, is a machine learning technique in which many institutions first train their individual local AI model with their local dataset. After the local AI model has been trained, this model is shared with other institutions in a manner that protects patient privacy, to collectively develop a global AI model that has effectively been trained by multiple institutions, using multiple datasets.

FV, also known as decentralized validation, is where an institution’s local AI Model can be validated against available datasets that have been collected as part of the FL efforts. This allows individual institutions to verify that their specific AI model is correct in its inference capabilities by using a diverse dataset provided by multiple institutions. As with FL, patient privacy is protected, as the dataset information is not shared between institutions during FV. The requesting institution receives only the results of the FV.

The primary objective of this project is to verify the suitability of DeepMets for use in clinical scenarios. The expectation is that DeepMets will increase the efficiency of the clinical teams in designing the treatments through automating the identification of brain metastasis. This, in turn, offers improved benefits to the institution and patients in the following ways

  1. The treatment team members will spend less time on each procedure, reducing the time commitment and stress experienced by both the patient and the medical professionals;
  2. Treatment decision will become faster, less costly, and more accurate for patients in need of treatment;
  3. As workflow efficiency increases with DeepMets, the institution ultimately saves on operating costs.

By supporting this project, you will be part of a team of pioneers that aims to make medical service more accessible to everyone.

To join this project, please visit this web link:

☞ Click Here for International Collaborators

or Scan the QR code

Thank you for your time and consideration.


Wan-Yuo Guo, M.D., Ph.D.
Head of Medical Solutions, Taiwan AI Labs
Professor, School of Medicine, National Yang Ming Chiao Tung University, Taiwan
Emeritus Radiologist, Taipei Veterans General Hospital, Taiwan

Extracting The Most Significant and Relevant Relational Facts From Large-Scale Biomedical Literature

Eunice You-Chi Liu


  • Introduction

pubmedKB is “a novel literature search engine that combines a large number of state-of-the-art text-mining tools optimized to automatically identify the complex relationships between biomedical entities—variants, genes, diseases, and chemicals—in PubMed abstracts” (Li et al.).  Currently, the pubmedKB Relation Extraction (RE) module comprises three submodules—Relational Phrases (algorithm developed by applying spaCy which is an open-source library that pioneers syntactic-dependency syntax parser), Relational Facts (model advanced by integrating R-BERT relation classification framework and BioBert) and Odds Ratio Info (tool progressed from a pre-trained general-purpose sequence-to-sequence model T5). 


  • Motivation

Indeed, through powerful literature mining, pubmedKB allows researchers and medical practitioners to more effectively gain extensive knowledge and insights on any biomedical entity. As this literature search engine takes all PubMed abstracts into account (currently, pubmedKB currently contains relations for 10.8M PubMed abstracts and 1.7M PubMed full texts), the relations extracted can be excessive and superfluous especially if the users want to productively identify the most pertinent relations or concisely present what relations a biomedical entity or entities pair entails. For instance, I am interested in learning the relations between HGVS:p.V600E and MESH:D009369. There are more than 2000 relations extracted for this entity pair (Figure 1). Depending on the needs of the users, scrolling over a sea of relations for this given pair might not be the most effective. Specifically, the vast majority of the relations are relational facts (denoted as rbert_cre). Moreover, when we look further into the relations extracted that are rbert_cre, there are more than 1500 relations implying similar meaning: MESH:D009369 patients that carry HGVS:p.V600E, which can be redundant and difficult to grasp what is important (Figure 2). In this case, we ask given any entity or PubMed Unique Identifier (PMID), how can we extract and demonstrate the most relevant and significant biomedical relations? 


Figure 1: Using the Web GUI to search for the relations between the biomedical entities pair: HGVS: p.V600E and MESH: D009369, the number of relations found for each submodule is shown.


Figure 2: A demonstration of a small portion of the in-patient relations extracted between HGVS: p.V600E and MESH: D009369.


  • Methods

To answer the research question posed in the motivation section, the methods can be majorly divided into two parts which are 1) “extracting” and 2) “demonstrating” the most relevant and significant biomedical relations. With the goal of acquiring the most relevant and significant relations of given entities, we develop scispaCy open relation extractions (ORE) and research on applying and developing neural open relation extractions (ORE). After finding the most accurate relations, we develop the evidence summary in order to present selective relations in a concise and informative way.


1) Develop scispaCy ORE

We decide to develop scispaCy ORE because it is a spaCy model specifically for biomedical text processing, which fits our main interest in working with biomedical literature and extracting significant relations from them. scispaCy dependency parsers are trained on GENIA 1.0 and then are transformed into universal dependencies, UD, format (Neumann et al.). According to spaCy’s official documentation, spaCy is trained in Clear format (English · spacy models documentation). scispaCy dependency parsers are trained on GENIA 1.0 and then are transformed into universal dependencies, UD, format (Neumann et al.). The difference in how the dependency is generated and trained might have resulted in syntactic parsing and part-of-speech (POS) tagging differences between scispaCy and spaCy (Figure 3).


Figure 3: For a very simple sentence that entails a relation between two biomedical entities, “All of ENT0ITY relates to ENT1ITY,” in which ENT0ITY and ENT1ITY infer two different biomedical entities. The upper part (labeled as a) is the dependency parsing using spaCy and the lower part (labeled as b) is the dependency parsing applying scispaCy. Notably, the dependency parse of the same sentence (shown by arrow) and the part-of-speech (POS) tagging (marked as capitalized abbreviation under the word) are distinct in part a and b. 


Using the same grammatical structure and the defined relationships between words as spaCy’s can result in misleading relations being identified. For example, as shown in Figure 3b, the main verb “relates” are directly linked to the object “ENT1ITY” rather than links to the preposition “to” in Figure 3a.

To ensure that the most relevant and important relations are extracted, understanding and redefining scispaCy’s head-dependent relationships are critical as they oftentimes “directly encode important information that is often buried in the more complex phrase-structure parses” (Jurafsky and Martin 310-334). In this case, we redefine scispaCy’s active and passive sentence structure capturing the subject, predicate, meaningful preposition, and object, and ensure that prepositions following the predicate and before entities in a noun modifier (nmod) dependency are captured. We also created a customized noun chunk extraction and set it as an extension as the default noun chunk extraction does not correspond to scispaCy’s dependency parsing. Besides the difference in dependency parsing, scispaCy and spaCy’s part-of-speech tagging diverges. In this case, we re-tag certain groups of words to ensure that the complete entities are pinpointed because for instance in Figure 3, the first entity should be “All of ENT0ITY” instead of only “ENT0ITY” because if today the determiner is changed from “all” to “none,” then the relations extracted can be inaccurate. 

The relations extracted using spaCy or scispaCy ORE are called triplets as they all include one head mention, one predicate, and one tail mention. After the triplets are identified, we calculate the number of triplets extracted and randomly select 20,000 sentences with or without triplets and manually label them. In order to be counted as a valid triplet, the triplet has to match the original meaning of the sentence, should be an assertion, and be understandable (unambiguous) without looking at the original sentence. 


2) Research on neural ORE

The current pubmedKB’s ORE submodule–spaCy–and the newly developed scispaCy ORE are all rule-based and can extract relations on a sentence level. As our goal is to extract the most significant and relevant relations, focusing on the sentence level might not be sufficient. We decide to research and develop neural ORE which can enable us to extract relations from paragraphs, sections, and hopefully the whole paper. Inspired by the paper “OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open Information Extraction,” we adopt and applied the OpenIE6 model to our own data of PubMed abstracts and full texts. OpenIE6 model contains a series of different pipelines, which includes a BERT encoder that “computes contextualized embeddings for each word” and a 2-layer transformer that attains the contextual embeddings at every level (Kolluru et al.). Extractions, in other words, triplets, are generated at every iteration after passing in the contextualized embeddings to produce words’ labels at a specific level (Kolluru et al.).  

The OpenIE6 model is trained on Wikipedia sentences, which is the same dataset as the one used for OpenIE4 (Kolluru et al.). We prepared the biomedical sentences from PubMed abstracts and full texts to get extractions. In order to evaluate the performance of the model on our biomedical sentences, we also prepared gold standard data derived from spaCy’s triplets that match the format of OpenIE6’s testing data. 


3) Develop and generate the evidence summary

After gathering the most relevant and significant relations, we hope to demonstrate them in a clear and intuitive way.  Figure 4 demonstrates an overview of how the evidence summary is generated. Aiming to be as diverse and informative as possible, we design that the summary generated would contain three different kinds of relations which are comprised of odds ratio, relational fact, and relational phrase via ORE. To achieve the concise aspect, we skillfully selected the relations with the highest confidence value–highest odds ratio and most supported relational phrases. In particular for the relational facts, based on the level of informativity, if the confidence value is identical, the causal relation between genetic variants and diseases is favored over the relation that entails certain patients carrying some specific diseases and the appositional relation of genetic variants and diseases. In terms of which relational phrases to feature, at most two different relational phrases from distinct sentences are included in the summary. Particularly, the relational phrases are selected based on the number of relations extracted from a single sentence as it is highly possible to infer that these sentences contain crucial information. 

Figure 4: A quick overview and explanation of how the evidence summary is being designed and produced. 


  • Results

Using the newly developed scispaCy ORE, among 95,441 unique sentences extracted from various PubMed abstracts, we are able to identify and extract 1000 more triplets compared to employing spaCy ORE (Table 1). 

Table 1: Among 95,441 individual sentences extracted from various PubMed abstracts, the number of sentences containing identified triplets and the number of identified triplets are both higher when applying the newly developed scispaCy ORE.  


We proceed into qualitative evaluation by manually looking over triplets that occurred in the randomly selected 20,000 sentences. Based on Table 2, by all means, the yield when applying scispaCy ORE is higher because a larger number of triplets are identified. The precision of the correct triplets (those who fit the criterion mentioned in the method section) extracted from scispaCy and spaCy is similar (Table 2). We can not really calculate recall because it would be nearly impossible to identify false negative triplets. 

Furthermore, we discover that scispaCy ORE allows us to generate triplets in sentences that have possessive relationships and adjective comparisons which current spaCy ORE is not yet able to achieve. To illustrate, for the sentence “Our results indicate that CYFRA 21-1 may be a useful tumor marker in NSCLC, especially in carcinoma planocellulare,” the triplet–“CYFRA 21-1, marker in, NSCLC”–is identified and extracted using scispaCy ORE but not spaCy ORE. In this case, scispaCy ORE is capable of identifying and incorporating relations other than action verbs and these other relations are also common and critical in biomedical literature besides those focused on an action verb. 

Table 2:  After randomly selecting 20,000 sentences and labeling them, the number of correct triplets extracted in scispaCy is 382 in spaCy is 244. Among the number of correct triplets extracted in scispaCy ORE and spaCy ORE, 190 triplets are identical. The precision of the correct triplets extracted from scispaCy and spaCy is similar, 0.75 and 0.79 respectively. 


Besides having scispaCy ORE that extracts and identifies important triplets which are missing in spaCy ORE, we recognize that the performance of neural ORE, specifically the OpenIE6 model, is also promising. By creating one-to-one mapping between the predicted extractions (generated by the OpenIE6 model) and the spaCy original extracted triplets, we evalutate the precision and recall. Notably, the AUC and F1 score, 49.0 and 62.9, are comparatively the same as the testing results using the standard CaRB dataset –a large-scale Open IE benchmark annotation that aims to contribute to the standardization of Open IE evaluation (Stanovsky and Dagan).

After developing and researching the most suitable ORE to extract the most significant and relevant relations, we compiled those information into a well-organized and informative paragraph (Figure 5). Moreover, the relations annotated by each annotator, including odds ratio, relational phrase, and relational fact are clearly labeled in distinct colors for easier comprehension (Figure 5). 

Figure 5: The summary of the relations between HGVS: p.V600E and MESH: D009369. 


  • Discussion

In this study, we ask and answer the question: given any entity or PubMed Unique Identifier (PMID), how can we extract and demonstrate the most relevant and significant biomedical relations? We add on to previous work especially to the development of pubmedKB by developing a new biomedical-specific sicspaCy ORE and researching the application of neural ORE. Moreover, we organize and present those selected relations into a well-informative paragraph ready for the usage of demonstrating in academic and professional contexts. Future studies can further expand on the development of neural ORE given the positive performance of the OpenIE6 model. Currently, the OpenIE6 model is trained on sentences from Wikipedia. In this case, training the OpenIE6 model with both biomedical sentences from PubMed and general sentences from Wikipedia can potentially yield greater performance. In terms of the scispaCy ORE, although it is a comparatively developed and mature submodule ready to be used. It might also be interesting to explore scispaCy’s named-entity recognition (NER) module and see how it can be integrated into the ORE. 


Works Cited

“Dependency Parsing.” Speech and Language Processing: An Introduction to 

Natural Language Processing, Computational Linguistics, and Speech 

Recognition, by Dan Jurafsky and James H. Martin, Pearson, 2022, pp. 


“English · Spacy Models Documentation.” English


Kolluru, Keshav, et al. “OpenIE6: Iterative Grid Labeling and Coordination 

Analysis for Open Information Extraction.” Proceedings of the 2020 

Conference on Empirical Methods in Natural Language Processing 

(EMNLP), 2020, doi:10.18653/v1/2020.emnlp-main.306. 

Li, Peng-Hsuan, et al. “PubmedKB: An Interactive Web Server for Exploring 

Biomedical Entity Relations in the Biomedical Literature.” Nucleic Acids 

Research, vol. 50, no. W1, 2022, doi:10.1093/nar/gkac310. 

Neumann, Mark, et al. “ScispaCy: Fast and Robust Models for Biomedical 

Natural Language Processing.” Proceedings of the 18th BioNLP 

Workshop and Shared Task, 2019, doi:10.18653/v1/w19-5034. 

Stanovsky, Gabriel, and Ido Dagan. “Creating a Large Benchmark for Open 

Information Extraction.” Proceedings of the 2016 Conference on 

Empirical Methods in Natural Language Processing, 2016, 



The Third-Party Federated Learning Model Interfacing

Ying-Chih Lin


Federated learning(FL) is a deep learning approach that trains a model across multi-center without data circulation. Without the local training dataset exchangement in each center, the data can be well protected and more private. This technique enables one can train a robust model in the condition of data privacy, data security, and the right of data access which is suitable for the field of medical images. Medical images are the data collected from the screening records of patients from hospitals at all levels. As the images all come from patients, personal data protection is a serious issue while utilizing these data. FL is a state-of-art technique that can deal with these privacy problems by only sharing model parameters(including weights and biases of a model) between models that trained from each local server(edge). FL might stand an important role in the next generation of deep learning in medicine.

Model Introduction

The third-party model we brought from NYCU BSP/BML Lab led by professor Chen(陳永昇教授) is a binary classification model targeting 10 regions of the human brain. The Alberta stroke programme early CT score (ASPECTS) is a 10-point quantitative topographic CT scan score utilized in our model for evaluating the severity of ischemic stroke patients. By giving a noncontrast-enhanced CT(NCCT) image, the model can infer the severity of the patient.

Figure of 10 regions of ASPECTS evaluation in the human brain




Our main task is to interface the third-party model into the FL system of AILabs. In stage one, we’ll need to be familiar with the Manual Harmonia(MH) system and the format of the model system needs while interfacing. Meanwhile, FL is a technique that requires communication between edges. In order to make sure every epoch of different edges is well trained before the next epoch of training starts, inserting multi-processing code in our original training code is necessary. After the first epoch is trained, models of all the edges need to load the model weights which merged by aggregator from all the last epoch models (ex: We get 2 edges in this task, both of the models in the 2 edges need to load the weights that merged from the 2 models of the first epoch by aggregator at the beginning of second epoch).

Figure of implementation in FL system


Finally, we got a file of model weights(merged.ckpt) at the end of the training. After evaluating the model performance, we put the whole training process actually in Physical which is stage two. 

In this experiment, training dataset contained 170 3D NCCT images and 90 for testing. Half of the training data was divided into the 2 edges of FL(85/85). Final evaluation for the purpose of comparing the performance between NYCU local training and 2 edges of FL in AI Labs.

Table of performance comparison



We observed that the recall rate was a bit higher and the precision rate was a bit lower both in stage MH and stage Physical in Class 0. The final result of FL in AILabs showed the performance still maintained at the same level compared with local training in NYCU, which meant the task of interfacing is successful.


Azoospermia with Deep Learning Object Detection

Introduce Azoospermia

Azoospermia is a medical term implying the condition of no measurable sperm in a man’s semen. It is also the main challenge in male infertility. Azoospermia could be divided into two classes, including obstructive azoospermia(OA) and non-obstructive azoospermia(NOA). For OA, the testicular size and serum hormone profile are normal. On the other hand, NOA means the process of spermatogenesis is unusual, and to make a further diagnosis the doctor has to check the cell findings of testicular specimens. Originally, it took 2-3 days to complete a pathological diagnosis. In order to make the process more efficient, the Department of Urology of Taipei Veterans General Hospital has developed a standard process using the testicular touch print smear(TPS) technique to make real-time diagnoses. However, it would take a lot of time to learn the interpretation of TPS. Machine learning or deep learning technologies have applied to different kinds of medical images and have become a new field with many researchers involved. AI Labs cooperates with Dr. William J. Huang(黃志賢醫師) from Taipei Veterans General Hospital, to make AI assist surgeons in the reading of TPS slides. We aim to perform object detection techniques on testicular specimens to find 6 cells, including Sertoli cell, primary spermatocyte, round spermatid, elongated spermatid, immature sperm, and mature sperm. These cells are essential to determine different stages of azoospermia. 



In this task, our goal is to detect cells among the above 6 classes in one individual image, the desired outputs are accurate bounding boxes and their labels. Since there are no existing open-dataset for this work, we have to build our own dataset. Provided input images are 2D testicular specimens captured from an electron microscope, with ground truth boxes and labels. The cell dataset currently contains 120 images with over 4,500 cells, which are annotated and reviewed by Dr. Huang and his assistant. Considering input size, class number and performances, we use EfficientDet as our model to train a network to detect cells in TPS. 


Example of input image with annotations. Different color means different class.


What is EfficientDet?

EfficientDet is the current state-of-the-art network for object detection. It is consists of a backbone network and a feature network. Inputs are feed into the backbone network; features would be extracted from different layers in backbone and sent to feature networks. Feature maps would be combined in different strategies depending on the network you used. At the end of the feature network, there are two heads with several layers that are used to predict final bounding box positions and class labels. In our setting, the EfficientDet uses EfficientNet pretrained on ImageNet as backbone and BiFPN as feature networks. 

EfficicientDet Model Structure.


EfficientNet is a model that applies the compound scaling strategy to improve accuracy. While pursuing higher performance, researchers often scale up model width, depth or resolution. However, the results are often contrary to expectation if the model becomes too complicated. The authors of EfficienNet combined different scaling through Neural Architecture Search to find suitable composite, thus it is called compound scaling. There are 8 levels of EfficientNet in total, and we choose EfficientNet-B3 as our backbone considering the difficulty of our task, input size, and model size.


Illustration of compound scaling strategy.


BiFPN is a feature pyramid network(FPN) with both top down and bottom up paths to combine feature maps, while the original FPN has only top down path. The purpose is to enhance the feature expression, so the bounding box regression and classification could perform better.


Comparison of different FPNs.

Implementation Details

There are some modifications that we have done to apply EfficientDet on our data. First, we reduce the size of anchor and the intersection of union(IoU) threshold for finding anchors correspond to ground truth boxes. The reason is that the smallest cell size is only around 8px, which is much smaller than the default base anchor size 32px. Also, since the boxes are quite small, matching ground truth boxes and anchors under a looser condition would make it easier to learn. Furthermore, we sample K matched anchors instead of all candidates for computing losses and update, K=20. Image size is set to 768×1024. 

For the small dataset with 108 training images and 12 testing images, we are able to reach a mAP 71%, recall 76% performance.

Figure of input ground truth data(bold frames) and predicted boxes(thin frames). The number is confidence score.



We demonstrate that modern machine learning/deep learning methods could apply to medical images, and are able to achieve satisfying performance. This model would help the surgeon to interpret the smear more easily, and even speed up the surgery as we are actively working on improving the model with more data. 



[1] Tan, M. & Le, Q.. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Proceedings of the 36th International Conference on Machine Learning, in PMLR 97:6105-6114

[2] M. Tan, R. Pang and Q. V. Le, “EfficientDet: Scalable and Efficient Object Detection,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 10778-10787, doi: 10.1109/CVPR42600.2020.01079.

DockCoV2: a drug database against SARS-CoV-2

The current state of the COVID-19 pandemic is a global health crisis. From December 2019 to September 2020, SARS-CoV-2 has infected over 32 million people, and caused more than one million deaths worldwide. To fight the novel coronavirus, one of the best-known ways is to block enzymes essential for virus entry or replication. The Genomics team at Taiwan AI Labs collaborated with Professor Hsuan-Cheng Huang at National Yang-Ming University, Professor Chien-Yu Chen and Distinguished Professor Hsueh-Fen Juan at National Taiwan University under MOST and NTU supports to develop the database DockCoV2 ( which aims to find effective antiviral drugs against SARS-CoV2.

The research team explores new opportunities for drug repurposing which is the process of finding new uses for existing approved drugs, and is believed to offer great benefits over de novo drug discovery, as well as to enable rapid clinical trials and regulatory review for COVID-19 therapy.

Here we develop the database, DockCoV2, by performing molecular docking analyses of seven proteins including spike, 3CLpro, PLpro, RdRp, N protein, ACE2, and TMPRSS2 with 2,285 FDA-approved and 1,478 NHI drugs. DockCoV2 also provides appropriate validation information with literature support. Several databases focus on delivering repurposing drugs against SARS-CoV-2. To our knowledge, no database provides a more up-to-date and comprehensive resource with drug-target docking results for repurposed drugs against SARS-CoV-2.

DockCoV2 is easy to use and search against, is well cross-linked to external databases, and provides the state-of-the-art prediction results in one site. DockCoV2 offers not only the related information of Docking structure and Ligand information but also Experimental data including biological assays, pathway information, and gene set enrichment analysis recruited from other validated databases. Users can download their drug-protein docking data of interest and examine additional drug-related information on DockCoV2. We also have released our scripts and source on github (

Article link:
Ting-Fu Chen, Yu-Chuan Chang, Yi Hsiao, Ko-Han Lee, Yu-Chun Hsiao, Yu-Hsiang Lin, Yi-Chin Ethan Tu, Hsuan-Cheng Huang, Chien-Yu Chen, Hsueh-Fen Juan. DockCoV2: a drug database against SARS-CoV-2. Nucleic Acids Research (2020)


Figure. The overview of the database content. In addition to the docking scores, DockCoV2 designed a joint panel section to provide the following related information: Docking structure, Ligand information, and Experimental data.

Harmonia: an Open-source Federated Learning Framework

Federated learning is a machine learning method that enables multiple parties (e.g., mobile devices or organization) to collaboratively train a model orchestrated by a trustable central server while keeping data locally. It has gained a lot of attention recently due to the increasing awareness of data privacy.

In Taiwan AI Labs, we started an open source project which aims at developing systems/infrastructures and libraries to ease the adoption of federated learning for research and production usage. It is named Harmonia, the Greek goddess of harmony, to reflect the spirit of federated learning; that is, multiple parities collaboratively build a ML model for the common good.

System Architecture

Figure 1: Harmonia system architecture


The design of the Harmonia system is inspired by GitOps. GitOps is a popular DevOp practice, where a Git repository maintains declarative descriptions of the production infrastructure and updates to the repository trigger an automated process to make the production environment match the described state in the repository.  Harmonia leverages Git for access control, model version control and synchronization among the server and participants in a federated training (FL) run. An FL training strategy, global models, and local models/gradients are kept in the git repositories. Updates to these git respoitroies trigger FL system state transitions. This automates the FL training process.

An FL participant is activated as a K8S pod which is composed of an operator and application container. An operator container is in charge of maintaining the FL system states, and communicates with an application container via gRPC. Local training and aggregation functions are encapsulated in application containers. This design enables easy deployment in a Kubernetes cluster environment, and quick plug-in of existing ML (Machine Learning) workflows.

Figure 2: Illustration of two clients FL

Figure 2 illustrates the Harmonia workflow with two local training nodes. The numbers shown in the figure indicate the steps to finish a FL run in the first Harmonia release. To start a FL training, a training plan is registered in the git registry (1), and the registry notifies all the participants via webhooks (2). Two local nodes are then triggered to load a pretrained global model (3), and start local training with a predefined number of epochs (4). When a local node completes its local training, the resulting model (called a local model) is pushed to the registry (5) and the aggregator pulls this local model (6). Once the aggregator receives local models of all the participants, it performs model aggregation (7), and the aggregated model is pushed to the git registry (8). The aggregated model is then pulled to local nodes to start another round of local training (9). These processes are repeated until a user-defined converge condition is met, e.g., # of rounds. The sequence diagram of a FL run is shown in Figure 3.

Figure 3: Sequence diagram

Below we detail the design of the Git repositories, workflows of an operator container for both an aggregator and a local node, and an application container.

Git Repositories

We have three types of repositories in registry:

  1. Training Plan: it stores the required parameters for a FL run. The training plan is a json file named json :


    “edgeCount”: 2,

    “roundCount”: 100,

    “epochs”: 100


  2. Aggregated Model: it stores aggregated models pushed by the aggregator container. The final aggregated model is tagged with inference-<commit_hash_of_train_plan>.
  3. Edge Model: these repositories store local models pushed by each node separately.


Edge and aggregator operator containers control FL states. In Figure 4 and Figure 5, we show the workflow of an edge and aggregator operator, respectively.

Edge Operator

Figure 4: Workflow in an edge node

When a training plan is registered at Git, an edge node starts a training process with local data. The resulting new local model weights are pushed to the git registry, and an edge node then waits for the aggregator to merge new model updates from all participating edge nodes. Another round of local training is performed once the aggregated model is ready at the git registry. This process repeats until it reaches the number of rounds specified in the training plan.

Aggregator Operator

Figure 5: Workflow in Aggregator

For a new FL run, an aggregator operator starts with a state waiting for all edges finishing their local training and notifies the application container in the aggregator server to perform model aggregation. A newly aggregated model is pushed into the git registry. The process iterates until it reaches the number of rounds specified in the training plan.


Local training or the model aggregation task is encapsulated in an application container, which is implemented by users. An application container communicates with an operator container with gRPC protocols. Harmonia works with any ML framework. In the SDK, we provide an application container template so a user can easily plug in their training pipeline and aggregation functions and don’t need to handle gPRC communication.


We demonstrate the usage of Harmonia with pneumonia detection on chest x-rays . The experiment is based on the neural network architecture developed by Taiwan AI Labs.

We took the open dataset RSNA Pneumonia dataset [1], and composed two different FL datasets. In this experiment, we assumed 3 hospitals. We first randomly split the whole dataset into a model training set (80%) and a testing set (20%). In the first FL dataset, we randomly assign training data to edges. In a real-world scenario, data from different hospitals are often non-IID (Independent and Identically Distributed). Therefore, in the second FL dataset, the ratio of positive data and negative data on each edge are set differently. Table 1 shows the number of positive and negative training data  for centralized training, federated training with IID and non-IID, respectively.


Table 1: Number of positive and negative training data of each training method

We adopted Federated Averaging (FedAvg) as our aggregation method [2] in this experiment. Local models are averaged by the aggregator proportionally to the number of training samples on each edge. Edges trained for one epoch in each round and the total number of epochs is the same as centralized training.

Figure 5: Classification accuracy


The results are shown in Figure 5. Both IID and non-IID FL could achieve comparable classification accuracy compared to centralized training, but they take more epochs to reach convergence for the non-IID dataset. We can also observe that non-IID FL converges slower than IID.

Privacy Module

To enforce Differential Privacy (DP), Harmonia provides a pytorch-based package, which implements two types of DP mechanisms. The first one is based on the algorithm proposed in [3, 5], a differentially-private version of the SGD, which randomly adds noises to SGD updates. Users can simply replace the original training optimizer to the DPSGD optimizer provided by Harmonia. The second technique is known as the Sparse Vector Technique(SVT) [4], which protects models by sharing distorted components of weights selectively. To adopt this privacy protection mechanism, a user could pass a trained model into the ModelSanitizer function provided by Harmonia.


The first release includes Harmonia-Operator SDK ( and differential privacy modules ( We will continue to develop essential components in FL (e.g., participant selection), and enable more flexible ways of describing FL training strategies. We welcome contributions of new aggregation algorithms, privacy mechanisms, datasets, etc. Let’s work together to flourish the growth of federated learning.


[1] RSNA Pneumonia Detection Challenge.

[2] Communication-Efficient Learning of Deep Networks from Decentralized Data. Brendan McMahan et al., in Proceedings of AISTATS, 2017

[3] Deep Learning with Differential Privacy. Martín Abadi et al., in Proceedings of ACM CCS, 2016

[4] Understanding the sparse vector technique for differential privacy. Min Lyu et al., in Proceedings of VLDB Endowment, 2017

[5] Stochastic gradient descent with differentially private updates. Shuang Song et al., in Proceedings of GlobalSIP Conference, 2013


AI Labs released an annotation system: Long live the medical diagnosis experience.

The Dilemma of Taiwan’s Medical Laboratory Sciences

Thanks to the Breau Of National Health Insurance in Taiwan, abundant medical data are appropriately recorded. This is surely good news for us, an AI-based company. However, most of the medical data have not been labeled yet. What’s worse, Taiwan currently faces a terrible medical talent shortage. The number of experienced masters of medical laboratory sciences is getting smaller and smaller. Take Malaria diagnosis for example. Malaria parasites belong to the genus Plasmodium (phylum Apicomplexa). In humans, malaria is caused by P. falciparum, P. malariae, P. ovale, P. vivax and P. knowlesi. It is undoubtedly an arduous work for a human to detect and classify the affected cell to these five classes. Unfortunately, it is only one retiring master in this field in Taiwan that can indeed confirm the correctness of the diagnosis. We must take remedial action right away, yet it costs too much either time or money to train a human being to be a Malaria master. Only through the power of the technology can we preserve the valuable diagnosis experience.

Now, we decided to solve the problem by transferring human’s experience to the machines, and the first step is to annotate the medical data. Since the only one master cannot address the overwhelming data by himself, he needs some helpers to help him do the first detection job and, in the end, the master does the final confirmation. In this case, we need a system which allows multiple users to cooperate. It should be able to record the file path of the label data, the annotators, and the label time. We search for assorted off-the-shelf annotation systems, none of them, unfortunately again, meets our specification. So we decided to roll up our sleeves and revise a most relevant one for our propose.

An Improved Annotation System

Citing from an open-source annotation system resource [1], AILabs revised it and released a new, easy-to-use annotation system in order to help those who desire to create their valuable labeled data. With our labeling system, you will know who is the previous annotator and can systematically revise other’s work.

This system is designed for object labeling. By creating a rectangular box on your target object, you will be able to assign the label name and label coordinates to the chosen target. See the example below.

Also, you will obtain the label catalogs and the label coordinates in an XML file, which is in PASCAL VOC format. You can surely leverage the output XML file as the input of your machine learning programs.

How does it work?

Three steps: Load, Draw and Save.

In Load: Feed the system an image into this system. It is always fine if you do not have an XML file since it is your first time operating this system.

In Draw: Create as many as labels in an image as you want. Don’t forget that you may zoom in/out if the image is not clear enough.

In Save: Click save button. Everything is done. The system will output an XML file including all the labels data for an image.

What’s Next?

With the sufficient annotated data, we can then train our machines by learning the labels annotated by the medical master, which will make the machines able to make a diagnosis as brilliant as the last master. We will keep working on it!


[1] Tzutalin. LabelImg. Git code (2015).

AI reads Medical Literature “variant2literature”

The importance of AI

Before the existence of medical AI systems, medical professionals and genome researchers found themselves limited by the extensive amount of time and money required to compensate professionals for their work in genomic standardization, analysis, and comparing gene variants and symptoms. To add on, full genomic analysis is difficult to achieve  due to the human genome being composed of over twenty thousand individual gene sequences.

AI applies to many areas of genomic analysis, such as gene-assisted diagnosis, human genome annotation, and quantification of the level of correlation between a gene variant and diseases. There’s still a long way to go before we understand all of the over twenty thousand human genes’ purposes and mechanisms and predict how gene variants affect them, but AI can assist in establishing correlation matrices and prediction models, except in cases such as genetically transmitted diseases, drug reactions and cancerous genes, where data is significantly scarcer.

As such, we employed AI to develop “variant2literature”, which peruses a large amount of medical literature, finding related variants to diseases of interest in order to assist medical professionals in efficiently predicting possible underlying diseases correlated to the variant  while raising the precision of diagnosis. In addition to finding the literature containing the variants of interest, variant2literature also provides association prediction if a variant is present along with a disease name in a single sentence.

The following paragraphs detail our methods and experimental results.


Data collection and test results

In order to determine the association between diseases and variants, initially, eatracting biomedical terms from literature is indispensable. In variant2liteature, we employed GNormPlus, tmVar and DNorm to identify the genes, variants, and diseases mentioned in PubMed Central (PMC). These tools were provided by the National Center for Biotechnology Information (NCBI), a part of the U.S. National Library of Medicine (NLM).

GNormPlus is a system that identifies gene descriptions from an excerpt. This technology is composed to two components: mention recognition and concept normalization. For mention recognition, GNormPlus uses conditional random fields (CRF) to recognize the gene descriptions. However, we still need to match the description to the gene described, which is why GNormPlus uses GenNorm in its concept normalization module to find the matching gene via vector analysis of exact matching or “bag-of-words” matching on descriptions.

tmVar is also a text-mining approach based on CRF, which is used to extract a wide range of sequence variants for both proteins and genes. These variants are defined according to a standard sequence variant nomenclature developed by the human genome variation society (HGVS). tmVar pre-processes the input text using tokenization and uses a CRF-based model to extract variant mentions for the final output.

DNorm is used to identify disease mentions, which are identified by using the BANNER named entity recognizer, a trainable system that also uses CRF. Mentions that are outputted by BANNER are then used to achieve disease normalization and identification using pairwise learning to rank.

Finally, a Recurrent Neural Network (RNN) deep learning model is used to predict the association between variants and diseases. This model was trained from a small set of literature annotated by our experts. We identified genes, variants, and diseases from these articles and categorized the types of relationships. If the instances of diseases and variants are observed in the same sentence, experts will denote whether they (a pair of variant and disease) are correlated, “Y” for yes and “O” for no. In other words, this is a machine learning algorithm that adopts binary classification. These labeled sentences will first pass through our self-trained Word2Vec, which converts the tokenized sentence into vectors, inputs these vectors into our RNN model, and then outputs the relationship between the two instances (“Y” or “O”). After training, this RNN model can be applied to the entirety of PMCOA to identify all relationships between diseases and variants.



On variant2literature, the user will input either a gene or a variant. variant2literature will normalize these inputs, search for the input in the indexed papers, then output related papers and label the relevant genes, variants, and diseases in the reported papers.If a disease and a gene variant both appear within the same sentence, variant2literature will determine the correlation between the two based on the surrounding context.

Through AI-assisted analysis, variant2literature automatically determines the correlation between a disease and a gene variant, greatly reducing the time spent comparing data, reducing the cost of gene analysis, allowing medical professionals to efficiently detect underlying diseases and setting a new milstone for Taiwanese disease-related genetic testing.