Tuesday, October 27, 2020

Data are Like Oil or Sunlight. Process it First

 

Data has been compared to a well-known phrase: new oil that business needs to run. IBM CEO Ginni Rometty explains it on the World Economic Forum in Davos in 2019, “I think the real point to that metaphor,” Rometty said, “is value goes to those that actually refine it, not to those that just hold it.”

Another different view of data came from Alphabet CFO Ruth Porat. “Data is actually more like sunlight than it is like oil because it is actually unlimited,” she said during a panel discussion in Davos. “It can be applied to multiple applications at one time. We keep using it and regenerating.”

An article entitled “Are data more like oil or sunlight?” published in the Economist in February 2020 has highlighted different aspects of data: it is considered as the “most valuable resource” and in the meantime, data can be also a public asset that people should share and make the most use of collectively.

AI is booming yet the data labeling behind is inefficient

Many industries are actively embracing AI to integrate it into their structure transformation. From autonomous driving to drones, from medical systems that assist in diagnosis to digital marketing, AI has empowered more and more areas to be more efficient and intelligent.

Turing Award winner Yann LeCun once expressed that developers need labeled data to train the AI model and more quality labeled data brings more accurate AI systems from the perspective of business and technology. LeCun is one of the godfathers of deep learning and the inventor of convolutional neural networks (CNN), one of the key elements that have spurred a revolution in AI in the past decade.

In the face of AI blue ocean, a large number of data providers have poured in. The data service companies recruit a large amount of data labelers, get them trained on each specific task and distribute the workload to different teams. Or they subcontract the labeling project to smaller data factories that again recruit people to intensively process the divided datasets. The subcontractors or data factories are usually located in India or China due to cheap labor. When the subcontractors complete the first data inspection, they collect the labeled datasets and pass on to the final data service provider who goes through its own data inspection once again and deliver the data results to the AI team.

Complicated, right? Unlike the AI industry, such a traditional working process is inefficient as it takes longer processing time and higher overhead costs, which unfortunately is wasted in secondary and tertiary distribution stages. ML companies are forced to pay high yet the actual small labeling teams could hardly benefit.

ByteBridge: an automated data annotation platform to empower AI

ByteBridge.io has made a breakthrough with its automated data labeling platform in order to empower data scientists and AI companies in an effective and engaged way.

With a completely automated data service system, ByteBridge.io has developed a mature and transparent workflow. In ByteBridge’s dashboard, developers can create the project by themselves, check the ongoing process simultaneously on a pay-per-task model with clear estimated time and price.

ByteBridge.io thinks highly of application scenarios, such as autonomous driving, retail, agriculture and smart households. It is dedicated to providing the best data solutions for AI development and unleashing the most value of data. “We focus on addressing practical issues in different application scenarios for AI development though one-stop, automated data solutions. Data labeling industry should take technology-driven as core competitiveness with efficiency and cost advantage,” said Brian Cheong, CEO and founder ByteBridge.io

It is undeniable that data has become a rare and precious social resource. Whatever metaphors of data are, such as new gold, oil, currency or sunlight, raw data is meaningless at first. It needs to be collected, cleaned and labeled before it grows into valuable goods. ByteBridge.io has realized the magic power of data and aimed at providing the best data labeling service to accelerate the development of AI with accuracy and efficiency.

Monday, October 26, 2020

Better Data for Smarter Chatbots

 Chatbots, computer programs that interact with users through natural language, have become extraordinarily popular due to technological advances. Among various types of chatbots, the need of conversational AI chatbots has become acute in order to facilitate human computer interactions through messaging applications, phones, websites, or mobile apps. In fact, a chatbot is just one of the typical examples of AI systems, many of which are powered by machine learning.


Not-so-intelligent Chatbots

According to a survey by Usabilla in 2019, 54% of respondents said they would prefer a chatbot to a human customer support representative if it saves them 10 minutes. Moreover, 59% of consumers in a PWC survey mentioned they want to have more humanized experiences with chatbots. Although customers have positive feelings toward AI solutions for the efficiency, many not-so-intelligent chatbots which are not smart enough to engage in fundamental conversations are seen in several industries.

Most chatbots that we see today are based on machine learning. They incorporate the ability to understand human language and get themselves trained. With machine learning, the computer systems can learn by being exposed to many examples: the training dataset. The dataset can be thought of as a set of examples. The chatbot’s algorithms extract and save patterns with each input of data. In this way, a chatbot uses training data to understand user behavior and presents the most applicable conversation for a personalized experience.

If not properly considered and developed, chatbots may contain massive potential failures. For example, when a customer starts a conversation with the word “howdy”, if the chatbot only has greeting words “hello”, “hi”programmed in the training dataset, unfortunately it doesn’t have a clue how to respond.

The quality of training data is key

Ben Virdee-Chapman, head of product at Kairos.com, once said that “the quality of the training data that allows algorithms to learn is key.” Preparing the training dataset for chatbots is not easy. For a customer service chatbot, a dataset that contains a massive amount of discussion text between customers and human-based customer support need to be collected, cleaned and labeled to make it understandable for NLP and develop the AI-enabled chatbot so as to communicate with people.

Conversational AI agents such as Alexa and Siri are built with manually annotated data. The accuracy of the ML models can be constantly improved by manually transcribed and annotated data. However, large-scale manual annotation is usually expensive and time-consuming. Thus, abundant and useful datasets are valuable assets for chatbot development.

Manual annotation gives the chatbot a competitive advantage and differentiates it from other competitors. AI and ML companies are seeking high quality datasets to train their algorithms. The choices among different labeling services can make an enormous impact on the quality of the training data, the amount of time and cost required.

Chatbots needs sufficient data to understand human intention. Traditional data providers collect text data or transcribe audio data offline from all available resources and upload the total data onto a certain software, which in turn takes unnecessary communication cost. On the other hand, data quality is often not guaranteed. Thus, obtaining task-oriented text datasets and getting them annotated in a massive amount remains a bottleneck for developers.

ByteBridge.io is one of the leading data service companies that aim to transform the data labeling industry. With a unique and user-friendly platform, Bytebridge.io enables users to complete data labeling tasks online conveniently. Moreover, the blockchain-driven data company substitutes the traditional model with an innovative and precise consensus mechanism, which dramatically improves working efficiency and accuracy.

Partnered with over 30 different language speaking communities across the globe, ByteBridge.io now provides data collection and annotation services covering languages such as English, Chinese, Spanish, Korean, Bengali, Vietnamese, Indonesian, Turkish, Arabic, Russian and more. With rich access to contractors worldwide, it ensures training data quality while expanding its service to a wider range of locations. ByteBridge’s high-quality data collecting and labeling service has empowered various industries such as healthcare, retail, robotics and self-driving, making it possible to integrate AI into such fields.

Chatbots are evolving and becoming increasingly sophisticated in an endeavor to mimic how people talk. Good chatbot applications can not only enhance customer experience, but also improve operational efficiency by reducing cost. To be successful, obtaining crucial datasets is valuable in training and optimizing the chatbot system.

Monday, September 28, 2020

Data matters for machine learning, but how to acquire the right data?

Over the last few years, there has been a burst of excitement for AI-based applications through businesses, governments, and the academic community. For example, natural language processing (NLP) and image analysis where input values are high-dimensional and high-variance are areas that deep learning techniques are highly useful. AI has shifted from algorithms that rely on programmed rules and logic to machine learning where algorithms contain a few rules and ingest training data to learn and training themselves. "The current generations of AI is what we call machine learning (ML) — in the sense that we’re not just programming computers, but we’re training and teaching them with data,” said Michael Chui, Mckinsey global institute partner in a podcast speech.


AI feeds heavily on data. Andrew Ng, former AI head of Google and Baidu, states data is the rocket fuel needed to power the ML rocket ship. Andrew also mentions companies and organizations that are taking AI seriously are working hard to acquire the right and useful data they need. Supervised learning needs more data than other model types in machine learning area. In supervised learning, algorithms learn from labeled data. Data needs to be labeled and categorized for training models. When the number of parameters and the complexity of problems increases, the need of data volumes grows exponentially. 




Data limitations: the new bottlenecks in machine learning


An Alegion survey reported that nearly 8 out of 10 enterprises currently engaged in AI and ML projects have stalled. The same study also revealed that 81% of the respondents admit the process of training AI with data is more difficult than they expected. According to a 2019 report by O’Reilly, the issue of data ranks the second-highest on obstacles in AI adoption. Gartner predicted that 85% of AI projects will deliver erroneous outcomes due to bias in data, algorithms, the teams management, etc. The data limitations in machine learning include but not limited to: 


  • Data collection. Issues like inaccurate data, insufficient representatives, biased views, loopholes, and ambiguity in data affect ML’s decisions and precision. Let along the hard access to large volumes of high quality datasets for model development, especially during Covid-19 when data has not been available for some demanding AI enterprises.  
  • Data quality. Low-quality labeled data can actually backfire twice: first during training model building and again when the model consumes the labeled data to make future decisions. For example, popular face datasets, such as the AT&T Database of Faces, contain primarily light-skinned male images, which leaves systems struggling to recognize dark-skinned and female faces. To create, validate, and maintain production for high-performing machine learning models, ML engineers  need to use trusted, reliable data.
  • Data labeling. Since most machine learning algorithms use supervised approaches, data is useless for ML applications which rely on computer visions and supervised learning approaches, unless it is labelled properly. The new bottleneck in machine learning nowadays is not only about the collection of qualified data anymore, but also about the speed and accuracy of the labeling process.


Solution


ML needs vast amounts of labeled high-quality datasets for model training to arrive at accurate predictions. Labeling of training data is progressively one of the primary concerns in the implementation of machine learning algorithms. AI companies are eager to acquire high quality labeled datasets to match their AI model requirements. Researches are showing ByteBridge.io, a data collection and labeling platform that allows users to train state-of-the-art machine learning models without manual marking of any training data themselves. ByteBridge.io's dataset includes diverse and rich data such as texts, images, audios and videos with full coverage of languages, races and regions across the globe. Its integrated data platform eliminates the intermediate processes such as labor recruitment for human in the loop, test, verification and so forth.


Automated data training platform


ByteBridge.io takes full advantage of the platform's consensus mechanism algorithm which greatly improves the data labeling efficiency and gets a large amount of accurate data labeled in a short time. The Data Verification Engine, equipped with advanced AI algorithms and the highly trained project management dashboard has automated the annotation process which fulfills the needs and standards of AI companies in a flexible and effective way.


“We believe data collection and labeling is a crucial factor in establishing successful machine learning models. We are committed to building the most effective data training platform and helping companies take full advantage of AI's capabilities,” said Brian Cheong, CEO of ByteBridge.io. “We have streamlined data collection and labeling process to relieve machine learning engineers from data preparation. The vision behind ByteBridge.io is to enable engineers to focus on their ML projects and get the value out of data.”


Compared with competitors, ByteBridge.io has customized for its automation data labeling system thanks to the natural language processing (NLP) enabled software. Its Easy-to-integrate API enables continuous feeding of high quality data into a new application system.


Both the quality and quantity of data matters for the success of AI outcome. Designed to power AI and ML industry, ByteBridge.io promises to usher in a new era for data labeling and collection, and accelerates the advent of the smart AI future .

Wednesday, September 23, 2020

Data labeling: a potential and problematic industry behind AI


Data labeling is not as mysterious as AI. To put it in a simple way, it applies multiple labeling tools to process data, the basic element of AI, so as to make data understandable for computer version and "teach" AI to identify, judge and act like human beings. If data serves like oil for AI, data labeling is to refine crude oil into gasoline.


At present, data labeling has been powering various industries such as autonomous driving, agriculture, healthcare, retail to turn them more efficient through the AI revolution.


For example, Baidu's AI data annotation center finished a labeling project for facial recognition with masks during the covid-19 period. Data labelers need to mark key points on human's eyebrows, eyes and cheekbones so that AI scanners can identify human faces and measure their temperature even when they wear masks.


According to Fractovia, data annotation tools market was valued at $650 million in 2019 and is projected to surpass $5 billion by 2026. Another report released by McKinsey in April 2017 estimates that the total market for AI applications may reach $127 billion by 2025. The expected market growth refers to the increasing demand of high-quality data labeling service for the AI industry development.


However, compared to the fancy high-tech AI, data labeling is labor-intensive in essence. Considering their great contributions to fueling AI industry, data labelers deserve more attention to improve their treatment and social status. The number of full-time data labelers in China has reached up to 100,000 and part-time labelers almost totaled 1 million. An ordinary data labeler in Baidu AI center labels 1,300 images and earns less than 25 dollars every day, which is much better than the cheaper labor force of small labeling teams in less developed counties and villages in China.


Data labeling industry has a low threshold for newcomers and it is more likely to be subcontracted by middlemen at all levels due to its huge amount, tight cost and schedule. Middlemen tend to lower the cost to seek higher profit. For a typical small label team with 20 staff, the labor cost is about $15-$ 25 per person a day. Unfortunately, such small label teams cannot guarantee the data quality and project delivery time due to various reasons such as incompetency, miscommunication, poor regulation and dysfunctional competition, which in turns wastes money and time for a couple AI companies.


"We are eager to find reliable and cost-effective data labeling teams. The accuracy and quality of the processed data determines the outcome of our machine learning training and final product," says Mr. Wang, a project manager in an AI company.


Bytebridge.io, a blockchain-driven data company, has also realized such urgent problems in the data labeling industry and committed itself to powering AI development through its automated data labeling platform.


Developers can create their data collection and labeling projects on Bytebridge's dashboard. The automated platform enables developers to customize various labeling projects and write down their specific requirements, upload raw dataset and control the labeling process for project management in a transparent and dynamic way. Developers can check the processed data, speed, estimated price and time without any limit of time and place.


In order to reduce data training time and cost when dealing with complicated tasks, Bytebridge.io has built up the consensus algorithm rules to optimize the labelling system: before task distribution, set a consensus index, such as 90%, for a task. If 90% of the labelling's results are basically the same, the system will consider they have reached a consensus. In this way, the platform can get a large amount of accurate data in a short time. If the machine learning model demands a higher accuracy of data annotation, for example, 99%, developers can use "multi-round consensus" to repeat tasks over again to improve the accuracy of final data delivery. Consensus algorithm mechanism can not only guarantee the data quality in an efficient way but also save budget through cutting out the middlemen and optimizing the work process with AI technology.


Bytebridge's easy-to-integrate API enables continuous feeding of high-quality data into machine learning system. Data can be processed 24/7 by the global partners, in-house experts through the distribution mechanism based on their education level, language capability and other parameters.


No middlemen, complete automation, access to a global, 24x7 workforce, more control over the project status, build by developers for developers, Bytebridge.io has cut off the intermediary costs and enabled AI companies to get projects done cost-effectively with its high-quality data services.


In the fierce market competition, only companies that focus on quality and service with their own complete and independent set of resources and technology could eventually survive. Bytebridge.io is one of such great companies in the data labeling industry and determined to accelerate the movement of AI revolution.

CONTACT:

contact: support@bytebridge.io
website: bytebridge.io




Monday, September 21, 2020

Can AI Address Real World Issues, such as Agriculture?

 To quote a classic paper titled “Machine Learning that Matters” by NASA computer scientist Kiri Wagstaff: “Much of current machine learning (ML) research has lost its connection to problems of import to the larger world of science and society.”

Ordinary people who are not familiar with AI and ML may consider them as fictional, but their applications are stepping out of the science community to address real life issues.

According to UN Food and Agriculture Organization (FAO), the worldwide population will increase to 10 billion by 2050. However, only 4% additional land will come under cultivation by then, let along the threat from climate change and increasing sea level. Traditional methods are not enough to handle those tough problems. AI is steadily emerging as one of the innovative approaches to agriculture. AI-powered solutions should not only enable farmers to produce more with less resources, but also improve food quality and security for consumer market.

AI’s Booming in Agriculture

The global AI in agriculture market size is expected to grow at a CAGR of 24.8% from 2020 to 2030, based on an industrial report. At this rate, the market size would rise from $852.2 million in 2019 to $8,379.5 million in 2030. At present, AI in agriculture is commonly used for precision farming, crop monitoring, soil management, agricultural robots and it has more to come.

Take precision farming as an example, the comprehensive application of AI technologies, such as machine learning, computer vision, and predictive analytics tools. It comprises farm-related data collection and analysis in order to help farmers make accurate decisions and increase the productivity of farmlands.

Dr. Yiannis Ampatzidis, an Assistant Professor of precision agriculture and machine learning at University of Florida (UF), mentions ML applications are already at work in agriculture including imaging, robotics, and big data analysis.

“In precision agriculture, AI is used for detecting plant diseases and pests, plant nutrition, and water management,” Ampatzidis says. He and his team at UF have developed the AgroView cloud-based technology that uses AI algorithms to process, analyze, and visualize data being collected from aerial- and ground-based platforms.

“The amount of these data is huge, and it’s very difficult for a human brain to process and analyze them,” he says. “AI algorithms can detect patterns in these data that can help growers make ‘smart’ decisions. For example, Agroview can detect and count citrus trees, estimate tree height and measure plant nutrient levels.” Ampatzidis believes AI holds great potential in the analytics of agricultural big data. AI is the key to unlocking the power of the massive amounts of data being generated from farms and agricultural research.

Image for post

Stepping Stone: Labeled Data

A weeding robot makes real-time decisions to identify crops and weeds through close cooperation of built-in cameras, computer vision, machine learning and robotics technology. As the machine drives through the field, high-resolution cameras collect images at a high frame rate. A neural network analyzes each frame and produces a pixel-accurate map of where the crops and weeds are. Once the plants are identified, each weed and crop is mapped to field locations, and the robot sprays only the weeds. The entire process occurs just in milliseconds.

It is challenging to train the neural network models as many weeds look similar to crops. Traditionally, it is agronomists and weed scientists who label millions of field images. However, data labeling is arduous and time-intensive. ML models need to be fed with labeled data in high quality and quantity to get trained or “smarter”automatically and constantly.

The high cost of gathering labeled data restrains AI’s application in agriculture. Aware of such a dilemma, some data service companies, such as ByteBridge.io, provide premium quality data collection and labeling service to empower AI applications to practical industries such as agriculture.

ByteBridge.io has made a breakthrough on its automated data collection and labeling platform where agricultural researchers can create the data annotation and collection projects by themselves. Since most of the data processing is completed on the platform, researchers can keep track of the project progress, speed, quality issues, and even cost required in real-time, thereby improving work efficiency and risk management in a transparent and dynamic way. They can upload raw data and download processed results through ByteBridge’s dashboard. Not only that, ByteBridge.io has utilized blockchain technology to make sure the labeled data service is cost-effective and productive.

Data is powerful, but labeling data makes it useful. Labeled data can be used to train machine learning models effectively. Furthermore, automated, AI-driven labeling platforms, such as ByteBridge.io can help to speed up the data labeling process and accelerate the development of AI industry which aims to address real world issues such as agriculture.

AI Creates More Jobs, but is Conditional without ByteBridge.io

 The Rise of Machine Learning

Machine learning is actively presenting job opportunities across the world. According to the World Economic Forum (WEF), Artificial Intelligence (AI) can create 58 million net increase of new employment by 2022. Besides tech giants such as Apple, Facebook, Google and Amazon are hiring Machine Learning engineers, other industries are increasingly leveraging this emerging technology at advanced levels. The huge demand for Machine Learning skills links to all kinds of fields, including but not limited to Financial Analysis, Smart Farming, Health Services, Online Education etc.,

COVID 19’s AI Boom

During the current COVID-19 pandemic crisis, the appeal of remote work is surging globally. The massive needs for intelligent machines are growing in our workforce driven by the outbreak of telecommuting. When facing the huge dataset particularly, AI allows smooth cybersecurity checks, optimizes the pulling data process, and fosters telework with significant growth in efficiency and time-saving.

Apart from benefiting remote workforce, this powerful emerging technology has been deployed to fight against the virus. Microsoft AI can be used for early detection for COVID-19 and allocating limited resources such as medical supplies and hospital spaces with smart decision making effectively.

Image for post

The future of intelligence machine is promising. According to Acumen Research and Consulting, the expected global investment of Machine-Learning-related products and services can reach up to US$76.8 billion by 2026. The application of AI technology will soon break new ground as the business industry keeps fueling the world market.

Skilled-biased Opportunity

While AI is agreed to have great potential, the application of this emerging digital technology marks a major shift in quality, location and requirements for the new roles. Machine learning, as a sub-technique under AI, automating analytical model building, is now increasingly adopted across industries. But not everyone stands to benefit automatically.

Based on Uria-Recio’s TEDxIMU Talk, AI will continuously push human professionals up the skillset ladder into cognitive human skills. Process-oriented employment i.e., jobs with repetitive activities such as machine operators is now declining. Over the next decades, more than 80% of them will be done by intelligence machines.

Image for post

In the meantime, a large number of job opportunity created involves cross-functional reasoning skills. When routine jobs are replaced by AI systems, businesses start looking for educated workers for new roles. Accordingly, the “human-machine collaboration” prefers applicants with advanced cognitive skills.

Workforce Transition

Given the increasing demand for creative and reasoning labors, job seekers should now upgrade their skills to adapt to new opportunities. ByteBridge.ioa tech startup providing data training solutions, facilitates this workforce transition for workers.

ByteBridge.io simplifies the advanced cognitive skills required for well-trained labelers. By interpreting complicated annotation rules, organizing the model into multiple stages, and dividing a big task into small pieces, ByteBridge.io lowers the needs for well-educated workers. The design of the ByteBridge’s platform is very clear and easy-to-use.

Moreover, unlike the traditional machine learning companies hiring trained employees or managed teams for data labelling, ByteBridge.io incorporates blockchain technology into the data training solutions. ByteBridge’s algorithm borrows the idea of a consensus mechanism from Cryptocurrency, distributing tasks to all users on the data platform.

ByteBridge.io replaces technical quality check of trained labelers by a general agreement system. The platform assigns several people do the same work, and the correct answer is the one that comes back from the majority of labelers. A single task could be completed multiple times by different users. As a result, this process involves the contributions from hundreds of thousands of participants who work on verification and authentication of data labelling.

For the business with data training needs, ByteBridge.io provides options for accuracy levels. Benchmark measures the consistency among users. A score of 75 indicates 75% of users agree the label is correct. So higher benchmark scores can improve the accuracy of the data labeling task, implying the better quality of the data. This greatly improves the distribution efficiency through a consensus mechanism. And the customer can get a large amount of accurate data in a very short time.

Image for post

Addressing Worldwide Technological Unemployment

ByteBridge.io, not only provides work locally, is now tackling the skill-biased Machine Learning Revolution for all individuals around the world. With more than 100,000 registered users across Asia, North America, EU, and Africa, ByteBridge.io is offering tens of millions of online job opportunities based on big data and recommendation services all around the world.

Importantly, ByteBridge.io provides mutual benefit for business customers as well. This worldwide data factory “hires” all kinds of employees. Based on the education level, the language used, and competency-based assessment scores, the workforce can cover a wide range of needs from customers.

ByteBridge.io, as an intermediate data solution provider, bridges the massive new advanced roles with less-skilled works, bringing working opportunities around the globe.

How Data Labeling Contributes to the War against Covid-19

 Healthcare industry is under enormous pressure, especially in the midst of Covid-19 period. The unexpected global pandemic has presented overwhelming challenges on human beings. Scientist, medical experts, doctors and nurses across the globe have undertaken their responsibility to fight against the disease. However, with a shortage of healthcare labor force, we still cannot deny how limited the current medical capacity is.

On December 30 of 2019, Healthmap, an artificial intelligence (AI) data-driven system that scans data sources for disease outbreak signs, detected an unusual activity about a new type of pneumonia burst in China. One day later, BlueDot, an AI outbreak risk software, raised a similar alarm after scanning thousands of Chinese news reports through its machine learning algorithms.

There’s no doubt that Covid-19 has been a catalyst for strengthening the increasing connection and cooperation between AI and healthcare industry.

Medical image diagnosis for future healthcare

AI and ML can be powerful methods for everything in healthcare: medicine research, diagnosis, disease prevention and control, patient treatment, even administrative and personnel management. AI/ML-enabled systems improve their capabilities and effectiveness by automating the most repetitive and homogenous activities. It is currently moving out of the labs and into real-world applications in the health sector.

When it comes to medical images, ML’s applications can cover the entire cycle from image creation and reconstruction to diagnosis and outcome prediction. AI-backed Machines use the computer vision to detect patterns that human eye can’t catch and correlate them with similar medical image data to identify possible diseases and prepare reports after analysis. X-ray, computed tomography (CT) scan, magnetic resonance imaging (MRI) and other image-based test reports can be easily screened to predict various illness in an automated, accurate, and fast way.

Some healthcare companies are now using ML technology to detect organ anomalies, such as identifying tumors from an MRI scan of the brain, along with millions of labeled medical images to show the affected area and to train ML algorithms to detect such diseases. For example, AI semantic segmentation can be used in liver and brain diagnosis; polygon annotation can be used in dentistry; bounding box in kidney stone; annotation detection in cancer cells, and etc. Medical image annotations provide results of greater accuracy in the early detection, diagnostics and treatment of disease as well as understanding the normal. The medical imaging diagnosis is seen as a powerful method for future applications in the health sector.

Bottlenecks of medical image labeling

High-quality training data is the key to building ML models and help to improve medical image-based diagnosis. However, a great challenge in this field is the lack of high quality data and annotation. Specifically, medical imaging annotations have to be performed by clinical specialists, which is costly and time-consuming.


As DJ Patil and Hilary Mason write in Data Driven, “Cleaning the data is often the most taxing part of data science, and is frequently 80% of the work.” The lack of precise and high quality data presents an overwhelming challenge for machine learning industry, limiting their ability to provide the “right data” to answer specific questions. Currently, most medical research organizations have limited access to data samples from a certain geographic areas.

The hardest part of building AI products is not the AI or algorithms but data preparation and labeling. For example, retinal images are used to develop automated diagnostic systems for conditions, such as diabetic retinopathy, age-related macular degeneration. In order to do that millions of medical images need to be labeled by various conditions structurally. This is laborious as it requires identification of very small structures and usually takes hours for experts to annotate them carefully.

Turning points

Aware of those challenges, ByteBridge.io moves a big step forward through its automated data collection and labeling platform. It allows researchers to have access to high-quality labeled datasets related to health care and public health.

ByteBridge’s innovative data training platform empowers healthcare researchers and ML medical companies to use data cost-effectively and improve healthcare outcomes. From data collection, to data labeling, to machine learning applications, ByteBridge.io provides professional data annotation service on medical images with the highest quality and maximum accuracy.

Different with traditional data labeling companies, in ByteBridge’s dashboard, researchers can create the data project by themselves, upload raw data, download processed results as well as check ongoing labeling progress simultaneously on a pay-per-task model with clear estimated time and more control over the project status.

Compared to existing Western companies for data annotation outsourcing, Bytebridge.io charges 90% lower. It offers 50% cheaper price than its competitors in China and India. More than that, ByteBridge’s data processing speed is more than 10 times faster than the current data annotation company.

“I believe that we can achieve great innovation in this field based on our product development capabilities and underlying blockchain-based technology. ByteBridge.io is aimed at accelerating the development of ML industry and seamlessly transforming it into other essential areas such as healthcare,” said Brian Cheong, CEO of ByteBridge.io.

Imagine one day, patients can simply go through a fast AI scan as diagnosis; smart wearable devices, such as Apple Watch, can analyze physical data, note abnormality and generate an alarm before you are about to have a heart attack or a stroke; medical detection and prediction can be fully automated and supervised with little human intervention. Such scenes can definitely be realized in the coming future, thanks to ML and AI technology.

Machine Learning has achieved unprecedented success in computer vision and other industries so far. And now it is drastically revolutionizing healthcare area with indispensable support from automated data labeling service.

No Bias Labeled Data — the New Bottleneck in Machine Learning

  The Performance of an AI System Depends More on the Training Data Than the Code Over the last few years, there has been a burst of excitem...