Showing posts with label data labeling. Show all posts
Showing posts with label data labeling. Show all posts

Monday, February 15, 2021

No Bias Labeled Data — the New Bottleneck in Machine Learning

 

The Performance of an AI System Depends More on the Training Data Than the Code


Over the last few years, there has been a burst of excitement for AI-based applications through businesses, governments, and the academic community. For example, computer vision and natural language processing (NLP) where output values are high-dimensional and high-variance. In these areas, machine learning techniques are highly helpful.

Indeed, AI depends more on the training data than the code. “The current generations of AI are what we call machine learning (ML) — in the sense that we’re not just programming computers, but we’re training and teaching them with data,” said Michael Chui, Mckinsey global institute partner in a podcast speech.

AI feeds heavily on data. Andrew Ng, former AI head of Google and Baidu, states data is the rocket fuel needed to power the ML rocket ship. Andrew also mentions that companies and organizations which are taking AI seriously are eager to acquire the correct and useful data. Moreover, as the number of parameters and the complexity of problems increases, the need for high-quality data at scale grows exponentially.

Image for post

Data Ranks the Second-Highest Obstacle in AI Adoption

An Alegion survey reports that nearly 8 out of 10 enterprises currently engaged in AI and ML projects have stalled. The research also reveals that 81% of the respondents admit the process of training AI with data is more difficult than they expected before.

It is not a unique case. According to a 2019 report by O’Reilly, the issue of data ranks the second-highest obstacle in AI adoption. Gartner predicted that 85% of AI projects will deliver erroneous outcomes due to bias in labeled data, algorithms, the R&D team’s management, etc.

The data limitations in machine learning include but not limited to:

Data Collection: Issues such as inaccurate data, insufficient representatives, biased views, loopholes, and data ambiguity affect ML’s decision and precision. Especially during Covid-19, certain data has not been available for some AI enterprises.

Data Quality: Since most machine learning algorithms use supervised approaches, ML engineers need consistent, reliable data in order to create, validate, and maintain production for high-performing machine learning models. Low-quality labeled data can actually backfire twice: during the training model building process and future decision-making.

Efficiency: In the process of machine learning project development, 25% of the time is used for data annotation. Only 5% of the time is spent on training algorithms. The reasons for spending a lot of time on data labeling are as follows:

  • The algorithm engineer needs to go through repeated tests to determine which label data is more suitable for the training algorithm.
  • Training a model needs tens of thousands or even millions of training data, which takes a lot of time. For example, an in-house team composed of 10 labelers and 3 QA inspectors can complete around 10,000 automatic driving lane image labeling in 8 days.

How to avoid sample bias while obtaining large scale data?

Solution

Accuracy

Dealing with complex tasks, the task is automatically transformed into tiny component to make the quality as high as possible as well as maintain consistency.

All work results are completely screened and inspected by the machine and the human workforce.

Efficiency

The real-time QA and QC are integrated into the labeling workflow.

ByteBridge takes full advantage of the platform’s consensus mechanism which greatly improves the data labeling efficiency and gets a large amount of accurate data labeled in a short time.

Consensus — Assign the same task to several workers, and the correct answer is the one that comes back from the majority output.

Ease of use

The easy-to-integrate API enables the continuous feeding of high-quality data into a new application system.

Image for post

End

“We have streamlined data collection and labeling process to relieve machine learning engineers from data preparation. The vision behind ByteBridge is to enable engineers to focus on their ML projects and get the value out of data,” said Brian Cheong, CEO of ByteBridge.

Both the quality and quantity of data matters for the success of AI outcome. Designed to power AI and ML industry, ByteBridge promises to usher in a new era for data labeling and collection, and accelerates the advent of the smart AI future.

Sunday, December 13, 2020

Data Annotation Market sees tremendous growth in the forthcoming future


The global data annotation market was valued at US$ 695.5 million in 2019 and is projected to reach US$ 6.45 billion by 2027, according to ResearchAndMarkets’s report. Expected to grow at a CAGR of 32.54% from 2020 to 2027, the booming data annotation market is witnessing tremendous growth in the forthcoming future.

This is not a surprising trend. The rapid growth of data labeling industry can boil down to the rising integration of machine learning into various industries.

                   (source: statista)

Unlabeled raw data exits around us everywhere, for example, emails, document, photos, presentation videos and speech recordings. The majority of machine learning algorithms today need data labeled so as to learn and train themselves. Data labeling is the process in which annotators manually tag various types of data such as text, video, images, audio through computers or smartphones.Once finished, the manually labeled dataset would be packaged and fed into algorithms for machine learning's model training.


However, data annotation itself is a laborious and time-consuming process. There are two choices to do data labeling projects. One way is to do it in-house, which means the company builds or buys labeling tools and hires an in-house labeling team. The other is to outsource the work to renowned data labeling companies like Appen and LionBridge, who can handle product scale and guarantee quality.

The booming of data annotation market has also stimulated multiple novel players to secure a niche position in the competition. For example, Playment, a data labeling platform for AI, has teamed up with Ouster, a leading LiDAR sensors provider, for annotation and calibration of 3D imagery captured by its sensors in 2018.

ByteBridge.io, a data labeling platform, has innovated the industry through its robust tools for real-time workflow management. On ByteBridge’s platform, developers can define and start the data labeling projects and get the results back instantly. It not only improves the efficiency dramatically but also allows clients to customize their task based on their needs. As a fully-managed platform, it enables developers to manage and monitor the overall data labeling process and provides API for data transfer. The platform also allows users to audit the data quality.

“High-quality data is the fuel that keeps the AI engine running smoothly and the machine learning community can’t get enough of it. The better the annotation, the more accurate the algorithm’s results become. Properly annotating data paves the way for efficient machine learning,” said Brian Cheong, founder and CEO of ByteBridge.io.

Designed to empower AI and ML industry, ByteBridge.io promises to usher in a new era for data labeling and accelerates the advent of the smart AI future.

Sunday, November 29, 2020

By Typing Captcha, you are Actually Helping AI’s Training

 Living in the Internet age, how occasionally have you come across the tricky CAPTCHA tests while entering a password or filling a form to prove that you’re fully human? For example, typing the letters and numbers of a warped image, rotating objects to certain angles or moving puzzle pieces into position.

Image for post

What is CAPTCHA and how does it work?

CAPTCHA is also known as Completely Automated Public Turing Test to filter out the overwhelming armies of spambots. Researchers at Carnegie Mellon University developed CAPTCHA in the early 2000s. Initially, the program displayed some garbled, warped, or distorted text that a computer could not read, but a human can. Users were requested to type the text in a box, and have access to the websites.

The program has achieved wild success. CAPTCHA has grown into a ubiquitous part of the internet user experience. Websites need CAPTCHAs to prevent the “bots” of spammers and other computer underworld types. “Anybody can write a program to sign up for millions of accounts, and the idea was to prevent that,” said Luis von Ahn, a pioneer of early CAPTCHA team and founder of Google’s reCAPTCHA, one of the biggest CAPTCHA services. The little puzzles work because computers are not as good as humans at reading distorted text. Google says that people are solving 200 million CAPTCHAs a day.

Over the past years, Google’s reCAPTCHA button saying “I’m not a robot” was followed more complicated scenarios, such as selecting all the traffic lights, crosswalks, and buses in an image grid. Soon the images have turned increasingly obscured to stay ahead of improving optical character recognition programs in the arms race with bot makers and spammers.

Image for post

CAPTCHA’s potential influence on AI

While used mostly for security reasons, CAPTCHAs also serve as a benchmark task for artificial intelligence technologies. According to CAPTCHA: using hard AI problems for security by Ahn, Blum and Langford, “any program that has high success over a captcha can be used to solve a hard, unsolved Artificial Intelligence (AI) problem. CAPTCHAs have many applications.”

From 2011, reCAPTCHA has digitized the entire Google Books archive and 13million articles from New York Times catalog, dating back to 1851. After finishing the task, it started to select snippets of photos from Google Street View in 2012. It made users recognize door numbers, other signs and symbols. From 2014, the system started training its Artificial Intelligence (AI) engines.

The warped characters users identify and fill in for reCaptcha are for a bigger purpose, as they have unknowingly transcribed texts for Google. It shows the same content to several users across the world and automatically verifies if a word has been transcribed correctly by comparing the results. Clicks on the blurry images can also help identify objects that computing systems fail to manage, and in this process Internet users are actually sorting and clarifying images to train Google’s AI engine.

Through such mechanisms, Google has been able to help users back in recognizing images, giving better Google search results, and Google Maps result.

ByteBridge: an automated data annotation platform to empower AI

Turing Award winner Yann LeCun once expressed that developers need labeled data to train AI models and more quality-labeled data brings more accurate AI systems from the perspective of business and technology.

In the face of AI blue ocean, a large number of data providers have poured in. ByteBridge.io has made a breakthrough with its automated data labeling platform in order to empower data scientists and AI companies in an effective way.

With a completely automated data service system, ByteBridge.io has developed a mature and transparent workflow. In ByteBridge’s dashboard, developers can create the project by themselves, check the ongoing process simultaneously on a pay-per-task model with clear estimated time and price.

ByteBridge.io thinks highly of application scenarios, such as autonomous driving, retail, agriculture and smart households. It is dedicated to providing the best data solutions for AI development and unleashing the real power of data. “We focus on addressing practical issues in different application scenarios for AI development through one-stop, automated data services. Data labeling industry should take technology-driven tool as core competitiveness,” said Brian Cheong, CEO and founder ByteBridge.io.

As a rare and precious social resource, data needs to be collected, cleaned and labeled before it grows into valuable goods. ByteBridge.io has realized the magic power of data and aimed at providing the best data labeling service to accelerate the development of AI.

Thursday, November 12, 2020

How Data Training Accelerates the Implementation of AI into Medical Industry

COVID-19 has undoubtedly accelerated the application of AI in healthcare, such as virus surveillance, diagnosis and patient risk assessments. AI-powered drones, robots and digital assistants are improving healthcare industry with better accuracy and efficiency. These have enabled doctors to provide more effective and personalized treatment with real-time data monitoring and analysis.

Garbage in, garbage out

As one of the most popular and promising subsets of AI, machine learning gives algorithms the ability to "learn" from training data so as to identify patterns and make decisions with little human intervention. However, as the saying goes, "garbage in, garbage out," making sure correct data fed into ML algorithms is not an easy work.

According to a report "the Digital Universe Driving Data Growth in Healthcare," published by EMC with research and analysis from IDC, hospitals are producing 50 petabytes of data per year. Almost 90% of this data consists of medical imaging i.e. digital images from scans like MRIs or CTs. However, more than 97% of this data goes unanalyzed or unused.

Unstructured raw data needs to be labelled for computer visions so that when the data is fed into an algorithm to train a ML model, the algorithm can recognize and learn from it. As DJ Patil and Hilary Mason write in Data Driven, "cleaning and labeling the data is often the most taxing part of data science, and is frequently 80% of the work."

Many enterprises wish to apply AI to their business practices. They have a glut of data, such as vast amounts of images from cameras and document texts. The challenge, however, is how to process and label those data in order to make it useful and productive. Many organizations are struggling to get AI and ML projects into production due to data labeling limitations and real-time validation deficiency.

A robust data labeling platform with real-time monitoring and high efficiency

An entire ecosystem of tech startups has emerged to contribute to the data labelling process. Among them, ByteBridge.io, a data labeling platform, solves the data labeling challenge with robust tools for real-time workflow management and automating the data labeling operations. Aiming at increasing flexibility, quality and efficiency for the data labeling industry, it specializes in high volumes, high variance, complex data, and provides full-stack solution for AI companies.

"On the dashboard, users can seamlessly manage all projects with powerful tools in real-time to meet their unique requirements. The automated platform ensures data quality, reduces the challenge of workforce management and lowers the costs with transparent standardized pricings," said Brian Cheong, CEO and founder of ByteBridge.io.

The quality of labeled dataset determines the success of AI projects, making it vital to look for a reliable platform that can help developers to overcome the data labeling challenges. The demands of data labelling will continue to be on the rise with the development of AI programs.

Human beings benefit from the implementation of AI systems into medical industry: from diagnosis to treatment, from drug experiment to generalization. These are all exciting areas for AI developers. But before that, providing high-quality training data lays the cornerstone of making those progress.

Tuesday, November 3, 2020

How an Automated Data Labeling Platform Accelerates AI industry’s Development During COVID-19

 The impact of AI on COVID-19 has been widely reported across the globe, yet the impact of COVID-19 on AI has not received much attention. As a direct result of Covid-19, AI enterprises are enhancing their strategies for digital transformation and business automation.

Data is the core of any AI/ML development. The quality and depth of data determines the level of AI applications. Considering that the better the data that goes into building the ML training model, the better the output. ML teams need to go through proper data preparation such as data collection, cleansing and labeling.

Data labeling is a simple but difficult task

When it comes to data labeling, the essential step to process raw data (images, text files, videos, etc.) for computer vision so that machine learning models can learn from the labeled dataset, some data labeling companies were forced to move to a work-from-home model due to the pandemic, which has posed challenges in terms of communication, data quality and inspection. For example, Google Cloud has officially announced that its data labeling services are limited or unavailable until further notice. Users can only request data labeling tasks through email but cannot start new data labeling tasks through the Cloud Console, Google Cloud SDK, or the API.

Insiders say that data labeling is a simple but difficult task. On one side, as soon as the labeling standard is set, data labelers just need to follow the rules directly with patience and profession. On the other side, however, data labeling is meant to pursue high quality for ML which demands accuracy, efficiency and high cost of labor and time regarding the massive amount of data to be labeled.

A majority of AI organizations said the process of training AI with data has been more difficult than expected, according to a report released from Alegion. Lack of data and data quality issues become their main obstacles to AI application.

An automated data labeling platform aims to transform the industry

To deal with such issues, Bytebridge.io has launched its automated data labeling platform this year. It aims to provide high quality data with efficiency through a real-time workflow management for AI developers so as to free them from the pressure of data preparation.

An autonomous driving company in Korea needs to label roadblocks and 2D bounding boxes for cars. Considering data security, they have built in-house labeling team. However, they ran into a couple of unexpected problems due to improper labeling tools and low efficiency. Upon trying Bytebridge, their project managers are able to improve working efficiency through Bytebridge’s online real-time monitoring function. The number of monthly labeled images has increased from 600k to 750k and they are able to save 60% of budget.

On Bytebridge’s dashboard, developers can upload raw data and create the labeling projects by themselves. They can check the labeling status and quality anytime, even the estimated price and time required. Such an automated and online platform greatly ensures labeling efficiency and quality. Bytebridge’s easy-to-integrate API enables continuous feeding of high-quality data into machine learning systems. Data can be processed 24/7 by the global contractors, in-house experts and the AI technology.

“We want to create an automated data labeling platform that helps AI/ML companies to accelerate their data project and generate high-quality work,” said Brian Cheong, CEO and founder of Bytebridge.io.

Monday, September 28, 2020

Data matters for machine learning, but how to acquire the right data?

Over the last few years, there has been a burst of excitement for AI-based applications through businesses, governments, and the academic community. For example, natural language processing (NLP) and image analysis where input values are high-dimensional and high-variance are areas that deep learning techniques are highly useful. AI has shifted from algorithms that rely on programmed rules and logic to machine learning where algorithms contain a few rules and ingest training data to learn and training themselves. "The current generations of AI is what we call machine learning (ML) — in the sense that we’re not just programming computers, but we’re training and teaching them with data,” said Michael Chui, Mckinsey global institute partner in a podcast speech.


AI feeds heavily on data. Andrew Ng, former AI head of Google and Baidu, states data is the rocket fuel needed to power the ML rocket ship. Andrew also mentions companies and organizations that are taking AI seriously are working hard to acquire the right and useful data they need. Supervised learning needs more data than other model types in machine learning area. In supervised learning, algorithms learn from labeled data. Data needs to be labeled and categorized for training models. When the number of parameters and the complexity of problems increases, the need of data volumes grows exponentially. 




Data limitations: the new bottlenecks in machine learning


An Alegion survey reported that nearly 8 out of 10 enterprises currently engaged in AI and ML projects have stalled. The same study also revealed that 81% of the respondents admit the process of training AI with data is more difficult than they expected. According to a 2019 report by O’Reilly, the issue of data ranks the second-highest on obstacles in AI adoption. Gartner predicted that 85% of AI projects will deliver erroneous outcomes due to bias in data, algorithms, the teams management, etc. The data limitations in machine learning include but not limited to: 


  • Data collection. Issues like inaccurate data, insufficient representatives, biased views, loopholes, and ambiguity in data affect ML’s decisions and precision. Let along the hard access to large volumes of high quality datasets for model development, especially during Covid-19 when data has not been available for some demanding AI enterprises.  
  • Data quality. Low-quality labeled data can actually backfire twice: first during training model building and again when the model consumes the labeled data to make future decisions. For example, popular face datasets, such as the AT&T Database of Faces, contain primarily light-skinned male images, which leaves systems struggling to recognize dark-skinned and female faces. To create, validate, and maintain production for high-performing machine learning models, ML engineers  need to use trusted, reliable data.
  • Data labeling. Since most machine learning algorithms use supervised approaches, data is useless for ML applications which rely on computer visions and supervised learning approaches, unless it is labelled properly. The new bottleneck in machine learning nowadays is not only about the collection of qualified data anymore, but also about the speed and accuracy of the labeling process.


Solution


ML needs vast amounts of labeled high-quality datasets for model training to arrive at accurate predictions. Labeling of training data is progressively one of the primary concerns in the implementation of machine learning algorithms. AI companies are eager to acquire high quality labeled datasets to match their AI model requirements. Researches are showing ByteBridge.io, a data collection and labeling platform that allows users to train state-of-the-art machine learning models without manual marking of any training data themselves. ByteBridge.io's dataset includes diverse and rich data such as texts, images, audios and videos with full coverage of languages, races and regions across the globe. Its integrated data platform eliminates the intermediate processes such as labor recruitment for human in the loop, test, verification and so forth.


Automated data training platform


ByteBridge.io takes full advantage of the platform's consensus mechanism algorithm which greatly improves the data labeling efficiency and gets a large amount of accurate data labeled in a short time. The Data Verification Engine, equipped with advanced AI algorithms and the highly trained project management dashboard has automated the annotation process which fulfills the needs and standards of AI companies in a flexible and effective way.


“We believe data collection and labeling is a crucial factor in establishing successful machine learning models. We are committed to building the most effective data training platform and helping companies take full advantage of AI's capabilities,” said Brian Cheong, CEO of ByteBridge.io. “We have streamlined data collection and labeling process to relieve machine learning engineers from data preparation. The vision behind ByteBridge.io is to enable engineers to focus on their ML projects and get the value out of data.”


Compared with competitors, ByteBridge.io has customized for its automation data labeling system thanks to the natural language processing (NLP) enabled software. Its Easy-to-integrate API enables continuous feeding of high quality data into a new application system.


Both the quality and quantity of data matters for the success of AI outcome. Designed to power AI and ML industry, ByteBridge.io promises to usher in a new era for data labeling and collection, and accelerates the advent of the smart AI future .

Monday, September 21, 2020

Can AI Address Real World Issues, such as Agriculture?

 To quote a classic paper titled “Machine Learning that Matters” by NASA computer scientist Kiri Wagstaff: “Much of current machine learning (ML) research has lost its connection to problems of import to the larger world of science and society.”

Ordinary people who are not familiar with AI and ML may consider them as fictional, but their applications are stepping out of the science community to address real life issues.

According to UN Food and Agriculture Organization (FAO), the worldwide population will increase to 10 billion by 2050. However, only 4% additional land will come under cultivation by then, let along the threat from climate change and increasing sea level. Traditional methods are not enough to handle those tough problems. AI is steadily emerging as one of the innovative approaches to agriculture. AI-powered solutions should not only enable farmers to produce more with less resources, but also improve food quality and security for consumer market.

AI’s Booming in Agriculture

The global AI in agriculture market size is expected to grow at a CAGR of 24.8% from 2020 to 2030, based on an industrial report. At this rate, the market size would rise from $852.2 million in 2019 to $8,379.5 million in 2030. At present, AI in agriculture is commonly used for precision farming, crop monitoring, soil management, agricultural robots and it has more to come.

Take precision farming as an example, the comprehensive application of AI technologies, such as machine learning, computer vision, and predictive analytics tools. It comprises farm-related data collection and analysis in order to help farmers make accurate decisions and increase the productivity of farmlands.

Dr. Yiannis Ampatzidis, an Assistant Professor of precision agriculture and machine learning at University of Florida (UF), mentions ML applications are already at work in agriculture including imaging, robotics, and big data analysis.

“In precision agriculture, AI is used for detecting plant diseases and pests, plant nutrition, and water management,” Ampatzidis says. He and his team at UF have developed the AgroView cloud-based technology that uses AI algorithms to process, analyze, and visualize data being collected from aerial- and ground-based platforms.

“The amount of these data is huge, and it’s very difficult for a human brain to process and analyze them,” he says. “AI algorithms can detect patterns in these data that can help growers make ‘smart’ decisions. For example, Agroview can detect and count citrus trees, estimate tree height and measure plant nutrient levels.” Ampatzidis believes AI holds great potential in the analytics of agricultural big data. AI is the key to unlocking the power of the massive amounts of data being generated from farms and agricultural research.

Image for post

Stepping Stone: Labeled Data

A weeding robot makes real-time decisions to identify crops and weeds through close cooperation of built-in cameras, computer vision, machine learning and robotics technology. As the machine drives through the field, high-resolution cameras collect images at a high frame rate. A neural network analyzes each frame and produces a pixel-accurate map of where the crops and weeds are. Once the plants are identified, each weed and crop is mapped to field locations, and the robot sprays only the weeds. The entire process occurs just in milliseconds.

It is challenging to train the neural network models as many weeds look similar to crops. Traditionally, it is agronomists and weed scientists who label millions of field images. However, data labeling is arduous and time-intensive. ML models need to be fed with labeled data in high quality and quantity to get trained or “smarter”automatically and constantly.

The high cost of gathering labeled data restrains AI’s application in agriculture. Aware of such a dilemma, some data service companies, such as ByteBridge.io, provide premium quality data collection and labeling service to empower AI applications to practical industries such as agriculture.

ByteBridge.io has made a breakthrough on its automated data collection and labeling platform where agricultural researchers can create the data annotation and collection projects by themselves. Since most of the data processing is completed on the platform, researchers can keep track of the project progress, speed, quality issues, and even cost required in real-time, thereby improving work efficiency and risk management in a transparent and dynamic way. They can upload raw data and download processed results through ByteBridge’s dashboard. Not only that, ByteBridge.io has utilized blockchain technology to make sure the labeled data service is cost-effective and productive.

Data is powerful, but labeling data makes it useful. Labeled data can be used to train machine learning models effectively. Furthermore, automated, AI-driven labeling platforms, such as ByteBridge.io can help to speed up the data labeling process and accelerate the development of AI industry which aims to address real world issues such as agriculture.

An International Online Factory Brings High-tech Innovation into Pig Farming Industry

 Woodrow Wilson Bledsoe, known as the father of facial recognition, developed a system that could read faces by using a 10-inch-square tablet with vertical and horizontal coordinates in the 1960s. For the past 60 years, countries across the world have substantially increased the investment in facial recognition systems. Today, programmers extend facial intelligence to the livestock farming industry, using facial recognition technology to assess the emotional well-being of pigs.

Pig farms significantly benefited from this modern technique. With machine learning technology, each piglets’ health condition can be accurately controlled since birth by understanding it’s facial expressions. The system also allows health improvement for pigs while monitoring their daily and total feed consumption individually.

Alibaba (China’s e-commerce giant) has recently set on automatic identification of pig faces. It can also be used for diagnosing its breeding status and detecting diseases. Last year Scotland’s Rural College (SRUC) implemented the convolutional neural networks to analyze pig emotion and intention. Increasing numbers of farms around the world are now using high-tech equipment to record pig’s actions with a precision that can exceed manual performance. Note that, however, smart farming relies on a massive database with intensive support of machine learning techniques.

Data support is the primary condition for smart farming

The key to smart farming is the big data support team behind it. Recently, a Korean pig farm is looking for a digital system to gain information on pigs’ productivity, behavior, and welfare. They hired Bytebridge.io’s team to improve farming efficiency.

“The smart system should be able to reflect every pig’s health condition from tracking its feeding patterns and behaviors. We were looking for a data annotation company to process the data structurally according to different machine languages. The tricky part is, we set a very strict time limit for the team. We need the labeling to be done as soon as possible” said the owner of the pig farm.

Image for post

Surprisingly, Bytebridge.io perfectly resolved this problem and improved our system. After handing out millions of images, we received their package even sooner than we expected. We got our data labeled within 3 working days.”

Traditional data labeling companies, after receiving similar projects, would employ an agent to call up the tagging team and train them at least for a day based on the customer’s requirements. The communication cost can be significant in this process. Bytebridge.ioon the other hand, hugely cut the time and cost. Their output accuracy rate of labeling reaches 99.5% (i.e. over 995/1000 pictures can be correctly labeled). Bytebridge.io’s speed of data processing is one-tenth of that of traditional data labeling companies.

Online data processing factory

Bytebridge.io owns millions of registered users all over the world with daily active users reached up to 100,000. According to the needs of customers, the platform divides and distributes tasks to global users and builds an online data processing factory. All users are grouped into different levels based on users’ education, language, and task capability coefficient. Customers can optimize their cost by picking one of those options.

Consensus algorithm

To cut the communication and training cost when dealing with complex task flow, Bytebridge.io employs consensus decision-making to optimize the labeling system. When dealing with complex tasks, several proposed protocols reduces the task difficulty by splitting the task flow and then set a consensus index to unify the results through algorithm rules.

In the pig farm project, the final delivered data is presented as the structured data, including the number, position, and posture of pigs are displayed in a picture. Therefore, the task flow can be divided into sub-work, i.e. counting pigs, frame pigs, and posture interpretation.

Before task distribution, set a consensus index, such as 90%, for a task. If 90% of the people’s answer is basically the same, the system will judge that they have reached a consensus. If customers require the highest accuracy of data annotation, they can use “multi-round consensus” to repeat tasks over again to improve the accuracy of final data delivery.

Bytebridge.ioan innovative data training platform, substitutes the traditional model by an innovative and precise consensus algorithm mechanism. It owns a powerful online data processing platform operating efficiently around the globe. Connecting the international fragmented labor force for data labeling/collection, promoting the labor structure transformation, Bytebridge.io develops a new model for the data labeling industry.

Meanwhile, data processing efficiency can be improved due to ensured data security as a result of API technology. By building an online data processing factory and replacing manual audits, Bytebridge.io provides revolutionary data solutions for all industries.

No Bias Labeled Data — the New Bottleneck in Machine Learning

  The Performance of an AI System Depends More on the Training Data Than the Code Over the last few years, there has been a burst of excitem...