ByteBridge data labeling outsourced service: get your ML training datasets cheaper and faster!: face recognition

Monday, November 9, 2020

The Human-power Behind AI: Machine Learning Needs Annotators

“The global data collection and labeling market size was valued at USD 1.0 billion in 2019 and is expected to witness a CAGR of 26.0% from 2020 to 2027,” quote from a market analysis report by grand view research.

At present, the application scenes of artificial intelligence are constantly enriched, and applications are changing our lives by providing automated and smart services. Behind the rapid growth of the AI industry, the new profession of data annotator is also expanding. There is a popular saying in the data annotation industry, “more intelligent, more labors”. The data that AI algorithms learn from must be annotated one by one through the human annotators.

These annotation workers don’t need to leave their homes. They can be trained to categorize and annotate data for machine learning from various platforms, such as cloud factories, label box, and Bytebridge.io, which all allow annotators to work remotely without any location requirement. Through these distributed annotators’ hard work, machines can quickly learn and recognize text, pictures, videos, and other content, and finally become “AI trainers.”

Machine learning requires data annotation

AI data annotators are called “the people behind Artificial Intelligence”. “Data is the blood of AI. It can be said that whoever has mastered the data is very likely to do well,” said Brian Cheong, CEO of bytebridge.io, an automated data labeling platform. He explained that the current Artificial Intelligence could also be called data intelligence because how machine learning evolves depends on the quality and quantity of data. “For example, current face recognition system works well on young and middle-aged group people, because young people are more likely to travel and reside on hotels, so their faces can be more easily collected. On the other hand, there are less data on kids and the elderly.”

But at the same time, data alone is useless. For deep learning, data only make sense when it is tagged and used for machines’ learning and evolution. Labeling is a must.

Starting from the data collection, cleaning, labeling to calibration are 100% relying on annotators. The most basic aspect of data annotation will be the image annotation. For example, if the detection target is a car, the annotator needs to mark all the cars on a picture. If the picture frame does not accurately mark the car, the machine may “break down” due to the inaccuracy. Another example is human posture recognition, which includes 18 key points. Only trained annotators can master these key points, so the annotated data can meet the standard for machines to learn.

“We are proud that we provide various functions in our platform. Many platforms only provide few functions, but we are a one-stop solution for AI firm. Everything can be automated with us,” quote from CEO of bytebridge.io.

Different data types require different skill sets to annotators. In addition to the annotation that is relatively simple and can be mastered by training, some annotation require professional background. For example, for medical data, the annotator needs to do the segmentation of medical images and mark tumor areas, which need to be completed by annotators who have a medical background. Another example is local dialects or foreign languages, these also require annotators who master that language.

“We got the annotators globally, we work with people from developed countries to developing areas, since we provide mobile version labeling toolset, our annotators got a very diversified background, which can meet different tasks’ skill requirements. ”

Now AI has entered the stage of technology application to real-world scenarios, including security, finance, home, transportation, and other major industries. In the future, in the data annotation industry, annotators will also enter the market segment chasing stage along with the AI industry.

Monday, September 21, 2020

An International Online Factory Brings High-tech Innovation into Pig Farming Industry

Woodrow Wilson Bledsoe, known as the father of facial recognition, developed a system that could read faces by using a 10-inch-square tablet with vertical and horizontal coordinates in the 1960s. For the past 60 years, countries across the world have substantially increased the investment in facial recognition systems. Today, programmers extend facial intelligence to the livestock farming industry, using facial recognition technology to assess the emotional well-being of pigs.

Pig farms significantly benefited from this modern technique. With machine learning technology, each piglets’ health condition can be accurately controlled since birth by understanding it’s facial expressions. The system also allows health improvement for pigs while monitoring their daily and total feed consumption individually.

Alibaba (China’s e-commerce giant) has recently set on automatic identification of pig faces. It can also be used for diagnosing its breeding status and detecting diseases. Last year Scotland’s Rural College (SRUC) implemented the convolutional neural networks to analyze pig emotion and intention. Increasing numbers of farms around the world are now using high-tech equipment to record pig’s actions with a precision that can exceed manual performance. Note that, however, smart farming relies on a massive database with intensive support of machine learning techniques.

Data support is the primary condition for smart farming

The key to smart farming is the big data support team behind it. Recently, a Korean pig farm is looking for a digital system to gain information on pigs’ productivity, behavior, and welfare. They hired Bytebridge.io’s team to improve farming efficiency.

“The smart system should be able to reflect every pig’s health condition from tracking its feeding patterns and behaviors. We were looking for a data annotation company to process the data structurally according to different machine languages. The tricky part is, we set a very strict time limit for the team. We need the labeling to be done as soon as possible” said the owner of the pig farm.

“Surprisingly, Bytebridge.io perfectly resolved this problem and improved our system. After handing out millions of images, we received their package even sooner than we expected. We got our data labeled within 3 working days.”

Traditional data labeling companies, after receiving similar projects, would employ an agent to call up the tagging team and train them at least for a day based on the customer’s requirements. The communication cost can be significant in this process. Bytebridge.io, on the other hand, hugely cut the time and cost. Their output accuracy rate of labeling reaches 99.5% (i.e. over 995/1000 pictures can be correctly labeled). Bytebridge.io’s speed of data processing is one-tenth of that of traditional data labeling companies.

Online data processing factory

Bytebridge.io owns millions of registered users all over the world with daily active users reached up to 100,000. According to the needs of customers, the platform divides and distributes tasks to global users and builds an online data processing factory. All users are grouped into different levels based on users’ education, language, and task capability coefficient. Customers can optimize their cost by picking one of those options.

Consensus algorithm

To cut the communication and training cost when dealing with complex task flow, Bytebridge.io employs consensus decision-making to optimize the labeling system. When dealing with complex tasks, several proposed protocols reduces the task difficulty by splitting the task flow and then set a consensus index to unify the results through algorithm rules.

In the pig farm project, the final delivered data is presented as the structured data, including the number, position, and posture of pigs are displayed in a picture. Therefore, the task flow can be divided into sub-work, i.e. counting pigs, frame pigs, and posture interpretation.

Before task distribution, set a consensus index, such as 90%, for a task. If 90% of the people’s answer is basically the same, the system will judge that they have reached a consensus. If customers require the highest accuracy of data annotation, they can use “multi-round consensus” to repeat tasks over again to improve the accuracy of final data delivery.

Bytebridge.io, an innovative data training platform, substitutes the traditional model by an innovative and precise consensus algorithm mechanism. It owns a powerful online data processing platform operating efficiently around the globe. Connecting the international fragmented labor force for data labeling/collection, promoting the labor structure transformation, Bytebridge.io develops a new model for the data labeling industry.

Meanwhile, data processing efficiency can be improved due to ensured data security as a result of API technology. By building an online data processing factory and replacing manual audits, Bytebridge.io provides revolutionary data solutions for all industries.

ByteBridge data labeling outsourced service: get your ML training datasets cheaper and faster!

Monday, November 9, 2020

The Human-power Behind AI: Machine Learning Needs Annotators

Monday, September 21, 2020

An International Online Factory Brings High-tech Innovation into Pig Farming Industry

No Bias Labeled Data — the New Bottleneck in Machine Learning

Report Abuse

Labels