Monday, November 9, 2020

The Human-power Behind AI: Machine Learning Needs Annotators

 “The global data collection and labeling market size was valued at USD 1.0 billion in 2019 and is expected to witness a CAGR of 26.0% from 2020 to 2027,” quote from a market analysis report by grand view research.

At present, the application scenes of artificial intelligence are constantly enriched, and applications are changing our lives by providing automated and smart services. Behind the rapid growth of the AI industry, the new profession of data annotator is also expanding. There is a popular saying in the data annotation industry, “more intelligent, more labors”. The data that AI algorithms learn from must be annotated one by one through the human annotators.

These annotation workers don’t need to leave their homes. They can be trained to categorize and annotate data for machine learning from various platforms, such as cloud factories, label box, and Bytebridge.io, which all allow annotators to work remotely without any location requirement. Through these distributed annotators’ hard work, machines can quickly learn and recognize text, pictures, videos, and other content, and finally become “AI trainers.”

Machine learning requires data annotation

AI data annotators are called “the people behind Artificial Intelligence”. “Data is the blood of AI. It can be said that whoever has mastered the data is very likely to do well,” said Brian Cheong, CEO of bytebridge.io, an automated data labeling platform. He explained that the current Artificial Intelligence could also be called data intelligence because how machine learning evolves depends on the quality and quantity of data. “For example, current face recognition system works well on young and middle-aged group people, because young people are more likely to travel and reside on hotels, so their faces can be more easily collected. On the other hand, there are less data on kids and the elderly.”

But at the same time, data alone is useless. For deep learning, data only make sense when it is tagged and used for machines’ learning and evolution. Labeling is a must.

Starting from the data collection, cleaning, labeling to calibration are 100% relying on annotators. The most basic aspect of data annotation will be the image annotation. For example, if the detection target is a car, the annotator needs to mark all the cars on a picture. If the picture frame does not accurately mark the car, the machine may “break down” due to the inaccuracy. Another example is human posture recognition, which includes 18 key points. Only trained annotators can master these key points, so the annotated data can meet the standard for machines to learn.

“We are proud that we provide various functions in our platform. Many platforms only provide few functions, but we are a one-stop solution for AI firm. Everything can be automated with us,” quote from CEO of bytebridge.io.

Different data types require different skill sets to annotators. In addition to the annotation that is relatively simple and can be mastered by training, some annotation require professional background. For example, for medical data, the annotator needs to do the segmentation of medical images and mark tumor areas, which need to be completed by annotators who have a medical background. Another example is local dialects or foreign languages, these also require annotators who master that language.

“We got the annotators globally, we work with people from developed countries to developing areas, since we provide mobile version labeling toolset, our annotators got a very diversified background, which can meet different tasks’ skill requirements. ”

Now AI has entered the stage of technology application to real-world scenarios, including security, finance, home, transportation, and other major industries. In the future, in the data annotation industry, annotators will also enter the market segment chasing stage along with the AI industry.

No comments:

Post a Comment

No Bias Labeled Data — the New Bottleneck in Machine Learning

  The Performance of an AI System Depends More on the Training Data Than the Code Over the last few years, there has been a burst of excitem...