Researchers from MIT and other institutions found that machine-learning models, which are designed to replicate human decision-making, often make harsher judgments than humans due to the type of data used for training. The study highlights that models should be trained on normative data—where human labelers determine if items violate specific rules. Instead, many models utilize descriptive data, which merely identifies factual features, leading to an over-prediction of rule violations.
The research indicates significant discrepancies between the judgments made using descriptive versus normative labels. For example, when asked to determine if a dog violates an apartment pet policy, human labelers were notably more stringent when providing descriptive labels compared to normative ones. This difference could have serious implications in real-world applications, such as legal decisions or content moderation, where stricter judgments might result in heavier penalties or restrictions.
The researchers emphasize the importance of transparency regarding data collection methods and the need to align training data with the context of its intended use. Future work may explore fine-tuning descriptively trained models with normative data to improve accuracy and reduce potential biases in machine learning systems.