论文复现一直是机器学习领域的难题,虽然最近多个学术会议提倡大家提交代码,但解决论文复现问题仍然任重而道远。在试图复现机器学习论文时,我们经常遇到哪些问题呢?新加坡机器学习工程师 Derek Chia 对此进行了总结。
一些达到新 SOTA 的论文在新闻媒体中引起关注;
读者深入研究或快速浏览论文内容;
读者对论文中的实验结果印象深刻,并产生复现的兴趣。
README 文件不完整或缺失;
未定义依赖项、代码存在 bug、缺少预训练模型;
未公开参数;
私有数据集或缺少预处理步骤;
对 GPU 资源的需求不切实际。
对 GPU 资源的需求不切实际
Why Can’t I Reproduce Their Results (http://theorangeduck.com/page/reproduce-their-results)
Rules of Machine Learning: Best Practices for ML Engineering (https://developers.google.com/machine-learning/guides/rules-of-ml)
Curated list of awesome READMEs (https://github.com/matiassingers/awesome-readme)
How the AI community can get serious about reproducibility (https://ai.facebook.com/blog/how-the-ai-community-can-get-serious-about-reproducibility/)
ML Code Completeness Checklist (https://medium.com/paperswithcode/ml-code-completeness-checklist-e9127b168501)
Designing the Reproducibility Program for NeurIPS 2020 (https://medium.com/@NeurIPSConf/designing-the-reproducibility-program-for-neurips-2020-7fcccaa5c6ad)
Tips for Publishing Research Code (https://github.com/paperswithcode/releasing-research-code)