Build and curate high-quality datasets, leveraging them to train, fine-tune, and optimize Large Language Models. Responsibilities include data pipeline development, distributed model training, model evaluation, and performance analysis to improve model accuracy and efficiency. Previously worked in small teams.