Research Scientist, Speech & Audio - Generative AI (PhD)
Meta
117k - 173k USD/year
Office
Menlo Park, CA | Seattle, WA | New York, NY
Full Time
Meta is looking for a Research Scientist to join our audio team under the AGI Multimedia pillar. The team is working on the industrial leading research on building foundation models for audio understanding and audio generation. We are also closely working with vision research teams on pushing the frontier of multimodality (audio, video, language) research. Individuals in this role will work with an interdisciplinary team of scientists, engineers, and cross-functional partners with a broad range of experiences, perspectives, approaches, and backgrounds, and access cutting-edge technology, resources, and research facilities.Research Scientist, Speech & Audio - Generative AI (PhD) Responsibilities
$117,000/year to $173,000/year + bonus + equity + benefits
Individual compensation is determined by skills, qualifications, experience, and location. Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual salary only, and do not include bonus, equity or sales incentives, if applicable. In addition to base compensation, Meta offers benefits. Learn more about benefits at Meta.
Equal Employment Opportunity Meta is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice here. Meta is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, fill out the Accommodations request form.
- Develop algorithms based on state-of-the-art machine learning and neural network methodologies
- Work with and create large datasets
- Conduct research to advance the science and technology of intelligent machines
- Conduct research towards long-term ambitious research goals while identifying intermediate milestones
- Conduct research that enables learning the semantics of data across multiple modalities (speech, audio, images, video, text, and other modalities)
- Open source high quality code and reproducible results for the community
- Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
- Currently has or is in the process of obtaining a PhD in the field of Speech, Audio, Language, Machine Learning, a related field, or equivalent practical experience. Degree must be completed prior to joining Meta
- Research and/or hands-on experience in one or more of the following areas: audio (speech, sound, or music) generation, text-to-speech (TTS) synthesis, text-to-music generation, text-to-sound generation, speech recognition, speech / audio representation learning, vision perception, image / video generation, video-to-audio generation, audio-visual learning, audio language models, lip sync, lip movement generation / correction, lip reading, etc
- Experience with Python and PyTorch
- Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
- Proven track record of achieving significant results as demonstrated by grants, fellowships, patents, as well as publications at leading workshops, journals or conferences such as ICML, NeurIPS, ICLR, ICASSP, Interspeech, ACL, EMNLP, CVPR, and other similar venues
- Demonstrated software engineer experience via an internship, work experience, coding competitions, or used contributions in open source repositories (e.g. GitHub)
- Experience solving complex problems and comparing alternative solutions, tradeoffs, and different perspectives to determine a path forward
- Experienced in large-scale data processing
- Experience communicating research findings to public audiences of peers
$117,000/year to $173,000/year + bonus + equity + benefits
Individual compensation is determined by skills, qualifications, experience, and location. Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual salary only, and do not include bonus, equity or sales incentives, if applicable. In addition to base compensation, Meta offers benefits. Learn more about benefits at Meta.
Equal Employment Opportunity Meta is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice here. Meta is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, fill out the Accommodations request form.
Research Scientist, Speech & Audio - Generative AI (PhD)
Office
Menlo Park, CA | Seattle, WA | New York, NY
Full Time
117k - 173k USD/year
August 5, 2025