# Mu Li(李沐)

Principal Architect

Sign in to view more

Principal Architect, Baidu, 2014 -

Intern, Google Research, Summer 2013

Senior Research and Developer, Baidu, 2011 - 2012

Research Assistant, Hong Kong University of Science and Technology, 2009 - 2010

Intern, Microsoft Research Asia, Summer 2007

## Papers50 papers

Bag of Tricks for Image Classification with Convolutional Neural Networks.

Language Models with Transformers.

On the Powerball Method: Variants of Descent Methods for Accelerated Optimization

Joint Training for Neural Machine Translation Models with Monolingual Data.

Coarse-To-Fine Learning for Neural Machine Translation.

Coarse-To-Fine Learning for Neural Machine Translation.

Data Driven Resource Allocation for Distributed Learning

Chunk-based Decoder for Neural Machine Translation.

DiFacto: Distributed Factorization Machines.

AdaDelay: Delay Adaptive Distributed Stochastic Optimization.

Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation.

Large-scale Nyström kernel matrix approximation using randomized SVD.

Inferring Movement Trajectories from GPS Snippets

High Performance Latent Variable Models

A Recursive Recurrent Neural Network for Statistical Machine Translation.

Efficient mini-batch training for stochastic optimization

Stability analysis of dynamic quantized systems with data dropout and communication delay

Machine Translation with Real-Time Web Search.

Communication Efficient Distributed Machine Learning with the Parameter Server.

A Coupled Similarity Kernel for Pairwise Support Vector Machine

A Coupled Similarity Kernel for Pairwise Support Vector Machine.

Compact Video Fingerprinting via Structural Graphical Models

Bilingual Data Cleaning for SMT using Graph-based Random Walk.

Robust video fingerprinting via structural graphical models

Forced derivation tree based model training to statistical machine translation

Stability analysis of dynamic quantized feedback system with packet loss

Time and space efficient spectral clustering via column sampling

Online multiple instance learning with no regret

PASCAL: A Protocol-Based Approach for Service Composition and Dependable Optimization

Mixture model-based minimum Bayes risk decoding using multiple machine translation systems

Making Large-Scale Nyström Approximation Possible

Adaptive development data selection for log-linear model in statistical machine translation

Emotion classification based on gamma-band EEG.

Extracting Keyphrases from Chinese News Articles Using TextRank and Query Log Knowledge ?

Detecting Drowsiness in Driving Simulation Based on EEG

Estimating vigilance in driving simulation using probabilistic PCA.

A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation