Welcome to Barna Saha's homepage


Barna Saha
AT&T Shannon Research Laboratory
180 Park Avenue, Florham Park, Room No. C257
Phone No. 973-360-7214
Email. [my first name]@research.att.com

I am a Research Scientist at AT&T Shannon Research Laboratory, New Jersey, where I joined after completing my PhD from University of Maryland College Park in August 2011. I am also a Permanent Member of Center for Discrete Mathematics and Theoretical Computer Science (DIMACS) at Rutgers.

My research interests span algorithm design and analysis, discrete optimization and foundational aspects of databases and data management. I examine optimization problems related to BigData platform (resource and workload management in Cloud, distributed data management) and enhancing utility of BigData (data quality, data integration, pattern mining). These are parts of AT&T's initiatives on BigData and Cloud services.

My works are based on developing approximation algorithms, use of randomization, probabilistic methods and graph theory to explore the trade-off between accuracy of result vs time and space of processing. My dissertation research was on designing approximation and scalable algorithhms for resource allocation and scheduling on parallel and distributed setting.


Selected Publications (See the Full List By Topic, Chronologically)

Author names are in alphabetical order in most of my publications as per tradition in TCS.
  1. A Constant Factor Approximation Algorithms for Fault Tolerant k median,with Mohammadtaghi Hajiaghayi, Jian Li, Shi Li, Wei Hu. SODA, 2014. This paper gives the first constant factor approximation algorithm for the fault tolerant k median problem improving upon previous logarithm approximation.

  2. Efficiently Computing Edit Distance to Dyck Language, Barna Saha, Submitted, 2014. This paper gives the first nontrivial (polylog) approximation algorithm that runs in near linear time for computing edit distance to Dyck Language, a theoretical treatment of the work in [3] and a significant generalization of the string edit distance problem.

  3. On Repairing Structural Problems in Semi-structured Data,with Flip Korn, Divesh Srivastava and Shanshan Ying. VLDB 2013. The paper proposes new model for data quality problems in semi-structured data.

  4. Discovering Conservation Rules, with Lukasz Golab, Howard Karloff, Flip Korn and Divesh Srivastava. To appear in IEEE TKDE Special Issue on the Best Papers of ICDE. Conference version in ICDE 2012, The paper presents a new model for data quality in certain structured data. AMONG FINALISTS FOR BEST PAPER AWARD.

  5. Set Cover revisited: Hypergraph Cover with Hard Capacities, with Samir Khuller. ICALP 2012. This paper settles an open question in basic set covering type problem from [FOCS 2002].

  6. New Constructive Aspects of the Lovasz Local Lemma, with Bernhard Haeupler and Aravind Srinivasan. Journal of the ACM (JACM), 2011. Conference version in FOCS 2010. This paper settles an important open question on resource allocation and scheduling from [STOC 2006].

  7. A Unified Approach to Ranking in Probabilistic Databases, with Jian Li and Amol Deshpande. The VLDB Journal 2011 Special Issue on the Best Papers from VLDB. Conference version in VLDB 2009, BEST PAPER AWARD.

  8. AdCell-Ad Allocation in Cellular Networks, with Saeed Alaei, Mohammad Taghi Hajiaghayi, Vahid Liaghat and Dan Pei. ESA 2011. One of the first papers to consider optimization for location based targeted advertisement.

  9. Energy Efficient Scheduling via Partial Shutdown, with Samir Khuller and Jian Li. SODA 2010. This paper proposes a new model for energy efficiency while scheduling in data centers.

  10. A New Approximation Technique for Resource-Allocation Problems, with Aravind Srinivasan. ITCS 2010. This paper proposes a unified scheme for rounding linear programs and application to resource allocation problems.

Patents

  • Lukasz Golab, Howard Karloff, Flip Korn, Barna Saha, Divesh Srivastava, ``Conservation Dependency," US patent filed US20120130935, AT&T Research Laboratory, New Jersey, 2012.

  • Kook Jin Ahn, Seungjoon Lee, Barna Saha, ``Resource Management for Batch Processing for Infrastructure/Platform as a Service,'' US patent in process, AT&T Research Laboratory, New Jersey, 2013.

  •  

    Link to My CV (Updated December, 2013)

    Services

  • Panelist: NSF BigData

  • Committee Member: VLDB 2015, APPROX 2014, GHC 2012.

  • Reviewer: (Journals) IEEE Transactions on Knowledge Discovery; Internet Mathematics; Operations Research Letters, Maths of Operations Research, Algorithmica, Transactions on Algorithms, Transaction on Database Systems, Maths of Programming.

    (Conferences): FOCS, SODA, STOC, ESA, VLDB, PODS, SIGMOD, ICDE, ISAAC, KDD, SDM, APPROX, RANDOM.

  • Recent/Upcoming Talks
    • Upcoming Tutorial at ICDE 2014: Data Quality: The other Face of Big Data, Barna Saha, Divesh Srivastava
    • Perfecting the Imperfect: Reasoning with ``real'' Data via a Probabilistic Lens
      UIUC, Columbia, ASU, U of Arizona, February-March, 2014
    • Clean Data and Unlimited Resources: A Probabilistic Journey to a Fantasyland
      Purdue, UMass-Amherst, Dartmouth, Virginia Tech, Washington University, Boston University, U. Waterloo, U. C. Davis, February-April, 2014
    • The Language Edit Distance Problem
      ITA, UCSD, February 2014
    • Making your Clean Data Big: Scalable Algorithms for Data Quality Problems
      MIT, January 2014
    • Computing Dense Subgraphs from Theory to Practice
      UMN Colloquim, October 2013
    • Energy Efficient Scheduling & Covering with Hard Capacities
      UIUC, October 2013

    Teaching

  • Algorithmic Techniques for Big Data Analysis, Fall 2013

  • Teaching Assistant: Cryptography, Algorithm Design, Data Streaming Algorithms, Artificial Intelligence, Parallel Algorithms.

    Student Mentoring

  • Manish Purohit , University of Maryland College Park, (Summer Intern June 2013-August 2013), Project: Firewall Placement on Cloud Infrastructure (short version in SOCC 2013, longer version in preparation)

  • Harmeet Jandu, Rutgers University, (September 2012-December 2012), Project: Analyzing Social Network Structure on Hadoop (Masters' Thesis)

  • Shanshan Ying, National University of Singapore, (Summer Intern, June 2012-August 2012), Project: Repairing Malformed XML Documents (appeared in VLDB 2013 )

  • Kook Jin Ahn University of Pennsylvania, (Summer Intern, June 2012-August 2012), Project: Scalable LP-Solver for Resource Allocation on Cloud

  • Donatella Firmani ,Sapienza University of Rome, (April 2012-July 2012), Project: Crowd-Sourced Record Linkage


  • Publications By Topic[Back to Top]

  • Scheduling, Resource Allocation, Facility Location

  • Data Quality and Data Integration

  • Graph Algorithms, Covering Problems: Data Analytics, Distributed Data Management

      Scheduling, Resource Allocations, Facility Location[Back to Top]

    1. Scheduling a Cloud In Advance,
          with Kook Jin Ahn and Seungjoon Lee.
          Submitted.

    2. A Constant Factor Approximation Algorithm for Fault Tolerant k-Median Problem,
          with Mohammadtaghi Hajiaghayi, Wei Hu, Jian Li and Shi Li.
          ACM-SIAM Symposium on Discrete Algorithms (SODA), 2014. .

    3. Renting a Cloud,
          Barna Saha
          Foundations of Software Technology and Theoretical Computer Science (FSTTCS), 2013.

    4. Firewall Placement on Cloud Data Centers,
          with Manish Purohit and Seungjoon Lee.
          Proc. in ACM Symposium on Cloud Computing (SOCC) [poster], 2013.

    5. New Approximation Results for Resource Replication Problems,
          with Samir Khuller and Kanthi Sarpatwar.
          15th. International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX), 2012.

    6. New Constructive Aspects of the Lovasz Local Lemma,
          with Bernhard Haeupler and Aravind Srinivasan.
          Journal of the ACM (JACM), 2011.

    7. The Matroid Median Problem,
          with Ravishankar Krishnaswamy and Vishwanath Nagarajan.
          ACM-SIAM Symposium on Discrete Algorithms (SODA), 2011.
      Merged with a paper by Amit Kumar and Yogish Sabharwal who obtained similar results for the case of partition matroid.

    8. New Constructive Aspects of the Lovasz Local Lemma,
          with Bernhard Haeupler and Aravind Srinivasan.
          IEEE Symposium on Foundations of Computer Science (FOCS) 2010
      This paper settles an outstanding open question related to resource allocation, known as the Santa Claus problem.

    9. Energy Efficient Scheduling via Partial Shutdown,
          with Samir Khuller and Jian Li.
          ACM-SIAM Symposium on Discrete Algorithms (SODA), 2010.

    10. A New Approximation Technique for Resource-Allocation Problems,
          with Aravind Srinivasan.
          Innovations in Theoretical Computer Science (ITCS) 2010.

      Data Quality and Data Integration[Back to Top]

    11. Data Quality: The other Face of Big Data,
          with Divesh Srivastava.
          Tutorial, 30th IEEE International Conference on Data Engineering (ICDE), 2014.

    12. Efficiently Computing Edit Distance to Dyck Language,
          Submitted.

    13. Discovering Conservation Rules,
          with Lukasz Golab, Howard Karloff, Flip Korn and Divesh Srivastava.
          To appear in IEEE Transactions on Knowledge and Data Engineering (TKDE) Special Issue on the Best Papers of ICDE 2012.

    14. On Repairing Structural Problems in Semi-structured Data,
          with Flip Korn, Divesh Srivastava and Shanshan Ying.
          35th International Conference on Very Large Data Bases (VLDB), 2013.

    15. Less is More, Selecting Sources Wisely for Integration,
          with Xin Luna Dong and Divesh Srivastava.
          35th International Conference on Very Large Data Bases (VLDB), 2013.

    16. Discovering Conservation Rules,
          with Lukasz Golab, Howard Karloff, Flip Korn and Divesh Srivastava.
          28th IEEE International Conference on Data Engineering (ICDE), 2012.
      Invited to IEEE Transactions on Knowledge and Data Engineering (TKDE) special issue on the best papers of ICDE 2012.

    17. A Unified Approach to Ranking in Probabilistic Databases,
          with Jian Li and Amol Deshpande.
          The VLDB Journal, 2011. Special Issue on the Best Papers from VLDB 2009.

    18. Schema Covering: A Step Towards Enabling Reusability in Information Integration,
          with Ioana Stanoi and Ken Clarkson.
          Proc. 26th IEEE International Conference on Data Engineering (ICDE), 2010.

    19. A Unified Approach to Ranking in Probabilistic Databases,
          with Jian Li and Amol Deshpande.
          35th International Conference on Very Large Data Bases (VLDB), 2009. BEST PAPER AWARD

    20. Simplifying Information Integration: Object-Based Flow-of-Mappings Framework for Integration,
          with Bogdan Alexe, Michael Gubanov, Mauricio A. Hernandez, Howard Ho, Jen-Wei Huang,
      Yannis Katsis, Lucian Popa and Ioana Stanoi.
          Lecture Notes in Business Information Processing, Revised Selected Papers from
      Second International VLDB Workshop, BIRTE 2008.

      Graph Algorithms, Covering Problems: Data Analytics, Distributed Data Management[Back to Top]

    21. Hierarchical Graph Partitioning,
          with Mohammadtaghi Hajiaghayi, Theodore Johnson and Reza Khani.
          26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) 2014.

    22. Size-Constrained Weighted Set Cover with Applications,
          with Lukasz Golab, Flip Korn, Feng Li and Divesh Srivastava.
          Submitted.

    23. The Closest String Problem and the Chebyshev Radius,
          with Yury Polyanskiy and Arya Mazumdar.
          IEEE International Symposium on Information Theory (ISIT), 2013.

    24. Set Cover revisited: Hypergraph Cover with Hard Capacities,
          with Samir Khuller.
          International Colloquium on Automata, Languages, and Programming (ICALP, Track A), 2012.
      This paper settles an open question raised by Chuzhoy and Naor in [FOCS02].

    25. AdCell-Ad Allocation in Cellular Networks,
          with Saeed Alaei, Mohammad Taghi Hajiaghayi, Vahid Liaghat and Dan Pei.
          European Symposia on Algorithms (ESA), 2011.

    26. Link Prediction for Annotation Graphs,
          with Philip Anderson, Samir Khuller, Saket Navlakha, Louiqa Raschid, Andreas Thor and Xiao-Ning Zhang.
          International Semantic Web Conference (ISWC), 2011.

    27. On Capacitated Set Cover Problems,
          with Nikhil Bansal and Ravishankar Krishnaswamy.
          14th. International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX), 2011.

    28. Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs,
          with Allison Hoch, Samir Khuller, Louiqa Raschid and Xiao-Ning Zhang.
          International Conference on Research in Computational Molecular Biology (RECOMB) 2010.

    29. On Finding Dense Subgraphs,
          with Samir Khuller.
          International Colloquium on Automata, Languages, and Programming (ICALP), 2009.

    30. On Maximum Coverage in the Streaming Model & Application to Multi-topic Blog-Watch,
          with Lise Getoor.
          Ninth SIAM International Conference on Data Mining (SDM), 2009.

    31. Group Proximity Measure for Recommending Groups in Online Social Networks.,
          with Lise Getoor.
          2nd ACM SIGKDD Workshop on Social Network Mining and Analysis (SNA-KDD), 2008.

    32. On Estimating Path-Aggregates over Streaming Graphs,
          with Sumit Ganguly.
          17th International Symposium on Algorithms and Computations (ISAAC), 2006.

    33. Bidirectional Fuzzy-Regression Model for Road-lines Detection,
          with Arya Mazumdar, N R Pal.
          IEEE International Conference on Engineering of Intelligent Systems (ICEIS), 2006.


    Publications in Reverse Chronological Order [Back to Top]

    1. Data Quality: The other Face of Big Data,
          with Divesh Srivastava.
          Tutorial, 30th IEEE International Conference on Data Engineering (ICDE), 2014.

    2. Efficiently Computing Edit Distance to Dyck Language,
          Submitted.

    3. Size-Constrained Weighted Set Cover with Applications,
          with Lukasz Golab, Flip Korn, Feng Li and Divesh Srivastava.
          Submitted.

    4. Scheduling a Cloud In Advance,
          with Kook Jin Ahn and Seungjoon Lee.
          Submitted.

    5. Hierarchical Graph Partitioning,
          with Mohammadtaghi Hajiaghayi, Theodore Johnson and Reza Khani.
          26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) 2014.

    6. A Constant Factor Approximation Algorithm for Fault Tolerant k-Median Problem,
          with Mohammadtaghi Hajiaghayi, Wei Hu, Jian Li and Shi Li.
          ACM-SIAM Symposium on Discrete Algorithms (SODA), 2014. .

    7. Renting a Cloud,
          Barna Saha
          Foundations of Software Technology and Theoretical Computer Science (FSTTCS), 2013.

    8. Firewall Placement on Cloud Data Centers,
          with Manish Purohit Ahn and Seungjoon Lee.
          Proc. ACM Symposium on Cloud Computing (SOCC) [poster], 2013.

    9. Discovering Conservation Rules,
          with Lukasz Golab, Howard Karloff, Flip Korn and Divesh Srivastava.
          To appear in IEEE Transactions on Knowledge and Data Engineering (TKDE) Special Issue on the Best Papers of ICDE 2012.

    10. On Repairing Structural Problems in Semi-structured Data,
          with Flip Korn, Divesh Srivastava and Shanshan Ying.
          35th International Conference on Very Large Data Bases (VLDB), 2013.

    11. Less is More, Selecting Sources Wisely for Integration,
          with Xin Luna Dong and Divesh Srivastava.
          35th International Conference on Very Large Data Bases (VLDB), 2013.

    12. The Closest String Problem and the Chebyshev Radius,
          with Yury Polyanskiy and Arya Mazumdar.
          IEEE International Symposium on Information Theory (ISIT), 2013.

    13. New Approximation Results for Resource Replication Problems,
          with Samir Khuller and Kanthi Sarpatwar.
          15th. International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX), 2012.

    14. Set Cover revisited: Hypergraph Cover with Hard Capacities,
          with Samir Khuller.
          International Colloquium on Automata, Languages, and Programming (ICALP, Track A), 2012.
      This paper settles an open question raised by Chuzhoy and Naor in [FOCS02].

    15. Discovering Conservation Rules,
          with Lukasz Golab, Howard Karloff, Flip Korn and Divesh Srivastava.
          28th IEEE International Conference on Data Engineering (ICDE), 2012.
      Invited to IEEE Transactions on Knowledge and Data Engineering (TKDE) special issue on the best papers of ICDE 2012.

    16. The Matroid Median Problem,
          with Ravishankar Krishnaswamy and Vishwanath Nagarajan.
          ACM-SIAM Symposium on Discrete Algorithms (SODA), 2011.
      Merged with a paper by Amit Kumar and Yogish Sabharwal who obtained similar results for the case of partition matroid.

    17. A Unified Approach to Ranking in Probabilistic Databases,
          with Jian Li and Amol Deshpande.
          The VLDB Journal, 2011. Special Issue on the Best Papers from VLDB 2009.

    18. New Constructive Aspects of the Lovasz Local Lemma,
          with Bernhard Haeupler and Aravind Srinivasan.
          Journal of the ACM (JACM), 2011.

    19. AdCell-Ad Allocation in Cellular Networks,
          with Saeed Alaei, Mohammad Taghi Hajiaghayi, Vahid Liaghat and Dan Pei.
          European Symposia on Algorithms (ESA), 2011.

    20. On Capacitated Set Cover Problems,
          with Nikhil Bansal and Ravishankar Krishnaswamy.
          14th. International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX), 2011.

    21. Link Prediction for Annotation Graphs,
          with Philip Anderson, Samir Khuller, Saket Navlakha, Louiqa Raschid, Andreas Thor and Xiao-Ning Zhang.
          International Semantic Web Conference (ISWC), 2011.

    22. New Constructive Aspects of the Lovasz Local Lemma,
          with Bernhard Haeupler and Aravind Srinivasan.
          IEEE Symposium on Foundations of Computer Science (FOCS) 2010
      This paper settles an outstanding open question related to resource allocation, known as the Santa Claus problem.

    23. Energy Efficient Scheduling via Partial Shutdown,
          with Samir Khuller and Jian Li.
          ACM-SIAM Symposium on Discrete Algorithms (SODA), 2010.

    24. A New Approximation Technique for Resource-Allocation Problems,
          with Aravind Srinivasan.
          Innovations in Theoretical Computer Science (ITCS) 2010.

    25. Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs,
          with Allison Hoch, Samir Khuller, Louiqa Raschid and Xiao-Ning Zhang.
          International Conference on Research in Computational Molecular Biology (RECOMB) 2010.

    26. Schema Covering: A Step Towards Enabling Reusability in Information Integration,
          with Ioana Stanoi and Ken Clarkson.
          Proc. 26th IEEE International Conference on Data Engineering (ICDE), 2010.

    27. A Unified Approach to Ranking in Probabilistic Databases,
          with Jian Li and Amol Deshpande.
          35th International Conference on Very Large Data Bases (VLDB), 2009. BEST PAPER AWARD

    28. On Finding Dense Subgraphs,
          with Samir Khuller.
          International Colloquium on Automata, Languages, and Programming (ICALP), 2009.

    29. On Maximum Coverage in the Streaming Model & Application to Multi-topic Blog-Watch,
          with Lise Getoor.
          Ninth SIAM International Conference on Data Mining (SDM), 2009.

    30. Simplifying Information Integration: Object-Based Flow-of-Mappings Framework for Integration,
          with Bogdan Alexe, Michael Gubanov, Mauricio A. Hernandez, Howard Ho, Jen-Wei Huang,
      Yannis Katsis, Lucian Popa and Ioana Stanoi.
          Lecture Notes in Business Information Processing, Revised Selected Papers from
      Second International VLDB Workshop, BIRTE 2008.

    31. Group Proximity Measure for Recommending Groups in Online Social Networks.,
          with Lise Getoor.
          2nd ACM SIGKDD Workshop on Social Network Mining and Analysis (SNA-KDD), 2008.

    32. On Estimating Path-Aggregates over Streaming Graphs,
          with Sumit Ganguly.
          17th International Symposium on Algorithms and Computations (ISAAC), 2006.

    33. Bidirectional Fuzzy-Regression Model for Road-lines Detection,
          with Arya Mazumdar, N R Pal.
          IEEE International Conference on Engineering of Intelligent Systems (ICEIS), 2006.