Peter Petrov’s Weblog

var me = from practice in programming where practice.IsBestPractice && practice.UseLambda select practice.OptimalPerformance;

Parse string having comma separated integers August 1, 2011

Filed under: .NET Framework,C# — ppetrov @ 8:01 pm
Tags: ,

I’ve read a blog post about splitting a string containing comma separated integers and then creating an array. It’s a very short and simple code but there’s a little problem – it uses the Split() method and the result is an array, potentially big array. What if you only need the first 10 results out of ten thousand integers ? You will do the work to split ten thousand integers and you will consume a lot memory then you need and finally you will create a lot of garbage. I’ve written a class to solve this issues.

public class IntConverter
{
private readonly char _separator;

public IntConverter()
: this(',')
{
}

public IntConverter(char separator)
{
_separator = separator;
}

public IEnumerable<int> From(string input)
{
if (input == null) throw new ArgumentNullException("input");

return FromImplementation(input.Trim(), _separator);
}

public string To(IEnumerable<int> numbers)
{
if (numbers == null) throw new ArgumentNullException("numbers");

var buffer = new StringBuilder();

foreach (var n in numbers)
{
if (buffer.Length > 0)
{
buffer.Append(_separator);
}
buffer.Append(n.ToString());
}

return buffer.ToString();
}

private IEnumerable<int> FromImplementation(string input, char separator)
{
if (input == string.Empty)
{
yield break;
}

var buffer = new StringBuilder();

var symbols = input.ToCharArray();
for (var i = 0; i < symbols.Length; i++)
{
var symbol = symbols[i];
var isSeparator = (symbol == separator);
if (isSeparator)
{
yield return int.Parse(buffer.ToString());
buffer.Length = 0;
}
else
{
buffer.Append(symbol);
}
}

yield return int.Parse(buffer.ToString());
}
}

I wish there will be an overload of int.Parse or int.TryParse witch accepts StringBuilder as input to reduce even more the memory usage in this scenario.

 

ForEach method on IEnumerable January 22, 2009

Filed under: .NET Framework,C#,Extension methods — ppetrov @ 5:37 pm
Tags: , ,

I’ve noticed that when I use IEnumerable<T> very often I have a construction like this

.ToList().ForEach(x=> ...)

in my code. I’ve thinked about it and I’ve realized that ToList() will create (probably) a large list just to iterrate over it and perform some action. That’s just because IEnumerable<T> doesn’t have ForEach method. Using an extension method I’m able to fill this gap. So here’s the ForEach method on IEnumerable<T> :

public static void ForEach<T>(this IEnumerable<T> values, Action<T> action)
{
    foreach (var v in values)
    {
        action(v);
    }
}

It’s possible to return the values reference to allow method chaining  but personally I think the foreach must be the last thing to perform, otherwise it will lead you to a bad practice. Let me explain what I mean. Imagine we have this Person class and a method that returns all Persons.

public static IEnumerable<Person> GetAllPersons()
{
    // ...
    yield break;
}
public class Person
{
    public long Fortune { get; private set; }

    public void SlowMethod()
    {
        Thread.Sleep(1000);
        // ...
    }
}

Take a look a the fowling two methods that retrieves all rich persons

var richPersonsSlow = GetAllPersons().ForEach(p => p.SlowMethod()).Where(p => p.Fortune > 1000000);
var richPersonsFast = GetAllPersons().Where(p => p.Fortune > 1000000).ForEach(p => p.SlowMethod());

Take this scenario : we have 3600 persons and only one have more then one million.

The first line will take an hour to return the millionaire.

The second line will take only one second to return the millionaire because p.SlowMethod will be called only on the millionaire.

There’s a little semantic difference in the example but I think you get the point. Using the ForEach returning void eliminates this possible issue.

I hope the BCL Team will include this method (not necessarly my implementation) in C# 4.0 .

 

Usuful methods – 11 of N – Copy/Move Directory September 12, 2008

Today I have to copy an entire directory including all files, sub directories and all sub files to another directory. The .NET framework doesn’t provide such method (at least I don’t know of it) so I have coded one myself. Here’s the implementation of the CopyDirectory method.

        public static void CopyDirectory(string source, string destination)
        {
            if (destination[destination.Length - 1] != Path.DirectorySeparatorChar)
            {
                destination += Path.DirectorySeparatorChar;
            }
            if (!Directory.Exists(destination))
            {
                Directory.CreateDirectory(destination);
            }
            var entries = Directory.GetFileSystemEntries(source);
            foreach (var e in entries)
            {
                if (Directory.Exists(e))
                {
                    CopyDirectory(e, destination + Path.GetFileName(e));
                }
                else
                {
                    File.Copy(e, destination + Path.GetFileName(e), true);
                }
            }
        }

MoveDirectory is very simple when we have CopyDirectory. I know it’s not as fast as it can be ( if we use File.Move()) but still it gets the job done.

        public static void MoveDirectory(string source, string destination)
        {
            CopyDirectory(source, destination);
            Directory.Delete(source);
        }
 

Usuful methods – 10 of N – Dictionary with unique values July 4, 2008

The .NET Dictionary<TKey, TValue> is a great class.

MSDN:

The Dictionary<(Of <(TKey, TValue>)>) generic class provides a mapping from a set of keys to a set of values. Each addition to the dictionary consists of a value and its associated key. Retrieving a value by using its key is very fast, close to O(1), because the Dictionary<(Of <(TKey, TValue>)>) class is implemented as a hash table

I’m using this class very often. I’m not going to explain when to use a dictionary, an array, a sorted array or a tree or any other data structures – there is a lot of articles on this subject on the net. Instead I’ll extend the class and I’ll constraint the set of values to be unique which will give us a one-to-one mapping.It’s a very simple thing to do.

public class UniqueValuesDictionary<TKey, TValue> : Dictionary<TKey, TValue>
{
    public new void Add(TKey key, TValue value)
    {
        if (this.Values.Count > 0)
        {
            bool hasDuplicates = this.Values.Contains(value);
            if (hasDuplicates)
            {
                throw new ArgumentException("An element with the same value already exists.");
            }
        }
        base.Add(key, value);
    }
}

And here’s an example which demonstrates how this class will help us find errors

            var lookup = new Dictionary<int, string>();
            lookup.Add(1, "One");
            lookup.Add(2, "One");   // No exception as expected

            var uniqueLookup = new UniqueValuesDictionary<int, string>();
            uniqueLookup.Add(1, "One");
            uniqueLookup.Add(2, "One"); // Here we'll get an exception
 

Usuful methods – 9 of N – Count string occurences July 2, 2008

One more extension method(hopefully useful) on String.  If we want to count how many times a string contains another string this method will help us.

    public static int CountOccurences(this string original, string value)
    {
        return original.CountOccurences(value, StringComparison.CurrentCulture);
    }

    public static int CountOccurences(this string original, string value, StringComparison comparisionType)
    {
        return GenericCountOccurences(original, value, comparisionType, value.Length);
    }

    public static int CountOverlapOccurences(this string original, string value)
    {
        return GenericCountOccurences(original, value, StringComparison.CurrentCulture, 1);
    }

    public static int CountOverlapOccurences(this string original, string value, StringComparison comparisionType)
    {
        return GenericCountOccurences(original, value, comparisionType, 1);
    }

    private static int GenericCountOccurences(string original, string value, StringComparison comparisionType, int step)
    {
        int occurences = 0;

        if (!string.IsNullOrEmpty(original))
        {
            int foundIndex = original.IndexOf(value, 0, comparisionType);
            while (foundIndex >= 0)
            {
                occurences++;
                foundIndex = original.IndexOf(value, foundIndex + step, comparisionType);
            }
        }

        return occurences;
    }

We have two versions – simple and overlapping.
When we use the simple version like this

            var input = "aaaa";
            var count = input.CountOccurences("aa");

we’ll have count = 2.
When we use the overlapping one on the same input

            var input = "aaaa";
            var count = input.CountOverlapOccurences("aa");

we’ll have count = 3.

That’s because the search of the next occurrence begins right after the start of the match and in the other version the search begins after the end of the previous match.

Note: I’m sure my colleague and friend Vlado will appreciate this ;)

 

Useful method – 8 of N – String Capitalize First (ToTitleCase) June 30, 2008

I’ve wanted to rename a lots of files. I’ve also wanted the name of the files to follow my convention to capitalize every first letter. I couldn’t find such functionality in the string class. I’ve googled and I’ve found the TextInfo class and ToTitleCase method. It gets the job done and perfectly suits me needs.

Here’s how we can use it

TextInfo ti = Thread.CurrentThread.CurrentCulture.TextInfo;
string name = "petar petrov - XML developer";
string properName = ti.ToTitleCase(name);
// properName = "Petar Petrov - XML Developer"

Note that the XML isn’t transformed to Xml which is the correct behavior for me.

 

Useful method – 7 of N – Ignore case on String.Replace() June 27, 2008

Filed under: .NET Framework,C# — ppetrov @ 3:51 pm
Tags: , , ,

I’ve posted a method to determine if a string contains another string ignoring the case. I’ve also looked all string methods and I’ve found Replace isn’t available in a ignore case variant. I’ve created the missing overload(using an extension methods).

public static string Replace(this string original, string oldValue, string newValue, StringComparison comparisionType)
 {
 if (oldValue == null)
 throw new ArgumentNullException("oldValue");
 if (newValue == null)
 throw new ArgumentNullException("newValue");

 var result = original;

 if (oldValue != newValue)
 {
 int index = -1;
 int lastIndex = 0;

 var buffer = new StringBuilder();

 while ((index = original.IndexOf(oldValue, index + 1, comparisionType)) >= 0)
 {
 buffer.Append(original, lastIndex, index - lastIndex);
 buffer.Append(newValue);

 lastIndex = index + oldValue.Length;
 }
 buffer.Append(original, lastIndex, original.Length - lastIndex);

 result = buffer.ToString();
 }
 return result;
 }

UPDATE : I’ve updated the method to allow newValue to be string.Empty. It will perform like a remove method. WordPress source code posting is broken :(

 

Useful method – 6 of N – Ignore case on String.Contains()

Filed under: .NET Framework,C# — ppetrov @ 12:39 pm
Tags: , , ,

The easiest way to see if a String contains another string is to use the method Contains().

The documentation has Remarks

This method performs an ordinal (case-sensitive and culture-insensitive) comparison. The search begins at the first character position of this string and continues through the last character position.

But what if we want a case-insensitive comparison? The answer is simple – we can use IndexOf(). The problem is the name of the method isn’t as intuitive as Contains() is(Contains calls IndexOf under the hood). To create a little comfort I’ve wrapped the IndexOf() call to an extension method which will determine if a string contains another string ignoring the case.

public static bool Contains(this string original, string value, StringComparison comparisionType)
{
    return original.IndexOf(value, comparisionType) >= 0;
}

In fact we pass StringComparison as parameter so it’s easy to specify the culture and the case.

 

Don’t use .ToUpper() or .ToLower()

Filed under: .NET Framework,C# — ppetrov @ 11:36 am
Tags: , , ,

I’ve seen on many code snippets and posts an inadequate use of ToUpper() and ToLower() methods of the string class. It’s a well know fact that string is immutable and these methods will return a copy of the original string.

A String object is called immutable (read-only) because its value cannot be modified once it has been created. Methods that appear to modify a String object actually return a new String object that contains the modification.

If you need to compare two strings ignoring the case, we must use the static method Equals()


string.Equals("abc", "ABC", StringComparison.OrdinalIgnoreCase)

or the instance one.


"abc".Equals("ABC", StringComparison.OrdinalIgnoreCase)

StartsWith(), EndsWith(), IndexOf() etc – all of these methods provide a way to specify the
StringComparison type.
An interesting method is Contains(). There’s no way to specify the StringComparison type or to say ignoreCase = true.
Actually it’s an alias for IndexOf(). In Reflector we can see the following implementation

public bool Contains(string value)
{
    return (this.IndexOf(value, StringComparison.Ordinal) >= 0);
}

To fill this gap I’ve decided to write an extension method with an additional parameter

        public static bool Contains(this string original, string value, StringComparison comparisionType)
        {
            return original.IndexOf(value, comparisionType) >= 0;
        }

Now we can use it like this

"abc".Contains("AB", StringComparison.OrdinalIgnoreCase)

I think it’s a useful method so I’ll post this extension method as a separate post.

 

Useful method – 5 of N – Format/Beautify XML June 24, 2008

Filed under: C#,XML — ppetrov @ 7:27 pm
Tags: , , ,

Here’s a method to format a XML string or a XML file.

    public static string Format(string xmlContents)
    {
        StringBuilder buffer = new StringBuilder();

        XmlDocument doc = new XmlDocument();
        doc.LoadXml(xmlContents);

        using (var writer = XmlTextWriter.Create(buffer, new XmlWriterSettings() { Indent = true }))
        {
            doc.Save(writer);
        }

        return buffer.ToString();
    }

    public static void FormatFile(string inputFile)
    {
        FormatFile(inputFile, inputFile);
    }

    public static void FormatFile(string inputFile, string outputFile)
    {
        XmlDocument doc = new XmlDocument();
        doc.Load(inputFile);

        using (var writer = XmlTextWriter.Create(outputFile, new XmlWriterSettings() { Indent = true }))
        {
            doc.Save(writer);
        }
    }

I’ve tried to optimize these methods by replacing the XmlDocument with XPathDocument. The documentation of XPathDocument says:

Provides a fast, read-only, in-memory representation of an XML document using the XPath data model.

My first thought was that the read-only nature of XPathDocument will speed up my code, so I end up with this method.

    public static string FormatUsingXPath(string xmlContents)
    {
        StringBuilder buffer = new StringBuilder();

        using (var writer = XmlTextWriter.Create(buffer, new XmlWriterSettings() { Indent = true }))
        {
            using (XmlTextReader reader = new XmlTextReader(xmlContents, XmlNodeType.Document, null))
            {
                XPathDocument doc = new XPathDocument(reader);
                writer.WriteNode(doc.CreateNavigator(), false);
            }
        }

        return buffer.ToString();
    }

Unfortunately the XPath version is 10% slower. I think the difference comes from the creation of the XmlTextReader.

If we apply the method on the following unformatted XML


<?xml version="1.0"?>
<catalog>
 <book id="bk101">
 <author>Gambardella, Matthew</author><title>XML Developer's Guide</title>

 <genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
 <description>An in-depth look at creating applications
 with XML.</description>

 </book>
</catalog>

we will receive a well formatted version of our input XML.

Unfortunately there’s a problem with wordpress.com XML formatting and the well formatted XML isn’t shown as expected.

 

 
Follow

Get every new post delivered to your Inbox.