Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » U++ Library support » U++ Core » Retriving data in record containing ""
Retriving data in record containing "" [message #33644] Sat, 03 September 2011 16:47 Go to next message
forlano is currently offline  forlano
Messages: 1185
Registered: March 2006
Location: Italy
Senior Contributor
Hello,

I have records like this

"81116","180","","Lüigi,MacFarlane,Dr.","M","D","1948 ","201113","2577","173","2593","GM","4600010","GER "

I need to retrive the data separated by ','. Unfortunately this sign is used inside column 4 and give problem to Split(). Column 4 sometime has one ',' and sometime two ','.
Is there some special method to process such a record?

Thanks,
Luigi
Re: Retriving data in record containing "" [message #33645 is a reply to message #33644] Sat, 03 September 2011 19:16 Go to previous messageGo to next message
tojocky is currently offline  tojocky
Messages: 607
Registered: April 2008
Location: UK
Contributor

Hi,

It is very simple:
0. if first char is new line then create new record
1. first check if the next char is "
2. if first char is " yes then get value till next ", skip "," and repeat 1
3. if first isn't " then get next value till ,, skip "," and repeat 1

in your example, point 3 is never used.


Hope this help you.
Ion.

forlano wrote on Sat, 03 September 2011 17:47

Hello,

I have records like this

"81116","180","","Lüigi,MacFarlane,Dr.","M","D","1948 ","201113","2577","173","2593","GM","4600010","GER "

I need to retrive the data separated by ','. Unfortunately this sign is used inside column 4 and give problem to Split(). Column 4 sometime has one ',' and sometime two ','.
Is there some special method to process such a record?

Thanks,
Luigi


Re: Retriving data in record containing "" [message #33646 is a reply to message #33645] Sat, 03 September 2011 19:52 Go to previous messageGo to next message
forlano is currently offline  forlano
Messages: 1185
Registered: March 2006
Location: Italy
Senior Contributor
tojocky wrote on Sat, 03 September 2011 19:16

Hi,

It is very simple:
0. if first char is new line then create new record
1. first check if the next char is "
2. if first char is " yes then get value till next ", skip "," and repeat 1
3. if first isn't " then get next value till ,, skip "," and repeat 1

in your example, point 3 is never used.


Hope this help you.
Ion.



Hi Ion,

I hoped in something at level of Split().
What I done was to replace '",' with '|' and obtain a string suitable to be used with Split, then remove again the '"' in the substring.
Perhaps your solution is better.

Luigi
Re: Retriving data in record containing "" [message #33649 is a reply to message #33646] Sat, 03 September 2011 21:29 Go to previous messageGo to next message
sergeynikitin is currently offline  sergeynikitin
Messages: 748
Registered: January 2008
Location: Moscow, Russia
Contributor

This very slow. I recomend use Regular Expression (package plugins/pcre)

CSV-parser (regex pattern below)
^(("(?:[^"]|"")*"|[^,]*)(,("(?:[^"]|"")*"|[^,]*))*)$

or



And in other words:
http://www.google.com/search?client=ubuntu&channel=fs&am p;q=retrieving+CVS+regular+expression&ie=utf-8&oe=ut f-8


SergeyNikitin<U++>( linux, wine )
{
    under( Ubuntu || Debian || Raspbian );
}
Re: Retriving data in record containing "" [message #33653 is a reply to message #33649] Sun, 04 September 2011 10:19 Go to previous messageGo to next message
forlano is currently offline  forlano
Messages: 1185
Registered: March 2006
Location: Italy
Senior Contributor
sergeynikitin wrote on Sat, 03 September 2011 21:29

This very slow. I recomend use Regular Expression (package plugins/pcre)



Prevet Sergey,

I agree it is slow. I tried your pattern
#include <Core/Core.h>
#include <plugin/pcre/pcre.h>

using namespace Upp;

CONSOLE_APP_MAIN
{
	String ss, s = LoadFile("in.txt");
	String p1 = "^((\"(?:[^\"]|\"\")*\"|[^,]*)(,(\"(?:[^\"]|\"\")*\"|[^,]*))*)$";
 	RegExp r0(p1);
 	r0.Match(s);
	Vector<String> field=r0.GetStrings();
 	for (int i=0; i<field.GetCount(); i++) {
	   ss << field[i] << "\n";
	}
	SaveFile("out.txt", ss); 
}

and in.txt is
"Luigi", "Ivan, Petrovic", "43434"
but it give me weird output. I would like to get the data inside ".

Luigi
Ivan, Petrovic
43434

Perhaps I need to modify the pattern as I have 14 fields
http://www.kimgentes.com/worshiptech-web-tools-page/2008/10/ 14/regex-pattern-for-parsing-csv-files-with-embedded-commas- dou.html

Luigi

[Updated on: Sun, 04 September 2011 10:31]

Report message to a moderator

Re: Retriving data in record containing "" [message #33670 is a reply to message #33644] Mon, 05 September 2011 11:19 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
forlano wrote on Sat, 03 September 2011 10:47

Hello,

I have records like this

"81116","180","","Lüigi,MacFarlane,Dr.","M","D","1948 ","201113","2577","173","2593","GM","4600010","GER "

I need to retrive the data separated by ','. Unfortunately this sign is used inside column 4 and give problem to Split(). Column 4 sometime has one ',' and sometime two ','.
Is there some special method to process such a record?

Thanks,
Luigi



If those strings are C-like, use CParser.

If it is Csv, I happen to have some routines (not in U++ sources) for that as well...

Mirek
Re: Retriving data in record containing "" [message #33674 is a reply to message #33670] Mon, 05 September 2011 12:52 Go to previous messageGo to next message
forlano is currently offline  forlano
Messages: 1185
Registered: March 2006
Location: Italy
Senior Contributor
mirek wrote on Mon, 05 September 2011 11:19

forlano wrote on Sat, 03 September 2011 10:47

Hello,

I have records like this

"81116","180","","Lüigi,MacFarlane,Dr.","M","D","1948 ","201113","2577","173","2593","GM","4600010","GER "

I need to retrive the data separated by ','. Unfortunately this sign is used inside column 4 and give problem to Split(). Column 4 sometime has one ',' and sometime two ','.
Is there some special method to process such a record?

Thanks,
Luigi



If those strings are C-like, use CParser.

If it is Csv, I happen to have some routines (not in U++ sources) for that as well...

Mirek


Yes, it a CSV file (the same that gave problem with accented character).
Luigi
Re: Retriving data in record containing "" [message #33679 is a reply to message #33674] Mon, 05 September 2011 14:03 Go to previous message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
Currently I am using this to parse csv:

Vector<String> ReadCsvLine(Stream& s, int separator, byte charset)
{
	Vector<String> r;
	bool instring = false;
	String val;
	for(;;) {
		int c = s.Get();
		if(c == '\n' || c < 0) {
			r.Add(ToCharset(CHARSET_UTF8, val, charset));
			return r;
		}
		else
		if(c == separator && !instring) {
			r.Add(ToCharset(CHARSET_UTF8, val, charset));
			val.Clear();
		}
		else
		if(c == '\"') {
			if(instring && s.Term() == '\"') {
				s.Get();
				val.Cat('\"');
			}
			else
				instring = !instring;
		}
		else
		if(c != '\r')
			val.Cat(c);
	}
}
Previous Topic: Need help about the codepage!
Next Topic: isnan [BUG?] fix for Core OSX11
Goto Forum:
  


Current Time: Thu Apr 25 21:16:28 CEST 2024

Total time taken to generate the page: 4.00038 seconds